Production Deployment
This guide covers considerations for running BulkSharp in production environments.
Storage Provider Selection
| Scenario | File Storage | Metadata Storage |
|---|---|---|
| Single server, moderate volume | FileSystem | SQL Server (EF) |
| Multi-server / containerized | S3 | SQL Server (EF) |
| Serverless / ephemeral compute | S3 | SQL Server (EF) |
| Development / testing | InMemory | InMemory |
Important
In-memory storage is not durable. Never use it in production.
Database Configuration
Connection Pooling
BulkSharp uses IDbContextFactory<T> to create short-lived DbContext instances per repository call. This makes all EF repositories thread-safe for parallel row processing but increases connection pool pressure.
services.AddBulkSharp(builder => builder
.ConfigureOptions(opts => opts.MaxRowConcurrency = 8)
.UseMetadataStorage(ms => ms.UseSqlServer(sql =>
{
sql.ConnectionString = "Server=...;Max Pool Size=200;";
sql.MaxRetryCount = 5;
sql.MaxRetryDelay = TimeSpan.FromSeconds(10);
})));
Sizing guideline: Set Max Pool Size to at least MaxRowConcurrency * WorkerCount * 2 to avoid pool exhaustion under load.
Optimistic Concurrency
BulkOperation uses a RowVersion column for optimistic concurrency. The EF repository retries up to 5 times on DbUpdateConcurrencyException, merging monotonically-increasing counters (ProcessedRows, SuccessfulRows, FailedRows) from the database on each retry.
No configuration is needed. If you see frequent concurrency retries in logs, reduce MaxRowConcurrency or increase FlushBatchSize to reduce write frequency.
Scheduler Configuration
Channels Scheduler
The default ChannelsScheduler runs as an IHostedService with a bounded channel.
builder.UseScheduler(s => s.UseChannels(opts =>
{
opts.WorkerCount = 4; // Concurrent operations
opts.QueueCapacity = 1000; // Bounded queue size
}));
- WorkerCount controls how many operations process simultaneously. Each worker processes one operation at a time; within each operation,
MaxRowConcurrencycontrols row-level parallelism. - QueueCapacity bounds the in-memory queue. If the queue is full,
ScheduleBulkOperationAsyncblocks until space is available.
Total thread pressure: WorkerCount * MaxRowConcurrency is the maximum number of concurrent row-processing tasks.
S3 File Storage
IAM Permissions
The S3 provider requires s3:PutObject, s3:GetObject, s3:DeleteObject, and s3:ListObjectsV2 on the configured bucket.
Key Format
Files are stored as {prefix}{guid}-{sanitizedFileName}. The prefix is configurable via S3StorageOptions.Prefix.
Connection Handling
The S3StorageProvider wraps GetObjectResponse in a custom stream that disposes the response when the stream is closed, preventing HTTP connection pool exhaustion. Callers must dispose the returned stream.
Monitoring
Structured Logging
BulkSharp uses source-generated LoggerMessage throughout the Processing layer. Key log events:
| Event | Level | When |
|---|---|---|
| Operation created | Information | New operation submitted |
| Processing started | Information | Worker picks up operation |
| Row validation failed | Warning | Individual row fails validation |
| Processing completed | Information | All rows processed |
| Processing failed | Error | Unhandled exception during processing |
| Step retry | Warning | Step failed, retrying with backoff |
| Concurrency conflict | Warning | Optimistic concurrency retry in EF |
Diagnostics
BulkSharpDiagnostics provides diagnostic event names for integration with System.Diagnostics.Activity and OpenTelemetry.
Health Checks
BulkSharp does not register health checks automatically. For production, consider adding:
// Verify database connectivity
services.AddHealthChecks()
.AddDbContextCheck<BulkSharpDbContext>("bulksharp-db");
// Verify S3 connectivity (custom check)
services.AddHealthChecks()
.AddCheck("bulksharp-s3", () =>
{
// Attempt a lightweight S3 operation
return HealthCheckResult.Healthy();
});
Error Batching
Errors are buffered in memory and flushed to the repository in batches controlled by FlushBatchSize (default: 100). This reduces database round-trips during processing.
- After every
FlushBatchSizerows, pending errors are written and the operation status is updated. - On operation completion or failure, remaining errors are flushed.
- If the process crashes mid-operation, unflushed errors are lost. The operation will remain in
Processingstatus and can be detected by monitoring for stale operations.
Scaling Considerations
- Vertical scaling: Increase
WorkerCountandMaxRowConcurrencyon a single host. Monitor CPU, memory, and database connection usage. - Horizontal scaling: Run multiple instances with the same database and S3 bucket. Each instance runs its own
ChannelsScheduler. Use an external queue (e.g., SQS) to distribute work if needed — implement a customIBulkSchedulerthat reads from the queue. - Large files: BulkSharp streams files via
IAsyncEnumerable<T>, keeping memory usage constant regardless of file size. TheMaxFileSizeBytesoption (default: 100 MB) limits upload size.
Security
PII in Error Records
BulkSharpOptions.IncludeRowDataInErrors (default: false) controls whether raw row data is serialized into error records. Enabling this is useful for debugging but may store PII in the database.
File Storage
- Uploaded files are stored as-is with no encryption at rest by BulkSharp. Use S3 server-side encryption or encrypted EBS volumes for the FileSystem provider.
- File names are sanitized via
Path.GetFileName()to prevent path traversal attacks.
Input Validation
MaxFileSizeByteslimits upload size at the stream level.- CSV and JSON parsers use streaming and do not load entire files into memory.
- The pre-submission validation endpoint validates metadata and the first row without processing.