dora-storage upload rewrite

Flow contracts, explained in place

Every step explains the decision it makes, the information it needs, and the invariant it protects.

client
storage api
postgres
object store
dora-repo

Session upload

Multipart upload where the client controls part order, while storage records enough state to validate, retry, and clean safely.

clientCreate upload session
A session is an upload promise.The client declares the file before bytes arrive: final size, repo destination, MIME hint, and conflict behavior. Storage asks for MIME up front because compression must be chosen consistently before blocks are written; guessing after upload would make retries and reads depend on late, unstable inference.
Input
  • parentPathPublic request field; repo parent path for the later attach.
  • requestedNamePublic request field; visible file name and extension fallback for codec choice.
  • expectedSizePublic request field; final logical byte count, not upload progress.
  • mimeTypePublic request field; optional compression hint captured before block writes.
  • conflictPolicyPublic request field; fail or auto_index.
  • Storage grantMiddleware supplies user_id, session_id, and root_fs_id.
GuaranteeAll later decisions use this contract instead of trusting request timing, client retries, or object-store state.
FailureInvalid destination, missing grant, impossible size, or unsupported shape fails before any object can be written.
postgresPersist session row
No file is created yet.The session row is intent, auth context, and recovery state. subject_id proves who owns the upload; auth_session_id prevents a stale browser or another login context from completing it; file_id stays empty until storage can prove the bytes form a whole file.
Input
  • idGenerated session id returned as sessionId in HTTP.
  • subject_idStored from grant.user_id.
  • auth_session_idStored from grant.session_id; must match later calls.
  • root_idStored from grant.root_fs_id.
  • parent_pathRepo parent path copied from the request.
  • requested_nameRepo file name copied from the request.
  • expected_sizeFinal logical size copied from expectedSize.
  • mime_typeOptional MIME hint copied from mimeType.
  • codecDerived block encoding decision.
  • expires_atTTL boundary for abandoned mutable upload.
  • stateStarts as CREATED.
GuaranteeThe row is the single source of truth for mutable upload state, so status, part upload, complete, abort, and maintenance see the same lifecycle.
FailureUnknown or expired handles later become session errors instead of leaking partial file state.
storage apiClaim a part
A part is a numbered slice of the promised file.partNo is the public query field for a zero-based slice index. Part 0 is the first block, part 1 is the next block, and so on. Numbering is the ordering contract: clients may upload in parallel, retry any slice, and arrive out of order without storage ever using arrival time as file order.
Input
  • :session_idPath parameter converted into an Eid.
  • partNoPublic query field mapped to internal part_no.
  • Request bodyStreaming bytes for exactly this slice.
  • expected_sizeStored session promise used to reject out-of-range parts.
GuaranteeThe claim checks session state atomically, so abort, expiry, and complete become hard boundaries rather than advisory flags.
FailureTerminal, expired, finalizing, or out-of-range uploads stop before object storage receives a new write.
postgresInsert part claim row
The part row is the cleanup anchor.Storage records a fresh block_id before object upload starts. That order is intentional: object storage is treated as dumb durable bytes, while Postgres owns identity, ownership, and cleanup. If the worker dies mid-write, maintenance still knows the exact key to delete.
Input
  • session_idPart-row foreign key to the upload session.
  • part_noPrimary-key half that prevents duplicate slice claims.
  • block_idFresh object identity generated before streaming.
  • stateInserted as UPLOADING when the session is still mutable and not expired.
GuaranteeOne request owns the part write; same-part concurrency becomes either an idempotent match or a conflict, not two competing objects.
FailureAn existing uploaded part is compared by content proof instead of overwritten silently.
object storeWrite block bytes
A block is the physical object.The logical slice becomes one object under block_id. MIME type and filename extension are used here because this is the only moment storage can cheaply choose a block codec. Text-like content gets zstd; already-compressed media, archives, and unknown content stay raw to avoid spending CPU to make output larger.
Input
  • block_idObject-store key reserved by Postgres.
  • Request streamLogical user bytes read for this one block.
  • codecnone or zstd, selected from mime_type and requested_name.
  • Expected part lengthComputed from expected_size, part_no, and BLOCK_CAPACITY; validated after the write.
GuaranteeHash and logical length describe the user bytes; stored length and codec describe the object bytes. Reads need both halves to verify then decode.
FailureWrong size fails the part; the DB anchor remains available for cleanup.
postgresMark part uploaded
Ownership is checked again after upload.Object writes are slower than DB state changes, so the world can change while bytes stream. The final mark verifies that the worker still owns the same block_id and the session still accepts parts before durable metadata starts trusting that object.
Input
  • block_idOwnership token claimed before streaming.
  • hashSHA-256 hex content proof for plain bytes.
  • plain_lenUser-visible byte length.
  • stored_lenActual object byte length after codec.
  • codecHow reads must decode the object.
GuaranteeThe uploaded part row becomes a compact proof: this numbered slice has these bytes and this exact object.
FailureIf ownership or state changed, the mark is rejected and cleanup can remove the untrusted object.
Same part, same bytesRetry succeeds because the proof matches; clients can recover from network drops without re-planning the upload.
Same part, different bytesRetry fails because accepting different bytes under the same part number would corrupt the declared file order.

Complete and attach

Completion freezes mutable upload state, proves the file shape, commits a storage manifest, then asks repo to expose it.

storage apiClaim finalization
Finalizing means “stop accepting changes.”FINALIZING is a barrier and a lease. It blocks late parts, abort, and expiry while one caller builds the final file, but it can be recovered if that caller dies before committing anything durable.
Input
  • :session_idPath parameter converted into an Eid.
  • stateStored session state; must be completeable under UploadSessionState::can_complete.
  • Scope checkHTTP verifies root_id, subject_id, and auth_session_id against the storage grant before storage finalizes.
  • upload_session_partsRows locked and validated after the finalization claim.
GuaranteeOnly one caller can construct the file manifest; concurrent complete calls observe the same lifecycle instead of duplicating work.
FailureIf validation fails, the session returns to uploadable state; if the worker dies, maintenance can release the stale lease.
postgresValidate uploaded parts
Expected size becomes a proof.The original byte-count promise now decides what “complete” means. Storage derives required part count, each offset, and the only short block allowed: the final one. That catches skipped middle parts, duplicate tails, short non-final blocks, and oversized uploads before any file exists.
Input
  • part_noDetermines block offset in the final file.
  • hashPer-part SHA-256 content proof.
  • plain_lenBytes counted toward expected_size.
  • stored_lenBytes stored in object storage.
  • codecDecode rule for this block.
GuaranteeOffsets are computed from part numbers, not DB insertion order. Total logical bytes exactly match the promised file length.
FailureMissing or badly sized parts reject complete and reopen the session for repair instead of committing a broken manifest.
postgresCreate committed file
A file is a manifest over blocks.The file row stores final content identity; reflinks map each block to an offset. This is why uploaded parts are not “the file” by themselves: a committed manifest is the boundary where independent block objects become one readable logical file.
Input
  • files.hashWhole-file content identity derived from ordered part hashes.
  • files.lenWhole-file logical byte count, equal to expected_size.
  • block_file_xrefsblock_id, file_id, and offset mappings.
  • upload_session_partsValidated source rows that already carry block ids and proofs.
GuaranteeFile row, reflinks, and session file_id commit in one transaction, avoiding a file without a session or a session pointing at nothing.
FailureCrash before commit leaves no file; crash after commit leaves enough session state to attach or retry attach.
dora-repoAttach upload
Attach makes storage visible in the repo tree.Storage owns bytes; repo owns the namespace users browse. The saved root_id, parent_path, requested_name, and conflict policy are used here because only repo can decide whether the destination is valid, whether a duplicate name may be auto-renamed, and what node id becomes visible.
Input
  • file_idCommitted storage file to expose.
  • root_fs_idRepo root from the upload session’s root_id.
  • parent_pathRepo parent path from the upload session.
  • requested_nameRequested visible file name.
  • file_hashWhole-file hash sent to repo.
  • file_sizeWhole-file logical size sent to repo.
  • conflict_policyProto enum: fail or auto-index.
GuaranteeSuccess returns repo entry id and final path, binding storage identity to repo identity.
FailureTransient or ambiguous failures are treated as possibly side-effecting, so storage keeps bytes and retries instead of deleting a file repo may already reference.
postgresPersist attach result
Pending is a real state, not a UI shrug.ATTACHED means the file is visible in repo. ATTACH_PENDING means storage has committed bytes and attach was inconclusive or retryable. FAILED means a terminal repo/storage decision says the upload cannot become visible.
Input
  • entry_idRepo response field saved as attached_entry_id.
  • pathRepo response field saved as attached_path.
  • attach_errorError string saved when repo attach fails.
  • Error classTransient tonic codes become ATTACH_PENDING; non-transient errors become FAILED.
GuaranteeThe user sees a truthful state: done, still recovering, or failed. No state asks the client to guess whether to upload again.
FailureRequest-time complete may return pending; maintenance owns retry loops so duplicate client clicks do not multiply attach attempts.

Simple upload

A one-request upload hides multipart mechanics from the client but keeps the same storage, attach, and recovery rules.

object storeChunk request body
Simple upload still writes blocks.The client sends one stream instead of numbered parts, but storage still chunks it into block objects so reads, hashing, compression, and size limits share the same machinery. MIME/name still matter because the same codec choice is applied before each block is written.
Input
  • Request bodySingle stream from the client.
  • parentPathPublic query field for repo parent path.
  • requestedNamePublic query field for repo file name.
  • mimeTypePublic query field used as compression hint.
  • conflictPolicyPublic query field; defaults to fail.
  • Storage grantMiddleware supplies user_id, session_id, and root_fs_id.
  • max_file_size_bytesUpload config guard for the streamed body.
GuaranteeWritten block ids are tracked in request memory until a DB transaction anchors them; before that, object storage has no durable owner.
FailureStream, size, codec, or hash failure deletes objects written by this request because no later cleanup row exists yet.
postgresCreate file plus session
The retry anchor is created immediately.Simple upload creates the committed file and a finalized upload session in one transaction. The user did not ask for a “session,” but storage needs one as a durable recovery record if repo attach fails after bytes are already committed.
Input
  • WrittenBlockRequest-local block ids, hashes, lengths, and codecs.
  • files.hashWhole-file identity computed from written blocks.
  • files.lenTotal logical size computed from written blocks.
  • subject_idStored from the request’s grant.user_id for future scope checks.
  • auth_session_idStored from the request’s grant.session_id for future scope checks.
  • root_idStored from the request’s grant.root_fs_id.
  • parent_pathRepo target copied into the finalized session.
  • requested_nameRepo file name copied into the finalized session.
  • conflict_policyRepo duplicate-name behavior.
GuaranteeCommitted file and finalized session appear together, so maintenance can always tell whether bytes still need repo attachment.
FailureDB failure triggers object cleanup for request-local blocks because the recovery anchor never became durable.
dora-repoAttach once
Same attach rule as session complete.“Simple” only describes the client API. Once bytes are committed, the same cross-service risk exists: repo may accept the attach and the network may fail before storage hears back. The finalized session is what makes that ambiguity recoverable.
Input
  • file_idCommitted storage file from this request.
  • root_fs_idRepo root from the upload session’s root_id.
  • parent_pathRepo parent path.
  • requested_nameRequested visible file name.
  • file_hashCommitted file hash.
  • file_sizeCommitted file logical size.
  • conflict_policyAttach behavior on collision.
GuaranteeSuccess returns ATTACHED immediately with the repo-visible entry.
FailureTransient or ambiguous failure leaves ATTACH_PENDING; terminal conflict leaves a failed upload that can be cleaned after retention.

Abort, expiry, cleanup, recovery

Maintenance follows DB state, treats object storage as key-value bytes, and deletes metadata only after every referenced object is settled.

storage apiAbort or expire
Abort only applies before a file exists.Abort and TTL are upload-cleanup tools, not delete-file tools. Once finalization begins, storage may already have a committed manifest or a repo attach in flight, so abandoning the session could orphan or erase bytes that are already meaningful.
Input
  • :session_idAbort path parameter, or DB-selected session during expiry.
  • stateMust be CREATED or UPLOADING.
  • expires_atTTL cutoff for automatic expiry.
GuaranteeFinalizing, finalized, attaching, attached, and pending sessions are not removed by upload expiry.
FailureLate abort after finalization claim returns invalid state and leaves recovery to completion/maintenance.
postgresRecover stale claims
Claims are leases without a separate lease table.FINALIZING and ATTACHING mean “someone is working.” They are not permanent states. Maintenance only treats them as stale after enough age has passed, so it does not steal active work but can recover from crashed workers.
Input
  • stateFINALIZING or ATTACHING.
  • updated_atAge signal compared with the stale threshold.
  • file_idPresence decides whether recovery returns to upload or attach.
GuaranteeNo-file finalization returns to uploadable state; committed files waiting on repo become pending attach.
FailureFresh claims are left alone even if they look slow, because correctness beats eager retry.
dora-repoRetry pending attach
Retry is centralized in maintenance.Users do not need to keep calling complete, and API handlers do not spin retry loops. Any finalized file that lacks a confirmed repo entry can be retried from durable DB state with the same stored root, path, name, and conflict policy.
Input
  • stateATTACH_PENDING or FINALIZED sessions selected by maintenance batch.
  • file_idCommitted storage file to expose.
  • root_idSaved repo root target.
  • parent_pathSaved repo parent path.
  • requested_nameSaved requested file name.
  • conflict_policySaved attach collision behavior.
  • attemptsCounter incremented after attach failures.
GuaranteeAmbiguous attach errors keep a retry path instead of deleting stored bytes or asking the client to decide what happened.
FailureTerminal attach failures move to failed state and become cleanup candidates after retention.
object storeDelete exact objects
No object-store listing means no guessing.Cleanup deletes only keys named by DB rows. This avoids expensive and eventually-consistent bucket scans, and it makes cleanup auditable: every delete has a reason in Postgres. If object deletion fails, the row stays because it is the only durable pointer to retry.
Input
  • block_idExact object key to delete.
  • upload_session_partsPart rows selected from aborted, expired, failed, or stale in-flight sessions.
  • block_file_xrefsFailed finalized-file manifest rows used to discover committed blocks.
  • Candidate stateCleanup only acts on states the DB selected as safe for deletion.
GuaranteeObject delete success or NotFound is required before DB anchors disappear; NotFound is safe because the desired end state already holds.
FailureDelete failure keeps metadata for retry instead of hiding a possible storage leak.
postgresRemove metadata anchors
Metadata is deleted last.Part rows, file rows, and block rows are removed only after referenced objects are gone. That preserves the core invariant: every live object is either reachable from a committed file or still named by cleanup metadata.
Input
  • upload_sessions.idUpload metadata to retire.
  • upload_session_partsMutable-upload anchors already cleaned in object storage.
  • filesFailed finalized-file metadata safe to remove after object deletion.
  • blocksBlock metadata safe to remove after object deletion.
  • Object delete resultSuccess or NotFound for every referenced object.
GuaranteeNo reachable file loses bytes; no object loses its cleanup key before deletion; failed uploads stop consuming DB attention after cleanup finishes.
FailureAttached files, pending attach, and active in-flight uploads are excluded because they still represent meaningful user data.