Thoughts On Document Storage and Workflow?

One of the requirements of my system will be to track an application process which will involve users submitting documents to the system that employees would then review/approve/reject.

Typical stuff, proof of insurance, proof of citizenship, proof of address, results of credit check, etc.

Again, version tracking is important, as is the ability to resell/license the solution, so I am advising against SharePoint for document management (I know there is a free version).

The DMS system really just needs the ability to return a collection of documents for a given “case”, as well as show versions on specific point-in-time. We do not need/want all the other features of SharePoint.

Are these documents something you would include in the same event store as everything else?

Basically, I think there is an “Application Process Saga” here (workflow) that can have multiple parallel steps and some fixed, sequential steps - order and type of steps will vary from customer to customer so I am even considering a full-blown workflow solution to manage this.

What are the size limitations one needs to be aware of? Chunking/streaming concerns? Other general thoughts?

Thanks,
Will

Hi Will,

Generally you’d want to store documents of that nature outside the event store, and use the claim check pattern (http://eaipatterns.com/StoreInLibrary.html) to get to them later. Although we’ve tested with fairly large events (around 4-5MB IIRC), it’s not optimal to be storing events of that size.

Cheers,

James

As James said, I would probably put the files themselves in a
distributed file store then put the events into the event store with
links to the original documents (eg the events describe the action
that caused the file to be put there). A distributed file system will
do a better job than any database at storing large numbers of files,
these offerings are also very mature.

Greg

Greg

Have you used a particular “distributed file system” in the past that you think will work well alongside the ES?

Depending on requirements/hosting etc. s3 would be a good choice (if
cloud hosted). azure's offering is also good. If hosting internally
there are a ton of options. How much data is being discussed?