31. July 2012 16:40
A follower on twitter, @red_square, ran my tests from my previous blog post using MongoDB as a storage engine that gave some interesting event read rates.
I incorporated this into my test:
Compared to the RDBMs, these reads speads are blistering: 7 times faster than SQL Server and over 20 times faster than PostgreSQL. There is a cost to this, and we are comparing apples and oranges here. There is plenty on the web comparing MongoDB and other NoSql storage tech to the traditional so I'm not going to regurgitate them here.
For my case, I think I'd like to still present the option of using an RDMBS for our customers. I think they are just more comfortable with it. But it does open the door to an interesing optimization strategy. There are cases, such as deploying new projection schema, upgrading an exisiting one, recovering from a crash, or replaying to point-in-time for analysis, where you want a pipeline that can rebuild the projections as quickly as possible. So perhaps using MongoDB as a secondary event store (with lower reliabilty requirements) which gives you a super fast read pipeline could be feasible. This secondary store can be a local async mirror of the primary store that will get you 99% of the required events for a projection store re-build (it may not have 100% of events due to it being an async mirror). The remaining events can the come from the slower primary event store at the end of the rebuild step. If the MongoDB mirror is ever lost, such as in disaster recovery, it can just be re-mirrored. I'd hope this would be a very rare occurance
(Big thanks to Steve Flitcroft for the MongoDB help)
Edit: The specific numbers are not representative of real world numbers. The interesting bit is the relative difference between them, ceteris paribus.
30. June 2012 11:31
(x/post and edited from the DDD/CQRS group)
Edit: Some background info... A Domain Event is expected to be around forever. This means that we need to be able to read and deserialize events that could be 5, 7 or a decade old. Think of them as little file schemas - we need to keep the knowledge of that schema and keep it intact the lifetime our the data, which may be longer than the application itself.
So, I think this will work for BEAM for and our particular requirements - I'm certainly not prescribing this as 'best practice' or anything. I've decided to have two representations of a Domain Event, which I have called "Application Domain Event" and the other "Storage Domain Event".
Application Domain Event:
- Is published on bus.
- Can be referenced and depended on by any part of the application / system.
- Can follow OO patterns, such as DRY, i.e. sharing a type / enum with a command in a common location is permitted.
- Is versioned is the standard .net application / assembly manner. Old versions are not kept around.
Storage Domain Event:
- Is the schema format for serializing to, and deserializing from, the event store.
- Is in a project that has no external dependencies, except for 'System'. (And maybe System.Runtime.Serialization if they need to be marked [DataContract] etc).
- Is not referenced or consumed by any part of the system / application except for the event store.
- Versioning is done for the entire set of domain events, rather than individual. Currently this is organised by namespace, but may do seperate assemblies later.
- Will be kept around 'forever'.
- Will, on occasion, utilize wholesale migration if it makes sense.
I use Jonathan Oliver's event store library and am utilizing pipeline hooks to convert between Storage and Application events when read (up conversion), when committed (commit conversion) and published (an IDispatchCommits wrapper). These converters are in a separate library and are all, at this point, AutoMapper based with minimal configuration. There is now an additional developer cost in creating / managing the storage representation as well as the application one, but it doesn't seem to be completely wild (yet). A test using AutoMapper's AssertConfigurationIsValid catches most event shape mismatch issues very quickly. There is also a perf hit in the mapping but my gut tells me it's small compared to serialization and event store I/O.
For 'locking' a version of the storage events I have a glorified copy -> fine/replace script+tool (me-ware) that duplicates a storage version, changes the namespace (i.e. ".V1." -> ".V2."), generates a type signature, creates a test to make sure that types don't change and updates converters. I'm not convinced that this is ultimately necessary or worth it. I have some time to chew on that and am in no particular rush to settle on an approach.
So, will this stand the test of time? Time will tell I suppose :)
8. December 2011 17:40
One of the biggest brain speed bumps for people who come from a centralized data-model first \ RDMBS world is the issue of duplicates. "What if we have duplicate names?" "What if we have duplicate accounts?" etc. This can be of special concern in distributed systems, where your aggregate roots may be created in different places.
Before I engage in any sort of implementation, I have several business questions that must be answered first:
- What is the business impact of something like this happening?
- Under what exact circumstances / business operations could this issue occur?
- What are the chances of this happening? Or, how big is the window of opportunity for this to occur?
- Is it possible to mitigate the chances of it happening through design of the business operations?
- Can it be automatically detected?
- How quickly can it be detected after the issue occurs?
- Can it be compensated (fixed) after occurring?
- Can the compensation be automated and transparent?
99% of the time, it turns out that set based concerns, aren't.