Commercial Suicide - Integration at the Database Level

This post was originally authored by Jak Charlton in 2009 and is originally hosted at devlicio.us. That site appears to be inactive and regularly unavailable so, with permission, I'm re-publishing it here as I think it's a timeless piece.

There are many ways you can commit commercial suicide, but there is possibly no slower and more agonising death than that produced by attempting that great architectural objective, the single authoritative database to which all applications talk.

The theory is good, if we have a single database then we have all our business information in one place, accessible to all, easy to report against, reduced maintenance costs, consistency across all applications, and a host of other good objectives.

However all these noble ideals hide a more fundamental problem, that the single database does not solve any of them, and makes most of them into far bigger problems.

All Information In One Place - The Single Authority

On the face of it this sounds like a great objective - after all developers try to live by the maxim of Don't Repeat Yourself - and data in many place is a clear violation of that principle.

Data that appears across many applications and across many storage mechanisms leads to all sorts of massive problems; inconsistency, duplication, replication, duplicated business logic and code, essentially all boiling down to - you end up with spaghetti data. Spaghetti data is much like spaghetti code - it sprawls, gets tangled up, and becomes hard to pull apart without covering yourself in pasta sauce. This is obviously a "bad thing".

So what is wrong?

Well the first and most obvious thing that is wrong is that all applications have different requirements, and different "world views". Although there may in theory be some concept of a "Customer" for the organisation as a whole, even this most basic of data items varies widely between individual departments and even individual applications within a single department.

You could approach this problem, as many database centric people would do, and say "that is the problem right there, we need to standardise all these applications to use the One True Customer". But that is missing the really important word in the definition of the problem ... "requirements" ... this is not accidental that the Customer is different to different parts of the business, it really is different.

Your database guys could say "well your Customer in your application may be different as you have different requirements, but you will just have to fit it into our One True Customer", but this is then like trying to put snakes into a plastic bag - they really don't want to be in the bag, they don't fit too well in the bag, and sooner or later while putting one in some others are going to escape or bite your hand. And when your other department starts putting his snakes in the bag too you will be fighting for who gets to not be bitten.

And worse still - now you are trying to map from your requirements based Customer to the One True Customer, and spend an inordinate amount of time maintaining this translation layer in your application. When that One True Customer changes, as he undoubtedly will as new applications require he is expanded to deal with more data they require, every previous application needs to be revisited, large parts of it need ot be re-written, and the whole application needs to be regression tested again. And you have to do this for every single application talking to your single authority database.

You could just skip this stuff, and rely on your applications ignoring this new data, and rely on the database not caring if they correctly updated new data - but this really will come back to bite you - when that bag of snakes starts getting really large and really full - you really don't want to be the one trying to get new snakes into it.

The Truth About the One True Authority

There isn't one.

There - I've said it - I have upset all those database guys, probably upset a large number of SOA guys (I'll cover Commercial Suicide - Integration at Service Level in a later post), and have totally disagreed with noble business objectives.

The truth is, data must have context - without context, data is worthless, absolutely and totally worthless. Data stored in a database has no context, and therefore has no value. Context is provided by the applications that read and write that data, and therefore they are the only thing that matters, and their requirements are the only thing that matters. That means, they need data that is specific to their application, structured in a way that makes that application meet the business objectives, and in a way that makes that application meet non-functional requirements like resilience, reliability and consistency.

So Why Does Anyone Want the One True Authority Database?

Well, in legacy terms it is easy to understand why database admins and database developers want it - it is their lifeblood, their whole raison d'etre. More importantly, it is the culture in which they were brought up - the data is the important thing, the data is the centre of the universe, the data must be consistent, uniform and pure.

But leaving database developers aside, more importantly why would a corporation want the One True Authority Database (OTADB)? After all, the title of this post says this is "Commercial Suicide", so why hasn't this got through to management?

Well - the promise of OTADB is that it will reduce errors in duplication, reduce waste, reduce duplicated effort and reduce maintenance costs - all highly desirable business objectives. And indeed, from those that advocate this approach (those database admins and developers again), the OTADB sounds mighty attractive. On the face of it, it achieves all of these objectives.

Where it falls down is that this holy grail of software development is always just out of reach, they never quite manage to achieve it. Each application that is developed starts to make the OTADB worse, people start to hack things into it to get it to meet business requirements, not because developers want to hack things together, but precisely becasue the restriction of the OTADB forces them to do it that way if they are to deliver any kind of functionality at all.

They blame these hacks for ruining their vision of the One True Authority Database, the database admins tell them that they have to fight the application teams to stop them messing up their nice database, but that the OTADB no longer meets the noble objectives as those pesky development teams have messed it up for everyone.

Wait a Minute - What is the BUSINESS Objective Behind the One True Authority Database?

If anyone was to step back from those noble objectives and ask a far more fundamental question, the solution might actually be a lot more obvious than it may seem. While they are all noble objectives, largely actually made worse specifically by the OTADB approach, they are not the real business objective.

Underlying all the other requirements, the ultimate business requirement that drives people (in particular database admins) to want a single database is so that they can see what their company looks like, in other words - so they can produce Management Information - reports to you and me.

This is the single and most fundamental requirement for a business - to have a clear, consistent, accurate and up to date picture of what their company looks. This is what management needs, it is what allows them to make decisions, allows them to identify problems and allows them to spot opportunities.

So, we are going to all this effort, and believe me it is extensive and significant effort, all to support some reporting tools at the end of the day. Reporting tools have problems with data in different formats, with data that is inconsistent, with data that is disparate and distributed. So at some point in the past, the "accepted truth" became "we need one true authority database to be able to produce good reports"

Reporting is a Context - and data only has purpose and relevance in context.

If the Elephant in the Room is Actually Reporting, How Do We Solve The Elephant Problem?

This is almost so easy to deal with, it is silly. Perhaps it is because it is so obviously simple that is has been overlooked by many and rejected by others. Especially as it violates another one of those noble objectives ... to provide quality reporting information, we duplicate more of our data.

Yep - we duplicate it - after all reporting data is read only, so it doesn't matter if it is just a copy of other data. Reporting requirements are also very different to transactional requirements too, so we get the added benefit of being able to optimise that duplicated data for the reporting functions.

Data in relational databases is actually very poor for query and reporting purposes, and there is a constant compromise to make it fast for all those applications to write to, that makes it poor to report on - and vice-versa.

How this data gets into the reporting database isn't my direct concern in this blog post, suffice to say the "easy" way is to publish messages with data changes, and have a reporting application pick those up and persist them. My point here - is that splitting the reporting functions from the day-to-day business functions pays massive dividends.

Now We Have a New Problem

That still leaves us with one problem - what happens when disparate applications really do need to know about data in other applications? What happens when my call centre operatives are asked to update the address for one of the Customers. Now as each application has it's own view of the world, and it's own data stores, my accounting applciation does not have access to that change.

Well, the solution to the "how does data get in the reporting databases" question is exactly the same one here - you published messages from your application when you have changes that the rest of the corporation may be interested in. Fire off a message saying "CustomerAddressUpdated" and any other applciation that is concerned can now listen for that message and deal with it as it sees fit.

As It Sees Fit

And this is the real business objective we were trying to achieve in the beginning ... avoiding Corporate Suicide.

When applications are each responsible for their own data, their own actions, and are only responsible for letting the "enterprise" know they have made some changes that other things may need to know about - then you have your solution.

In good development terms, we have proper separation of concerns ... applications are responsible for their data, and their data only. They decide if they care about data from other applications - they are not forced to use it, nor to work around it.