Data Model != Database
Look, I love databases as much as the next guy. Maybe even more. Ok, definitely more. But this needs to stop.
Databases are not a valid integration point for applications.
I have seen this in virtually every place I’ve been and it’s a problem. Everybody loves the database because we all understand it. It’s easy. It sits there with its port all open waiting for any little application to drop in and leave some data behind. You don’t have to worry about the filesystem or sticking stuff on a particular server somewhere; files are dirty anyway. The database is elegant, with relationships and tables and indexes. You have drivers in every language, and all these pretty abstraction layers, ORM and such. It even uses a nice querying language so you can get exactly what you want out of it.
What could be more perfect?
Unfortunately the database isn’t the cheap date she leads us to believe she is. You’ll pay for all this beauty, yes sirree. How, you ask?
- Once you code to a schema from multiple applications, you can never effectively change it. Sure, you can try, but it’ll take you 10 times longer than it should. And since half of your apps are developed by programmers who are no longer with the company, you may not even know you needed that column until you drop it and all the sudden the payroll system crashes. On second thought maybe you should leave it alone.
- The database schema is designed with a ton of little rules in mind. That absolutely_sure_flag column on the order_items table? That is there to make sure that the guys in the warehouse were positive they stuck that item in the box, and if you don’t set it the accounting system doesn’t include that item in the rollup report. You did remember to set it in your web service, right?
- Databases are flat. The elegance of everything being tables with columns is offset by the fact that meaning is implied through naming conventions and foreign key relationships alone. This is fine if you designed the database and know what everything means. It is less fine if the database was designed by a DB2 veteran who cut his teeth on a System/390 and felt that using more than 8 characters in a table name was a waste of “core.”
- Database columns have a limited range of types, and all format-level validation is done in whatever application language is doing the inserts and updates. Hopefully all of the applications will do the same validation, right?
- Databases have a somewhat broken permission system. This leads to audit properties and permissions being implemented at the application level, which is great if every application has the same login/permissions system but not so much otherwise.
Integrating at the database level gives you a brittle, inflexible system where all of your applications are tightly coupled. After a few years your beautiful bride turns into a gluttonous beast consuming developer cycles like bonbons.
I think the reason for this is a simple misunderstanding.
Developers confuse the data model with the database.
The data model is what your company uses to do its work; it represents a model of the world in which they operate. The database is just an artifact, a rough implementation of that data model. Subject to change as the world comes into focus.
This is an admittedly fine line, but an important one. Data models are tuned, databases are changed. In the data model you can deal with abstractions. Orders have Items on them which correspond to Products and are shipped to Customers. Databases deal with specifics. A row in the `order_items` table points to the `orders` and `products` tables, and the `orders` table has a `customer_person_id` that is keyed to the `people` table. Or, if performance is an issue, maybe the customer details can be placed in columns directly on the `orders` table instead. Maybe we can also drop a `product_name` into the `order_items` table as well, since that is all we need from the `products` table on those checkout screens that get loaded 1,000,456 times a day.
In any event, the data model should be all you need to know in your application code. The database should be hidden behind an abstraction layer.
So how can we achieve this? My personal preference is a business rules engine such as CodaServer. (Did I mention it’s open source and free?) It handles many of the above concerns:
- Data integrity is enforced through an extensible workflow system with triggers that can be made to stop you from doing bad things with data. If you make a trigger that halts an operation when a particular flag isn’t there, it will do so across all applications.
- Tables have different relationships with each other. They can be parents, children, or neither, plain tables or workflow-laden “forms.” Since this is part of the schema, applications can figure it out and plan accordingly.
- There is a regex-based type system so column values can be validated more stringently.
- The user and permissions model is multi-faceted and a help rather than a hinderance.
Another solution is web services. REST can provide a simple way of representing your data model in a database- and language-neutral way, and it is trivial to update to reflect database schema updates. On the updating side, XML-RPC and SOAP can provide ways of parameterizing your data operations and presenting them across application languages.
Databases are wonderful technology, but like any tool, they are more appropriate for certain tasks than others. Storing data and making it available for ad-hoc queries? Great. Providing a common API for applications written in different languages and for different purposes to share data? Not good.
Til next time.
Posted: November 12th, 2008 under Technology.
Comments: 2
Comments
Comment from Suhail
Time: November 13, 2008, 3:12 am
Hi Mike,
I very much agree with you, that RDBMS schema’s are rigid and a pain to change. They perform best when you have a fixed schema and you know for sure that there is no way of it being changed.
But in todays internet driven world where the data needs to continuously evolve, this poses a big problem. In an enterprise you don’t want your data to be struck in isolated silos.
I think the time has come when semantic databases will take over from the RDBMS, a very promising Semantic database is Poseidon Database which is based on open source technologies and comes packed with Brainwave Platform a complete development and deployment suite.
-Suhail.
Pingback from My daily readings 11/13/2008 « Strange Kite
Time: November 13, 2008, 7:39 am
[…] Mike-O-Matic » Data Model != Database […]

















Write a comment