Today, I had a wonderful conversation about transactions and transactional integrity in the NoSQL world, especially with document databases. The most common example being a financial system where two (or more) inserts must be created and either succeed together or fail together.
First of all, not all systems are equal, and this isn’t 2008. RavenDB supports transactions out of the box. Hey look! So does Mongo. So correct faulty data when you get the chance.
But I want to keep going, because this is still a point worth making.
In an accounting system, you have three types of Balance Sheet accounts: Assets, Liabilities, Equity. These accounts must always satisfy the conditions Assets = Liabilities + Equity. We also have two types of Profit & Loss accounts: Operating Revenue and Operating Expense.
So let’s take a typical accounting system. You probably have data that looks something like this. You’d have much more info, of course, like in what order accounts should appear in a chart of accounts, etc. For now, this will suffice.
ACCOUNTS
ID NUMBER TYPE DESCRIPTION BALANCE LAST_POSTED
1 100001 AS Cash 2000.00 2013-03-20 00:00:00
2 100002 AS Accounts Receivable 500.00 2013-03-20 00:00:00
3 100003 AS Inventory 1000.00 2013-03-20 00:00:00
4 200001 LI Accounts Payable 250.00 2013-03-20 00:00:00
5 300001 EQ Owner Equity 2500.00 2013-03-20 00:00:00
6 300002 EQ Retained Earnings 750.00 2013-03-20 00:00:00
7 400001 OR Sales 1000.00 2013-03-20 00:00:00
8 500001 OE Cost of Goods Sold 500.00 2013-03-20 00:00:00
9 500002 OE Discounts Given 100.00 2013-03-20 00:00:00
For this purpose, we’ll ignore taxes, payroll, loans, etc. Real companies have to worry about them, but they won’t add to our discussion.
When an order comes in, customer is charged. If you were doing accounting by hand, you’d probably write something like this.
Dr. Cr. Ref.
Accounts Receivable 100.00 Order #100
Sales 100.00
And in good ol’ relational database world, you would want to ensure that both inserts. Happened simultaneously.
JOURNAL ENTRIES
TIMESTAMP TYPE ACCOUNT_ID AMOUNT REF POSTED_AT
2013-03-20 10:00:00 DR 2 100.00 Order #100 NULL
2013-03-20 10:00:00 CR 7 100.00 Order #100 NULL
Note that your table of accounts is not updated at this time, because no accounting system that I’m aware of posts balances immediately. You usually go through a trial balance process first, then post balances when you’re sure that everything balances. To get the current balance of an account, you must take the account balance then add or subtract the unposted journal entries.
It’s also important to note that a journal entry affects at least two accounts, but could affect more. Here’s another order, later that same day, but this person had a coupon. We would need to add an extra debit for Discounts Given.
Dr. Cr. Ref.
Accounts Receivable 90.00 Order #101
Discounts Given 10.00
Sales 100.00
If we have to pay sales tax, this becomes four lines, because you would need to debit the liability account for Taxes Owed. At least in Europe, it would be 4 lines, because the sales tax is included in the price of the product. If something says 10,69€, then it will cost 10,69€ to walk out of the store. In the United States, taxes are added after the fact, so most companies would add two lines: one for Taxes Collected and one for Taxes Owed. If it says, $9.99, then it will really cost you $10.69 to walk out of the store (at least in Indiana).
The journal entry still balances. It should be obvious that this should become three rows inserted into our JOURNAL_ENTRIES table.
JOURNAL ENTRIES
TIMESTAMP TYPE ACCOUNT_ID AMOUNT REF POSTED_AT
2013-03-20 13:00:00 DR 2 90.00 Order #100 NULL
2013-03-20 13:00:00 DR 9 10.00 Order #100 NULL
2013-03-20 13:00:00 CR 7 100.00 Order #100 NULL
Like so. Or five inserts if we have the tax thing to worry about.
Can I write for a Document DB the same as RDBMS?
Yes. We could create multiple inserts just like this.
{
Timestamp: 2013-03-20 10:00:00,
Type: "DR",
Account: {
Id: Account/2,
Description: "Accounts Receivable"
},
Ref: "Order #100",
PostedAt: null
},
{
Timestamp: 2013-03-20 10:00:00,
Type: "CR",
Account: {
Id: Account/7,
Description: "Sales"
},
Amount: 100.00,
Ref: "Order #100",
PostedAt: null
}
Sure, that works. But what’s the point? What have you really gained by doing this? You’re still thinking like you’re working with an RDBMS. You’re not.
Should I write for a Document DB the same as RDBMS?
No. I would claim that this operation, and most operations with a document database, should be a single insert.
{
Timestamp: 2013-03-20 10:00:00,
Description: "Order Received",
Order: Orders/100,
PostedAt: null
Debits: [{
Account: {
Id: Accounts/2,
Description: "Accounts Receivable"
},
Amount: 100.00
}],
Credits: [{
Account: {
Id: Accounts/7,
Description: "Sales"
},
Amount: 100.00
}]
}
You have now rewritten the problem to leverage the capabilities of a document database. And now you have a fairly simple Map-Reduce to find the current account balance. Our second example isn’t really so different, which is why we recognized that debits and credits should be arrays.
{
Timestamp: 2013-03-20 13:00:00,
Description: "Order Received",
Order: Orders/101,
PostedAt: null
Debits: [{
Account: {
Id: Accounts/2,
Description: "Accounts Receivable"
},
Amount: 90.00
}, {
Account: {
Id: Accounts/9,
Description: "Discounts Given"
},
Amount: 10.00
}],
Credits: [{
Account: {
Id: Accounts/7,
Description: "Sales"
},
Amount: 100.00
}]
}
Next, we need to verify inventory. Again, we don’t really update inventory values. We create a series of inventory transactions that later get rolled up and posted. Just like accounts, the current inventory is equal to the quantity on hand plus the unposted adjustments.
In RDBMS land, this would be 3n inserts, where n is number of line items on the invoice. For each line on the order, you’d have an inventory transaction and two journal entries. You would debit Cost of Goods Sold and credit Inventory.
INVENTORY_TRANSACTIONS
TIMESTAMP ITEM_ID ADJ REF POSTED_AT
2013-03-20 10:05:00 10 -2 Order #100, Line #1 NULL
2013-03-20 10:05:00 11 -1 Order #100, Line #2 NULL
JOURNAL_ENTRIES
TIMESTAMP TYPE ACCOUNT_ID AMOUNT REF POSTED_AT
2013-03-20 10:05:00 DR 8 40.00 Order #100, Line #1 NULL
2013-03-20 10:05:00 CR 3 40.00 Order #100, Line #1 NULL
2013-03-20 10:05:00 DR 8 15.00 Order #100, Line #2 NULL
2013-03-20 10:05:00 CR 3 15.00 Order #100, Line #2 NULL
A simple two-line order requires six inserts that all must succeed or fail together.
Once again, I would make this a single document, encapsulating all of this information. Also, I would probably copy a bit more data to the local document, because that is something that we’re comfortable with doing in a NoSQL world.
{
Timestamp: 2013-03-20 10:05:00,
Description: "Order Released",
Order: Orders/100,
PostedAt: null,
Debits: [{
Account: {
Id: Accounts/8,
Description: "Cost of Goods Sold",
},
Product: {
Id: Products/10,
Description: "Something Awesome",
Line: 1,
Quantity: 2
},
Amount: 40.00
},{
Account: {
Id: Accounts/8,
Description: "Cost of Goods Sold",
},
Product: {
Id: Products/11,
Description: "Something Else Awesome",
Line: 2,
Quantity: 1
},
Amount: 15.00
}],
Credits: [{
Account: {
Id: Accounts/2,
Description: "Inventory",
},
Product: {
Id: Products/10,
Description: "Something Awesome",
Line: 1,
Quantity: 2
},
Amount: 40.00
}, {
Account: {
Id: Accounts/2,
Description: "Inventory",
},
Product: {
Id: Products/11,
Description: "Something Else Awesome",
Line: 2,
Quantity: 1
},
Amount: 15.00
}],
InventoryAdjustments: [{
Product: Products/10,
Adjustment: -2
},{
Product: Products/11,
Adjustment: -1
}]
}
Like a book about your company.
The series of journal entries you create, instead of being a series of disjointed things with foreign keys, becomes a meaningful, human-readable story. This is an incredibly powerful benefit. The data has machine value, business value, and human value contexts. Typically, data in an RDBMS has machine value only. It’s not until it’s translated by code that its business value and human value are realized.
But what about transactions?
What about them? These databases satisfy ACID. You don’t need transactions for single ops. When you find ways to redefine your problem as single-document operations, the whole idea of transactions just becomes (mostly) a waste.
That doesn’t mean that we don’t need transactions. It just means that their importance is greatly diminished. When we roll up trial balances and post them, we want all of those updates to be a single transaction. Depending on the size of your company and the frequency that you post transactions, the trial balance can be an enormous undertaking. Large manufacturing firms with thousands of inventory items and massive sales volumes who post once per month can accrue millions of unposted journal entries.
The trial balance is a collection of all of the journal entries and inventory adjustments to be included in the post. Controllers had the ability to decide which transactions were to be posted. For each account, you know the starting balance, you apply all adjustments, and you know the ending balance.
Hey! We just described another document. It’s a working trial balance.
{
Id: TrialBalances/100,
CreatedAt: 2013-03-22 14:20:00,
CreatedBy: {
Id: Users/jmeyer,
Name: "Jarrett Meyer"
},
PostedAt: null,
JournalEntries: [
JournalEntries/1700,
JournalEntries/1701,
JournalEntries/1702
],
AccountBalances: [{
Id: Accounts/1,
StartingBalance: 1000.00,
Adjustments: 260.00,
EndingBalance: 1260.00
}, {
// ... etc ...
}]
}
With a document like this, you know exactly what is supposed to happen. You know exactly what journal entries are to be included. You know exactly every adjustment to be applied. So even if the posting transaction fails, you know exactly what you should look for. Every journal entry in the list should have a non-null PostedAt value, and the account balances should all match their EndingBalance value. If that’s not the pit of success, then I don’t know what is.
Document databases. It’s your data. Only clearer.