Caffeine-Powered Life

Getting Started With Cassandra

For this blog, we are going to cover just some basics of creating a keyspace, column family, inserting records into and retrieving records from Cassandra. I am running Cassandra version 1.2.5 and CQLSH 3.0.2.

Your First Keyspace

First, we need to connect to the Cassandra database. Navigate to your Cassandra directory.

1
2
3
4
5
$ bin/cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 3.0.2 | Cassandra 1.2.5 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh>

The first thing we need to do is create a Keyspace.

For all purposes, a Cassandra Keyspace is functionally the same thing as database catalog. When creating a keyspace, we must give a storage class and a replication factor.

1
2
3
4
5
6
7
8
cqlsh> create keyspace demo with {'class':'SimpleStrategy', 'replication_factor':1};
cqlsh> use demo;
cqlsh:demo> describe keyspace demo;

CREATE KEYSPACE demo WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '1'
};

We have successfully created a keyspace.

Your First Column Family

To store anything in Cassandra, you need to create a column family. A column family is functionally the same as a table or view. In fact, when we create a column family, we are going to use a create table command.

1
2
3
4
5
cqlsh:demo> create table tasks (
        ...   id timeuuid primary key,
        ...   description text,
        ...   completed boolean
        ... );

The primary key in this case is a timeuuid type. It is a guaranteed unique ID based on the system clock and MAC address of the client that generated the value. We do not have an option to use autoincrementing integers in a distributed database, so we will get very used to using either UUIDs or deterministic values.

See the DataStax documentation for more information on Cassandra data types.

To define a column family, you must have at least a primary key.

Inserting Rows

1
2
3
cqlsh:demo> insert into tasks (id, description, completed) values (now(), 'This is a test', False);
cqlsh:demo> insert into tasks (id, description, completed) values (now(), 'This is another test', False);
cqlsh:demo> insert into tasks (id, description, completed) values (now(), 'Just one more test for safe keeping', False);

We can also query out all tasks quite easily with a select statement.

1
2
3
4
5
6
7
cqlsh:demo> select * from tasks;

 id                                   | completed | description
--------------------------------------+-----------+-------------------------------------
 a31e2680-ce94-11e2-a7da-4f403ad60db0 |     False | Just one more test for safe keeping
 0b56dd60-ce94-11e2-a7da-4f403ad60db0 |     False |                      This is a test
 14af9870-ce94-11e2-a7da-4f403ad60db0 |     False |                This is another test

Since Cassandra has a primary key on the id column, we can query just that record.

1
2
3
4
5
cqlsh:demo> select * from tasks where id = 0b56dd60-ce94-11e2-a7da-4f403ad60db0;

 id                                   | completed | description
--------------------------------------+-----------+-------------------------------------
 0b56dd60-ce94-11e2-a7da-4f403ad60db0 |     False |                      This is a test

Now data is being stored, and we can get all of the rows or a single row by id. What if only want the open tasks? That’s where secondary indexes come into play. If we try to query only the open tasks now, our query will fail.

1
2
3
4
5
6
7
8
9
10
11
cqlsh:demo> select * from tasks where completed = False;
Bad Request: No indexed columns present in by-columns clause with Equal operator

cqlsh:demo> create index tasks_completed on tasks (completed);
cqlsh:demo> select * from tasks where completed = False;

 id                                   | completed | description
--------------------------------------+-----------+-------------------------------------
 a31e2680-ce94-11e2-a7da-4f403ad60db0 |     False | Just one more test for safe keeping
 0b56dd60-ce94-11e2-a7da-4f403ad60db0 |     False |                      This is a test
 14af9870-ce94-11e2-a7da-4f403ad60db0 |     False |                This is another test

Inserting Rows from Ruby

For Ruby, I have been really happy with the cql-rb gem. It lets you create prepared statements and parameterized queries.

1
2
3
4
5
irb(main)> require 'cql'
irb(main)> client = Cql::Client.connect('127.0.0.1')
irb(main)> client.use('demo')
irb(main)> statement = client.prepare('insert into tasks (id, description, completed) values (now(), ?, ?)')
irb(main)> statement.execute('This task was created by cql-rb', false)

If we go back to our CQLSH, we can see the newly inserted row.

1
2
3
4
5
6
7
8
cqlsh:demo> select * from tasks;

 id                                   | completed | description
--------------------------------------+-----------+-------------------------------------
 a31e2680-ce94-11e2-a7da-4f403ad60db0 |     False | Just one more test for safe keeping
 17a0f9c0-ce99-11e2-a7da-4f403ad60db0 |     False |     This task was created by cql-rb
 0b56dd60-ce94-11e2-a7da-4f403ad60db0 |     False |                      This is a test
 14af9870-ce94-11e2-a7da-4f403ad60db0 |     False |                This is another test

Our new task has been successfully stored in Cassandra.

Conclusion

In this blog post, we…

  • Connected to Cassandra with CQLSH.
  • Created a Keyspace.
  • Created a Column Family.
  • Inserted records into Cassandra.
  • Retrieved records from Cassandra
  • Created an index.
  • Retrieved records by index.
  • Connected to Cassandra with cql-rb.
  • Created a prepared statement with cql-rb.
  • Inserted records with cql-rb.

Base64 Encoding in Java

I’ve spent the last month or so writing Java. I’ve been writing C# for close to 8 years, so Java is just plain frustrating. I can read it just fine, but writing it is an incredibly painful experience. It’s close enough that I think I know what I’m doing, but the classes and libraries are all different enough that too many things produce unexpected results.

Let’s demonstrate an example of this.

1
2
3
4
5
6
7
8
9
import org.apache.commons.codec.binary.Base64;

// snip...

private void initializeConnectionAuthorizationHeader() {
    byte[] authorizationHashBytes = Base64.encodeBase64((USERNAME + ":" + PASSWORD).getBytes());
    String authorizationHash = new String(authorizationHashBytes);
    connection.addRequestProperty("Authorization", "BASIC " + authorizationHash);
}

The connection object comes from a java.net.HttpURLConnection, which is obviously part of the core Java language library. Why exactly, is the ability to Base64 encode a string not part of the same core Java language library? This should be a really common thing. And when I look to the other languages I know – C#, Ruby, Python – an additional library is not necessary for this kind of simple task. Why, why, do I need an external library?

To do it without a library, you’ll find examples like this Stackoverflow answer, which is the raw algorithm for Base64 encoding and decoding. Yeah, that makes sense.

Six Months of CrossFit

The end of May marks six months of CrossFit at TitanFit and about three or four months of very clean eating habits. My last CrossFit post was back in February I wish I had taken some more before pics, because I’d really like to see just how far I’ve come in the last half year.

Honestly, I didn’t take any before pics because I didn’t like what I saw. This pic took a lot of courage to post. Yes, it’s an in-progress photo. If you don’t like it, go away. That’s unfortunate, I suppose, because now I can’t do an awesome before/after comparison. Although I can’t compare photos, I can compare quite a few numbers.

Weight

The most obvious number is weight. Six months ago, I was at 202 lbs. Last week, I was at 199 lbs. Obviously, my weight is not going down very much. Could I get it to go down? Probably, and I’m going to make a bit of an effort to get back to 190 this summer. If last year is any kind of indication, July and August are a great time for me to lose some blubber.

On the other hand, I’m not really so concerned with pure weight as a series of digits. Weight only tells a tiny fraction of my story. Also, I don’t think it’s possible for my weight to go down that much with the amount I’ve been lifting. I’m trying to add muscle, and that means getting around 100-120g of protein per day. I’ve been resisting having too much of a caloric deficit, just enough to get by.

Body Fat %

I am much more interested in reducing my body fat percentage as opposed to just having a lower weight. Starting next month, I will be tracking my BF% on a more regular basis. When I hear that professional football players weigh 240 lbs and are 11% body fat, I start to think that weight isn’t so important any more.

So, while I’ve only lost 2 or 3 lbs, these pants fit without a belt when I started at TitanFit. My body composition has changed a lot. Now, all of my pants go on and off without unbuttoning them. Belts are required on everything I own.

Functional Movement

This has been a huge area of change, both physically and mentally. Through CrossFit – pull-ups, push-ups, kettlebell swings, box jumps, wall balls, toes-to-bar, handstands, double-unders, and yes sprints – I have begun to measure my body by what it can do. It turns out, I can do quite a lot!

Furthermore, CrossFit has fed back into the other things that I do. My running has gotten faster. My cycling has gotten faster. My whole body has gotten more efficient. I did my first 50 mile bike ride earlier this week. It was remarkably easy, and my average heart rate was in the 120s.

My last timed run was just this past weekend (part of my Murph workout). I ran a 7:34 mile, and I was holding back because I didn’t want to start out too strong.

One more thing: I haven’t taken an elevator since I started CrossFit, so that means walking to the sixth floor at work at least two times a day (usually three or four). This used to leave me a little winded. No longer! (The day after squat day is a little harsh, though.)

Strength Gains

The most obvious gains are strength gains. I have put on a bit of muscle. Ideally, I would love to hit all of the CrossFit intermediate strength standards for my body weight.

  • Overhead Press went from 65 lbs to 115 lbs. Even tonight, I did 5 sets of 5 reps at 100 lbs. My arms and upper body continue to be my weak point. These, however, are some huge newbie gains in just six months. I would love to get this to 150. Yeah, that’s a really high goal. It’s also the intermediate standard. That’s going to take a long time, I know. If there’s anything CrossFit has taught me, it’s that I know I can get there.

  • Clean went from “what’s a clean” to 165 lbs. My first cleans were at 95 lbs, and I remember them being terrible. There’s a lot of technique that goes into a clean. Wrists, elbows, shoulders, back, abs, legs, feet – they all have to work together for this lift. There’s also a special mental challenge with a clean. You just threw 165 lbs into the air. Now convince yourself to squat underneath it and catch it. The intermediate standard for this lift is 205 lbs.

  • Squat from 205 lbs to 265 lbs. I’ve always been happy with my legs. With all the running and cycling I do, I know I have good legs. The intermediate standard for this lift is 285 lbs, so I should be within reach of this in a few more cycles. I think 315 lb squats are in my future!

  • Deadlift from 255 lbs to 315 lbs, and that 315 is low. I’ve pretty sure I could beat the intermediate standard of 335. Again, the strong legs and back help. I can see myself passing 405 lbs by this time next year.

I’ve been doing Wendler’s 5/3/1 and using the Big Lifts app for iPhone to help calculate and track everything.

Eating

The gym isn’t the only place I’ve made changes. Those who know me know that I have made a serious attempt to eat well. In the last three months, I have really tried to improve on my base. More fruits and vegetables. No HFCS. No added sugars, including added sugar in its various forms. Evaporated cane extract, evaporated fruit concentrate, and agave nectar are all food industry jargon for sugar. I’ve cut out Splenda and Diet Coke. Even artificial sugars reprogram our brains for what a sweetness standard should be.

If you’re interested in this, go read Sugar, Salt, Fat by Michael Moss. I am married to a food scientist, and I am still learning more and more about the food industry and how our brains react to food.

And then something amazing happened.

I used to watch Food Network and hear judges on various shows say something along the lines of, “This is too sweet.” I was always wondering what that meant. Who says that?

Last week, I said that. Twice.

So there you go. Six months of CrossFit.

I Was Wrong About MSMVC

It’s no secret that I am a fan of Ruby on Rails. Just a quick look back through my archives, my first Ruby post was on 25 August 2010. Ruby has become my get-stuff-done language. I author web sites with it. I deploy with Capistrano. I script with it.

Today, I was talking through an MSMVC vs. Rails example: post a form, parse the form data, save some stuff to the database, dynamically generate and send an HTML-body email, redirect to a success page. I don’t think I’m too off base here. I understand that the same amount of computational “stuff” has to happen no matter what language and framework is used.

The difference is how much is baked into the Rails framework vs. how much is still left as an execise for the reader in MSMVC. It was then that I realized I shouldn’t be comparing the two.

Frankenstein

MSMVC is a skeleton. You have to add the organs, muscles, and everything else to create your creature. Many of these organs can be downloaded and installed through Nuget. By the time you’ve written as many solutions as I have, you probably have a library of bits and pieces checked into a repo that you can add at will. Some things, like EF Code First, require Visual Studio plugins to be installed, so that’s a little annoying. Add enough stuff to your skeleton, and you can have your very own Frankenstein’s monster of an application.

Rails is also a skeleton. It also comes out of the box with vital organs and enough muscle to breathe, walk, and eat on its own. Active Record is set up and ready to go. Asset bundling is a good idea and included by default. SCSS and CoffeeScript support? Put two lines in your Gemfile. Done. Rails is already Frankenstein’s monster.

Once I figured this out, I realized that MSMVC is actually really good skeleton. Someone needs to build an opinionated framework on top of MSMVC that makes MSMVC as easy and as configurable as Rails.

Pick an ORM. NHibernate? Entity Framework? Do you go micro with Dapper? I only want to worry about POCO classes. Let me craft my own SQL if I want. You don’t have to support 100% of cases out there.

Pick a DI tool: Structure Map, Ninject, Unity. Whatever, I don’t care. Just don’t make me think about it. If it’s done well, I don’t even want to see any IoC library-specific code. I just want a container that I configure.

Make it stupid simple. Encourage rapid scaffolding. Understand nested resources. The entire stack needs to be testable out of the box. This includes running an in-memory web server, so I can test session values and routing considerations. Tests need to hit a real database, with setup and teardown methods able to run fixtures at test startup.

But if I don’t like something, let me override an implementation with my own stuff. This means that you’ve got to use interfaces and abstract classes and make your methods virtual.

Support easy testing. Out of the box, run an in-memory web server just using a StringBuilder as the output stream. Add some assertions that help with testing MSMVC.

Someone go do that.

Oh wait, someone already did, and it’s a hell of a lot closer to the target than what MS put together.

Not Your Father’s Accounting System

Today, I had a wonderful conversation about transactions and transactional integrity in the NoSQL world, especially with document databases. The most common example being a financial system where two (or more) inserts must be created and either succeed together or fail together.

First of all, not all systems are equal, and this isn’t 2008. RavenDB supports transactions out of the box. Hey look! So does Mongo. So correct faulty data when you get the chance.

But I want to keep going, because this is still a point worth making.

In an accounting system, you have three types of Balance Sheet accounts: Assets, Liabilities, Equity. These accounts must always satisfy the conditions Assets = Liabilities + Equity. We also have two types of Profit & Loss accounts: Operating Revenue and Operating Expense.

So let’s take a typical accounting system. You probably have data that looks something like this. You’d have much more info, of course, like in what order accounts should appear in a chart of accounts, etc. For now, this will suffice.

ACCOUNTS
ID  NUMBER  TYPE  DESCRIPTION          BALANCE  LAST_POSTED
 1  100001  AS    Cash                 2000.00  2013-03-20 00:00:00
 2  100002  AS    Accounts Receivable   500.00  2013-03-20 00:00:00
 3  100003  AS    Inventory            1000.00  2013-03-20 00:00:00
 4  200001  LI    Accounts Payable      250.00  2013-03-20 00:00:00
 5  300001  EQ    Owner Equity         2500.00  2013-03-20 00:00:00
 6  300002  EQ    Retained Earnings     750.00  2013-03-20 00:00:00
 7  400001  OR    Sales                1000.00  2013-03-20 00:00:00
 8  500001  OE    Cost of Goods Sold    500.00  2013-03-20 00:00:00
 9  500002  OE    Discounts Given       100.00  2013-03-20 00:00:00

For this purpose, we’ll ignore taxes, payroll, loans, etc. Real companies have to worry about them, but they won’t add to our discussion.

When an order comes in, customer is charged. If you were doing accounting by hand, you’d probably write something like this.

                      Dr.      Cr.       Ref.
Accounts Receivable   100.00             Order #100
    Sales                      100.00

And in good ol’ relational database world, you would want to ensure that both inserts. Happened simultaneously.

JOURNAL ENTRIES
TIMESTAMP            TYPE  ACCOUNT_ID  AMOUNT  REF         POSTED_AT 
2013-03-20 10:00:00  DR    2           100.00  Order #100  NULL
2013-03-20 10:00:00  CR    7           100.00  Order #100  NULL

Note that your table of accounts is not updated at this time, because no accounting system that I’m aware of posts balances immediately. You usually go through a trial balance process first, then post balances when you’re sure that everything balances. To get the current balance of an account, you must take the account balance then add or subtract the unposted journal entries.

It’s also important to note that a journal entry affects at least two accounts, but could affect more. Here’s another order, later that same day, but this person had a coupon. We would need to add an extra debit for Discounts Given.

                      Dr.      Cr.       Ref.
Accounts Receivable    90.00             Order #101
Discounts Given        10.00
    Sales                      100.00

If we have to pay sales tax, this becomes four lines, because you would need to debit the liability account for Taxes Owed. At least in Europe, it would be 4 lines, because the sales tax is included in the price of the product. If something says 10,69€, then it will cost 10,69€ to walk out of the store. In the United States, taxes are added after the fact, so most companies would add two lines: one for Taxes Collected and one for Taxes Owed. If it says, $9.99, then it will really cost you $10.69 to walk out of the store (at least in Indiana).

The journal entry still balances. It should be obvious that this should become three rows inserted into our JOURNAL_ENTRIES table.

JOURNAL ENTRIES
TIMESTAMP            TYPE  ACCOUNT_ID  AMOUNT  REF         POSTED_AT 
2013-03-20 13:00:00  DR    2            90.00  Order #100  NULL
2013-03-20 13:00:00  DR    9            10.00  Order #100  NULL
2013-03-20 13:00:00  CR    7           100.00  Order #100  NULL

Like so. Or five inserts if we have the tax thing to worry about.

Can I write for a Document DB the same as RDBMS?

Yes. We could create multiple inserts just like this.

{
  Timestamp: 2013-03-20 10:00:00,
  Type: "DR",
  Account: {
    Id: Account/2,
    Description: "Accounts Receivable"
  },      
  Ref: "Order #100",
  PostedAt: null
},
{
  Timestamp: 2013-03-20 10:00:00,
  Type: "CR",
  Account: {
    Id: Account/7,
    Description: "Sales"
  },
  Amount: 100.00,
  Ref: "Order #100",
  PostedAt: null
}

Sure, that works. But what’s the point? What have you really gained by doing this? You’re still thinking like you’re working with an RDBMS. You’re not.

Should I write for a Document DB the same as RDBMS?

No. I would claim that this operation, and most operations with a document database, should be a single insert.

{
  Timestamp: 2013-03-20 10:00:00,
  Description: "Order Received",
  Order: Orders/100,
  PostedAt: null
  Debits: [{
    Account: {
      Id: Accounts/2,
      Description: "Accounts Receivable"
    },
    Amount: 100.00
  }],
  Credits: [{
    Account: {
      Id: Accounts/7,
      Description: "Sales"
    },
    Amount: 100.00
  }]
}

You have now rewritten the problem to leverage the capabilities of a document database. And now you have a fairly simple Map-Reduce to find the current account balance. Our second example isn’t really so different, which is why we recognized that debits and credits should be arrays.

{
  Timestamp: 2013-03-20 13:00:00,
  Description: "Order Received",
  Order: Orders/101,
  PostedAt: null
  Debits: [{
    Account: {
      Id: Accounts/2,
      Description: "Accounts Receivable"
    },
    Amount: 90.00
  }, {
    Account: {
      Id: Accounts/9,
      Description: "Discounts Given"
    },
    Amount: 10.00
  }],
  Credits: [{
    Account: {
      Id: Accounts/7,
      Description: "Sales"
    },
    Amount: 100.00
  }]
}

Next, we need to verify inventory. Again, we don’t really update inventory values. We create a series of inventory transactions that later get rolled up and posted. Just like accounts, the current inventory is equal to the quantity on hand plus the unposted adjustments.

In RDBMS land, this would be 3n inserts, where n is number of line items on the invoice. For each line on the order, you’d have an inventory transaction and two journal entries. You would debit Cost of Goods Sold and credit Inventory.

INVENTORY_TRANSACTIONS
TIMESTAMP            ITEM_ID  ADJ  REF                  POSTED_AT
2013-03-20 10:05:00  10       -2   Order #100, Line #1  NULL
2013-03-20 10:05:00  11       -1   Order #100, Line #2  NULL

JOURNAL_ENTRIES
TIMESTAMP            TYPE  ACCOUNT_ID  AMOUNT  REF                  POSTED_AT 
2013-03-20 10:05:00  DR    8            40.00  Order #100, Line #1  NULL
2013-03-20 10:05:00  CR    3            40.00  Order #100, Line #1  NULL
2013-03-20 10:05:00  DR    8            15.00  Order #100, Line #2  NULL
2013-03-20 10:05:00  CR    3            15.00  Order #100, Line #2  NULL

A simple two-line order requires six inserts that all must succeed or fail together.

Once again, I would make this a single document, encapsulating all of this information. Also, I would probably copy a bit more data to the local document, because that is something that we’re comfortable with doing in a NoSQL world.

{
  Timestamp: 2013-03-20 10:05:00,
  Description: "Order Released",
  Order: Orders/100,
  PostedAt: null,
  Debits: [{
    Account: {
      Id: Accounts/8,
      Description: "Cost of Goods Sold",          
    },
    Product: {
      Id: Products/10,
      Description: "Something Awesome",
      Line: 1,
      Quantity: 2
    },
    Amount: 40.00
  },{
    Account: {
      Id: Accounts/8,
      Description: "Cost of Goods Sold",          
    },
    Product: {
      Id: Products/11,
      Description: "Something Else Awesome",
      Line: 2,
      Quantity: 1
    },
    Amount: 15.00
  }],
  Credits: [{
    Account: {
      Id: Accounts/2,
      Description: "Inventory",          
    },
    Product: {
      Id: Products/10,
      Description: "Something Awesome",
      Line: 1,
      Quantity: 2
    },
    Amount: 40.00
  }, {
    Account: {
      Id: Accounts/2,
      Description: "Inventory",          
    },
    Product: {
      Id: Products/11,
      Description: "Something Else Awesome",
      Line: 2,
      Quantity: 1
    },
    Amount: 15.00
  }],
  InventoryAdjustments: [{
    Product: Products/10,
    Adjustment: -2
  },{
    Product: Products/11,
    Adjustment: -1
  }]
}

Like a book about your company.

The series of journal entries you create, instead of being a series of disjointed things with foreign keys, becomes a meaningful, human-readable story. This is an incredibly powerful benefit. The data has machine value, business value, and human value contexts. Typically, data in an RDBMS has machine value only. It’s not until it’s translated by code that its business value and human value are realized.

But what about transactions?

What about them? These databases satisfy ACID. You don’t need transactions for single ops. When you find ways to redefine your problem as single-document operations, the whole idea of transactions just becomes (mostly) a waste.

That doesn’t mean that we don’t need transactions. It just means that their importance is greatly diminished. When we roll up trial balances and post them, we want all of those updates to be a single transaction. Depending on the size of your company and the frequency that you post transactions, the trial balance can be an enormous undertaking. Large manufacturing firms with thousands of inventory items and massive sales volumes who post once per month can accrue millions of unposted journal entries.

The trial balance is a collection of all of the journal entries and inventory adjustments to be included in the post. Controllers had the ability to decide which transactions were to be posted. For each account, you know the starting balance, you apply all adjustments, and you know the ending balance.

Hey! We just described another document. It’s a working trial balance.

{
  Id: TrialBalances/100,
  CreatedAt: 2013-03-22 14:20:00,
  CreatedBy: {
    Id: Users/jmeyer,
    Name: "Jarrett Meyer"
  },
  PostedAt: null,
  JournalEntries: [
    JournalEntries/1700, 
    JournalEntries/1701, 
    JournalEntries/1702
  ],
  AccountBalances: [{
    Id: Accounts/1,
    StartingBalance: 1000.00,
    Adjustments: 260.00,
    EndingBalance: 1260.00 
  }, {
    // ... etc ...
  }]
} 

With a document like this, you know exactly what is supposed to happen. You know exactly what journal entries are to be included. You know exactly every adjustment to be applied. So even if the posting transaction fails, you know exactly what you should look for. Every journal entry in the list should have a non-null PostedAt value, and the account balances should all match their EndingBalance value. If that’s not the pit of success, then I don’t know what is.

Document databases. It’s your data. Only clearer.

Why C# Needs Duck Typing

I was working on a .NET project that lacked a few unit tests. So, being the Boy Scout developer I am, I decided I would add the ones that seemed obvious. Since this was a web project, that meant interacting with an active web session. The good news is that the web session has been wrapped with an adapter. The bad news is that the .NET framework is missing some obvious-to-me hooks.

HttpSessionStateBase.cs
1
2
3
4
5
6
7
8
public abstract class HttpSessionStateBase : ICollection, IEnumerable
{
    // snip

    public abstract object this[string key] { get; set;}

    // snip
}

Why isn’t the indexer encapsulated into it’s own interface? It’s just sitting out there, a property of the class definition. Basic key-value lookup seems like an appalling obvious role interface.

If it were its own interface, think about how easy it would be to jump back and forth from an in-memory Dictionary<string, object> to a live session. What’s your backing data? Who cares? This would make what I’m doing a heck of a lot easier.

IKeyValueStore.cs
1
2
3
4
public interface IKeyValueStore<TKey, TValue>
{
    TValue this[TKey key] { get; set;}
}

This interface should be retroactively added to all .NET objects that support indexing. This pattern pops up all over the place in .NET, so why not explicitly call it out when you see it? I’d even add it to System.String as an IKeyValueStore<int, char>. Of course, the setter would throw a NotImplementedException with a very nice explanation that .NET strings are immutable.

This is not the same as an IDictionary<KeyValuPair<TKey, TValue>, although dictionaries would also implement this interface. There’s a ton of extra stuff you need to make a dictionary — stuff I’m just not interested in writing. Nor do I want to have throw a ton of NotImplmentedException all over my codebase.

So why would duck typing matter? We wouldn’t care about the explict interface definition. Instead, we’d just call the indexer. If it works, then great! If not, then throw a NotImplementedException. In fact, this is exactly how the foreach keyword works.

In the end, I want to be able to jump back and forth between backing stores for the purposes of writing tests. This isn’t difficult, but it’s work I don’t feel like I should have to do because I want higher code quality.

A Gist of this code has been provided. For those of you keeping score at home, that’s 6 classes to something that should be really simple in 2013.