Monday, June 29, 2009

Tokyo Cabinet Tuning Part 1 - Bucket Array Size

I have been playing around with Tokyo Cabinet for a few weeks now, and I wanted to share some of the tuning hints I have found.

I was loading a database with just shy of two billion records, and speed would become unacceptably slow after about the 500 million mark. In order to improve the database performance, I have begun experimenting with different tuning options available through the tcbdbtune method. The first tuning option I experimented with was the number of members in the bucket array.

Putting records into the B+ tree database will be much slower than you expect unless you increase the number of elements in the bucket array. Some of my runs took over 30 minutes to load 100 million records. I performed over 200 tests with different bucket, leaf/non-leaf member values, and record counts. In the end I found the bucket array should be between one-tenth to six-tenths the expected data-set size. Anything smaller or larger results in longer loads. The leaf/non-leaf values had very little impact on the performance of linear record writing.

I am still collecting data on the performance of different leaf/non-leaf settings for random writes, and I will post about those findings in part 2.