Tokyo Cabinet Observations

I’m using Tokyo Cabinet with Python tc for a decent sized amount of data (~19G in a single hash table) on OS X. A few observations and oddities:

  • Writes slow down significantly as the database size grows. I’m writing 97 roughly equal sized batches to the tch table. The first batch takes ~40 seconds, and processing time seems to increase fairly linearly, with the last taking ~14 minutes. Not sure why this would be the case, but it’s discouraging. I’ll probably write a simple partitioning scheme to split the data into multiple databases and keep the size of each small, but it seems like this should be handled out of the box for me.
  • [Update] I implemented a simple partitioning scheme, and sure enough it makes a big difference. Apparently keeping the file size small (where small is < 500G) is important. Surprising – why doens’t TC implement partitioning if it’s susceptible to performance issues with larger file sizes? Is this a python tc issue or a Tokyo Cabinet issue?
  • [Also] Seems I can only open 53-54 tc.HDB()’s before I get an ‘mmap error’, limiting how much I can partition.
  • Reading records that have already been read from the tch seems to go much faster on the second access (like an order of magnitude faster). I suspect this is the disk cache at work, but if anyone has extra info on this please enlighten me.
Another somewhat surprising aspect: using the tc library you’re essentially embedding Tokyo Cabinet in your app; I had assumed it was going to be network based access, but it’s not. You can do network access either using the memcached protocol or using pytyrant.

12 Comments so far

  1. [...] Tokyo Cabinet Observations – Standard Deviations (tags: performance tokyocabinet) [...]

  2. Jason Scheirer on April 14th, 2009

    The same is true of any hash store — as the number of entries increase, the probability of a hash collision increase as well, and once you hit a certain point there is nearly a 100% chance of collision where in addition all the buckets are full. Some basic second-semester algorithms concepts here. Not a TC issue, but a fundamental basic limits of 64 bit integers issue.

    The partitioning should be up to you, TC is lightweight and modular* and you can implement the partitioning yourself. Cabinet does the B-tree or Hash store for you, Tyrant does the remote access, Dystopia does the low-level text indexing that you put your smarter search implementation on top of. Too much feature creep and Cabinet becomes PostgreSQL Jr.

    * The code is shamefully clean and easy to follow C — I say shameful because of how few codebases I have seen written that are nearly as beautiful and perfect and well-disciplined as this one is, and if they were then C would be less of a punchline and more of a language of deep reverence with modern developers.

  3. Parand on April 14th, 2009

    I have ~2.5 million distinct keys. (2^64)/2500000 = 7.37e12 . Not exactly running out of room for keys here. Math is our friend.

    I’m not asking for extra features. I just want the data store to provide reasonable performance for larger data sizes. I don’t think handling large data sizes would be feature bloat or turn TC into postgres jr.

    I’m not trying to dump on TC here – in fact I’m trying hard to use it in my project and have high hopes for it. I’m just curious as to the performance characteristics on large data – given the relative abundance of hash like data stores, I’d have imagined the newer ones would take care of scaling out of the box. My question was mainly to find out if there’s something obvious I’m doing wrong in my setup – which I still suspect there is.

  4. Didier Spezia on May 1st, 2009

    If you partition your data, you may want to play with the tchdbsetxmsiz function (C API) to limit the amount of memory TC mmaps per database. The default is 64 Mb. You may want to reduce it in order to get more partitions. You can also have a look at the ulimit -a output and check the virtual memory, max memory size, and data seg size limits.

    You may also want to review the tuning parameters to be set by tchdbtune (C API), and especially the bnum parameter. It should be set to 2 or 4 times the expected number of items to get good performance.

  5. Parand on May 1st, 2009

    Thanks Didier. I’ve largely given up on the many-tc-instances approach, maybe I can revisit based the tuning parameters you describe.

  6. Robin Luckey on May 11th, 2009

    Did you try increasing the bnum parameter to increase the bucket size? I suspect that by default, the entire 2^64 key space is not available, and that you are in fact encountering key collisions.

    Thanks for posting — I’m preparing to do some large TC experiments myself, and I’m curious about your results.

  7. Parand on May 11th, 2009

    Robin, I didn’t change any of the parameters (including bnum). If you learn of interesting parameters or different setups, if you could leave a comment or post something about it I’d appreciate it.

  8. Reid Lynch on May 21st, 2009

    I’ve only just started experimenting with Tokyo Cabinet and Tyrant, but it’s pretty clear you’ve got to up your hash bucket number (bnum) based on the number of records you anticipate. TC recommends a bnum of .5 – 4x the number of records. With the optimize function, this can be altered on an existing db.

    I would also be very interested to see a comprehensive post somewhere detailing the all of the tuning and optimization parameters than can be done on Tokyo Cabinet databases.

  9. Fz on June 25th, 2009

    You can also try pyrant if you need to do network operations. Great library and clear.

    http://code.google.com/p/pyrant/

  10. Kelvin Tan on October 10th, 2009

    Thanks for the post and for everyone’s comments. I too was running into the slowdown with large hdbs, and I didn’t realize I needed to up bnum.

  11. [...] experienced a similar phenomenon, and just stumbled upon http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/ , where I realized my problem was with bnum being too small (default of [...]

  12. Brian on November 4th, 2009

    Is RAM usage a function of the bnum parameter?

Leave a Reply