<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Tokyo Cabinet Observations</title>
	<atom:link href="http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/feed/" rel="self" type="application/rss+xml" />
	<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/</link>
	<description>Parand Tony Darugar: A Cruel and Petty Dictator</description>
	<lastBuildDate>Fri, 03 Sep 2010 03:26:42 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Brian</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-265136</link>
		<dc:creator>Brian</dc:creator>
		<pubDate>Wed, 04 Nov 2009 14:10:05 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-265136</guid>
		<description>Is RAM usage a function of the bnum parameter?</description>
		<content:encoded><![CDATA[<p>Is RAM usage a function of the bnum parameter?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TokyoCabinet HDB slowdown :: Kelvin Tan - Lucene Solr Nutch Consultant</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-262519</link>
		<dc:creator>TokyoCabinet HDB slowdown :: Kelvin Tan - Lucene Solr Nutch Consultant</dc:creator>
		<pubDate>Sat, 10 Oct 2009 15:53:00 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-262519</guid>
		<description>[...] experienced a similar phenomenon, and just stumbled upon http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/ , where I realized my problem was with bnum being too small (default of [...]</description>
		<content:encoded><![CDATA[<p>[...] experienced a similar phenomenon, and just stumbled upon <a href="http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/" rel="nofollow">http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/</a> , where I realized my problem was with bnum being too small (default of [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kelvin Tan</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-262518</link>
		<dc:creator>Kelvin Tan</dc:creator>
		<pubDate>Sat, 10 Oct 2009 15:48:38 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-262518</guid>
		<description>Thanks for the post and for everyone&#039;s comments. I too was running into the slowdown with large hdbs, and I didn&#039;t realize I needed to up bnum.</description>
		<content:encoded><![CDATA[<p>Thanks for the post and for everyone&#8217;s comments. I too was running into the slowdown with large hdbs, and I didn&#8217;t realize I needed to up bnum.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fz</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-246732</link>
		<dc:creator>Fz</dc:creator>
		<pubDate>Thu, 25 Jun 2009 19:42:01 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-246732</guid>
		<description>You can also try pyrant if you need to do network operations. Great library and clear.

http://code.google.com/p/pyrant/</description>
		<content:encoded><![CDATA[<p>You can also try pyrant if you need to do network operations. Great library and clear.</p>
<p><a href="http://code.google.com/p/pyrant/" rel="nofollow">http://code.google.com/p/pyrant/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reid Lynch</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-240047</link>
		<dc:creator>Reid Lynch</dc:creator>
		<pubDate>Thu, 21 May 2009 14:31:28 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-240047</guid>
		<description>I&#039;ve only just started experimenting with Tokyo Cabinet and Tyrant, but it&#039;s pretty clear you&#039;ve got to up your hash bucket number (bnum) based on the number of records you anticipate.  TC recommends a bnum of .5 - 4x the number of records.  With the optimize function, this can be altered on an existing db.

I would also be very interested to see a comprehensive post somewhere detailing the all of the tuning and optimization parameters than can be done on Tokyo Cabinet databases.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve only just started experimenting with Tokyo Cabinet and Tyrant, but it&#8217;s pretty clear you&#8217;ve got to up your hash bucket number (bnum) based on the number of records you anticipate.  TC recommends a bnum of .5 &#8211; 4x the number of records.  With the optimize function, this can be altered on an existing db.</p>
<p>I would also be very interested to see a comprehensive post somewhere detailing the all of the tuning and optimization parameters than can be done on Tokyo Cabinet databases.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Parand</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-238537</link>
		<dc:creator>Parand</dc:creator>
		<pubDate>Mon, 11 May 2009 20:04:18 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-238537</guid>
		<description>Robin, I didn&#039;t change any of the parameters (including bnum). If you learn of interesting parameters or different setups, if you could leave a comment or post something about it I&#039;d appreciate it.</description>
		<content:encoded><![CDATA[<p>Robin, I didn&#8217;t change any of the parameters (including bnum). If you learn of interesting parameters or different setups, if you could leave a comment or post something about it I&#8217;d appreciate it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin Luckey</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-238527</link>
		<dc:creator>Robin Luckey</dc:creator>
		<pubDate>Mon, 11 May 2009 19:11:01 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-238527</guid>
		<description>Did you try increasing the bnum parameter to increase the bucket size? I suspect that by default, the entire 2^64 key space is not available, and that you are in fact encountering key collisions.

Thanks for posting -- I&#039;m preparing to do some large TC experiments myself, and I&#039;m curious about your results.</description>
		<content:encoded><![CDATA[<p>Did you try increasing the bnum parameter to increase the bucket size? I suspect that by default, the entire 2^64 key space is not available, and that you are in fact encountering key collisions.</p>
<p>Thanks for posting &#8212; I&#8217;m preparing to do some large TC experiments myself, and I&#8217;m curious about your results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Parand</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-236719</link>
		<dc:creator>Parand</dc:creator>
		<pubDate>Fri, 01 May 2009 16:01:11 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-236719</guid>
		<description>Thanks Didier. I&#039;ve largely given up on the many-tc-instances approach, maybe I can revisit based the tuning parameters you describe.</description>
		<content:encoded><![CDATA[<p>Thanks Didier. I&#8217;ve largely given up on the many-tc-instances approach, maybe I can revisit based the tuning parameters you describe.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Didier Spezia</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-236714</link>
		<dc:creator>Didier Spezia</dc:creator>
		<pubDate>Fri, 01 May 2009 15:38:15 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-236714</guid>
		<description>If you partition your data, you may want to play with the tchdbsetxmsiz function (C API) to limit the amount of memory TC mmaps per database. The default is 64 Mb. You may want to reduce it in order to get more partitions. You can also have a look at the ulimit -a output and check the virtual memory, max memory size, and data seg size limits.

You may also want to review the tuning parameters to be set by tchdbtune (C API), and especially the bnum parameter. It should be set to 2 or 4 times the expected number of items to get good performance.</description>
		<content:encoded><![CDATA[<p>If you partition your data, you may want to play with the tchdbsetxmsiz function (C API) to limit the amount of memory TC mmaps per database. The default is 64 Mb. You may want to reduce it in order to get more partitions. You can also have a look at the ulimit -a output and check the virtual memory, max memory size, and data seg size limits.</p>
<p>You may also want to review the tuning parameters to be set by tchdbtune (C API), and especially the bnum parameter. It should be set to 2 or 4 times the expected number of items to get good performance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Parand</title>
		<link>http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/comment-page-1/#comment-233724</link>
		<dc:creator>Parand</dc:creator>
		<pubDate>Wed, 15 Apr 2009 03:56:03 +0000</pubDate>
		<guid isPermaLink="false">http://parand.com/say/?p=787#comment-233724</guid>
		<description>I have ~2.5 million distinct keys. (2^64)/2500000 = 7.37e12 . Not exactly running out of room for keys here. Math is our friend. 

I&#039;m not asking for extra features. I just want the data store to provide reasonable performance for larger data sizes. I don&#039;t think handling large data sizes would be feature bloat or turn TC into postgres jr.

I&#039;m not trying to dump on TC here -  in fact I&#039;m trying hard to use it in my project and have high hopes for it. I&#039;m just curious as to the performance characteristics on large data - given the relative abundance of hash like data stores, I&#039;d have imagined the newer ones would take care of scaling out of the box. My question was mainly to find out if there&#039;s something obvious I&#039;m doing wrong in my setup - which I still suspect there is.</description>
		<content:encoded><![CDATA[<p>I have ~2.5 million distinct keys. (2^64)/2500000 = 7.37e12 . Not exactly running out of room for keys here. Math is our friend. </p>
<p>I&#8217;m not asking for extra features. I just want the data store to provide reasonable performance for larger data sizes. I don&#8217;t think handling large data sizes would be feature bloat or turn TC into postgres jr.</p>
<p>I&#8217;m not trying to dump on TC here &#8211;  in fact I&#8217;m trying hard to use it in my project and have high hopes for it. I&#8217;m just curious as to the performance characteristics on large data &#8211; given the relative abundance of hash like data stores, I&#8217;d have imagined the newer ones would take care of scaling out of the box. My question was mainly to find out if there&#8217;s something obvious I&#8217;m doing wrong in my setup &#8211; which I still suspect there is.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
