<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Standard Deviations &#187; Scaling</title>
	<atom:link href="http://parand.com/say/index.php/category/scaling/feed/" rel="self" type="application/rss+xml" />
	<link>http://parand.com/say</link>
	<description>Parand Tony Darugar: A Cruel and Petty Dictator</description>
	<lastBuildDate>Wed, 11 Jan 2012 20:33:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Proxies For Request Modification?</title>
		<link>http://parand.com/say/index.php/2009/04/21/proxies-for-request-modification/</link>
		<comments>http://parand.com/say/index.php/2009/04/21/proxies-for-request-modification/#comments</comments>
		<pubDate>Wed, 22 Apr 2009 00:30:03 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Scaling]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=795</guid>
		<description><![CDATA[Interesing post from igvita on Ruby Proxies for Scale and Monitoring discussing the use of Ruby and EventMachine to create simple proxies for monitoring, benchmarking, content examination, and even request modification.
I&#8217;ve always wanted to do benchmarking as Ilya suggests. Real production traffic is the best way to test. Good stuff.
I&#8217;m tempted by the beanstalkd use case [...]]]></description>
			<content:encoded><![CDATA[<p>Interesing post from igvita on <a href="http://www.igvita.com/2009/04/20/ruby-proxies-for-scale-and-monitoring/" target="_blank">Ruby Proxies for Scale and Monitoring</a> discussing the use of Ruby and EventMachine to create simple proxies for monitoring, benchmarking, content examination, and even request modification.</p>
<p>I&#8217;ve always wanted to do benchmarking as Ilya suggests. Real production traffic is the best way to test. Good stuff.</p>
<p>I&#8217;m tempted by the beanstalkd use case as well &#8211; he uses his proxy to detect and route certain requests to an archiving mysql instead of to his beanstalkd instance. I&#8217;m leary of maintainability issues however &#8211; I&#8217;ve generally found indirection, particularly at wire protocol level, can quickly lead to hard to find bugs.</p>
<p>Something to experiment with at some point.</p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2009/04/21/proxies-for-request-modification/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Distributed Database Talk</title>
		<link>http://parand.com/say/index.php/2009/04/17/distributed-database-talk/</link>
		<comments>http://parand.com/say/index.php/2009/04/17/distributed-database-talk/#comments</comments>
		<pubDate>Fri, 17 Apr 2009 22:04:17 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Scaling]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=791</guid>
		<description><![CDATA[Very informative PyCon talk on various fancy distributed data stores, including BigTable, Dynamo, Cassandra, and several others.
 
]]></description>
			<content:encoded><![CDATA[<p>Very <a href="http://blip.tv/file/1949416/" target="_blank">informative PyCon talk</a> on various fancy distributed data stores, including BigTable, Dynamo, Cassandra, and several others.</p>
<p> <object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="720" height="510" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://blip.tv/play/AffKEpWmLQ" /><embed type="application/x-shockwave-flash" width="720" height="510" src="http://blip.tv/play/AffKEpWmLQ"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2009/04/17/distributed-database-talk/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If you have enough traffic, the cost of servers outweighs the cost of programmers</title>
		<link>http://parand.com/say/index.php/2009/04/08/if-you-have-enough-traffic-the-cost-of-servers-outweighs-the-cost-of-programmers/</link>
		<comments>http://parand.com/say/index.php/2009/04/08/if-you-have-enough-traffic-the-cost-of-servers-outweighs-the-cost-of-programmers/#comments</comments>
		<pubDate>Thu, 09 Apr 2009 06:27:22 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Scaling]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=785</guid>
		<description><![CDATA[Quote from Bill Venners (via):
If you have enough traffic, at some point the cost of servers outweighs the cost of programmers
Absolutely true, which is why places like Yahoo and Google are among the last bastions of very skilled C/C++ programmers.
Of course I should mention: you are not at that point. You really aren&#8217;t. So for [...]]]></description>
			<content:encoded><![CDATA[<p>Quote from Bill Venners (<a href="http://highscalability.com/some-point-cost-servers-outweighs-cost-programmers" target="_blank">via</a>):</p>
<blockquote><p><em><strong>If you have enough traffic, at some point the cost of servers outweighs the cost of programmers</strong></em></p></blockquote>
<p>Absolutely true, which is why places like Yahoo and Google are among the last bastions of very skilled C/C++ programmers.</p>
<p>Of course I should mention: you are not at that point. You really aren&#8217;t. So for now ignore this quote.</p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2009/04/08/if-you-have-enough-traffic-the-cost-of-servers-outweighs-the-cost-of-programmers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Queueing Benefits</title>
		<link>http://parand.com/say/index.php/2009/01/19/queuing-benefits/</link>
		<comments>http://parand.com/say/index.php/2009/01/19/queuing-benefits/#comments</comments>
		<pubDate>Tue, 20 Jan 2009 07:44:38 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Scaling]]></category>
		<category><![CDATA[queuing]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=742</guid>
		<description><![CDATA[Queues are nice things. We should be using more of them.
A couple of benefits that I knew of but didn&#8217;t really appreciate until Alex started using beanstalkd in his application:
- Having a worker-pulls-jobs-from-queue model provides near optimal use of the machine and prevents overload and thrashing. Setup as many simultaneous workers as your resources can [...]]]></description>
			<content:encoded><![CDATA[<p>Queues are nice things. We should be using more of them.</p>
<p>A couple of benefits that I knew of but didn&#8217;t really appreciate until Alex started using beanstalkd in his application:</p>
<p>- Having a worker-pulls-jobs-from-queue model provides near optimal use of the machine and prevents overload and thrashing. Setup as many simultaneous workers as your resources can handle and let them go. You have a controlled number of workers, preventing thrashing, and your workers work continuously. You don&#8217;t have to worry about a load distribution strategy &#8211; workers pull jobs as fast as they can.</p>
<p>- Provisioning new workers into the system becomes trivial. Want to add another box into the mix? Just set it up and have the workers start pulling jobs. You don&#8217;t have to worry about registering the new box and getting it into the load distribution system &#8211; so long as it knows how to connect to the queue it can grab jobs. Alex commented that scaling his system is as easy as bringing up another virtual machine &#8211; as soon as it&#8217;s up it starts pulling jobs.</p>
<p>These are both very important operational benefits that I had largely ignored.</p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2009/01/19/queuing-benefits/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Happy: Hadoop with Python (Jython)</title>
		<link>http://parand.com/say/index.php/2008/09/24/happy-hadoop-with-python-jython/</link>
		<comments>http://parand.com/say/index.php/2008/09/24/happy-hadoop-with-python-jython/#comments</comments>
		<pubDate>Wed, 24 Sep 2008 17:58:00 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Scaling]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=679</guid>
		<description><![CDATA[The Freebase folks have open sourced their Python (Jython) based Hadoop framework, calling it Happy. Looks interesting, will need to give it a whirl when I get a chance.
]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.freebase.com/" target="_blank">Freebase</a> folks have open sourced their Python (Jython) based Hadoop framework, calling it <a href="http://code.google.com/p/happy/" target="_blank">Happy</a>. Looks interesting, will need to give it a whirl when I get a chance.</p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2008/09/24/happy-hadoop-with-python-jython/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Disco: Erlang/Python Based Map-Reduce</title>
		<link>http://parand.com/say/index.php/2008/09/15/disco-erlangpython-based-map-reduce/</link>
		<comments>http://parand.com/say/index.php/2008/09/15/disco-erlangpython-based-map-reduce/#comments</comments>
		<pubDate>Tue, 16 Sep 2008 05:53:00 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Scaling]]></category>
		<category><![CDATA[map-reduce]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=675</guid>
		<description><![CDATA[Disco is a map-reduce framework written in Erlang and Python. Seems reasonable &#8211; I definitely prefer Python to Java for writing maps and reduces, and Erlang is rumored to be good at parallel stuff.
Interestingly no mention of an underlying distributed file system.
Via High Scalability.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://discoproject.org/" target="_blank">Disco</a> is a map-reduce framework written in Erlang and Python. Seems reasonable &#8211; I definitely prefer Python to Java for writing maps and reduces, and Erlang is rumored to be good at parallel stuff.</p>
<p>Interestingly no mention of an underlying distributed file system.</p>
<p>Via <a href="http://highscalability.com/mapreduce-framework-disco" target="_blank">High Scalability</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2008/09/15/disco-erlangpython-based-map-reduce/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Drizzle: MySQL Based Slim / Cloud-Oriented DB</title>
		<link>http://parand.com/say/index.php/2008/09/05/drizzle-mysql-based-slim-cloud-oriented-db/</link>
		<comments>http://parand.com/say/index.php/2008/09/05/drizzle-mysql-based-slim-cloud-oriented-db/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 18:27:10 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Scaling]]></category>
		<category><![CDATA[drizzel]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=668</guid>
		<description><![CDATA[Drizzle is interesting:
Drizzle: A High-Performance Microkernel DBMS for Scale-Out Applications
Drizzle is a community-driven project based on the popular MySQL DBMS that is focused on MySQL&#8217;s original goals of ease-of-use, reliability and performance.
Headed up by Brian Aker, Director of Architecture at MySQL AB. Take a look at the MySQL Differences page and you&#8217;ll mostly see features [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://drizzle.wikia.com/wiki/Drizzle_Wiki" target="_blank">Drizzle</a> is interesting:</p>
<blockquote><p>Drizzle: A High-Performance Microkernel DBMS for Scale-Out Applications<br />
Drizzle is a community-driven project based on the popular MySQL DBMS that is focused on MySQL&#8217;s original goals of ease-of-use, reliability and performance.</p></blockquote>
<p>Headed up by <a href="http://en.wikipedia.org/wiki/Brian_Aker" target="_blank">Brian Aker</a>, Director of Architecture at MySQL AB. Take a look at the <a href="http://drizzle.wikia.com/wiki/MySQL_Differences" target="_blank">MySQL Differences</a> page and you&#8217;ll mostly see features <em>removed</em> and cleaned up, which is great. Designed for high levels of concurrency, targeted to &#8220;cloud&#8221; applications. <a href="http://monty-says.blogspot.com/2008/07/what-if.html" target="_blank">Monty</a> and <a href="http://krow.livejournal.com/602409.html" target="_blank">Brian&#8217;s</a> posts offer motivation for the project.</p>
<p>Something to keep an eye on.</p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2008/09/05/drizzle-mysql-based-slim-cloud-oriented-db/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Performance Penalty of Virtualization</title>
		<link>http://parand.com/say/index.php/2008/06/27/the-performance-penalty-of-virtualization/</link>
		<comments>http://parand.com/say/index.php/2008/06/27/the-performance-penalty-of-virtualization/#comments</comments>
		<pubDate>Sat, 28 Jun 2008 05:29:42 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Scaling]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[vmware]]></category>
		<category><![CDATA[xen]]></category>

		<guid isPermaLink="false">http://parand.com/say/?p=600</guid>
		<description><![CDATA[If you&#8217;ve spent any time with virtualized environments you know how effective and productive they are. The process of expanding capacity for FaceDouble, for example, became significantly simpler once they moved to depolying virtual servers, and SmugMug has been singing the praises of Amazon&#8217;s EC2 with a clever system to provision and remove capacity based [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve spent any time with virtualized environments you know how effective and productive they are. The process of expanding capacity for <a href="http://faceddouble.com/" target="_blank">FaceDouble</a>, for example, became significantly simpler once they moved to depolying virtual servers, and SmugMug has been <a href="http://blogs.smugmug.com/don/2008/06/03/skynet-lives-aka-ec2-smugmug/" target="_blank">singing the praises of Amazon&#8217;s EC2</a> with a clever system to provision and remove capacity based on load. My own experiments with <a href="http://hadoop.apache.org/core/" target="_blank">Hadoop</a> and EC2 have been similarly fruitful.</p>
<p>So I&#8217;m wondering what the downside to aggressively going virtual is &#8211; why not make all servers virtual?</p>
<p>The main issue that comes to mind is performance, or the loss thereof. Presumably the performance of a virtual server is less than that of the same server running directly on the native OS.</p>
<p>Just how much of a performance difference is there, say in terms of per request latency and capacity, for a web server, a database server, and a cpu-bound heavy computation server, for any of the common virtualization systems (Xen, VMWare, etc)? I haven&#8217;t seen any good materials on this, so if you have knowledge or pointers please let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2008/06/27/the-performance-penalty-of-virtualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flickr Capacity Planning Presentation</title>
		<link>http://parand.com/say/index.php/2008/05/13/flickr-capacity-planning-presentation/</link>
		<comments>http://parand.com/say/index.php/2008/05/13/flickr-capacity-planning-presentation/#comments</comments>
		<pubDate>Tue, 13 May 2008 19:53:32 +0000</pubDate>
		<dc:creator>Parand</dc:creator>
				<category><![CDATA[Scaling]]></category>

		<guid isPermaLink="false">http://parand.com/say/index.php/2008/05/13/flickr-capacity-planning-presentation/</guid>
		<description><![CDATA[Unfortunately I missed the Web2.0 Expo this year, but I&#8217;ve been catching up on slides and presentations. I had John Allspaw&#8217;s Capacity Planning For Web Operations open in a tab for several days and finally got to it. Turned out to be much more interesting than I&#8217;d anticipated. Slide 9 &#8211; &#8220;Normal&#8221; growth: 4x increase [...]]]></description>
			<content:encoded><![CDATA[<p>Unfortunately I missed the Web2.0 Expo this year, but I&#8217;ve been catching up on slides and presentations. I had <a target="_new" href="http://www.slideshare.net/jallspaw/capacity-planning-for-web-operations-web20-expo-2008">John Allspaw&#8217;s Capacity Planning For Web Operations</a> open in a tab for several days and finally got to it. Turned out to be much more interesting than I&#8217;d anticipated. Slide 9 &#8211; &#8220;Normal&#8221; growth: 4x increase in photo requests/sec. That&#8217;s pretty obscene. Slide 43 &#8211; diagonal scaling: replacing 67 dual core servers with 18 dual quads results in ~half the load per server. Slide 45: ~70% less power usage, 49U less space. I&#8217;d been curious about that last stat (power usage of horizontally scaled servers versus multitude of smaller servers), good to see some real numbers for it.</p>
<div style="width:425px;text-align:left" id="__ss_372867"><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=web20expocapacityplanning-1209164375125178-9"/><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=web20expocapacityplanning-1209164375125178-9" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;"><a href="http://www.slideshare.net/?src=embed"><img src="http://static.slideshare.net/swf/logo_embd.png" style="border:0px none;margin-bottom:-5px" alt="SlideShare"/></a> | <a href="http://www.slideshare.net/jallspaw/capacity-planning-for-web-operations-web20-expo-2008?src=embed" title="View 'Capacity Planning for Web Operations - Web20 Expo 2008' on SlideShare">View</a> | <a href="http://www.slideshare.net/upload?src=embed">Upload your own</a></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://parand.com/say/index.php/2008/05/13/flickr-capacity-planning-presentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

