Archive for the 'Scaling' Category
Disco: Erlang/Python Based Map-Reduce
1Disco is a map-reduce framework written in Erlang and Python. Seems reasonable - I definitely prefer Python to Java for writing maps and reduces, and Erlang is rumored to be good at parallel stuff.
Interestingly no mention of an underlying distributed file system.
Via High Scalability.
Drizzle: MySQL Based Slim / Cloud-Oriented DB
0Drizzle is interesting:
Drizzle: A High-Performance Microkernel DBMS for Scale-Out Applications
Drizzle is a community-driven project based on the popular MySQL DBMS that is focused on MySQL’s original goals of ease-of-use, reliability and performance.
Headed up by Brian Aker, Director of Architecture at MySQL AB. Take a look at the MySQL Differences page and you’ll mostly see features removed and cleaned up, which is great. Designed for high levels of concurrency, targeted to “cloud” applications. Monty and Brian’s posts offer motivation for the project.
Something to keep an eye on.
The Performance Penalty of Virtualization
0If you’ve spent any time with virtualized environments you know how effective and productive they are. The process of expanding capacity for FaceDouble, for example, became significantly simpler once they moved to depolying virtual servers, and SmugMug has been singing the praises of Amazon’s EC2 with a clever system to provision and remove capacity based on load. My own experiments with Hadoop and EC2 have been similarly fruitful.
So I’m wondering what the downside to aggressively going virtual is - why not make all servers virtual?
The main issue that comes to mind is performance, or the loss thereof. Presumably the performance of a virtual server is less than that of the same server running directly on the native OS.
Just how much of a performance difference is there, say in terms of per request latency and capacity, for a web server, a database server, and a cpu-bound heavy computation server, for any of the common virtualization systems (Xen, VMWare, etc)? I haven’t seen any good materials on this, so if you have knowledge or pointers please let me know.
Flickr Capacity Planning Presentation
0Unfortunately I missed the Web2.0 Expo this year, but I’ve been catching up on slides and presentations. I had John Allspaw’s Capacity Planning For Web Operations open in a tab for several days and finally got to it. Turned out to be much more interesting than I’d anticipated. Slide 9 - “Normal” growth: 4x increase in photo requests/sec. That’s pretty obscene. Slide 43 - diagonal scaling: replacing 67 dual core servers with 18 dual quads results in ~half the load per server. Slide 45: ~70% less power usage, 49U less space. I’d been curious about that last stat (power usage of horizontally scaled servers versus multitude of smaller servers), good to see some real numbers for it.
Manage your expenses via Email, SMS, Twitter, Voice (Jott: Call and say your expense), IM (Yahoo, AIM, MSN), or Web.
