Scaling Lessons From Amazon: SOA

Good read: ACM Queue’s interview with Werner Vogels, CTO of Amazon:

The first and foremost lesson is a meta-lesson: If applied, strict service orientation is an excellent technique to achieve isolation; you come to a level of ownership and control that was not seen before.

SOA, done correctly, means scale. People are always worried about latency (how long it takes to service a single request), but that can be managed. The ability to scale (how many requests can be handled at the same time) ends up being a huge win for SOA.

The development and operational process has greatly benefited from it as well. The services model has been a key enabler in creating teams that can innovate quickly with a strong customer focus. Each service has a team associated with it, and that team is completely responsible for the service—from scoping out the functionality, to architecting it, to building it, and operating it.

SOA also means wins in development. You establish agreement on interfaces and services and teams can develop independently, allowing you to divide-and-conquer to scale development.

Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view.

This is an interesting one. I don’t have first-hand experience with this in a large company (in a startup the developers are always operations and support anyway), but it sounds plausible.

Do we see that customers who develop applications using AWS care about REST or SOAP? Absolutely not! A small group of REST evangelists continue to use the Amazon Web Services numbers to drive that distinction, but we find that developers really just want to build their applications using the easiest toolkit they can find. They are not interested in what goes on the wire or how request URLs get constructed; they just want to build their applications.

Nice. I’ve used the Amazon REST vs. SOAP adoption numbers myself. I should probably stop doing that.

How do you test in an environment like Amazon? Do we build another Amazon.test somewhere, which has the same number of machines, the same number of data centers, the same number of customers, and the same data sets? … Testing in a very large-scale distributed setting is a major challenge.

Indeed. This is a tough problem. How do you test a very large system that’s essentially impossible to replicate?

Anyway, go read the whole thing, there are lots of lessons on building and running high scale applications in there.

Leave a Reply