Archive for February, 2008

Starling in Python?

5

Starling looks very interesting – it’s a “light-weight persistent queue server that speaks the MemCache protocol”. To use it you fire up your regular memcached client library, point it at the Starling server, and do a regular set to put an item on the queue, and a get to read an item from the queue.


  # Start the Starling server as a daemonized process:
  starling -h 192.168.1.1 -d

  # Put messages onto a queue:
  require 'memcache'
  starling = MemCache.new('192.168.1.1:22122')
  starling.set('my_queue', 12345)

  # Get messages from the queue:
  require 'memcache'
  starling = MemCache.new('192.168.1.1:22122')
  loop { puts starling.get('my_queue') }

This thing is nice in many ways: it’s very simple with practically no configuration, ala memcached; it’s stable and scalable, running Twitter’s production backend clusters; and it speaks a simple and universally available protocol (memcache), meaning you can use any of the existing client libraries to access it.

This answers half of my request for a Python or Ruby messaging server (it does the work-queue half, doesn’t do the pub/sub half). I think I’m going to give it a try. Let me know if you’ve tried it.

It also has me thinking – with all of the multitude of async-IO capabilities out there for Python, why isn’t there something like this implemented in Python? Between Twisted, asynchat, Eventlet, and the 19 other libraries and toolkits out there, surely somebody smart could whip something together in short order?

Mille Fleurs: Disappointed

0

Mille Fleurs is one of San Diego’s better known high end restaurants. We ate there with some friends last night and once again I was disappointed. The lobster bisque was luke-warm and a tad salty. The steak was good, but only cooked on one end (I’m not even sure how that’s possible). My friend asked for one of the dishes to be prepared without meat and ended up with an extremely bland pasta, devoid of any flavor. The lamb was ok, but my wife found the lamb we had at Robbie and Julie’s much better. The dessert was slightly better fare, but overall quite disappointing. My previous dining experience there was similar – underwhelmed. I think I’m done with Mille Fleurs.

Parsing and Normalizing Dates with Timezones in Python

7

This was a bit painful and not well documented, so documenting here for future reference.

Say you want to parse and normalize dates with timezones (eg. dates in email headers, I believe based on rfc822). Here’s what you do:

Install pytz.


import email, time, datetime
import pytz
utctimestamp = email.Utils.mktime_tz(email.Utils.parsedate_tz( msg['Date'] ))
utcdate= datetime.datetime.fromtimestamp( utctimestamp, pytz.utc )
pacificdate = utcdate.astimezone(pytz.timezone('US/Pacific'))

parsedate_tz produces a tuple that can be digested by mktime_tz, which in turn spits out a timestamp based on the UTC timezone. You can turn this into a datetime via fromtimestamp and set its timezone to UTC. Once you have the TZ aware datetime you can manipulate it to your heart’s content; the final line above converts it to a US/Pacific date.

Full example:


>>> import email, time, datetime
>>> import pytz
>>> date_eastern = 'Thu, 31 Jan 2008 17:56:13 -0500'
>>> utctimestamp = email.Utils.mktime_tz(email.Utils.parsedate_tz( date_eastern ))
>>> utcdate= datetime.datetime.fromtimestamp( utctimestamp, pytz.utc )
>>> utcdate
datetime.datetime(2008, 1, 31, 22, 56, 13, tzinfo=<UTC>)
utcdate.astimezone(pytz.timezone('US/Pacific'))
datetime.datetime(2008, 1, 31, 14, 56, 13, tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)

Jumping on the Obama Bandwagon

0

Obama continues to impress me. Listen to this speech on religion – he actually speaks in intelligent terms, assumes an intelligent audience, and refuses to take a divisive, one-sided stance.

Bella: See It

0

Bella Poster

Bella is an interesting movie. It flashes back and forth in time – my uncle commented the direction reminded him of 21 grams. It strikes a nice balance of being obvious and mysterious at the same time. Worth watching.

Engineering Enrollment Declines

1

I attended a talk by the dean of UCSD’s Jacobs school of Engineering last night. Besides his pictures of the Bauhaus, one of his most interesting slides tracked engineering student enrollment over the last few years. Since the dot-com bust enrollment declined significantly – for CS majors I believe it dropped to 1/3 of the pre-bust levels. I asked him if this was a UCSD only phenomenon or wide-spread, and he told me it was across all US schools, and possibly world-wide. Interestingly graduate student enrollment didn’t suffer the same fate; perhaps it was all those graduates going back to school instead of getting jobs.

Apparently enrollment is once again picking up.  I was fairly shocked by the numbers however – a 2/3 drop. Daniel Lemire opines that there is no shortage of IT workers - which may be true. I don’t really think of UCSD CS grads as IT workers – you wouldn’t find them administering the local mail server, for example – but surely a 2/3 drop in supply of qualified graduates hurts. As someone hiring for “technically sophisticated” problems I can tell you we could use more fresh faces.

Image by KK+

Me Moustache!

2

So I thought International Mustache Month was last month and gave it a belated effort.

Moustache

It’s actually this month, perhaps I’ll go for it again.

Map/Reduce (Hadoop) First Impressions

2

I’m finally getting a chance to actually implement map/reduce instead of read or write about it. General impressions so far:

  • Hadoop is fairly easy to install and get running.
  • The choice of Java as the default programming language feels strange to me. It’d feel more natural in Perl, Python, or Ruby since most of what you do is read and massage records. (I’m actually using Python with Hadoop Streaming)
  • The map/reduce paradigm is very nice, but doesn’t fit everything. In fact, so far it hasn’t fit anything I’m trying perfectly. It works, but it always feels like you’re shoe-horning the problem into a map/reduce mode. I’m wondering how well it’d work to remove the map/reduce model and make it just a general work distribution mechanism, with map and reduce as easy add-ons. So if I only need a map, or only a reduce, or just a sort, I can do only that. Or, if my map actually produces 2 different sets of output for processing by 2 different sets of reducers, there should be an easy way to do that too.
  • Pig is promising; I haven’t actually used it hands-on yet. A higher level language seems like the right way to go.
  • Hadoop does scale as advertised, at least to the number of boxes I’ve tried so far (30). It’s great to see it crunch through something that used to take 30 minutes in 1 1/2 minutes. I’ll be trying larger clusters soon.

No Country For Old Men: See it

0

No Country For Old Men is an odd film. The film ends and you’re sitting there thinking: huh, that was strange…

The next day: that was a good movie, I should watch it again…

The following day: That was a great movie!

It’s the quintessential non-hollywood movie, consistently doing things that make perfect sense but surprise you. Javier Bardem is particularly good.

Go see it.