Archive for September, 2007

Parsing HTTP Log Files with Python

0

Couple of quick snippets on parsing apache/http log files (common format) with Python. This is the regular expression for parsing each line of the log:


combined_format_re = re.compile(r'''(?P.*?) -(?P.*?)- \[(?P.*?)\] "(?P.*?)(?P
.*?)(?P\?.*?)? (?P
.*?)" (?P\d*) (?P.*?) "(?P.*?)""(?P.*?)"''')

You can use it ala:


match = combined_format_re.search(line)

And you can get the matches in a convenient hash form via:


fields = match.groupdict()
print fields['useragent']

And while we're at it, here's how you parse the timestamp into a python datetime object:


import datetime
timestamp = datetime.datetime.strptime(fields['date'].split()[0], '%d/%b/%Y:%H:%M:%S')

Wordpress formatting will probably mess up some of the code above; in theory I'll be releasing a small piece of code soon that uses all of these so you can get the source.

Ruby On Rails Fanboys The New Java Zombies?

0

Reading the reactions to Siver’s 7 reasons I switched back to PHP after 2 years on Rails has me thinking: RonR is spawning a whole new generation of fanboys to rival the past generation of Java zombies in their blind devotion and thou-shalt-burn-in-hell-because-you-disagree zealousy. It’s the early indicator of future crapitude when a language/framework/movement gathers this crowd as followers.

Which really is too bad, because Rails has been fantastic in terms of its influence on Web development.

The core RonR guys seem to have good heads on their shoulders, as evidenced by the various joint Django/RonR talks, but the RonR movement sure is attracting a lot of riff-raff…

Read Chris Cummer’s post for a more intelligent take on this.

Finally Decided: Pentax K100D

0

I was in mid-sentence, telling my wife that it’s ok, Kamran (the 4 year old) is not going to damage the Canon S500, when he dropped and broke it. Unlucky really; he’s a good photographer and had used it plenty in the past.

I did a painful amount of research before finally settling on the Pentax K100D.

The primary alternatives for me were the Nikon D40, which looks to be a very nice camera and is in a similar price range (~$500), and the Nikon D50, also a nice camera. The main deciding factors were:

  • The “highly recommended” from the DPReview review. Most other reviews were also quite positive.
  • The price point – $400 after rebate.
  • Image stabilization.
  • Lens selection and pricing.

This will be my first DSLR, I’m really looking forward to learning. It’ll take a while to get here, will post my impressions once I try it out.

JavaScript: The Most Implemented Language?

0

John Resig’s post on JavaScript Engine Speeds got me thinking – there are a lot of solid implementations of the JavaScript language specification. Is JavaScript the language with the most implementations? Are there any other languages that have as many runtimes or as many implementations of the language specification?

Ideal Startup Space = Large Fishbowl

2

I’m with Dick on this one: ideal office space for a startup is a big fishbowl with no/few offices. Read Dick’s post for the benefits; I largely agree with everything he says.

For what it’s worth, Overture used to be the-whole-floor-is-one-big-open-space-with-no-cubicles, so it can be scaled to much larger than 30 people (each floor was probably 100 people). Not that everybody liked it – in fact, many disliked it.

My favorite workspaces: the lab at Scripps C4, which was a bunch of workstations in a large U shape with shared private offices for taking phone calls / naps, and our first space at Binary Evolution, which was a wonderful 500 square foot no-windows former sweat-shop (literally) next to a porn shop. I was probably the only person that like that space though.

Where Are the Python/Ruby/etc Messaging Systems?

3

Scaling requires the ability to decompose apps into pieces which can be spread across multiple boxes. An application composed of multiple pieces needs an efficient, reliable way to communicate between the pieces.

With multiple cores now standard, even in a single box scenario we need pieces that can run independently and communicate efficiently.

Java has this via JMS, with multiple good implementations, both open and closed source.

So where are the high quality, standard communication pipes for scripting languages? Where is the equivalent of JMS for Python?

I realize I can use ActiveMQ with HJB and PyHJB, but I’m loathe to introduce three pieces of new software along with a new VM (the JVM required to run ActiveMQ) into the picture.

What I’m looking for is a standard interface for messaging that allows implementations that optimize both intra and inter box scenarios. It should allow work-queue scenarios as well as pub/sub. It should support persistence / fail-safe modes. It should enable message or event driven apps – ie. have a event loop that’s driven by message availability, ala MDBs in the Java world.

Is there such a thing and I’ve missed it? Or is it that no-one wants it?

I suppose there’s always Gearman and TheSchwartz

Telepresence: I’m a Believer

1

We’ve been using Cisco Telepresence systems for a month or two at work. I have to say, I’m a believer.

My previous experience with video conferencing had left me unimpressed at best. Choppy video in a small screen, not worth the time nor effort.

The cisco system consists of half a room with a roundish table and 8-10 chairs. The other half of the room is displayed on three large screens. People appearing on screen are life-sized and look approximately as though they’re sitting across the table from you. The video is extremely clear and sound is directional – the voice comes from the person speaking.

The important part is that after about 10 minutes you forget the other people are on a teleconference. The body language is there – the eye movements, the subtle noises, everything it takes to make it work.

These things are very costly – something like $300k upfront and several tens of thousands per month to operate. But the absolutely do work. I could see this cutting travel down by a very significant percentage in many situations.

As with all things tech I’m sure the prices will drop as the devices become more common place. I can’t wait till I have one at the house. For now the price tag severely restricts their deployments. Someone should setup it up as a per-use service with locations in all major cities.

If you get a chance you really should give it a try. It’s one of those things you have to experience to get a true sense of. In a few years we’ll all have these and world distances will once again be warped by technology.

jQuery 1.2: Smaller and Faster

0

One more reason to like jQuery: each release gets smaller and faster. One of the few projects that favors efficiency over bloat and actively removes unnecessary features between releases. Smaller and Faster should be the top two features of every piece of software.

Go check out the 1.2 release. Via Simon Willison.

HBO’s Tell Me You Love Me: Skip It

0

With Entourage out of the lineup I decided to give HBO’s new series, Tell Me You Love Me, a try.

Summary: closeups of overacting interspersed with gratuitous, needless, explicit adult content.

The sex scenes seem designed to offend and be in-your-face in their explicitness. I’m not impressed.

The acting hits you over the head. Not in a good way. In one scene the woman twists her lips, as would a 5 year old if you asked her to act, not once but twice to show her consternation. Perhaps it wouldn’t be as noticeable if there weren’t so many closeups.

I forget who said it, reacting to a man with a large nose and a prominent mustache: with a nose like that, I wouldn’t underline it. With actors like these, I wouldn’t close-up so much.

I’m going to be skipping Tell Me You Love Me; despite all the attempted shock, it turns out to contain very little of interest.