Archive for the 'Uncategorized' Category


Preserve JavascriptDB: Yet Another Non-Traditional Data Store

0

Non-traditional data stores are coming fast and furious these days. Here’s another interesting one: Preserve with JavascriptDB. This one I’d like to check out.

On Magic Powers

1

As with most profound experiences in my life this took place on a late Southwest flight after a long, exhausting day. The young man seated a few seats away began to scream, threaten, move violently, and curse wildly as the plane took off.

The initial reaction of people close by was fear, disgust, and horror, followed by understanding, compassion, and finally by a genteel unspoken agreement to pretend he doesn’t exist.

The young man suffered from Tourette’s syndrome.

I realized at that moment that taking offense is a choice.

We chose not to be offended by this young man due to his condition.

Knowing this I’ve gained a magic power. I can chose not to take offense. I can choose not to be bothered by things that really should bother me.

And once in a while I actually employ this power – instead of fuming and screaming bloody murder, sometimes I choose simply not to be offended and move on.

Not often enough though.

Twitter Scala/Ruby Drama

9

The Twitter folks decide to use Scala, and one of their prominent developers decides to write a book about it. Interesting, motivates me to take a look at Scala.

Particularly interesting is their happiness with the type system in Scala. I’ve found happiness with duck typing, and these guys are moving away from duck typing to something else, so another viewpoint for me to check out. Good.

But – blaspheme – they’re using Scala to replace Ruby. The Ruby community is incensed. Did these guys do their homework? Did they research every possible queuing system in existence before writing their own? Did they not try JRuby? Surely there’s a way to make it work with Ruby. These guys must be incompetent, lazy, or just plain stupid.

I’m not going to link to all the drama, but here is one of the most reasonable, well written criticisms.

Now this is a reasonable criticism, and the comments do provide a good bit of insight and justification. Heck, even one of the authors of RabbitMQ justifies why the Twitter guys decided not to use RabbitMQ.

Fine. But this thing with the Ruby community getting bent out of shape whenever someone decides to use another language is getting old. From all appearances the Twitter folks did much more evaluation and study than 95% of the rest of the world would have. They decided to use something else. They’re writing a book about it.

So move on. Somebody found something they like better than Ruby. Shocking.

Not everybody is going to like your system. I thought DHH had already expressed how he feels about what he judges to be extraneous requirements. I think DHH meant he doesn’t care. Looks like the rest of the Ruby crowd cares deeply, religiously, fervently.

San Jose San Fran Next Week, Drop Me A Line If You Want To Meet

0

The title pretty much says it; I’m planning an SJC/SF trip next week, if you want to meet or chat drop me a line – Darugar at gmail.

Python Script For Finding And Removing Duplicate Files

5

My image, mp3, and ebook collection were a mess after years of copying to various servers, consolidating, and re-copying. I had lots of duplicates.

I looked for an app to find and remove duplicates but surprisingly didn’t find anything very good. So I had to write my own.

This is a very simple script – it scans the directory tree you specify, looks for exact duplicates, and removes the duplicates.

It’s not very smart about which copy it removes. It’s not smart about finding files that are “similar” – it only finds exact matches. It ignores small files (intentionally – it’s easy to make it deal with small files).

It uses /temp for its output and cache files, so it’s targeting windows. Change that to /tmp if you’re running unix.

I built in a caching mechanism to save the results of scanning the disk, but it turned out not to be too useful and the script ran faster than I expected, so the caching is commented out.

Here it is: FileInfo.py .

That’s How I Roll, MoFo

0

Funny.

jQuery serializeArray: Why Not An Associative Array?

7

I’m trying to examine and modify form variables from jQuery by catching the submit event. jQuery has a serializeArray method that hands you the form variables in a nice array. For example:


	$('#someform').submit( function() {
		$.post("/some/url/", $(this).serializeArray(),
			function(data){
			    console.log(data);
			}, "json");
		return false;
	 } );

This is great, but the result of serializeArray is an integer indexed array whose values are (key,value) pairs. Eg.


	var data = $(this).serializeArray(),
	console.log( data[0] );
>> output: Object name=somename value=537

I’m wondering why the array looks like this, instead of being a dictionary (associate array, hash, or whatever you want to call it) such that the keys are “name”s and values are “value”s. Eg.


	var data = $(this).serializeArray(),
	console.log( data.somename );
>> output: 537

Anybody know the answer?

Javascript Is The Guy With The Thing

2

Man with ShovelIn most programming languages (Java, C, Python, Perl) I’m generally thinking “I’ll put this thing on this shelf here, then I’ll do x, then I’ll pick up that thing, do some work on it, put the result over here,” and so forth.

With Javascript, particularly when used correctly, which for me means in the Way Of JQuery, the thought process is more like “When some event happens, this guy will wake up and he’ll know what to do. He’ll remember his name, what he was supposed to work on, and he’ll be carrying his own tools. He might get blocked at some point, but then he’ll just wait around and when he’s ready to go he’ll remember who he is, what he was doing, and how far along doing it he was. And when he’s done he’ll go away and along with him will go his tools and any other mess he made”.

Javascript is a lot more “guy with the thing” thinking instead of “what’s on this shelf here?” thinking. I guess that’s called closures, or something like that. Anyway, I’m liking it.

Photo by St-Even.

D Lazy Evaluation Prettyness

0

This is kind of pretty:


void log(lazy char[] dg)
{
    if (logging)
	fwritefln(logfile, dg());
}

void foo(int i)
{
    log("Entering foo() with i set to " ~ toString(i));
}

Note the lazy keyword in the definition of the log function, which tells D to only evaluate the value if needed (ie. lazily).

Nice. Smells a little like Twisted’s deferred business, except different.

Via Raganwald.

The Performance Penalty of Virtualization

0

If you’ve spent any time with virtualized environments you know how effective and productive they are. The process of expanding capacity for FaceDouble, for example, became significantly simpler once they moved to depolying virtual servers, and SmugMug has been singing the praises of Amazon’s EC2 with a clever system to provision and remove capacity based on load. My own experiments with Hadoop and EC2 have been similarly fruitful.

So I’m wondering what the downside to aggressively going virtual is – why not make all servers virtual?

The main issue that comes to mind is performance, or the loss thereof. Presumably the performance of a virtual server is less than that of the same server running directly on the native OS.

Just how much of a performance difference is there, say in terms of per request latency and capacity, for a web server, a database server, and a cpu-bound heavy computation server, for any of the common virtualization systems (Xen, VMWare, etc)? I haven’t seen any good materials on this, so if you have knowledge or pointers please let me know.

Time Warner Discontinues Usenet

0

Looks like Time Warner has dicontinued usenet service:

Time Warner Ends Usenet

A good reason to leave Time Warner. I currently have the majority of my services through them (cable, phone, web), I’ll be looking for alternatives.

Pig (Hadoop) Commands And Sample Results

3

I find seeing the results of Pig commands on sample data a good companion to the PigLatin language reference, so I setup some simple sample data and ran commands, capturing the results.Here’s the sample data as well as the commands:

/data/one:


a	A	1
b	B	2
c	C	3
a	AA	11
a	AAA	111
b	BB	22

/data/two:


x	X	a
y	Y	b
x	XX	b
z	Z	c

Pig commands and their results:


one = load 'data/one' using PigStorage();
two = load 'data/two' using PigStorage();

generated = FOREACH one GENERATE $0, $2;
(a, 1)
(b, 2)
(c, 3)
(a, 11)
(a, 111)
(b, 22)

grouped = GROUP one BY $0;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)})
(b, {(b, B, 2), (b, BB, 22)})
(c, {(c, C, 3)})

grouped2 = GROUP one BY ($0, $1);
((a, A), {(a, A, 1)})
((a, AA), {(a, AA, 11)})
((a, AAA), {(a, AAA, 111)})
((b, B), {(b, B, 2)})
((b, BB), {(b, BB, 22)})
((c, C), {(c, C, 3)})

summed = FOREACH grouped GENERATE group, SUM(one.$2);
(a, 123.0)
(b, 24.0)
(c, 3.0)

counted = FOREACH grouped GENERATE group, COUNT(one);
(a, 3)
(b, 2)
(c, 1)

flat = FOREACH grouped GENERATE FLATTEN(one);
(a, A, 1)
(a, AA, 11)
(a, AAA, 111)
(b, B, 2)
(b, BB, 22)
(c, C, 3)

cogrouped = COGROUP one BY $0, two BY $2;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)}, {(x, X, a)})
(b, {(b, B, 2), (b, BB, 22)}, {(y, Y, b), (x, XX, b)})
(c, {(c, C, 3)}, {(z, Z, c)})

flatc = FOREACH cogrouped GENERATE FLATTEN(one.($0,$2)), FLATTEN(two.$1);
(a, 1, X)
(a, 11, X)
(a, 111, X)
(b, 2, Y)
(b, 22, Y)
(b, 2, XX)
(b, 22, XX)
(c, 3, Z)

joined = JOIN one BY $0, two BY $2;
(a, A, 1, x, X, a)
(a, AA, 11, x, X, a)
(a, AAA, 111, x, X, a)
(b, B, 2, y, Y, b)
(b, BB, 22, y, Y, b)
(b, B, 2, x, XX, b)
(b, BB, 22, x, XX, b)
(c, C, 3, z, Z, c)

crossed = CROSS one, two;
(a, AA, 11, z, Z, c)
(a, AA, 11, x, XX, b)
(a, AA, 11, y, Y, b)
(a, AA, 11, x, X, a)
(c, C, 3, z, Z, c)
(c, C, 3, x, XX, b)
(c, C, 3, y, Y, b)
(c, C, 3, x, X, a)
(b, BB, 22, z, Z, c)
(b, BB, 22, x, XX, b)
(b, BB, 22, y, Y, b)
(b, BB, 22, x, X, a)
(a, AAA, 111, x, XX, b)
(b, B, 2, x, XX, b)
(a, AAA, 111, z, Z, c)
(b, B, 2, z, Z, c)
(a, AAA, 111, y, Y, b)
(b, B, 2, y, Y, b)
(b, B, 2, x, X, a)
(a, AAA, 111, x, X, a)
(a, A, 1, z, Z, c)
(a, A, 1, x, XX, b)
(a, A, 1, y, Y, b)
(a, A, 1, x, X, a)

SPLIT one INTO one_under IF $2 < 10, one_over IF $2 >= 10;
-- one_under:
(a, A, 1)
(b, B, 2)
(c, C, 3)

Using Django Signals To Watch For Changes To Instances

4

Say you want to monitor changes to instances of a model and update something based on the changes. In my example I wanted to maintain a sum of the values that had certain characteristics. You can accomplish this with Django Signals.

Signals are events that fire at various pre-defined moments – for example, before an instance is saved, after it’s saved, etc. You can subscribe to these events, allowing your callback handler to be called at those moments.

The code below subscribes to the post_init and post_save signals. post_init gets triggered when a model’s __init__ class is done executing, which generally means when a model instance is created for the first time or instantiated from a query to the DB. This is actually too frequent for the use case I have in mind (checking the before-modification and after-modification values of certain fields), but seems to be the only place I can hook in to get the pre-modification values.

post_init gets triggered after the instance is saved to the DB. The code below stores the pre-modification values in pre_save when it gets triggered by the post_init signal, and checks them against the post-modification values when it gets triggered by the post_save signal.

Note that you’ll probably want to clean up pre_save periodically. Unfortunately post_init and post_save are not symmetrical (you’ll get a post_init anytime an instance is created, for example when you query the DB), so you can’t simply delete from pre_save when the post_save signal gets triggered.


from django.dispatch import dispatcher
from django.db.models import signals

pre_save = {}

def change_watcher(sender, instance, signal, *args, **kwargs):
    print "SIGNAL:", sender, instance.report, signal, args, kwargs
    if signal == signals.post_init:
        pre_save[instance.id] = (instance.field1, instance.field2)
    else:
        if pre_save[instance.id][0] != instance.field1:
            print "Changed field1"
        if pre_save[instance.id][1] != instance.field2:
            print "Changed field2"

for signal in (signals.post_init, signals.post_save):
    dispatcher.connect(change_watcher, sender = Expense, signal = signal)

Static Typing and Breath Mints

1

Laughed out loud at this one:

Static typing is like giving a drunk a bunch of breath mints and saying “Don’t drive drunk. But if you must, use these breath mints in case you get pulled over.”

Via Simon.

Sex and the City Second Hand Review

0

My wife came back from the movie last night, not happy. “It was horrible”. “Depressing”. “Too serious”.

So there you have it. Not good, apparently.

IE6 Image Works Suprisingly Well

0

I needed to test a site on IE6. Normally this involves stealing my wife’s laptop, since it’s the last computer in the house to still have IE6.

Today I was too lazy to go downstairs, so instead I decided to give the IE6 Image a try. This is basically a windows virtual machine image with IE6 loaded and ready to go.

The download was fairly large, but the install was easy, all the defaults just worked, and it’s now running smoothly in its own little sandbox on my Vista box. Well done Microsoft, and well done virtualization.

Had To Happen

0

Well, it’s official – I’m an iPhone fanboy. This thing is just fantastic, one of those “just works” deals. I’m supposed to give it to the wife, I’m gonna miss it dearly…

Finally Bought An iPhone

5

Well, after a ridiculous amount of procrastination I finally bought the iPhone. I’m using it to post this. So far: amazing, revolutionary, frustrating, imperfect, and really very nice. Typing is by far the least successful part of this thing. Thank god for the auto correction, I don’t think I’ve spelled a single word correctly yet.

Now to turn off the incredibly annoying key click…

Engineering Enrollment Declines

1

I attended a talk by the dean of UCSD’s Jacobs school of Engineering last night. Besides his pictures of the Bauhaus, one of his most interesting slides tracked engineering student enrollment over the last few years. Since the dot-com bust enrollment declined significantly – for CS majors I believe it dropped to 1/3 of the pre-bust levels. I asked him if this was a UCSD only phenomenon or wide-spread, and he told me it was across all US schools, and possibly world-wide. Interestingly graduate student enrollment didn’t suffer the same fate; perhaps it was all those graduates going back to school instead of getting jobs.

Apparently enrollment is once again picking up.  I was fairly shocked by the numbers however – a 2/3 drop. Daniel Lemire opines that there is no shortage of IT workers - which may be true. I don’t really think of UCSD CS grads as IT workers – you wouldn’t find them administering the local mail server, for example – but surely a 2/3 drop in supply of qualified graduates hurts. As someone hiring for “technically sophisticated” problems I can tell you we could use more fresh faces.

Image by KK+

Map/Reduce (Hadoop) First Impressions

2

I’m finally getting a chance to actually implement map/reduce instead of read or write about it. General impressions so far:

  • Hadoop is fairly easy to install and get running.
  • The choice of Java as the default programming language feels strange to me. It’d feel more natural in Perl, Python, or Ruby since most of what you do is read and massage records. (I’m actually using Python with Hadoop Streaming)
  • The map/reduce paradigm is very nice, but doesn’t fit everything. In fact, so far it hasn’t fit anything I’m trying perfectly. It works, but it always feels like you’re shoe-horning the problem into a map/reduce mode. I’m wondering how well it’d work to remove the map/reduce model and make it just a general work distribution mechanism, with map and reduce as easy add-ons. So if I only need a map, or only a reduce, or just a sort, I can do only that. Or, if my map actually produces 2 different sets of output for processing by 2 different sets of reducers, there should be an easy way to do that too.
  • Pig is promising; I haven’t actually used it hands-on yet. A higher level language seems like the right way to go.
  • Hadoop does scale as advertised, at least to the number of boxes I’ve tried so far (30). It’s great to see it crunch through something that used to take 30 minutes in 1 1/2 minutes. I’ll be trying larger clusters soon.

Next Page »