Preserve JavascriptDB: Yet Another Non-Traditional Data Store
0Non-traditional data stores are coming fast and furious these days. Here’s another interesting one: Preserve with JavascriptDB. This one I’d like to check out.
Non-traditional data stores are coming fast and furious these days. Here’s another interesting one: Preserve with JavascriptDB. This one I’d like to check out.
As with most profound experiences in my life this took place on a late Southwest flight after a long, exhausting day. The young man seated a few seats away began to scream, threaten, move violently, and curse wildly as the plane took off.
The initial reaction of people close by was fear, disgust, and horror, followed by understanding, compassion, and finally by a genteel unspoken agreement to pretend he doesn’t exist.
The young man suffered from Tourette’s syndrome.
I realized at that moment that taking offense is a choice.
We chose not to be offended by this young man due to his condition.
Knowing this I’ve gained a magic power. I can chose not to take offense. I can choose not to be bothered by things that really should bother me.
And once in a while I actually employ this power – instead of fuming and screaming bloody murder, sometimes I choose simply not to be offended and move on.
Not often enough though.
The Twitter folks decide to use Scala, and one of their prominent developers decides to write a book about it. Interesting, motivates me to take a look at Scala.
Particularly interesting is their happiness with the type system in Scala. I’ve found happiness with duck typing, and these guys are moving away from duck typing to something else, so another viewpoint for me to check out. Good.
But – blaspheme – they’re using Scala to replace Ruby. The Ruby community is incensed. Did these guys do their homework? Did they research every possible queuing system in existence before writing their own? Did they not try JRuby? Surely there’s a way to make it work with Ruby. These guys must be incompetent, lazy, or just plain stupid.
I’m not going to link to all the drama, but here is one of the most reasonable, well written criticisms.
Now this is a reasonable criticism, and the comments do provide a good bit of insight and justification. Heck, even one of the authors of RabbitMQ justifies why the Twitter guys decided not to use RabbitMQ.
Fine. But this thing with the Ruby community getting bent out of shape whenever someone decides to use another language is getting old. From all appearances the Twitter folks did much more evaluation and study than 95% of the rest of the world would have. They decided to use something else. They’re writing a book about it.
So move on. Somebody found something they like better than Ruby. Shocking.
Not everybody is going to like your system. I thought DHH had already expressed how he feels about what he judges to be extraneous requirements. I think DHH meant he doesn’t care. Looks like the rest of the Ruby crowd cares deeply, religiously, fervently.
The title pretty much says it; I’m planning an SJC/SF trip next week, if you want to meet or chat drop me a line – Darugar at gmail.
My image, mp3, and ebook collection were a mess after years of copying to various servers, consolidating, and re-copying. I had lots of duplicates.
I looked for an app to find and remove duplicates but surprisingly didn’t find anything very good. So I had to write my own.
This is a very simple script – it scans the directory tree you specify, looks for exact duplicates, and removes the duplicates.
It’s not very smart about which copy it removes. It’s not smart about finding files that are “similar” – it only finds exact matches. It ignores small files (intentionally – it’s easy to make it deal with small files).
It uses /temp for its output and cache files, so it’s targeting windows. Change that to /tmp if you’re running unix.
I built in a caching mechanism to save the results of scanning the disk, but it turned out not to be too useful and the script ran faster than I expected, so the caching is commented out.
Here it is: FileInfo.py .
I’m trying to examine and modify form variables from jQuery by catching the submit event. jQuery has a serializeArray method that hands you the form variables in a nice array. For example:
$('#someform').submit( function() {
$.post("/some/url/", $(this).serializeArray(),
function(data){
console.log(data);
}, "json");
return false;
} );
This is great, but the result of serializeArray is an integer indexed array whose values are (key,value) pairs. Eg.
var data = $(this).serializeArray(),
console.log( data[0] );
>> output: Object name=somename value=537
I’m wondering why the array looks like this, instead of being a dictionary (associate array, hash, or whatever you want to call it) such that the keys are “name”s and values are “value”s. Eg.
var data = $(this).serializeArray(),
console.log( data.somename );
>> output: 537
Anybody know the answer?
In most programming languages (Java, C, Python, Perl) I’m generally thinking “I’ll put this thing on this shelf here, then I’ll do x, then I’ll pick up that thing, do some work on it, put the result over here,” and so forth.
With Javascript, particularly when used correctly, which for me means in the Way Of JQuery, the thought process is more like “When some event happens, this guy will wake up and he’ll know what to do. He’ll remember his name, what he was supposed to work on, and he’ll be carrying his own tools. He might get blocked at some point, but then he’ll just wait around and when he’s ready to go he’ll remember who he is, what he was doing, and how far along doing it he was. And when he’s done he’ll go away and along with him will go his tools and any other mess he made”.
Javascript is a lot more “guy with the thing” thinking instead of “what’s on this shelf here?” thinking. I guess that’s called closures, or something like that. Anyway, I’m liking it.
Photo by St-Even.
This is kind of pretty:
void log(lazy char[] dg)
{
if (logging)
fwritefln(logfile, dg());
}
void foo(int i)
{
log("Entering foo() with i set to " ~ toString(i));
}
Note the lazy keyword in the definition of the log function, which tells D to only evaluate the value if needed (ie. lazily).
Nice. Smells a little like Twisted’s deferred business, except different.
Via Raganwald.
If you’ve spent any time with virtualized environments you know how effective and productive they are. The process of expanding capacity for FaceDouble, for example, became significantly simpler once they moved to depolying virtual servers, and SmugMug has been singing the praises of Amazon’s EC2 with a clever system to provision and remove capacity based on load. My own experiments with Hadoop and EC2 have been similarly fruitful.
So I’m wondering what the downside to aggressively going virtual is – why not make all servers virtual?
The main issue that comes to mind is performance, or the loss thereof. Presumably the performance of a virtual server is less than that of the same server running directly on the native OS.
Just how much of a performance difference is there, say in terms of per request latency and capacity, for a web server, a database server, and a cpu-bound heavy computation server, for any of the common virtualization systems (Xen, VMWare, etc)? I haven’t seen any good materials on this, so if you have knowledge or pointers please let me know.
Looks like Time Warner has dicontinued usenet service:

A good reason to leave Time Warner. I currently have the majority of my services through them (cable, phone, web), I’ll be looking for alternatives.
I find seeing the results of Pig commands on sample data a good companion to the PigLatin language reference, so I setup some simple sample data and ran commands, capturing the results.Here’s the sample data as well as the commands:
/data/one:
a A 1
b B 2
c C 3
a AA 11
a AAA 111
b BB 22
/data/two:
x X a
y Y b
x XX b
z Z c
Pig commands and their results:
one = load 'data/one' using PigStorage();
two = load 'data/two' using PigStorage();
generated = FOREACH one GENERATE $0, $2;
(a, 1)
(b, 2)
(c, 3)
(a, 11)
(a, 111)
(b, 22)
grouped = GROUP one BY $0;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)})
(b, {(b, B, 2), (b, BB, 22)})
(c, {(c, C, 3)})
grouped2 = GROUP one BY ($0, $1);
((a, A), {(a, A, 1)})
((a, AA), {(a, AA, 11)})
((a, AAA), {(a, AAA, 111)})
((b, B), {(b, B, 2)})
((b, BB), {(b, BB, 22)})
((c, C), {(c, C, 3)})
summed = FOREACH grouped GENERATE group, SUM(one.$2);
(a, 123.0)
(b, 24.0)
(c, 3.0)
counted = FOREACH grouped GENERATE group, COUNT(one);
(a, 3)
(b, 2)
(c, 1)
flat = FOREACH grouped GENERATE FLATTEN(one);
(a, A, 1)
(a, AA, 11)
(a, AAA, 111)
(b, B, 2)
(b, BB, 22)
(c, C, 3)
cogrouped = COGROUP one BY $0, two BY $2;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)}, {(x, X, a)})
(b, {(b, B, 2), (b, BB, 22)}, {(y, Y, b), (x, XX, b)})
(c, {(c, C, 3)}, {(z, Z, c)})
flatc = FOREACH cogrouped GENERATE FLATTEN(one.($0,$2)), FLATTEN(two.$1);
(a, 1, X)
(a, 11, X)
(a, 111, X)
(b, 2, Y)
(b, 22, Y)
(b, 2, XX)
(b, 22, XX)
(c, 3, Z)
joined = JOIN one BY $0, two BY $2;
(a, A, 1, x, X, a)
(a, AA, 11, x, X, a)
(a, AAA, 111, x, X, a)
(b, B, 2, y, Y, b)
(b, BB, 22, y, Y, b)
(b, B, 2, x, XX, b)
(b, BB, 22, x, XX, b)
(c, C, 3, z, Z, c)
crossed = CROSS one, two;
(a, AA, 11, z, Z, c)
(a, AA, 11, x, XX, b)
(a, AA, 11, y, Y, b)
(a, AA, 11, x, X, a)
(c, C, 3, z, Z, c)
(c, C, 3, x, XX, b)
(c, C, 3, y, Y, b)
(c, C, 3, x, X, a)
(b, BB, 22, z, Z, c)
(b, BB, 22, x, XX, b)
(b, BB, 22, y, Y, b)
(b, BB, 22, x, X, a)
(a, AAA, 111, x, XX, b)
(b, B, 2, x, XX, b)
(a, AAA, 111, z, Z, c)
(b, B, 2, z, Z, c)
(a, AAA, 111, y, Y, b)
(b, B, 2, y, Y, b)
(b, B, 2, x, X, a)
(a, AAA, 111, x, X, a)
(a, A, 1, z, Z, c)
(a, A, 1, x, XX, b)
(a, A, 1, y, Y, b)
(a, A, 1, x, X, a)
SPLIT one INTO one_under IF $2 < 10, one_over IF $2 >= 10;
-- one_under:
(a, A, 1)
(b, B, 2)
(c, C, 3)
Say you want to monitor changes to instances of a model and update something based on the changes. In my example I wanted to maintain a sum of the values that had certain characteristics. You can accomplish this with Django Signals.
Signals are events that fire at various pre-defined moments – for example, before an instance is saved, after it’s saved, etc. You can subscribe to these events, allowing your callback handler to be called at those moments.
The code below subscribes to the post_init and post_save signals. post_init gets triggered when a model’s __init__ class is done executing, which generally means when a model instance is created for the first time or instantiated from a query to the DB. This is actually too frequent for the use case I have in mind (checking the before-modification and after-modification values of certain fields), but seems to be the only place I can hook in to get the pre-modification values.
post_init gets triggered after the instance is saved to the DB. The code below stores the pre-modification values in pre_save when it gets triggered by the post_init signal, and checks them against the post-modification values when it gets triggered by the post_save signal.
Note that you’ll probably want to clean up pre_save periodically. Unfortunately post_init and post_save are not symmetrical (you’ll get a post_init anytime an instance is created, for example when you query the DB), so you can’t simply delete from pre_save when the post_save signal gets triggered.
from django.dispatch import dispatcher
from django.db.models import signals
pre_save = {}
def change_watcher(sender, instance, signal, *args, **kwargs):
print "SIGNAL:", sender, instance.report, signal, args, kwargs
if signal == signals.post_init:
pre_save[instance.id] = (instance.field1, instance.field2)
else:
if pre_save[instance.id][0] != instance.field1:
print "Changed field1"
if pre_save[instance.id][1] != instance.field2:
print "Changed field2"
for signal in (signals.post_init, signals.post_save):
dispatcher.connect(change_watcher, sender = Expense, signal = signal)
My wife came back from the movie last night, not happy. “It was horrible”. “Depressing”. “Too serious”.
So there you have it. Not good, apparently.
I needed to test a site on IE6. Normally this involves stealing my wife’s laptop, since it’s the last computer in the house to still have IE6.
Today I was too lazy to go downstairs, so instead I decided to give the IE6 Image a try. This is basically a windows virtual machine image with IE6 loaded and ready to go.
The download was fairly large, but the install was easy, all the defaults just worked, and it’s now running smoothly in its own little sandbox on my Vista box. Well done Microsoft, and well done virtualization.
Well, it’s official – I’m an iPhone fanboy. This thing is just fantastic, one of those “just works” deals. I’m supposed to give it to the wife, I’m gonna miss it dearly…
Well, after a ridiculous amount of procrastination I finally bought the iPhone. I’m using it to post this. So far: amazing, revolutionary, frustrating, imperfect, and really very nice. Typing is by far the least successful part of this thing. Thank god for the auto correction, I don’t think I’ve spelled a single word correctly yet.
Now to turn off the incredibly annoying key click…

I attended a talk by the dean of UCSD’s Jacobs school of Engineering last night. Besides his pictures of the Bauhaus, one of his most interesting slides tracked engineering student enrollment over the last few years. Since the dot-com bust enrollment declined significantly – for CS majors I believe it dropped to 1/3 of the pre-bust levels. I asked him if this was a UCSD only phenomenon or wide-spread, and he told me it was across all US schools, and possibly world-wide. Interestingly graduate student enrollment didn’t suffer the same fate; perhaps it was all those graduates going back to school instead of getting jobs.
Apparently enrollment is once again picking up. I was fairly shocked by the numbers however – a 2/3 drop. Daniel Lemire opines that there is no shortage of IT workers - which may be true. I don’t really think of UCSD CS grads as IT workers – you wouldn’t find them administering the local mail server, for example – but surely a 2/3 drop in supply of qualified graduates hurts. As someone hiring for “technically sophisticated” problems I can tell you we could use more fresh faces.
I’m finally getting a chance to actually implement map/reduce instead of read or write about it. General impressions so far: