Archive for the 'Uncategorized' Category
jQuery serializeArray: Why Not An Associative Array?
3I’m trying to examine and modify form variables from jQuery by catching the submit event. jQuery has a serializeArray method that hands you the form variables in a nice array. For example:
$('#someform').submit( function() {
$.post("/some/url/", $(this).serializeArray(),
function(data){
console.log(data);
}, "json");
return false;
} );
This is great, but the result of serializeArray is an integer indexed array whose values are (key,value) pairs. Eg.
var data = $(this).serializeArray(),
console.log( data[0] );
>> output: Object name=somename value=537
I’m wondering why the array looks like this, instead of being a dictionary (associate array, hash, or whatever you want to call it) such that the keys are “name”s and values are “value”s. Eg.
var data = $(this).serializeArray(),
console.log( data.somename );
>> output: 537
Anybody know the answer?
Javascript Is The Guy With The Thing
0
In most programming languages (Java, C, Python, Perl) I’m generally thinking “I’ll put this thing on this shelf here, then I’ll do x, then I’ll pick up that thing, do some work on it, put the result over here,” and so forth.
With Javascript, particularly when used correctly, which for me means in the Way Of JQuery, the thought process is more like “When some event happens, this guy will wake up and he’ll know what to do. He’ll remember his name, what he was supposed to work on, and he’ll be carrying his own tools. He might get blocked at some point, but then he’ll just wait around and when he’s ready to go he’ll remember who he is, what he was doing, and how far along doing it he was. And when he’s done he’ll go away and along with him will go his tools and any other mess he made”.
Javascript is a lot more “guy with the thing” thinking instead of “what’s on this shelf here?” thinking. I guess that’s called closures, or something like that. Anyway, I’m liking it.
Photo by St-Even.
D Lazy Evaluation Prettyness
0This is kind of pretty:
void log(lazy char[] dg)
{
if (logging)
fwritefln(logfile, dg());
}
void foo(int i)
{
log(”Entering foo() with i set to ” ~ toString(i));
}
Note the lazy keyword in the definition of the log function, which tells D to only evaluate the value if needed (ie. lazily).
Nice. Smells a little like Twisted’s deferred business, except different.
Via Raganwald.
The Performance Penalty of Virtualization
0If you’ve spent any time with virtualized environments you know how effective and productive they are. The process of expanding capacity for FaceDouble, for example, became significantly simpler once they moved to depolying virtual servers, and SmugMug has been singing the praises of Amazon’s EC2 with a clever system to provision and remove capacity based on load. My own experiments with Hadoop and EC2 have been similarly fruitful.
So I’m wondering what the downside to aggressively going virtual is - why not make all servers virtual?
The main issue that comes to mind is performance, or the loss thereof. Presumably the performance of a virtual server is less than that of the same server running directly on the native OS.
Just how much of a performance difference is there, say in terms of per request latency and capacity, for a web server, a database server, and a cpu-bound heavy computation server, for any of the common virtualization systems (Xen, VMWare, etc)? I haven’t seen any good materials on this, so if you have knowledge or pointers please let me know.
Time Warner Discontinues Usenet
0Looks like Time Warner has dicontinued usenet service:

A good reason to leave Time Warner. I currently have the majority of my services through them (cable, phone, web), I’ll be looking for alternatives.
Pig (Hadoop) Commands And Sample Results
2I find seeing the results of Pig commands on sample data a good companion to the PigLatin language reference, so I setup some simple sample data and ran commands, capturing the results.Here’s the sample data as well as the commands:
/data/one:
a A 1
b B 2
c C 3
a AA 11
a AAA 111
b BB 22
/data/two:
x X a
y Y b
x XX b
z Z c
Pig commands and their results:
one = load 'data/one' using PigStorage();
two = load 'data/two' using PigStorage();
generated = FOREACH one GENERATE $0, $2;
(a, 1)
(b, 2)
(c, 3)
(a, 11)
(a, 111)
(b, 22)
grouped = GROUP one BY $0;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)})
(b, {(b, B, 2), (b, BB, 22)})
(c, {(c, C, 3)})
grouped2 = GROUP one BY ($0, $1);
((a, A), {(a, A, 1)})
((a, AA), {(a, AA, 11)})
((a, AAA), {(a, AAA, 111)})
((b, B), {(b, B, 2)})
((b, BB), {(b, BB, 22)})
((c, C), {(c, C, 3)})
summed = FOREACH grouped GENERATE group, SUM(one.$2);
(a, 123.0)
(b, 24.0)
(c, 3.0)
counted = FOREACH grouped GENERATE group, COUNT(one);
(a, 3)
(b, 2)
(c, 1)
flat = FOREACH grouped GENERATE FLATTEN(one);
(a, A, 1)
(a, AA, 11)
(a, AAA, 111)
(b, B, 2)
(b, BB, 22)
(c, C, 3)
cogrouped = COGROUP one BY $0, two BY $2;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)}, {(x, X, a)})
(b, {(b, B, 2), (b, BB, 22)}, {(y, Y, b), (x, XX, b)})
(c, {(c, C, 3)}, {(z, Z, c)})
flatc = FOREACH cogrouped GENERATE FLATTEN(one.($0,$2)), FLATTEN(two.$1);
(a, 1, X)
(a, 11, X)
(a, 111, X)
(b, 2, Y)
(b, 22, Y)
(b, 2, XX)
(b, 22, XX)
(c, 3, Z)
joined = JOIN one BY $0, two BY $2;
(a, A, 1, x, X, a)
(a, AA, 11, x, X, a)
(a, AAA, 111, x, X, a)
(b, B, 2, y, Y, b)
(b, BB, 22, y, Y, b)
(b, B, 2, x, XX, b)
(b, BB, 22, x, XX, b)
(c, C, 3, z, Z, c)
crossed = CROSS one, two;
(a, AA, 11, z, Z, c)
(a, AA, 11, x, XX, b)
(a, AA, 11, y, Y, b)
(a, AA, 11, x, X, a)
(c, C, 3, z, Z, c)
(c, C, 3, x, XX, b)
(c, C, 3, y, Y, b)
(c, C, 3, x, X, a)
(b, BB, 22, z, Z, c)
(b, BB, 22, x, XX, b)
(b, BB, 22, y, Y, b)
(b, BB, 22, x, X, a)
(a, AAA, 111, x, XX, b)
(b, B, 2, x, XX, b)
(a, AAA, 111, z, Z, c)
(b, B, 2, z, Z, c)
(a, AAA, 111, y, Y, b)
(b, B, 2, y, Y, b)
(b, B, 2, x, X, a)
(a, AAA, 111, x, X, a)
(a, A, 1, z, Z, c)
(a, A, 1, x, XX, b)
(a, A, 1, y, Y, b)
(a, A, 1, x, X, a)
SPLIT one INTO one_under IF $2 < 10, one_over IF $2 >= 10;
– one_under:
(a, A, 1)
(b, B, 2)
(c, C, 3)
Using Django Signals To Watch For Changes To Instances
3Say you want to monitor changes to instances of a model and update something based on the changes. In my example I wanted to maintain a sum of the values that had certain characteristics. You can accomplish this with Django Signals.
Signals are events that fire at various pre-defined moments - for example, before an instance is saved, after it’s saved, etc. You can subscribe to these events, allowing your callback handler to be called at those moments.
The code below subscribes to the post_init and post_save signals. post_init gets triggered when a model’s __init__ class is done executing, which generally means when a model instance is created for the first time or instantiated from a query to the DB. This is actually too frequent for the use case I have in mind (checking the before-modification and after-modification values of certain fields), but seems to be the only place I can hook in to get the pre-modification values.
post_init gets triggered after the instance is saved to the DB. The code below stores the pre-modification values in pre_save when it gets triggered by the post_init signal, and checks them against the post-modification values when it gets triggered by the post_save signal.
Note that you’ll probably want to clean up pre_save periodically. Unfortunately post_init and post_save are not symmetrical (you’ll get a post_init anytime an instance is created, for example when you query the DB), so you can’t simply delete from pre_save when the post_save signal gets triggered.
from django.dispatch import dispatcher
from django.db.models import signals
pre_save = {}
def change_watcher(sender, instance, signal, *args, **kwargs):
print "SIGNAL:", sender, instance.report, signal, args, kwargs
if signal == signals.post_init:
pre_save[instance.id] = (instance.field1, instance.field2)
else:
if pre_save[instance.id][0] != instance.field1:
print “Changed field1″
if pre_save[instance.id][1] != instance.field2:
print “Changed field2″
for signal in (signals.post_init, signals.post_save):
dispatcher.connect(change_watcher, sender = Expense, signal = signal)
Sex and the City Second Hand Review
0My wife came back from the movie last night, not happy. “It was horrible”. “Depressing”. “Too serious”.
So there you have it. Not good, apparently.
IE6 Image Works Suprisingly Well
0I needed to test a site on IE6. Normally this involves stealing my wife’s laptop, since it’s the last computer in the house to still have IE6.
Today I was too lazy to go downstairs, so instead I decided to give the IE6 Image a try. This is basically a windows virtual machine image with IE6 loaded and ready to go.
The download was fairly large, but the install was easy, all the defaults just worked, and it’s now running smoothly in its own little sandbox on my Vista box. Well done Microsoft, and well done virtualization.
Had To Happen
0Well, it’s official - I’m an iPhone fanboy. This thing is just fantastic, one of those “just works” deals. I’m supposed to give it to the wife, I’m gonna miss it dearly…
Finally Bought An iPhone
5Well, after a ridiculous amount of procrastination I finally bought the iPhone. I’m using it to post this. So far: amazing, revolutionary, frustrating, imperfect, and really very nice. Typing is by far the least successful part of this thing. Thank god for the auto correction, I don’t think I’ve spelled a single word correctly yet.
Now to turn off the incredibly annoying key click…
Engineering Enrollment Declines
1
I attended a talk by the dean of UCSD’s Jacobs school of Engineering last night. Besides his pictures of the Bauhaus, one of his most interesting slides tracked engineering student enrollment over the last few years. Since the dot-com bust enrollment declined significantly - for CS majors I believe it dropped to 1/3 of the pre-bust levels. I asked him if this was a UCSD only phenomenon or wide-spread, and he told me it was across all US schools, and possibly world-wide. Interestingly graduate student enrollment didn’t suffer the same fate; perhaps it was all those graduates going back to school instead of getting jobs.
Apparently enrollment is once again picking up. I was fairly shocked by the numbers however - a 2/3 drop. Daniel Lemire opines that there is no shortage of IT workers - which may be true. I don’t really think of UCSD CS grads as IT workers - you wouldn’t find them administering the local mail server, for example - but surely a 2/3 drop in supply of qualified graduates hurts. As someone hiring for “technically sophisticated” problems I can tell you we could use more fresh faces.
Map/Reduce (Hadoop) First Impressions
2I’m finally getting a chance to actually implement map/reduce instead of read or write about it. General impressions so far:
- Hadoop is fairly easy to install and get running.
- The choice of Java as the default programming language feels strange to me. It’d feel more natural in Perl, Python, or Ruby since most of what you do is read and massage records. (I’m actually using Python with Hadoop Streaming)
- The map/reduce paradigm is very nice, but doesn’t fit everything. In fact, so far it hasn’t fit anything I’m trying perfectly. It works, but it always feels like you’re shoe-horning the problem into a map/reduce mode. I’m wondering how well it’d work to remove the map/reduce model and make it just a general work distribution mechanism, with map and reduce as easy add-ons. So if I only need a map, or only a reduce, or just a sort, I can do only that. Or, if my map actually produces 2 different sets of output for processing by 2 different sets of reducers, there should be an easy way to do that too.
- Pig is promising; I haven’t actually used it hands-on yet. A higher level language seems like the right way to go.
- Hadoop does scale as advertised, at least to the number of boxes I’ve tried so far (30). It’s great to see it crunch through something that used to take 30 minutes in 1 1/2 minutes. I’ll be trying larger clusters soon.
New Look, New Day
0The blog went bonkers, couldn’t be easily fixed, so we have a new Wordpress install and a new theme. Lots of things are probably broken; let me know what and I’ll fix’em.
Gmail Contact List API?
33UPDATE: Google has released an official contact list API for GMail. This should supersede the various libraries out there.
I’m looking for an API to extract contact lists from Gmail accounts. I’ve tried both libgmail and gmail.py, and neither work for me, returning “HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.” and “LoginFailure: Wrong username or password.” respectively.
Ian Murdock’s post on the topic brings up the very intriguing possibility of using XMPP to access the contacts, but doesn’t get into details. I might give that a try. If you’ve tried it or have a code sample for how to get at the contacts please leave a comment with details.
There’s also Open Social’s People Data API, but I don’t immediately see the path to getting the contact list and it doesn’t look like GMail supports the API yet (I’m not particularly interested in the Orkut friends list). Perhaps there’s a way in there somewhere.
So if you have a way of getting at the contact list from GMail (or hotmail, MSN, or Yahoo for that matter), leave a comment and clue me in.
awk Example
0This example captures most of the common things I end up doing with awk, so I’m noting it here for future reference:
awk -F, '/00:00:/ { print $3-past, $1, $2, $3; past=$3 }' state9930.rate | sort -n -r > daily.sorted.rate
Which is saying:
- The field separator is “,” (ie. the input file fields are separated by commas).
- For each line that matches “00:00:”
- “$3-past” subtracts the “past” value from the current third field.
- Print the various fields.
- After printing, “past” is set to the current third field
How Yahoo and Google Make Money
26How do Yahoo and Google make money? This is a frequently asked question, so let me give a high level overview.
The short answer is targetted advertising. Why and how does it work?
What do you use Google for? Search. Let’s look at search. Say the user searches for “mountain bike San Diego”. Chances are he’s looking for a place to go mountain biking. Or, perhaps he’s looking to buy a mountain bike. Google will go and find the most relevant Web pages on that topic.
Now imagine you own a mountain bike store. If somebody told you they would let you show your advertisement to this specific user exactly at the moment he’s expressed interest in mountain bikes, while it’s foremost in his mind, would you be interested? Sure you would. That’s what Yahoo and Google do: not only do they find the most relevant Web pages, they also find the most relevant ads and show them to the user in the form of Sponsored Links.
The sponsored links are actually relevant, so some percentage of the people that see them click on them. Note that this is very different from Banner Ads - those are generic ads targetted at a demographic (or sometimes not targetted at all).
Each time a user clicks on your ad, Google or Yahoo has effectively sent you a lead, someone who’s likely to buy something from your mountain bike store.
This lead is valuable to you and you’re willing to pay for it. But how much?
Turns out you’re not the only mountain bike store in San Diego. I own a store too, and I want that same lead. I’m willing to pay for it too.
So how do we determine the price? Very approximately speaking, by bidding on it. It’s sort of an auction.
I want to advertise my mountain bikes. I go to Yahoo or Google and tell them: every time somebody searches on “mountain bike”, show my ad. I do this by specifying a bunch of terms related to mountain bikes, and I provide the text for my ad, and a link to my Web page. Something like:
keyword: mountain bike, offroad bike, offroad bicycle
advertisement: buy my wonderful mountain bikes, they’re the best
url: www.mywonderfulmountainbikes.com
You own a mountain bike store too, so you do a similar thing.
Along with my advertisement, I specify how much I’m willing to pay for each lead. What is a lead? It’s defined as a click on my ad. So my bid says I’m willing to pay X each time someone clicks on my ad.
Note that I pay for clicks, not for my ad being shown (aka impressions). It’s a good deal - I only pay if the user is interested enough in what I offer to click on it.
I specify a bid. You specify a bid too. Approximately speaking, the higher the bid, the more prominently/more frequently the ad is shown. Other factors go into picking the actual ordering of the ads, but bids play a big part.
So there you have it. Lots of advertisers bidding on many millions of keywords, and hundreds of millions of users doing billions of searches, a small charge each time somebody clicks on an ad, and you get a big business.
There are also contextual ads, the ads you see on blogs and other random web pages. Same idea, except instead of using the user’s search term to select the ads, the contents of the page you’re looking at is used. So if you’re looking at a page about mountain bikes, you’d see ads related to mountain bikes.
There’s more to it than this, but to a first order approximation, this is the core of the business.
I Blog
0I have nothing interesting to say. Yet the urge to share my inner-most superficial thoughts with you, dear stranger, is so strong, I must start blogging. All the cool kids are doing it. And so I blog.
Manage your expenses via Email, SMS, Twitter, Voice (Jott: Call and say your expense), IM (Yahoo, AIM, MSN), or Web.