Archive for June, 2008

Windows Should Include Virtual Machines For Virus Protection

0

I’m about buy a new laptop for my wife, and it has me thinking about how to keep computers for non-technical folks virus free.

The best method I can think of is to have a virtual machine running on her machine. She can keep her important things on the core operating system (eg. financial stuff, other core software), and do riskier stuff on the virtual machine (eg. email, web surfing). If the virtual machine gets infected, no big deal, blow it away and start a clean image. The virtual machine should operate like parallels: apps running on it should appear as regular windows within the host system so it’s seamless for the user.

The biggest source of viruses on my home computer has been visitors – somehow whenever a niece visits she ends up installing her favorite software, and leaves the computer with either a virus or a trojan of some sort. Visitors would get their own virtual images.

Should be fairly straight forward to do – Virtual PC already works well, and creating a small base Windows image should be quite simple. The concepts are quite complex for end-users, so it’ll take some cleverness to make it understandable. However, Vista’s permission business is already quite a hassle (I turned it off the second day I was on Vista), and this will probably be easier to understand.

D Lazy Evaluation Prettyness

0

This is kind of pretty:


void log(lazy char[] dg)
{
    if (logging)
	fwritefln(logfile, dg());
}

void foo(int i)
{
    log("Entering foo() with i set to " ~ toString(i));
}

Note the lazy keyword in the definition of the log function, which tells D to only evaluate the value if needed (ie. lazily).

Nice. Smells a little like Twisted’s deferred business, except different.

Via Raganwald.

The Performance Penalty of Virtualization

0

If you’ve spent any time with virtualized environments you know how effective and productive they are. The process of expanding capacity for FaceDouble, for example, became significantly simpler once they moved to depolying virtual servers, and SmugMug has been singing the praises of Amazon’s EC2 with a clever system to provision and remove capacity based on load. My own experiments with Hadoop and EC2 have been similarly fruitful.

So I’m wondering what the downside to aggressively going virtual is – why not make all servers virtual?

The main issue that comes to mind is performance, or the loss thereof. Presumably the performance of a virtual server is less than that of the same server running directly on the native OS.

Just how much of a performance difference is there, say in terms of per request latency and capacity, for a web server, a database server, and a cpu-bound heavy computation server, for any of the common virtualization systems (Xen, VMWare, etc)? I haven’t seen any good materials on this, so if you have knowledge or pointers please let me know.

OpenID for Email Verification?

2

I have a need to verify user email addresses, which I’ve been doing the traditional way – sending an email with a secret to the user’s address and having them reply or click on a URL.

Unfortunately this is not optimal – emails tend to not make it to the user, go into bulk/spam buckets, and are less real-time than I’d like. I’m looking for a better way.

I’m hoping OpenID will help me. I mainly care about Yahoo, Google, and Hotmail, all of which support OpenID to some extent.

I believe OpenID Simple Registration is what I’m looking for. I have a lot of homework to do to see which providers support SREG, how to use them, etc. I’ll post my progress here, and if you have knowledge / experience with this, please leave a comment below.

Time Warner Discontinues Usenet

0

Looks like Time Warner has dicontinued usenet service:

Time Warner Ends Usenet

A good reason to leave Time Warner. I currently have the majority of my services through them (cable, phone, web), I’ll be looking for alternatives.

Russia Wins!

1

Just finished watching Russia vs. Netherlands. Most of it anyway – I’d only DVR’d the main match, missed most of the extra time.

What a fantastic game. Arshavin is the real deal, looking forward to seeing more of him. Sorry to all my Dutch buddies, but Russia really looked like they wanted it. I’m always impressed when the underdog comes out to win instead of play to tie.

We don’t get nearly enough football in the US, I’m really enjoying Euro2008 in HD.

Pig (Hadoop) Commands And Sample Results

3

I find seeing the results of Pig commands on sample data a good companion to the PigLatin language reference, so I setup some simple sample data and ran commands, capturing the results.Here’s the sample data as well as the commands:

/data/one:


a	A	1
b	B	2
c	C	3
a	AA	11
a	AAA	111
b	BB	22

/data/two:


x	X	a
y	Y	b
x	XX	b
z	Z	c

Pig commands and their results:


one = load 'data/one' using PigStorage();
two = load 'data/two' using PigStorage();

generated = FOREACH one GENERATE $0, $2;
(a, 1)
(b, 2)
(c, 3)
(a, 11)
(a, 111)
(b, 22)

grouped = GROUP one BY $0;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)})
(b, {(b, B, 2), (b, BB, 22)})
(c, {(c, C, 3)})

grouped2 = GROUP one BY ($0, $1);
((a, A), {(a, A, 1)})
((a, AA), {(a, AA, 11)})
((a, AAA), {(a, AAA, 111)})
((b, B), {(b, B, 2)})
((b, BB), {(b, BB, 22)})
((c, C), {(c, C, 3)})

summed = FOREACH grouped GENERATE group, SUM(one.$2);
(a, 123.0)
(b, 24.0)
(c, 3.0)

counted = FOREACH grouped GENERATE group, COUNT(one);
(a, 3)
(b, 2)
(c, 1)

flat = FOREACH grouped GENERATE FLATTEN(one);
(a, A, 1)
(a, AA, 11)
(a, AAA, 111)
(b, B, 2)
(b, BB, 22)
(c, C, 3)

cogrouped = COGROUP one BY $0, two BY $2;
(a, {(a, A, 1), (a, AA, 11), (a, AAA, 111)}, {(x, X, a)})
(b, {(b, B, 2), (b, BB, 22)}, {(y, Y, b), (x, XX, b)})
(c, {(c, C, 3)}, {(z, Z, c)})

flatc = FOREACH cogrouped GENERATE FLATTEN(one.($0,$2)), FLATTEN(two.$1);
(a, 1, X)
(a, 11, X)
(a, 111, X)
(b, 2, Y)
(b, 22, Y)
(b, 2, XX)
(b, 22, XX)
(c, 3, Z)

joined = JOIN one BY $0, two BY $2;
(a, A, 1, x, X, a)
(a, AA, 11, x, X, a)
(a, AAA, 111, x, X, a)
(b, B, 2, y, Y, b)
(b, BB, 22, y, Y, b)
(b, B, 2, x, XX, b)
(b, BB, 22, x, XX, b)
(c, C, 3, z, Z, c)

crossed = CROSS one, two;
(a, AA, 11, z, Z, c)
(a, AA, 11, x, XX, b)
(a, AA, 11, y, Y, b)
(a, AA, 11, x, X, a)
(c, C, 3, z, Z, c)
(c, C, 3, x, XX, b)
(c, C, 3, y, Y, b)
(c, C, 3, x, X, a)
(b, BB, 22, z, Z, c)
(b, BB, 22, x, XX, b)
(b, BB, 22, y, Y, b)
(b, BB, 22, x, X, a)
(a, AAA, 111, x, XX, b)
(b, B, 2, x, XX, b)
(a, AAA, 111, z, Z, c)
(b, B, 2, z, Z, c)
(a, AAA, 111, y, Y, b)
(b, B, 2, y, Y, b)
(b, B, 2, x, X, a)
(a, AAA, 111, x, X, a)
(a, A, 1, z, Z, c)
(a, A, 1, x, XX, b)
(a, A, 1, y, Y, b)
(a, A, 1, x, X, a)

SPLIT one INTO one_under IF $2 < 10, one_over IF $2 >= 10;
-- one_under:
(a, A, 1)
(b, B, 2)
(c, C, 3)

iTunes SUCKS

2

I revived a bunch of mp3 onto my windows box. I want to get them onto my iPhone. Should be drop-dead simple.

I’ve spent 10 minutes with iTunes, pressed the “Sync” button a ridiculous number of times, and I’m no closer to getting the songs onto the iPhone.

I found an article that suggested creating a special playlist and sync’ing that with the iPhone. Fine. However, when I try to sync that it tells me my iPhone is already sync’d with another computer (the infamous mac) and that sync’ing would erase the existing songs on there.

Why on earth is this difficult? Did apple really assume you’d only ever want to transfer songs from a single computer?

I’m really puzzled and disgusted. I should be able to drag and drop my songs onto the iPhone and magic should happen. Bah.

Turtles Can Fly: See It

0

Turtles Can Fly

We watched Turtles Can Fly tonight. The story of Iraqi-Kurdish refugee children before, during, and just after the US invasion.

The movie is harsh. The life, the sheer numbers, and what they have to do to get by is just harsh.

The movie is well done; it’s not preachy and doesn’t dwell on the negatives. But it’s hard to watch at times.

Well worth seeing, give it a look.

All Internal Awards / Chachkis Should be Cups

0

As I waste yet another styrofoam cup and stare at a desk full of various thank you / congratulations / etc chachkis, it occurs to me that if I was handed a real cup / mug instead of the various junk I currently get I could stop wasting styrofoam. A cup is useful, long lasting, and a good place to put your message / logo.

No more nerf cellphone holder chairs please.

Using Django Signals To Watch For Changes To Instances

4

Say you want to monitor changes to instances of a model and update something based on the changes. In my example I wanted to maintain a sum of the values that had certain characteristics. You can accomplish this with Django Signals.

Signals are events that fire at various pre-defined moments – for example, before an instance is saved, after it’s saved, etc. You can subscribe to these events, allowing your callback handler to be called at those moments.

The code below subscribes to the post_init and post_save signals. post_init gets triggered when a model’s __init__ class is done executing, which generally means when a model instance is created for the first time or instantiated from a query to the DB. This is actually too frequent for the use case I have in mind (checking the before-modification and after-modification values of certain fields), but seems to be the only place I can hook in to get the pre-modification values.

post_init gets triggered after the instance is saved to the DB. The code below stores the pre-modification values in pre_save when it gets triggered by the post_init signal, and checks them against the post-modification values when it gets triggered by the post_save signal.

Note that you’ll probably want to clean up pre_save periodically. Unfortunately post_init and post_save are not symmetrical (you’ll get a post_init anytime an instance is created, for example when you query the DB), so you can’t simply delete from pre_save when the post_save signal gets triggered.


from django.dispatch import dispatcher
from django.db.models import signals

pre_save = {}

def change_watcher(sender, instance, signal, *args, **kwargs):
    print "SIGNAL:", sender, instance.report, signal, args, kwargs
    if signal == signals.post_init:
        pre_save[instance.id] = (instance.field1, instance.field2)
    else:
        if pre_save[instance.id][0] != instance.field1:
            print "Changed field1"
        if pre_save[instance.id][1] != instance.field2:
            print "Changed field2"

for signal in (signals.post_init, signals.post_save):
    dispatcher.connect(change_watcher, sender = Expense, signal = signal)

Django+MySQL: How To Fix Unicode (aka Mysterious Question Marks)

1

If you’re running into the problem where unicode items in your Django / MySQL project are displayed as question marks, here’s the likely problem and solution, found in this django-users thread:

The likely problem is that your MySQL encoding is set to latin1, as opposed to utf8. You can check this via:

 mysqld --verbose --help | grep character-set

You’ll probably see:

character-set-server              latin1

You want this to be uft8. To modify it, edit your my.conf file ( /etc/mysql/my.conf on ubuntu ), adding the following lines to the appropriate sections:


[client]
...
default-character-set = utf8

[mysqld]
...
character-set-server=utf8
collation-server=utf8_unicode_ci
init_connect='set collation_connection = utf8_unicode_ci;'

Now restart mysql:


sudo /etc/init.d/mysql restart

And alter your existing tables to use the utf8 encoding:


mysql your_db_name

alter table your_table_name convert to character set utf8;

And that should do it.

Static Typing and Breath Mints

1

Laughed out loud at this one:

Static typing is like giving a drunk a bunch of breath mints and saying “Don’t drive drunk. But if you must, use these breath mints in case you get pulled over.”

Via Simon.

Startup Equity and What You Stand To Make

0

This is one of the most frequent questions I get about the startup world, and Chris Michel covers it well in this F|R interview, so I’m quoting here:

Typically VPs of early-stage companies get between 1 percent and 1.7 percent of a company. That’s just the benchmark, but it’s what investors will expect to see. The equity structure of a VC-backed company looks like this: The investors own 40 percent; the founder(s) own 40 percent; 20 percent is set aside in an employee option pool. After a round of additional funding, your senior managers may each be diluted from 1.5 percent to 0.75 percent. If you sell the company for $100 million — a very good outcome for a startup — the managers each get $750,000. If you toiled away for five years to build the company, is that worth giving up five years of a great salary? Maybe not.

Babies Eating Ants

0

Step 1: Give the baby some inappropriate sugary thing since mom is not home.

Step 2: Go to the backyard. Get fascinated by the fact that you get get better wifi reception in the backyard than you do inside the house.

Step 3: Come back to consciousness as you realize you haven’t heard from baby in a few minutes.

Step 4: Watch as baby eats the ants that have gathered on his sugary thing that he’s dropped in the backyard.

How To Unlock The BlackBerry Pearl

0

The following worked for me for unlocking the Cingular BlackBerry Pearl I recently got:

  • Figure out your IMEI number. Press *#06# to find it.
  • Go to unlock8800 and pay for your unlock code. Mine arrived via email the next morning. I paid $20.
  • Use the instructions here to enter the unlock code. The instructions you get from the unlock8800 guys aren’t too clear; you don’t need to press each key for MEPPD and MEPP2 twice; ie. to get the letter P you push the [OP] key just once.
  • Note that when you first go the “SIM Card” page you won’t see what you type, but once you enter MEPPD correctly you’ll see a menu.

Your Choice of University Is Key, Short Version

1

In case you didn’t believe my earlier ramblings, here’s the short version:

A small group of schools account for a disproportionate amount of billionaire education. Just 20 universities and colleges account for 52% of the billionaire graduates while 182 schools count for the remainder.

Via Yahoo Finance.

The Hamburger Theory of Threads and Processes

2

Hamburger

You’re busy making hamburgers and suddenly you get lots of customers. You want to scale your service to take care of more customers more quickly.

Threads: all of your workers share a single set of tools, utensils, and the same workspace. One puts mustard on the spreader, turns to grab the bun, and finds that someone else has put ketchup on there while he was turning. So you come up with complex rules about who must ask permission, under what circumstances, for grabbing what tools. Sometimes worker A is waiting for worker B to put down the knife, while B waits for A to put down the cheese, and they end up waiting forever.

Processes: each worker gets his own tools, utensils, and work space. Any sharing is explicit: worker A must intentionally pass the utensil to worker B.

More tools are used with processes, so in some sense it’s less efficient. But the rules are much simpler.

If the tools and utensils are very large and valuable, perhaps threads will work. Picture lots of bees each working on a piece of the beehive.

When the tools are small their duplication is less wasteful. The simplicity of the rules makes it easy for you to get your system going, add new items to the menu, and spend a lot less time worrying about your workers waiting for somebody else to put down the cheese.

Photo by JustABigGeek.

On The Value of Lurkers

0

Don Dodge discussed the very small percentage of content creators versus viewers:

in a group of 100 people online, one will create content, 10 will “interact” with it (commenting or adding to it) and the other 89 will just view it.

True enough. However, it’s important not to discount the value of “just viewing”. Viewing is an expression of attention, and attention is an immensely valuable metric to track. View and click data, for example, play a significant part in how Google/Yahoo/etc rank their search results – the more users click on a given item in the search results, the more prominent rank it gets in the results.

Consider a list of 100 random headlines pulled from random news sources. Consider 100k users viewing those headlines and clicking on the ones that interest them. Immediately you have a system for finding the most interesting stories of the day, simply by tallying which headlines get the most views.

A big part of the power of the web is the latent data that can be gleamed from the day-to-day, non-explicit actions of the masses. Content creators create obvious value, but lurkers and viewers play a vital role in unlocking, exposing, and magnifying the value of that content.