Archive for the 'Python' Category


Using Python-Oauth2 To Access OAuth Protected Resources

0

Most of the examples I found for python-oauth2 show how to use the library to request and approve tokens, but not how to use the access token to access a protected resource (ie. how to actually make a call to the service you’re trying to access). Here’s an example:


import oauth2 as oauth
consumer = oauth.Consumer('consumer-key-here','consumer-secret-here')
token = oauth.Token('access-key-here','access-key-secret-here')
client = oauth.Client(consumer, token)
response = client.request('http://someservice.com/api/something/')

And here’s how you make a POST call:


import urllib
response, content = myclient.request("http://someservice.com/api/something/", \
    method="POST", body=urllib.urlencode({'name': 'value', 'another_name': 'another value'}) )

Django: Using The Permission System

1

I was surprised at how little information I found on making use of Django’s permission system. Here are some quick notes on one way to use it:

Groups are groups of users. For example, you could define a group of users who have premium accounts, or have been verified in some way, or are somehow special:


from django.contrib.auth.models import Group, Permission
special_users = Group(name='Special Users')
special_users.save()
really_special_users = Group(name='Super Special Users')
really_special_users.save()

Now you have two groups defined and can define permissions for them. Django associates permissions with models (note: not model instances, but models). You’ll need to select a model to apply the permissions to, and do a small dance with “ContentType” to find that model’s content type:


from django.contrib.contenttypes.models import ContentType
somemodel_ct = ContentType.objects.get(app_label='myapp', model='somemodel')

can_view = Permission(name='Can View', codename='can_view_something',
                       content_type=somemodel_ct)
can_view.save()

can_modify = Permission(name='Can Modify', codename='can_modify_something',
                       content_type=somemodel_ct)
can_modify.save()

You’ve now defined two permissions and can associate them with your Groups:


special_users.permissions.add(can_view)
really_special_users.permissions = [can_view, can_modify]

Our groups and their associated permissions are ready to go. Now we just have to associate these permissions with users:


jack=User.objects.get(email='jack@test.com')
jack.groups.add(special_users)

jill=User.objects.get(email='jill@test.com')
jill.groups.add(really_special_users)

We’re all done. Now we can check the users’ permissions:


>>> jack.has_perm('myapp.can_view_something')
True
>>> jack.has_perm('myapp.can_modify_something')
False

>>> jill.has_perm('myapp.can_view_something')
True
>>> jill.has_perm('myapp.can_modify_something')
True

And to use it in your templates:


{% if perms.myapp.can_view_something %}
Here is something for you to see.
{% else %}
Can't show you!
{% endif %}

Django-mptt: Tree Storage in Django: A Brief Overview

0

django-mptt is a library for storing tree oriented data using the Django ORM. It allows you to place your model instances into a tree structure and efficiently query for ancestors and children.

Here’s a brief tutorial on how to use it:

After installing, you’ll need to modify your model to include a “parent” field, and register it with mptt:

class Person(models.Model):
    contact   = models.ForeignKey( Contact, db_index=True )
    role      = models.CharField(max_length=20, blank=True)
    parent    = models.ForeignKey('self', null=True, blank=True, related_name='children')

    def __unicode__(self):
        return "Person: <%s>" % (self.contact.email, )

mptt.register(Person)

mptt dynamically adds fields to your model, so you’ll need to syncdb after you’ve added the parent attribute and the mptt.register call to your model.

The basics are fairly easy to use:

To move a node to the root of the tree, use move_to with a targe of None:

person1.move_to(None)
person1.save()

To make a node the child of another, set its parent:

person2.parent = person1
person2.save()

To find the children of a node, use the children field:

>>>person1.children.all()
[<Person: Person: <test2@testing.com>>, <Person: Person: <test3@testing.com>>]

Here’s a little snippet of code to setup a 15 node tree where each node has two child nodes:

[UPDATE] The code in this snippet is not correct – you have to save each node as you update it, then look it up again. You can’t modify a node, save it, then use the reference you already have for it. I’ll update the code when I get a chance

contacts = []
people = []
for n in range(15):
    c = mod.Contact(email="test" + str(n) + "@testing.com")
    c.save()
    contacts.append(c)
    p = mod.Person(contact=c)
    p.save()
    people.append(p)

people[0].move_to(None)  # Root
people[0].save()
for n in range(1,15):
    people[n].parent = people[(n-1)/2]
    people[n].save()

Now let’s take a look around:

>>>people[7].parent
<Person: Person: <test3@testing.com>>

>>>people[3].children.all()
[<Person: Person: <test7@testing.com>>, <Person: Person: <test8@testing.com>>]

Now let’s move things around a bit; we’ll take person3, which is 2 levels down from the root, and make it a direct child of the root:

>>>people[3].parent = people[0]
>>>people[3].save()

>>>people[0].children.all()
[<Person: Person: <test1@testing.com>> <Person: Person: <test2@testing.com>>, <Person: Person: <test3@testing.com>>]

And we can look at the ancestors of a given node:

people[14].get_ancestors()

Eclipse + PyDev : I Recommend It

1

I used to be a vi guy who finally made the move to graphical editors. I looked for the simplest, lightest possible solutions, using ConTEXT for quite a while.

Some years ago I was forced into using Eclipse for reasons I can’t quite recall; probably Java development. I didn’t like it – the forced Project concept, the bloat, the general slowness.

Eventually I got comfortable with it, got PyDev installed, and made it my primary development environment. These days most of my development lives in Eclipse.

With the 1.5 release PyDev included quite a few previously pay-only features in the free / open source version. Since then I’ve found I’ve become even more productive in the environment, and now actually enjoy it.

In particular, the code analysis is very useful. I love the fact that it points out unused imports and variables as well as syntax errors. Going through old code I was surprised at how many spurious imports I had, as well as a few actual errors in code that had been in production for several years in rarely exercised branches.

If you’re doing python development I recommend you take a look at Eclipse+PyDev. I was surprised at the level of increased productivity it brought me.

Python Simple Mock Object

0

Another one of those that’s mainly for my own notes. I needed to create a mock object that always a returns an empty list no matter what method is called on it. That is, you can instantiate one of these and call any made up method on it and get an empty list back. Here is it:


class Dummy:
	def __getattr__(self, name): return lambda *args: []
d = Dummy()
d.something()   # returns []
d.something_else("a",1,"b")   # returns []

Python SMTP Debugging Server

0

Note to self: this is how you start a python SMTP debugging server:

python -m smtpd -n -c DebuggingServer localhost:25

Django-Piston: REST Framework for Django

7

Django-Piston in a promising looking REST frameowork for Django. On first inspection it seems to have all the right attributes and setup. I hope to give it a try soon.

[Update] By the way, I’ve been using django-piston in a real project, like it quite a bit. I recommend it.

One question I have – while I agree HTTP PUT and DELETE are the right verbs to use for Update and Delete, in practice they’re not well supported and can cause confusion. I’m wondering if there’s a way to change the mapping to the following:

POST /resource  -- Create
POST /resource/id -- Update
POST /resource/id?action=delete -- Delete
GET / resource/id -- Read

Python Simple Inheritance Example

8

Another one of those things I tend to forget the syntax of, noting for easy future lookup:

 


class Base():
    def __init__(self, param):
        print "Base:", param
    def method(self, param):
        print "Base.method:", param

class Derived(Base):
    def __init__(self, param):
        Base.__init__(self, param)
        print "Derived:", param
    def method(self, param):
        Base.method(self, param)
        print "Derived.method:", param

>>> d = Derived("me")
Base: me
Derived: me
>>> d.method("you")
Base.method: you
Derived.method: you

Python S3 Library For Chunked / Streaming Download

2

Born of a need to deal with multi-gig files on S3, I’ve modified the Python Amazon S3 library to allow you to read data in chunks, as well as a simple file-like object that lets you to read the file one line at a time (ala for line in f ).

The plan was to create this as a patch against the official Python S3 library from Amazon (it’s only a small change), and possibly even do a github thing, but it’s evident I’ll never get around to it, so I’m simply uploading it here

The small change is the addition of an optional readbody argument to AWSAuthConnection.get that tells the library not to read the body of the message, and a S3File class that provides the line interface. Here’s an example of using the S3File class:


if use_S3:
    f = S3.S3File(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, BUCKET, FILE)
else:
    f = file(conf.local_location + '/' + FILE)

for line in f:
    # do something

Use Python’s Marshal For Faster Serialization

0

I’d been using Python’s cPickle module for my serializing data structures to disk. As my data size got larger I started to care about performance, and after a few searches ended up at the marshal module. It comes with a few caveats, but is working great, much faster for me than cPickle. So if you’re looking for performance and can live with the caveats, give marshal a try.

JSON faster than Thrift and Protocol Buffers?

0

This is a bit surprising. According to Justin compressed JSON is faster than both Thrift and Protocol Buffers when used with Python (via Dion). I had previously asked about performance comparison of Thrift and Protocol Buffers on StackOverflow, but I had assumed JSON would be significantly slower. Maybe not. I’d love to see more on this.

Another Python Convert

1

Looks like I’ve finally converted Alex to Python as well. “This is actually shorter than perl and still makes sense”. Be forewarned – if you code with me, you’ll be doing Python in no time.

On Concurrency, Threads, and Python

0

Nice and detailed article from Jesse Noller on concurrency, threads, the GIL, and Python. Worth a read; I learned a lot from it.

For the record I’m one of the people who doesn’t think threads are the true solution to the concurrency problem.

Python Based Key Sniffer In 10 Lines

1

I love Autohotkey but I’m not crazy about its programming language, so I decided to investigate building an alternative with a simpler language, namely Python.

Turns out the key sniffing portion is actually quite easy to do. Here’s a simple script from the Keylogger in Python thread that does it in 10 lines using pyHook and Mark Hammond’s Win32 Extensions:


import pyHook
import pythoncom

def OnKeyboardEvent(event):
	print event.Ascii

hm = pyHook.HookManager()
hm.KeyDown = OnKeyboardEvent
hm.HookKeyboard()

while True:
	pythoncom.PumpMessages()

Mindtrove’s post has further details including code and examples for event filtering.

Python Dot Notation Dictionary Access

3

In most cases I prefer dot notation over bracket notation for dictionary access. That is, I prefer mydictionary.myfield over mydictionary['myflield']. I also prefer attempted access to undefined keys to return None instead of raising an exception.

With the help of this thread, this is what I’ve been using:



class dotdict(dict):
    def __getattr__(self, attr):
        return self.get(attr, None)
    __setattr__= dict.__setitem__
    __delattr__= dict.__delitem__

>>> dd = dotdict()
>>> dd.a
>>> dd.a = 'one'
>>> dd.a
'one'
>>> dd.keys()
['a']

>>> existing = {'a':'A', 'b':'B'}
>>> dot_existing = dotdict(existing)
>>> dot_existing.a
'A'

Python Scripts For Dumping Oracle Data And Loading Onto Hadoop DFS

2

There have been several requests for this, so I might as well post it here for general use. I put together a simple system for dumping data out of Oracle databases and loading onto Hadoop DFS. The slightly interesting part is the parallelism – Python’s Processing library is used to dump partitions in parallel and copy and load them onto DFS in parallel. This helps when dumping large amounts of data from partitioned Oracle tables.

The database interaction is handled by db.py . There are a couple of helper functions for finding table partitions, etc. DBDumper dumps the requested fields from the requested table:


dumper = db.DBDumper('username/password@yourhost:9999/DB', 'table_name',
      ('field1', 'field2', 'field3'), 'owner', 'partition', 'output_dir', 10)
dumper.dump(cp)

Where 10 is the level of concurrency, owner is the owner of the table, and partition is the name of the partitions you’re interested in (can be None).

dfs.py copies the dumped files over in parallel, again using PyProcessing. It’s simply a wrapper around “cat | ssh | hadoop dfs -put”.

DBDumper and dfs are tied together via a callback – when each partition is dumped, the callback is invoked, triggering the dfs copy.

Here’s a complete example of using these to dump and copy data:


import db
import dfs

fs = dfs.RemoteDFS('address.of.remote.machine')

def cp(arg):
    print "CALLBACK:", arg
    fs.cp(arg[0], '/some/directory/' + arg[1] + '/' + arg[2])

dumper = db.DBDumper('username/password@yourhost:9999/DB', 'table_name',
     ('field1', 'field2', 'field3'), 'owner', 'partition', 'output_dir', 10)
dumper.dump(cp)

Python Script For Finding And Removing Duplicate Files

5

My image, mp3, and ebook collection were a mess after years of copying to various servers, consolidating, and re-copying. I had lots of duplicates.

I looked for an app to find and remove duplicates but surprisingly didn’t find anything very good. So I had to write my own.

This is a very simple script – it scans the directory tree you specify, looks for exact duplicates, and removes the duplicates.

It’s not very smart about which copy it removes. It’s not smart about finding files that are “similar” – it only finds exact matches. It ignores small files (intentionally – it’s easy to make it deal with small files).

It uses /temp for its output and cache files, so it’s targeting windows. Change that to /tmp if you’re running unix.

I built in a caching mechanism to save the results of scanning the disk, but it turned out not to be too useful and the script ran faster than I expected, so the caching is commented out.

Here it is: FileInfo.py .

Access Python Dictionary Keys As Properties

1

Say you want to access the values if your dictionary via the dot notation instead of the dictionary syntax. That is, you have:


d = {'name':'Joe', 'mood':'grumpy'}

And you want to get at “name” and “mood” via


d.name
d.mood

instead of the usual


d['name']
d['mood']

Why would you want to do this? Maybe you’re fond of the Javascript Way. Or you find it more aesthetic. In my case I need to have the same piece of code deal with items that are either instances of Django models or plain dictionaries, so I need to provide a uniform way of getting at the attributes.

Turns out it’s pretty simple:



class DictObj(object):
    def __init__(self, d):
        self.d = d

    def __getattr__(self, m):
        return self.d.get(m, None)

d = DictObj(d)
d.name
# prints Joe
d.mood
# prints grumpy

Beanstalkd / Python Basic Tutorial

9

(First install beanstalkd and pybeanstalk)

Beanstalkd is an in-memory queuing system. It supports named queues (called ‘tubes’), priorities, and delayed delivery of messages.

Terminology: a message is called a job, and queues are called tubes.

Let’s look at an example scenario. Say you want to create 2 tubes, one called “orders” and another called “emails”, place orders into the first tube (or queue) and emails into the second, and have different processes handle orders and emails.

You can create queues by simply naming and putting messages into them. On the producer side:


from beanstalk import serverconn
c = serverconn.ServerConn('localhost', 99988)

# put a message (or job) into the default queue:
c.put('first message, into default tube')

# now start using a named tube:
c.use('orders')
c.put('second message, into orders tube')

Now on the consumer:


from beanstalk import serverconn
c = serverconn.ServerConn('localhost', 99988)

# by default your connection will be listening on the 'default' tube.
# switch it to use the 'orders' tube.
# This should return the 'orders' message and ignore the 'default' message:
c.watchlist = ['orders']
j = c.reserve()
print j
# {'data': 'second message, into orders tube', 'jid': 39, 'bytes': 32, 'state': 'ok'}

You can similarly setup another consumer to listen on only the ‘emails’ tube, or both, or any other scenario you want.

Beanstalkd also supports priorities, with 0 being highest priority and higher numbers meaning lower priority. You define message priority with:


c.put('low priority message', pri=999 )
c.put('high priority message', pri=0 )

j = c.reserve()
print j
# {'data': 'high priority message', 'jid': 41, 'bytes': 21, 'state': 'ok'}
# the high priority message was delivered before the low priority message, even
# though the low priority message was first into the queue

The beanstalkd consumption model is to “reserve” a message (or job), process it, and then tell beanstalkd you’ve successfully dealt with the message so it can be thrown away. When you first get the job via c.reserve() you haven’t actually fully consumed it; you’ve just reserved it for processing.

What does this mean? Imagine a scenario where you reserve a message but your process dies before you have a chance to fully process it. Beanstalkd holds your message in reserve for a period of time, but since it hasn’t heard from you confirming you’ve successfully dealt with the message, it eventually removes the reservation and makes the message available once again for the next consumer to grab. This is a basic handshake between the consumer and the server to allow for some level of resiliency.

So once you’re finished dealing with the message you’ve reserved, be sure to “delete” it, letting the beanstalkd server know it can throw that message away:


j = c.reserve()
# do some processing with j
c.delete(j['jid'])

So there you have the basics. Let me know if you’re interested and I can cover a few more topics.

Setting Up Beanstalkd on Ubuntu for Python

1

beanstalkd is a promising in-memory queuing system in the mold of memcached (minimal configuration, just works) with client libraries in a variety of languages. The following worked for me for installing it on Ubuntu 8.04:


mkdir ~/packages

# pre-requisite: libevent.
cd ~/packages
wget http://monkey.org/~provos/libevent-1.4.8-stable.tar.gz
tar zxvf http://monkey.org/~provos/libevent-1.4.8-stable.tar.gz
cd libevent-1.4.8-stable
./configure
make
sudo make install

# add /usr/local/lib to your load library path so beanstalkd can find libevent
vi ~/.bashrc   (add the following somewhere near the end):
export LD_LIBRARY_PATH=$LD_LIBRYARY_PATH:/usr/local/lib

(exit vi)
source ~/.bashrc

# need git in order to get latest code for beanstalkd
cd ~/packages
sudo apt-get install git-core

# grab beanstalkd
git clone http://xph.us/src/beanstalkd.git
cd beanstalkd
make

# now you should be able to start the beanstalkd daemon
./beanstalkd -d -p 99988

# get the python beanstalkd client
cd ~/packages
svn checkout http://pybeanstalk.googlecode.com/svn/trunk/ pybeanstalk-read-only

cd pybeanstalk-read-only
sudo python setup.py install

# get pyyaml, a pre-requisite for the python beanstalkd client
cd ~/packages
wget http://pyyaml.org/download/pyyaml/PyYAML-3.06.tar.gz
tar zxvf PyYAML-3.06.tar.gz
cd PyYAML-3.06
sudo python setup.py install

# open two different shells (or use screen) type the following in the two different shells:
cd ~/packages/pybeanstalk-read-only/examples
python simple_clients.py producer localhost 99988
python simple_clients.py consumer localhost 99988

Next Page »