Archive for the 'Python' Category


How To Dynamically Import Python Modules

0

Here’s how you import a python module dynamically: say your module is called “mysettings” and it’s in the directory “config”. Make sure you have a __init__.py in your “config” directory and use:


mysettings = __import__("config.mysettings", fromlist=["config"])

This is equivalent to


from config import mysetings

Django: Checking If A User Is In A Group

0

Adding to my earlier post on the django permission system:

You can check if a user is part of a group using this simple snippet. It will allow you to check if a user is in a group with the following template syntax:


{% if user|in_group:"Friends,Enemies" %}

Generating static html from markdown in python

0

Note to self, for future reference.

Generating static html:


import markdown2
file('classification.html','w').write( markdown2.markdown_path('classification.md', extras=["code-friendly"]) )

Using Python-Oauth2 To Access OAuth Protected Resources

4

Most of the examples I found for python-oauth2 show how to use the library to request and approve tokens, but not how to use the access token to access a protected resource (ie. how to actually make a call to the service you’re trying to access). Here’s an example:


import oauth2 as oauth
consumer = oauth.Consumer('consumer-key-here','consumer-secret-here')
token = oauth.Token('access-key-here','access-key-secret-here')
client = oauth.Client(consumer, token)
response = client.request('http://someservice.com/api/something/')

And here’s how you make a POST call:


import urllib
response, content = myclient.request("http://someservice.com/api/something/", \
    method="POST", body=urllib.urlencode({'name': 'value', 'another_name': 'another value'}) )

Django: Using The Permission System

11

I was surprised at how little information I found on making use of Django’s permission system. Here are some quick notes on one way to use it:

Groups are groups of users. For example, you could define a group of users who have premium accounts, or have been verified in some way, or are somehow special:


from django.contrib.auth.models import Group, Permission
special_users = Group(name='Special Users')
special_users.save()
really_special_users = Group(name='Super Special Users')
really_special_users.save()

Now you have two groups defined and can define permissions for them. Django associates permissions with models (note: not model instances, but models). You’ll need to select a model to apply the permissions to, and do a small dance with “ContentType” to find that model’s content type:


from django.contrib.contenttypes.models import ContentType
somemodel_ct = ContentType.objects.get(app_label='myapp', model='somemodel')

can_view = Permission(name='Can View', codename='can_view_something',
                       content_type=somemodel_ct)
can_view.save()

can_modify = Permission(name='Can Modify', codename='can_modify_something',
                       content_type=somemodel_ct)
can_modify.save()

You’ve now defined two permissions and can associate them with your Groups:


special_users.permissions.add(can_view)
really_special_users.permissions = [can_view, can_modify]

Our groups and their associated permissions are ready to go. Now we just have to associate these permissions with users:


jack=User.objects.get(email='[email protected]')
jack.groups.add(special_users)

jill=User.objects.get(email='[email protected]')
jill.groups.add(really_special_users)

We’re all done. Now we can check the users’ permissions:


>>> jack.has_perm('myapp.can_view_something')
True
>>> jack.has_perm('myapp.can_modify_something')
False

>>> jill.has_perm('myapp.can_view_something')
True
>>> jill.has_perm('myapp.can_modify_something')
True

And to use it in your templates:


{% if perms.myapp.can_view_something %}
Here is something for you to see.
{% else %}
Can't show you!
{% endif %}

Django-mptt: Tree Storage in Django: A Brief Overview

0

django-mptt is a library for storing tree oriented data using the Django ORM. It allows you to place your model instances into a tree structure and efficiently query for ancestors and children.

Here’s a brief tutorial on how to use it:

After installing, you’ll need to modify your model to include a “parent” field, and register it with mptt:

class Person(models.Model):
    contact   = models.ForeignKey( Contact, db_index=True )
    role      = models.CharField(max_length=20, blank=True)
    parent    = models.ForeignKey('self', null=True, blank=True, related_name='children')

    def __unicode__(self):
        return "Person: <%s>" % (self.contact.email, )

mptt.register(Person)

mptt dynamically adds fields to your model, so you’ll need to syncdb after you’ve added the parent attribute and the mptt.register call to your model.

The basics are fairly easy to use:

To move a node to the root of the tree, use move_to with a targe of None:

person1.move_to(None)
person1.save()

To make a node the child of another, set its parent:

person2.parent = person1
person2.save()

To find the children of a node, use the children field:

>>>person1.children.all()
[<Person: Person: <[email protected]>>, <Person: Person: <[email protected]>>]

Here’s a little snippet of code to setup a 15 node tree where each node has two child nodes:

[UPDATE] The code in this snippet is not correct – you have to save each node as you update it, then look it up again. You can’t modify a node, save it, then use the reference you already have for it. I’ll update the code when I get a chance

contacts = []
people = []
for n in range(15):
    c = mod.Contact(email="test" + str(n) + "@testing.com")
    c.save()
    contacts.append(c)
    p = mod.Person(contact=c)
    p.save()
    people.append(p)

people[0].move_to(None)  # Root
people[0].save()
for n in range(1,15):
    people[n].parent = people[(n-1)/2]
    people[n].save()

Now let’s take a look around:

>>>people[7].parent
<Person: Person: <[email protected]>>

>>>people[3].children.all()
[<Person: Person: <[email protected]>>, <Person: Person: <[email protected]>>]

Now let’s move things around a bit; we’ll take person3, which is 2 levels down from the root, and make it a direct child of the root:

>>>people[3].parent = people[0]
>>>people[3].save()

>>>people[0].children.all()
[<Person: Person: <[email protected]>> <Person: Person: <[email protected]>>, <Person: Person: <[email protected]>>]

And we can look at the ancestors of a given node:

people[14].get_ancestors()

Eclipse + PyDev : I Recommend It

1

I used to be a vi guy who finally made the move to graphical editors. I looked for the simplest, lightest possible solutions, using ConTEXT for quite a while.

Some years ago I was forced into using Eclipse for reasons I can’t quite recall; probably Java development. I didn’t like it – the forced Project concept, the bloat, the general slowness.

Eventually I got comfortable with it, got PyDev installed, and made it my primary development environment. These days most of my development lives in Eclipse.

With the 1.5 release PyDev included quite a few previously pay-only features in the free / open source version. Since then I’ve found I’ve become even more productive in the environment, and now actually enjoy it.

In particular, the code analysis is very useful. I love the fact that it points out unused imports and variables as well as syntax errors. Going through old code I was surprised at how many spurious imports I had, as well as a few actual errors in code that had been in production for several years in rarely exercised branches.

If you’re doing python development I recommend you take a look at Eclipse+PyDev. I was surprised at the level of increased productivity it brought me.

Python Simple Mock Object

0

Another one of those that’s mainly for my own notes. I needed to create a mock object that always a returns an empty list no matter what method is called on it. That is, you can instantiate one of these and call any made up method on it and get an empty list back. Here is it:


class Dummy:
	def __getattr__(self, name): return lambda *args: []
d = Dummy()
d.something()   # returns []
d.something_else("a",1,"b")   # returns []

Python SMTP Debugging Server

1

Note to self: this is how you start a python SMTP debugging server:

python -m smtpd -n -c DebuggingServer localhost:25

Django-Piston: REST Framework for Django

7

Django-Piston in a promising looking REST frameowork for Django. On first inspection it seems to have all the right attributes and setup. I hope to give it a try soon.

[Update] By the way, I’ve been using django-piston in a real project, like it quite a bit. I recommend it.

One question I have – while I agree HTTP PUT and DELETE are the right verbs to use for Update and Delete, in practice they’re not well supported and can cause confusion. I’m wondering if there’s a way to change the mapping to the following:

POST /resource  -- Create
POST /resource/id -- Update
POST /resource/id?action=delete -- Delete
GET / resource/id -- Read

Python Simple Inheritance Example

14

Another one of those things I tend to forget the syntax of, noting for easy future lookup:

[code updated based on comments]


class Base(object):
    def __init__(self, param):
        print "Base:", param
    def method(self, param):
        print "Base.method:", param

class Derived(Base):
    def __init__(self, param):
        super(Derived, self).__init__(param)
        print "Derived:", param
    def method(self, param):
        Base.method(self, param)
        print "Derived.method:", param

>>> d = Derived("me")
Base: me
Derived: me
>>> d.method("you")
Base.method: you
Derived.method: you

Python S3 Library For Chunked / Streaming Download

2

Born of a need to deal with multi-gig files on S3, I’ve modified the Python Amazon S3 library to allow you to read data in chunks, as well as a simple file-like object that lets you to read the file one line at a time (ala for line in f ).

The plan was to create this as a patch against the official Python S3 library from Amazon (it’s only a small change), and possibly even do a github thing, but it’s evident I’ll never get around to it, so I’m simply uploading it here

The small change is the addition of an optional readbody argument to AWSAuthConnection.get that tells the library not to read the body of the message, and a S3File class that provides the line interface. Here’s an example of using the S3File class:


if use_S3:
    f = S3.S3File(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, BUCKET, FILE)
else:
    f = file(conf.local_location + '/' + FILE)

for line in f:
    # do something

Use Python’s Marshal For Faster Serialization

0

I’d been using Python’s cPickle module for my serializing data structures to disk. As my data size got larger I started to care about performance, and after a few searches ended up at the marshal module. It comes with a few caveats, but is working great, much faster for me than cPickle. So if you’re looking for performance and can live with the caveats, give marshal a try.

JSON faster than Thrift and Protocol Buffers?

0

This is a bit surprising. According to Justin compressed JSON is faster than both Thrift and Protocol Buffers when used with Python (via Dion). I had previously asked about performance comparison of Thrift and Protocol Buffers on StackOverflow, but I had assumed JSON would be significantly slower. Maybe not. I’d love to see more on this.

Another Python Convert

1

Looks like I’ve finally converted Alex to Python as well. “This is actually shorter than perl and still makes sense”. Be forewarned – if you code with me, you’ll be doing Python in no time.

On Concurrency, Threads, and Python

0

Nice and detailed article from Jesse Noller on concurrency, threads, the GIL, and Python. Worth a read; I learned a lot from it.

For the record I’m one of the people who doesn’t think threads are the true solution to the concurrency problem.

Python Based Key Sniffer In 10 Lines

1

I love Autohotkey but I’m not crazy about its programming language, so I decided to investigate building an alternative with a simpler language, namely Python.

Turns out the key sniffing portion is actually quite easy to do. Here’s a simple script from the Keylogger in Python thread that does it in 10 lines using pyHook and Mark Hammond’s Win32 Extensions:


import pyHook
import pythoncom

def OnKeyboardEvent(event):
	print event.Ascii

hm = pyHook.HookManager()
hm.KeyDown = OnKeyboardEvent
hm.HookKeyboard()

while True:
	pythoncom.PumpMessages()

Mindtrove’s post has further details including code and examples for event filtering.

Python Dot Notation Dictionary Access

5

In most cases I prefer dot notation over bracket notation for dictionary access. That is, I prefer mydictionary.myfield over mydictionary['myflield']. I also prefer attempted access to undefined keys to return None instead of raising an exception.

With the help of this thread, this is what I’ve been using:



class dotdict(dict):
    def __getattr__(self, attr):
        return self.get(attr, None)
    __setattr__= dict.__setitem__
    __delattr__= dict.__delitem__

>>> dd = dotdict()
>>> dd.a
>>> dd.a = 'one'
>>> dd.a
'one'
>>> dd.keys()
['a']

>>> existing = {'a':'A', 'b':'B'}
>>> dot_existing = dotdict(existing)
>>> dot_existing.a
'A'

Python Scripts For Dumping Oracle Data And Loading Onto Hadoop DFS

2

There have been several requests for this, so I might as well post it here for general use. I put together a simple system for dumping data out of Oracle databases and loading onto Hadoop DFS. The slightly interesting part is the parallelism – Python’s Processing library is used to dump partitions in parallel and copy and load them onto DFS in parallel. This helps when dumping large amounts of data from partitioned Oracle tables.

The database interaction is handled by db.py . There are a couple of helper functions for finding table partitions, etc. DBDumper dumps the requested fields from the requested table:


dumper = db.DBDumper('username/password@yourhost:9999/DB', 'table_name',
      ('field1', 'field2', 'field3'), 'owner', 'partition', 'output_dir', 10)
dumper.dump(cp)

Where 10 is the level of concurrency, owner is the owner of the table, and partition is the name of the partitions you’re interested in (can be None).

dfs.py copies the dumped files over in parallel, again using PyProcessing. It’s simply a wrapper around “cat | ssh | hadoop dfs -put”.

DBDumper and dfs are tied together via a callback – when each partition is dumped, the callback is invoked, triggering the dfs copy.

Here’s a complete example of using these to dump and copy data:


import db
import dfs

fs = dfs.RemoteDFS('address.of.remote.machine')

def cp(arg):
    print "CALLBACK:", arg
    fs.cp(arg[0], '/some/directory/' + arg[1] + '/' + arg[2])

dumper = db.DBDumper('username/password@yourhost:9999/DB', 'table_name',
     ('field1', 'field2', 'field3'), 'owner', 'partition', 'output_dir', 10)
dumper.dump(cp)

Python Script For Finding And Removing Duplicate Files

6

My image, mp3, and ebook collection were a mess after years of copying to various servers, consolidating, and re-copying. I had lots of duplicates.

I looked for an app to find and remove duplicates but surprisingly didn’t find anything very good. So I had to write my own.

This is a very simple script – it scans the directory tree you specify, looks for exact duplicates, and removes the duplicates.

It’s not very smart about which copy it removes. It’s not smart about finding files that are “similar” – it only finds exact matches. It ignores small files (intentionally – it’s easy to make it deal with small files).

It uses /temp for its output and cache files, so it’s targeting windows. Change that to /tmp if you’re running unix.

I built in a caching mechanism to save the results of scanning the disk, but it turned out not to be too useful and the script ran faster than I expected, so the caching is commented out.

Here it is: FileInfo.py .

Next Page »