Tag: "python"

Here are all posts that have been tagged with "python".

There is also an Atom feed for this tag.

How to parse the output of git log

Here is how to get the output of "git log" in an easy to parse format and build a python dict from the result. You could then convert the dict to JSON, XML, HTML, etc.

First, look at the git-log man page and find the section on "Pretty Formats." There are different codes to use (like printf) for the commit metadata (e.g. %an for author name).

Store these codes, along with the corresponding field names in two lists:

GIT_COMMIT_FIELDS = ['id', 'author_name', 'author_email', 'date', 'message']
GIT_LOG_FORMAT = ['%H', '%an', '%ae', '%ad', '%s']

Then, join the format fields together with "\x1f" (ASCII field separator) and delimit the records by "\x1e" (ASCII record separator). These characters are not likely to appear in your commit data, so they are pretty safe to use for parsing.

GIT_LOG_FORMAT = '%x1f'.join(GIT_LOG_FORMAT) + '%x1e'

Then run git log --format="..." with your format string, split the fields, and make a dict from them:

p = Popen('git log --format="%s"' % GIT_LOG_FORMAT, shell=True, stdout=PIPE)
(log, _) = p.communicate()
log = log.strip('\n\x1e').split("\x1e")
log = [row.strip().split("\x1f") for row in log]
log = [dict(zip(GIT_COMMIT_FIELDS, row)) for row in log]

Output:

$ python commits.py
[{'author_email': 'skryskalla@gmail.com',
  'author_name': 'stevek',
  'date': 'Sat Feb 18 12:58:00 2012 -0800',
  'id': 'f1dc488e092e5e725c2ec3b7afc3962f0ba707d3',
  'message': 'third commit'},
 {'author_email': 'skryskalla@gmail.com',
  'author_name': 'stevek',
  'date': 'Sat Feb 18 12:57:54 2012 -0800',
  'id': '1bf26e9aa0cb8c9b95b579695c6af349319a88ab',
  'message': 'second commit'},
 {'author_email': 'skryskalla@gmail.com',
  'author_name': 'stevek',
  'date': 'Sat Feb 18 12:57:47 2012 -0800',
  'id': '9c2db5dffa7c70358ab78b6092539ce26006775b',
  'message': 'this is the first commit'}]

Full working example.

Spec runner using withhacks

Here is an interesting video (and blog post) on Ruby vs. Python by Gary Bernhardt. One of the advantages Gary gives to Ruby is the ability to develop and use tools like rspec or cucumber, which use Ruby's block syntax for really nice looking unit tests, runnable specs, etc.

In the talk Gary shows that he was able to create a spec runner with syntax similar to rspec by using "really nasty" and "ugly" hacks (sys.settrace I believe). But, even if these techniques are ugly, they are becoming more easily accessible and more easy to develop with tools like withhacks, which abstracts the ugliness away.

I'm sure this does much less than what Gary's mote does, but here is a simple spec runner using withhacks:

from __future__ import with_statement

from withhacks import CaptureOrderedLocals, CaptureBytecode

class specs(CaptureOrderedLocals,CaptureBytecode):
    def __init__(self,what, *args, **kwargs):
        self.__what = what
        self.__args = args
        self.__kwargs = kwargs
        self.results = []
        super(specs,self).__init__()

    def __exit__(self,*args):
        retcode = super(specs,self).__exit__(*args)
        results = self.run_specs(self.locals)
        self.results = results

    def run_specs(self, cases):
        results = []
        num_pass, num_fail = 0,0
        print "Testing %s specs for %s:" % (len(cases), repr(self.__what))
        for (name, func) in cases:
            if not callable(func): continue
            name = name.replace('_', ' ')
            name = name.capitalize() + '.'
            print "->",
            try:
                func()
                error = None
                num_pass += 1
            except BaseException, e:
                error = repr(e)
                num_fail += 1
            if error:
                print "[FAIL]", name
                print "--->", error
                results.append((name, False, error))
            else:
                print "[pass]", name
                results.append((name, True, None))
        print "Result: %s/%s passed, %s/%s failed" % (num_pass, len(cases), num_fail, len(cases))
        print "-"*20
        return results

And here is an example spec:

class MyClass(object):
    def add(self, a, b):
        return a+b

with specs(MyClass):
    def it_adds_two_and_two():
        c = MyClass()
        assert c.add(2,2) == 4

    def it_adds_negatives():
        c = MyClass()
        assert c.add(10,-10) == 0

    def it_fails_adding_int_and_string():
        c = MyClass()
        try:
            c.add(10, 'foo')
        except TypeError:
            pass #correct!

    def testing_what_a_spec_failure_looks_like():
        c = MyClass()
        c.thisdoesntexist()

And here is the output:

Testing 4 specs for <class '__main__.MyClass'>:
-> [pass] It adds two and two.
-> [pass] It adds negatives.
-> [pass] It fails adding int and string.
-> [FAIL] Testing what a spec failure looks like.
---> AttributeError("'MyClass' object has no attribute 'thisdoesntexist'",)
Result: 3/4 passed, 1/4 failed
--------------------

I put the code on bitbucket here.

Decorator for preventing recursion

Here's a decorator that will prevent a recursive function from calling itself:

def norecursion(default=None):
    '''Prevents recursion into the wrapped function.'''
    def entangle(f):
        def inner(*args, **kwds):
            if not hasattr(f, 'callcount'):
                f.callcount = 0
            if f.callcount >= 1:
                print "recursion detected %s calls deep. exiting." % f.callcount
                return default
            else:
                f.callcount += 1
                x = f(*args, **kwds)
                f.callcount -= 1
                return x
        return inner
    return entangle

It's based on this recipe. The function in that recipe relies on keeping track of which arguments were passed into the function, which means that it could not work on a function without any arguments. The decorator above works by attaching an attribute to the wrapped function for keeping track of how many calls have been made and exiting when the number of nested calls goes above a certain number.

Here's how you use it:

@norecursion(default=1)
def fact(x):
  if x <= 1:
    return 1
  else:
    return x*fact(x-1)

Now when you call fact it won't make the recursive call, instead it will return the default value of 1:

>>> fact(0)
1
>>> fact(1)
1
>>> fact(2)
recursion detected 1 calls deep. exiting.
2
>>> fact(3)
recursion detected 1 calls deep. exiting.
3

Why I needed this: I have a function on a Jinja2 template which builds a list of all pages and their metadata (a bunch of variables defined at the top of the template). Let's say I use the function on index.html. When it iterates over all the pages, it comes to index.html and then tries to get the list of all pages again. This causes the infinite recursion. On the second call deep, I don't need the whole page list, I only need the template metadata, so I can safely wrap the function in @norecursion(default=[]) to prevent it from running subsequent times.

Update: Reading this post again I think I could have just used a memoization decorator instead. At the time preventing recursion with a decorator seemed like an okay solution, but memoization would have been a little less weird and probably worked fine.

Writing mercurial plugins

Getting my feet wet with writing some Mercurial plugins... First impression is that the API is very low-level, but I guess that makes sense since HG (and its plugins) have to be low-level to perform well.

#!/usr/bin/env python

from mercurial import hg
from binascii import hexlify
from mercurial import util

def interact(ui, repo, **opts):
    """poke around the mercurial API for this repo in a python interpreter"""
    print "Locals are:", dir()
    import code; code.interact(local=locals())

def short_incoming(ui, repo, **opts):
    """Shows a shortened form of 'hg incoming'"""
    default = hg.repository(ui, ui.expandpath('default'))
    inc = repo.findincoming(default)
    nodes = default.changelog.nodesbetween(inc, None)[0]
    for node in nodes:
        cs = default.changelog.read(node)
        print hexlify(cs[0])[:6], '|', cs[1], '|', util.datestr(cs[2]), \
              '|', len(cs[3]), 'files', '|', cs[5], '|', cs[4]

cmdtable = {
    "interact": (
        interact,
        [],
        interact.__doc__
    ),
    "short": (
        short_incoming,
        [],
        short_incoming.__doc__
    ),
}

Part of me just wants to scrape the text of the different subcommands.

WSGI talk code now online

At the September 2009 Detroit Perlmongers / dynamic language meetup I gave a talk on Python and WSGI.

I walked through six different examples showing what WSGI is and some parts of the WSGI web development ecosystem. The six examples were:

  1. the simplest WSGI app, and serving that same app under many different WSGI servers
def application(environ, start_response):
    '''The simplest WSGI app.'''
    start_response('200 OK', [('content-type', 'text/html')])
    yield '<h1>Hello world</h1>'
  1. request and response wrappers, introduction to middleware
  2. a slightly more full example of middleware
  3. a fleshed out app with templates, URL routing, and some more middleware
http://lost-theory.org/images/wsgi-middleware.png

Great diagram explaining middleware from the Pylons documentation.

  1. form generation and validation
  2. using an existing app with authn / authz middleware

You can find the code and instructions for running the examples on Bitbucket.

Ltchinese 0.1 release

I've given one of my old projects, ltchinese, an official release on PyPI, the Python Package Index.

I followed this excellent guide to make the package and publish it to PyPI. This is my first real open source Python package.

ltchinese is a small library of tools I built up when creating some of my Chinese language learning pages Ocrat mirror site, Mandarin phonetics table, etc.). It would be useful for developers that are building tools or web apps that deal with the Chinese language. It also includes a programmatic interface to some of the data on my site.

There is also documentation available (which is hosted by PyPI, cool).

I also got my first bit of feedback that someone was able to use the library for something useful. Thank you Vathanan!

Running zine on Dreamhost PS

I started using Dreamhost PS July 2009 and after a few hours had a Zine blog running. That's was one of the quickest turnarounds I've ever experienced for getting a Python web app up and running and exposed to the web.

It was better than starting from scratch on a new VPS, but there was some weirdness. Dreamhost PS is like a VPS where you can run what you want, but without root. It's like using your friend's server and he doesn't trust you very much. :)

Here are my steps for getting Zine running on Dreamhost PS.

Administrivia

  • First, I cranked the resources down to the minimum in the Dreamhost PS admin panel. They start you off at the highest setting so you can monitor how much resources your server actually needs, but if you're going to have little traffic to start I don't see the point.
  • Next, I created a new shell user on my Dreamhost PS instance.
  • Finally, I made a new subdomain (blog.lost-theory.org) belonging to that user.

virtualenv & pip

DreamhostPS has a good version of Python (2.4.4) (update for 2011: this qualified as good in 2009) and easy_install, so you can dive right in. I started by first setting up my virtualenv:

$ cd ~
$ mkdir -p zine/lib
$ easy_install --install-dir=~/zine/lib virtualenv
$ easy_install --install-dir=~/zine/lib pip
$ cd zine
$ virtualenv .

Install Zine and its packages

You can then start installing Zine and the packages it depends on into that virtualenv.

$ cd ~/zine
$ wget http://zine.pocoo.org/releases/Zine-0.1.2.zip
$ unzip Zine-0.1.2.zip
$ pip -E . install Werkzeug
$ pip -E . install Jinja2
$ pip -E . install MySQL-python
$ pip -E . install SQLAlchemy
$ pip -E . install simplejson
$ pip -E . install pytz
$ pip -E . install Babel
$ pip -E . install html5lib
$ (try to install lxml...)
(over 9000 compilation errors)

Now here you will run into a problem since lxml requires the libxml and libxslt packages. On Dreamhost PS we don't have root, so we can't install these packages with apt-get install. I took a peek at how Zine uses lxml and it seemed like I might be able to get away without installing it:

./importers/wordpress.py:15:    from lxml import etree
./importers/feed.py:13:    from lxml import etree
./zxa.py:21:    from lxml import etree

I tried wrapping those lines with try/except like so:

try:
    from lxml import etree
except:
    print "Skipping lxml import... will die later"

That will let you start serving up your Zine instance. I haven't had a problem so far with skipping the lxml import (because I haven't used those features yet that require it). It might be possible to use elementtree instead, but it's working fine for now.

Install and quickstart

Install the Zine package:

$ cd ~/zine/Zine-0.1.2/
$ ./configure --prefix=~/zine
$ make install

After that you can create and start an instance:

$ cd ~/zine
$ mkdir instance
$ ./Zine-0.1.2/scripts/server -I instance

This will start the install wizard on port 4000. Go check it out!

Database

You won't get very far without a database though. One important thing to remember is that you do not need to use the Dreamhost PS MySQL service. You can use your existing DH sahred hosting MySQL. All I had to do was set up a new user and database on the existing MySQL from my regular shared hosting service.

After that's set up put the DB URI in the install wizard and you're pretty much done.

At this point you'll have Zine fully functional on port 4000. You can start writing entries and checking out the themes and all that. But we can do better than running on port 4000 with a development server, can't we!

Serve Zine using cherrypy and lighttpd

I want to run Zine on port 80 and serve it with something a little more powerful, so I checked out what the Dreamhost admin panel offered. There are settings for proxying and Mongrel and FCGI, but those don't really apply.

DH gives you two choices for serving on port 80: Apache (the default) or lighttpd. You can run your own long-running processes, but they have to serve through Apache/lighty (using CGI, FCGI, and I think Phusion Passenger and maybe a few other options). You can't run your own server on 80 since you don't have root.

I chose lighty for the smaller footprint and because I find its configuration a bit easier. To proxy all requests at the root of the domain from port 80 to port 4000 you can use the following:

$HTTP["host"] == "blog.lost-theory.org"
{
  proxy.server = (
    "" => (
      "blog" => (
        "host" => "127.0.0.1",
        "port" => 4000
      )
    )
  )
}

The configs are stored per-host here: /usr/local/dh/lighttpd/servers/your-user/

Once that was set up I decided to swap out the Werkzeug development server for Cherrypy's WSGI server. You can keep most of the Zine-0.1.2/scripts/server script the same, just pip install cherrypy and switch Werkzeug's server to CherryPyWSGIServer (yay WSGI).

One more important change: Zine will probably still think that the address of the blog is http://example.com:4000/. This will make all the links point to that site, which is ugly. To fix this just drop the port number off of the blog_url setting in zine.ini.

Conclusion

That's all it took! I am pretty happy with how easy it was.

Python web apps are not as easy as something like PHP to get up-and-running, but this process was pretty ideal in my opinion. I hated using a bare bones VPS because you end up spending more time on sysadmin and thinking about security holes than on building and deploying cool apps. Dreamhost PS seems like a good middle ground between bare bones VPS / dedicated servers and shared hosting. It has a lot of good defaults, polished administration (via the panel), decent price, and the ability to scale upwards.

Google App Engine is a similar service, but I haven't tried it yet. I am a little worried that existing Python apps / code aren't portable to GAE.

As far as resources go, the 150MB memory has been smooth so far. I'll monitor if I need to up my resources. If I do need to increase my resources I'll probably do it since they make it so easy and the price is reasonable. :)

http://lost-theory.org/images/dreamhostps-usage.png

One interesting thing is that after switching to lighttpd+cherrypy (after the spike in the graph) my memory usage went down. I've heard that Dreamhost PS has about a 100MB footprint when idle. After switching to lighty+cherrypy my memory usage is ~50MB when idle.

Hope this was helpful if you're interested in running Python apps on DreamhostPS. Happy serving to you!

Update: Dreamhost PS now gives you root access by setting up a new account under "Manage Admin Users" in your control panel. This makes things a lot easier and gets rid of all my whinging about not having root.

Update 2012: I moved away from Dreamhost PS and Zine for a static blog (using Blohg). It is funny re-reading this now and seeing that I thought this was "easy". Now I am spoiled by ep.io and dotcloud.

Illustration of a grassy knoll