Archive for the 'python tricks' Category

Python Tricks: Advanced String Formatting

April 22nd, 2009 | Category: python,python tricks

I ran across this a long time ago, never really used it till more recently, but it really shows the power and capability of Python.

So, say you have a big set of local variables which need to go into the string you are building, say for a YAPWF(Yet Another Python Web Framework).

Thats means you have this messy looking code:

s="%d%f" % (car, calc)

Its messy, because you’ve separated contextual information from the actual place that it matters, and if the string is very long and confusing, it gets messy, and opens you up to errors down the road. In addition, you are tied to having those variables exist locally.

However, there is a much, much nicer way to do this.

d={"number":100, "float":4.2, "status":"okay", "ph":8665782}
s="%(number)d%(float)f%(status)s%(ph)d" % d
print s
1004.2okay8665782

What this means is now you can throw in each of the relevant possible variables into the key that matters into a single dictionary object, and pass that around easily. The only thing you’re tied to now are the keys of the dictionary, giving you a bit of Adapter action.

Just one note, the format goes like this:

"%({key name}){format modifier}" % dictionary

As well, you can name the keys relevant names, like status, phone, etc, and your string formatting becomes a lot easier to read.

Comments are off for this post

Changing types in Python

December 10th, 2008 | Category: programming,python,python tricks

Okay, so its another entry in my long-abandoned python tricks series.

First, lets take a look at a couple of classes:

class B(object):
    def foo(self):
        return '10'
 
class A(object):
    def __init__(self):
        self.a='a'
    def foo(self):
        return '1'

The important part here is the (object) subclassing. This trick only works if you do that. Lets take alook at it in operation:

>>> a=A()
>>> a
<changetype.A object at 0x7f0f18dbf9d0>
>>> a.__class__
<class 'A'>
>>> a.a
'a'
>>> a.foo
<bound method A.foo of <changetype.A object at 0x7f0f18dbf9d0>>
>>> a.foo()
'1'
>>> a.__class__=B
>>> a.a
'a'
>>> a.foo
<bound method B.foo of <changetype.B object at 0x7f0f18dbf9d0>>
>>> a.foo()
'10'
>>> isinstance(a, B)
True

As you can see, I was able to override the foo() method with the one in the Beta class. As well, this only works on instanced objects. Now, what use is this you may ask? It is basically meta-programming. For 90% of programming tasks, meta-programming is not needed at all. But for that remaining 10%, you can do some seriously cool stuff, like changing the operation of a class during runtime, without modifying the class variables.

As a real-life example, I just finished writing a simplistic query parser. I realized I had forgetten to take into account price queries, so, I created a PriceToken class, that on instantiation, changes its own class to that of a TermToken. Otherwise, I would have had to rewrite about 100 lines of code to allow the use of a PriceToken. Pretty much useful because I’m lazy. But, yes, in Python, you can change the type of an object on the fly.

1 comment

Functional for the win

June 19th, 2008 | Category: programming,python,python tricks

As part of my new job, as a research student, I have to process a lot of data. On the order of several hundreds of thousands of records. So, I turned to my favourite language, Python. Its what caused my earlier issues with memory management.

A few more lessons I’ve learned:

  • sometimes theres a faster way to do stuff.
  • make sure you aren’t putting repetitive data into a database
  • verify the data format before you download 2-3 GB of it

Okay, so, faster way to do stuff. Python has both a performance advantage and disadvantage. The advantage is that it only takes one or two hours to code up something to process data. The disadvantage is that in an effort to be clear and simple, one can end up coding a task that takes 20-100x longer than it should. Like I did.

I had a lot of nested loops, python speed loops, involving lots of duplicate handling, inserting to a database, etc. It took five hours to go through only 14 sets of queries to clear out. I had a lot of duplicate rows that needed to be erased. For a query looking for rows with an id of 15, there are 25,607 records. Removing the duplicates yields… 807. Brilliant.

Today, I decided to take most of the loops, and replace them with their functional counterparts, map, lambda, and Google’s functional tools in python, goopy. In only two hours, it has gone through now, 160 ids. In less than half the time. Pretty amazing, eh?

Here’s a sample:

for row in all:
     curs=curs.execute('insert into elements2 values(?, ?, ?, ?, ?)', (row[0], row[1], str(row[2]), str(row[3]), ''))

replaced with:

map(lambda row: <code>conn.execute('insert into elements2 values(?, ?, ?, ?, ?)', (row[0], row[1], str(row[2]), str(row[3]), ''))</code>, all)

A little bit more difficult to read, yes, but, the speed improvements are well over 50 times faster…

All this work to clear out duplicate entries, and how intensive it can be to fix, is a perfect example of why the “Look Before You leap” programming paradigm works better than, “Ask for Forgiveness Not Permission”.

Now, once the processing is done, I’ll be able to go through all the data… and add a piece of information I forgot to process in the first place to each entry. Yay me!

Comments are off for this post

Memory management in Python

June 17th, 2008 | Category: design,design patterns,programming,python,python tricks

Thats right, the dreaded two m’s. If you use C or C++, its quite dreaded. However, for about the first time ever, I ran into memory management issues in Python! Well, partially due to my own mistakes, and partially due to the way Python handles memory.

Memory handling schemes go awry

Well, not quite. More like, a leaky abstraction, compounded by my own idiot mistakes in the code. When memory is allocated in a loop, it is normally a smart optimization to hold onto that memory. What this means is that there are some cool things that you can do, like, refer to variables that were allocated inside a loop, and they will be in the local scope. They will just have the latest value thats been assigned to that variable.

However, in my case it backfired. The memory was held onto… and never reused. Essentially, I had a memory leak.

My own stupid mistakes

What happened, is that I was loading in a variable from a pickled file. However, what I had done was:

elements = pickle.load(open('file', 'rb'))

Normally, this would be okay, however, due to the size of the files, and the fact I was doing this in a loop, over different files, is that there was a now opened multi-megabyte file floating around, that has not been closed or reallocated. Actually, several. I came home after two hours to find that almost all of my RAM and 70% of my swap space had been used. Thats 70% of 3.8 GB. Yeah, great job Zeroth.

The lesson here is, that what works well for a small loop, or small files, does not neccesarilly work well for large files or large loops. Always make sure you close every file you open, and if worst comes to worst, allocate memory by using the del keyword in Python. I’m using this, and there is very moderate memory growth, as compared to the quite massive almost 6 GB of memory the code was taking up before.

So the proper way to do this is:

f=open('file', 'rb')
elements = pickle.load(f)
f.close()
del f

I do need to make this clear. This fault was not due to Python, or any problems or bugs in it, but rather with my own sloppy code. Even the best of us make mistakes, and I’m definitely not the best of us.

2 comments

Cool Python Tricks part II

December 05th, 2007 | Category: programming,python,python tricks

A new installment in my ever-popular series.

Todays installment will concern one of the best performance enhancing tricks in python: list comprehensions.

Most beginning pythonistas will produce code like this:

lst=[]
for i in range(0, 10):
     lst.append(i**2)

Read more

2 comments

Next Page »