Archive for the 'programming' Category
Functional for the win
As part of my new job, as a research student, I have to process a lot of data. On the order of several hundreds of thousands of records. So, I turned to my favourite language, Python. Its what caused my earlier issues with memory management.
A few more lessons I’ve learned:
- sometimes theres a faster way to do stuff.
- make sure you aren’t putting repetitive data into a database
- verify the data format before you download 2-3 GB of it
Okay, so, faster way to do stuff. Python has both a performance advantage and disadvantage. The advantage is that it only takes one or two hours to code up something to process data. The disadvantage is that in an effort to be clear and simple, one can end up coding a task that takes 20-100x longer than it should. Like I did.
I had a lot of nested loops, python speed loops, involving lots of duplicate handling, inserting to a database, etc. It took five hours to go through only 14 sets of queries to clear out. I had a lot of duplicate rows that needed to be erased. For a query looking for rows with an id of 15, there are 25,607 records. Removing the duplicates yields… 807. Brilliant.
Today, I decided to take most of the loops, and replace them with their functional counterparts, map, lambda, and Google’s functional tools in python, goopy. In only two hours, it has gone through now, 160 ids. In less than half the time. Pretty amazing, eh?
Here’s a sample:
for row in all: curs=curs.execute('insert into elements2 values(?, ?, ?, ?, ?)', (row[0], row[1], str(row[2]), str(row[3]), ''))
replaced with:
map(lambda row: <code>conn.execute('insert into elements2 values(?, ?, ?, ?, ?)', (row[0], row[1], str(row[2]), str(row[3]), ''))</code>, all)
A little bit more difficult to read, yes, but, the speed improvements are well over 50 times faster…
All this work to clear out duplicate entries, and how intensive it can be to fix, is a perfect example of why the “Look Before You leap” programming paradigm works better than, “Ask for Forgiveness Not Permission”.
Now, once the processing is done, I’ll be able to go through all the data… and add a piece of information I forgot to process in the first place to each entry. Yay me!
Comments are off for this postMemory management in Python
Thats right, the dreaded two m’s. If you use C or C++, its quite dreaded. However, for about the first time ever, I ran into memory management issues in Python! Well, partially due to my own mistakes, and partially due to the way Python handles memory.
Memory handling schemes go awry
Well, not quite. More like, a leaky abstraction, compounded by my own idiot mistakes in the code. When memory is allocated in a loop, it is normally a smart optimization to hold onto that memory. What this means is that there are some cool things that you can do, like, refer to variables that were allocated inside a loop, and they will be in the local scope. They will just have the latest value thats been assigned to that variable.
However, in my case it backfired. The memory was held onto… and never reused. Essentially, I had a memory leak.
My own stupid mistakes
What happened, is that I was loading in a variable from a pickled file. However, what I had done was:
elements = pickle.load(open('file', 'rb'))
Normally, this would be okay, however, due to the size of the files, and the fact I was doing this in a loop, over different files, is that there was a now opened multi-megabyte file floating around, that has not been closed or reallocated. Actually, several. I came home after two hours to find that almost all of my RAM and 70% of my swap space had been used. Thats 70% of 3.8 GB. Yeah, great job Zeroth.
The lesson here is, that what works well for a small loop, or small files, does not neccesarilly work well for large files or large loops. Always make sure you close every file you open, and if worst comes to worst, allocate memory by using the del keyword in Python. I’m using this, and there is very moderate memory growth, as compared to the quite massive almost 6 GB of memory the code was taking up before.
So the proper way to do this is:
f=open('file', 'rb') elements = pickle.load(f) f.close() del f
I do need to make this clear. This fault was not due to Python, or any problems or bugs in it, but rather with my own sloppy code. Even the best of us make mistakes, and I’m definitely not the best of us.
2 commentsCool Python Tricks part II
A new installment in my ever-popular series.
Todays installment will concern one of the best performance enhancing tricks in python: list comprehensions.
Most beginning pythonistas will produce code like this:
lst=[] for i in range(0, 10): lst.append(i**2)
Cool Python Tricks part I
I’ve spent almost five years programming as a hobby, and my first language was Python. Its still one of my favourites for many reasons; simplicity, power, rapid development, and its fun.
Due to its nature, Python is very flexible and very versatile, and over the years, I’ve learned some very very cool and useful python idioms and tricks.
What I will show today is a Design Pattern that fits the ideology of Python well, and delivers a great amount of flexibility and usefulness. Its called the Borg Design Pattern. Read more
Comments are off for this postMission Statement
I’ve made enough posts here without making a mission statement of any sort, so here goes.
I want to help others like myself, curious and wanting to learn. I wish I’d had someone to share things with me several years ago, things I learned through my own trial and error, through blood and sweat dearly earned.
I chose to become a programmer simply because their is nothing quite like it. All you need is intellect, logic, and creativity. And with those base ingredients, you can construct anything. Anything at all. It appealed to me, the power a programmer has over this inscrutable machine called a computer. I found it easy, and interesting, and thus my life goal was set.
However, what kind of programmer was I to be? Will I be an egghead academic, working on complex and unknowable topics? Will I become a Unix hacker, complete with scruffy beard and free software on the tongue? Will I become one of those web 2.0 darlings, perfectly awkward in the limelight, and caring about things like ‘spiders’ and ‘eyeballs’ like they were some witche’s brew? I don’t know yet, but I do know, I have more fun making and designing games than anything else, and that is where I will head.
But no, I don’t want to join Blizzard, or Bioware. No, I’m starting my own company! One where code monkeys can swing in the trees and go home at the end of the day. One where Artists of every kind aspire to be hired. A place where brilliance seeps out of the walls, and is breathed in like so many spores. I am making OddCo, with my best friend, Ruby. Join me for the ride.
Comments are off for this post