On a lark, I wrote this haiku generator one afternoon at work, with the Python becoming more and more obfuscated as I went:
from random import choice as CH from itertools import product as P from functools import partial as PA def haiku(syls=dict((i,s.split()) for i,s in enumerate( "fair blue green sun earth sky air lake bird fish wheat tree sand life death grim " "man boy girl runs plays hunts far old young deep high I my his her sings sad old " "red black white night day sleeps broods joy eats drinks swims leaps dreams worm " "eye hand arm leg foot mouth blood tears years moon stars days wild dog cat time " "friend son takes lives dies loves hates laughs cries shouts flies flames burnt " "tall short grass low mourns moans chants wind blows grows rain snow light dark " "prays prayer soul tone hue shade fades rich leaf leaves arch oak rose song just " "child void|" "golden yellow meadow open mountain nature garden ocean tiger vision only " "flower zebra lion woman baby wonder joyful dances dancing laughing single " "morning evening sleepy drowsy awake asleep away hunger pleasure simple deadly " "today tonight progress prefers dances mother father daughter droplet ocean " "laughter sorrow running jumping rushing turning whispers cascade arrow simple " "longing elder willow honey water heaven above below after before river " "quiet embers wishes under over respite relief journey afar glistens abyss " "shimmers ripples shudders shivers trembles slumbers hiding gently slowly " "slender brooding somber ochre velvet ruby entwined spirit blossom ribbon iris " "willow meaning resolve omen|" "waterfall aurora indigo butterfly lioness whimsical yesterday enemy " "contemplates considers grandfather grandmother family glistening emerges " "fortunate absolute horizon precipice oasis shivering shimmering tenderly " "sleepily carefully wistfully beautiful entwining sepia unknowing " "feverish|" "caterpillar captivating fascinating philosophy iridescent " "intermingled understanding proximity unknowable unreachable|" "inescapable undeniable inevitable unfathomable".split('|'),1)), combs=map(PA(filter,None),P(*[range(6)]*7))): return '\n'.join(' '.join(CH(syls[x]) for x in CH(filter(lambda c:sum(c)==n, combs))) for n in (5,7,5)) for i in range(10): print (haiku()+'\n')
feverish moans runs moans unfathomable time garden baby earth absolute moon cries wheat moon evening blossom song earth takes shudders plays life grim precipice blue understanding abyss fades grim aurora broods willow grim dies fair respite iridescent foot shade I hiding my stars family iridescent heaven sand hunts grass cascade runs dreams turning red snow waterfall shade shimmering captivating stars sand grim tonight night moon nature shimmering prays my flower shade blows old white glistening meadow only shade slender wild his omen cat willow hue whispers dies grandfather son spirit garden dark above tears garden relief laughs shimmering tenderly song time mountain dies
Are these really haiku? Not in the conventional sense, and certainly not ones that would hold up to critical review. But if you select a seed dictionary of words that collectively suggest a particular mood, the generated verse can evoke that same mood.
Sometimes we’d like to have a class that is tolerant of attribute accesses that weren’t anticipated at class design time, returning a default ‘None’ value for an uninitialized attribute. Using defaultdict to override the `__dict__` attribute makes this very easy:
from collections import defaultdict class X(object): def __init__(self): self.__dict__ = defaultdict(lambda : None) def __getattr__(self, attr): return self.__dict__[attr]
Now we can initialize an object with some attributes, but access others and get back the default None:
x = X() x.color = "RED" x.size = 6 print x.color print x.size print x.material
RED 6 None
Of course, this defeats a key validation feature in Python, so you’ll need to take extra care that you specify attributes correctly.
From time to time, I find that I need a function to return +1, 0, or -1 representing the sign of a value: +1 or -1 for positive or negative values, or 0 for a zero value. I remember learning this as the sgn function in high school. Python’s standard lib leaves this out, but does supply cmp as a built-in, so the standard approach would probably be to define sgn using:
sgn = lambda x : cmp(x,0)
I found a nice website of Bit-Twiddling Hacks the other day, and it had a nice alternative for sgn, which in Python would be:
sgn = lambda x : (x>0) - (x<0)
The elegance of this appeals to me and I did a quick timing pass:
C:\Users\Paul>python -m timeit "[cmp(x,0) for x in (-100,0,14)]"
1000000 loops, best of 3: 0.645 usec per loop
C:\Users\Paul>python -m timeit "[lambda x:x>0 - x<0 for x in (-100,0,14)]"
1000000 loops, best of 3: 0.496 usec per loop
So by cutting out the repeated function calls to cmp, this also has the benefit of being just a tad faster.
I came across some code today that used a set to keep track of previously seen values while iterating over a sequence, keeping just the not-seen-before items. A brute force kind of thing:
seen = set() unique =  for item in sequence: if not item in seen: unique.append(item) seen.add(item)
I remembered that I had come up with a simple form of this a while ago, using a list comprehension to do this in a single expression. I dug up the code, and wrapped it up in a nice little method. This version accepts any sequence or generator, and makes an effort to return a value of the same type as the input sequence:
def unique(seq): """Function to keep only the unique values supplied in a given sequence, preserving original order.""" # determine what type of return sequence to construct if isinstance(seq, (list,tuple)): returnType = type(seq) elif isinstance(seq, basestring): returnType = type(seq)('').join else: # - generators and their ilk should just return a list returnType = list try: seen = set() return returnType(item for item in seq if not (item in seen or seen.add(item))) except TypeError: # sequence items are not of a hashable type, can't use a set for uniqueness seen =  return returnType(item for item in seq if not (item in seen or seen.append(item)))
My first pass at this tried to compare the benefit of using a list vs. a set for seen – it turns out, both versions are useful, in case the items in the incoming sequence aren’t hashable, in which case the only option for seen is a list.
While moving files from my old laptop drive to my new one, I found a nice Runge-Kutta integrator class that I had written ages ago. So long ago, in fact, that I was a little embarrassed at the newbiness of some of my code. So I decided to update my code to get a nice RK class out of it, using list comprehensions instead of “for i in range” loops, and including an integrate method that acts as a generator so that the calling code can cycle through each integration step. As is typical in R-K, the system state is maintained in a vector X, and the calling method must provide a callback function that will return dX/dt.
Here is the class:
class RKIntegrator : "Class used to perform Runge-Kutta integration of set of ODE's" def __init__( self, dt, derivFunc, degree=0, initConds=None ): self.dt = float(dt) self.dt_2 = dt / 2.0 self.t = float(0) if not (degree or initConds): raise ValueError("must specify degree or initial conditions") if initConds is not None: self.x = initConds[:] else: self.x = [0.0 for i in range(degree)] self.derivFunc = derivFunc def doIntegrationStep( self ): dt = self.dt dxFunc = self.derivFunc t2 = self.t + self.dt_2 dx = dxFunc( self.t, self.x ) delx0 = [ dx_i*dt for dx_i in dx ] xv = [x_i + delx0_i/2.0 for x_i, delx0_i in zip(self.x, delx0)] dx = dxFunc( t2, xv ) delx1 = [ dx_i*dt for dx_i in dx ] xv = [x_i + delx1_i/2.0 for x_i,delx1_i in zip(self.x,delx1)] dx = dxFunc( t2, xv ) delx2 = [ dx_i*dt for dx_i in dx ] xv = [x_i + delx1_2 for x_i,delx1_2 in zip(self.x,delx2)] self.t += dt dx = dxFunc(self.t, xv) self.x = [ x_i + ( delx0_i + dx_i*dt + 2.0*(delx1_i + delx2_i) ) / 6.0 for x_i, dx_i, delx0_i, delx1_i, delx2_i in zip(self.x, dx, delx0, delx1, delx2) ] def integrate(self): while True: self.doIntegrationStep() yield self.t, self.x
Here is an example of finding X with constant acceleration of 4:
def getDX( t, x ): return [ x, 4.0 ] isWhole = lambda x : abs(x-round(x)) < 1e6 rk = RKIntegrator( dt=0.1, derivFunc=getDX, initConds = [0.0, 0.0] ) for t,x in rk.integrate(): if t > 10: break if isWhole(t): print t,', '.join('%.2f' % x_i for x_i in x)
Googling for ‘Python runge kutta’, I came across this blog posting:
This does a good job, but hardcodes the vector size to just x, velocity, and acceleration. Here is how my R-K integrator would implement Doswa’s code:
def accel(t,x): stiffness = 1 damping = -0.005 x,v = x return -stiffness*x - damping*v def getDX(t,x): return [ x, accel(t,x) ] rk = RKIntegrator( dt=1.0/40.0, derivFunc=getDX, initConds = [50.0, 5.0] ) for t,x in rk.integrate(): if t > 100.1: break if isWhole(t): print t,', '.join('%.2f' % x_i for x_i in x)
My results match the posted results to 2 places.
It’s funny how a little experiment can start to take on momentum all by itself. After looking at other Python databases, it wasn’t long before Google’s BigTable cropped in my searches. This suggested to me a more descriptive and maybe more appopriate name for my experiment – littletable. It’s expectations are modest, and so it has a fairly modest-sounding name.
Tables of objects are created simply by creating an empty table and loading like objects into it. No schema, no SQL. The attributes of the objects themselves, and the attributes used in the queries and joins, describe an organic, emergent schema. I loaded a data table of zipcodes by state (from xxx), and a table of states. There are a total of over 42,000 defined zipcodes (data as of 1999). Here is a query of zipcodes:
fullzips = (zips.join_on("statecode") + states)()
A table can keep an index on a particular attribute, with the option to require uniqueness or not. Indexes are used at join time to optimize the join performance, by minimizing the number of records that have to be sifted through.
The latest version of littletable (0.3) now includes table pivoting. This makes it very easy to look at data in a large table to see how it is distributed across particular keys. For instance, here is a table of the top 20 states with the most zip codes:
TX Texas 2676 CA California 2675 NY New York 2238 PA Pennsylvania 2224 IL Illinois 1595 OH Ohio 1470 FL Florida 1449 VA Virginia 1253 MO Missouri 1192 MI Michigan 1169 NC North Carolina 1083 IA Iowa 1073 MN Minnesota 1036 KY Kentucky 1016 IN Indiana 992 GA Georgia 975 WV West Virginia 930 WI Wisconsin 914 AL Alabama 847 TN Tennessee 806
created by “pivoting” the zip code table on the single attribute stateabbr.
The states with the fewest zip codes are:
GU Guam 21 VI Virgin Islands 16 FM Federated State 4 MP Northern Marian 3 MH Marshall Island 2 AS American Samoa 1 PW Palau 1
And this query:
nozips = states.where(lambda o:o.statecode not in zips.statecode)
returns a single record:
(“UM” is the postal state abbreviation for the U.S. Minor Outlying Islands, a group of uninhabited islands SW of Hawaii – see http://en.wikipedia.org/wiki/U.S._Minor_Outlying_Islands).
A nice characteristic of littletable queries and joins is that they each return a new fully-functional table, containing the joined and/or filtered records described in the query. Tables can then be exported to CSV files, making it easy to save and restore the results of a particular query. Tables are just wrappers around Python lists, so it is still possible to access parts of them using slice notation.
Here is a query from a database of US place names, retrieving all of the tunnels in the US, sorted by descending elevation.
tunnels = us_names.query(feature="TNL", _orderby="elev desc")
Using basic python slicing, we can then find the 15 highest and 15 lowest tunnels in the country:
for t in tunnels[:15]+tunnels[-15:]: print "%-30.30s %s %5d" % (t.name, t.state, t.elev)
Twin Lakes Reservoir and Canal CO 4003 Harold D Roberts Tunnel CO 3763 Eisenhower Memorial Tunnel CO 3728 Twin Lakes Reservoir Tunnel Nu CO 3709 Ivanhoe Tunnel CO 3680 Vasquez Tunnel CO 3653 Old Alpine Tunnel (historical) CO 3639 Hagerman Tunnel CO 3637 McCullough Tunnel CO 3635 Strickler Tunnel CO 3608 August P Gumlick Tunnel CO 3605 Charles H Boustead Tunnel CO 3603 Quandary Tunnel CO 3574 Chapman Tunnel CO 3561 Hoosier Pass Tunnel CO 3552 Harvey Tunnel LA 2 Posey Tube CA 2 Harbor Tunnel MD 0 Baytown Tunnel TX 0 Chesapeake Channel Tunnel VA 0 Downtown Tunnel VA 0 Midtown Tunnel VA 0 Thimble Shoal Channel Tunnel VA 0 Holland Tunnel NJ 0 Lincoln Tunnel NJ 0 Brooklyn-Battery Tunnel NY 0 Pennsylvania Tunnnels NY 0 Queens Midtown Tunnel NY 0 Webster Street Tube CA 0 Rapid Transit Trans-Bay Tube CA -2
So it isn’t necessary to support every SQL feature in the litletable API, since the objects *are* in what is essentially a list structure.
So far littletable has been a handy little tool for quick data manipulation, and maybe some simple database concept experimentation. Not sure if there is much more I really need to add – we’ll see if there are many takers out there for this little recipe.
Not sure how I got started with this, I think I was looking at some ORM-style APIs and wanted to try my hand at it. Not too surprising, my result is reminiscent of pyparsing – using operator ‘+’ to define table joins, and __call__ to execute joins and queries. I called this little project “dulce”, as it is really little more than a syntactic sweet, a wrapper around a Python list of Python objects. But here’s two things I like:
- a simple join syntax:
wishlists = customers.join_on("id") + wishitems.join_on("custid") + catalog.join_on("sku")
- a simple query-by-key syntax:
Also, all queries and joins return a new full-fledged dulce Table object, so chaining queries, or exporting the output of joins is very easy. And there is no schema definition, the schema “emerges” based on the object attributes used in queries and joins.
As it is pure Python, I think it would be ridiculous to make claims about how fast this is, although with psyco acceleration, loading and joining a table with 10,000+ zip codes takes about 2 seconds. I guess the biggest advantages would be:
- pure Python portability
- small Python footprint – single script file, about 500 lines long, including docs
- quick start to organize a collection of objects
- simple join interface
I created a SourceForge project for this little thing, you can find it here. I haven’t even packaged any releases yet, but the source can be extracted from the project SVN repository.