“dulce” becomes “littletable”

It’s funny how a little experiment can start to take on momentum all by itself. After looking at other Python databases, it wasn’t long before Google’s BigTable cropped in my searches.  This suggested to me a more descriptive and maybe more appopriate name for my experiment – littletable.  It’s expectations are modest, and so it has a fairly modest-sounding name.

Tables of objects are created simply by creating an empty table and loading like objects into it.  No schema, no SQL.  The attributes of the objects themselves, and the attributes used in the queries and joins, describe an organic, emergent schema.  I loaded a data table of zipcodes by state (from xxx), and a table of states.  There are a total of over 42,000 defined zipcodes (data as of 1999).  Here is a query of zipcodes:

fullzips = (zips.join_on("statecode") + states)()

A table can keep an index on a particular attribute, with the option to require uniqueness or not.  Indexes are used at join time to optimize the join performance, by minimizing the number of records that have to be sifted through.

The latest version of littletable (0.3) now includes table pivoting.  This makes it very easy to look at data in a large table to see how it is distributed across particular keys.  For instance, here is a table of the top 20 states with the most zip codes:

    TX Texas             2676
    CA California        2675
    NY New York          2238
    PA Pennsylvania      2224
    IL Illinois          1595
    OH Ohio              1470
    FL Florida           1449
    VA Virginia          1253
    MO Missouri          1192
    MI Michigan          1169
    NC North Carolina    1083
    IA Iowa              1073
    MN Minnesota         1036
    KY Kentucky          1016
    IN Indiana            992
    GA Georgia            975
    WV West Virginia      930
    WI Wisconsin          914
    AL Alabama            847
    TN Tennessee          806

created by “pivoting” the zip code table on the single attribute stateabbr.

The states with the fewest zip codes are:

    GU Guam                21
    VI Virgin Islands      16
    FM Federated State      4
    MP Northern Marian      3
    MH Marshall Island      2
    AS American Samoa       1
    PW Palau                1

And this query:

    nozips = states.where(lambda o:o.statecode not in zips.statecode)

returns a single record:

    ['UM', None]

(“UM” is the postal state abbreviation for the U.S. Minor Outlying Islands, a group of uninhabited islands SW of Hawaii – see  http://en.wikipedia.org/wiki/U.S._Minor_Outlying_Islands).

A nice characteristic of littletable queries and joins is that they each return a new fully-functional table, containing the joined and/or filtered records described in the query.  Tables can then be exported to CSV files, making it easy to save and restore the results of a particular query.  Tables are just wrappers around Python lists, so it is still possible to access parts of them using slice notation.

Here is a query from a database of US place names, retrieving all of the tunnels in the US, sorted by descending elevation.

    tunnels = us_names.query(feature="TNL", _orderby="elev desc")

Using basic python slicing, we can then find the 15 highest and 15 lowest tunnels in the country:

    for t in tunnels[:15]+tunnels[-15:]:
        print "%-30.30s %s %5d" % (t.name, t.state, t.elev)

Giving:

    Twin Lakes Reservoir and Canal CO  4003
    Harold D Roberts Tunnel        CO  3763
    Eisenhower Memorial Tunnel     CO  3728
    Twin Lakes Reservoir Tunnel Nu CO  3709
    Ivanhoe Tunnel                 CO  3680
    Vasquez Tunnel                 CO  3653
    Old Alpine Tunnel (historical) CO  3639
    Hagerman Tunnel                CO  3637
    McCullough Tunnel              CO  3635
    Strickler Tunnel               CO  3608
    August P Gumlick Tunnel        CO  3605
    Charles H Boustead Tunnel      CO  3603
    Quandary Tunnel                CO  3574
    Chapman Tunnel                 CO  3561
    Hoosier Pass Tunnel            CO  3552
    Harvey Tunnel                  LA     2
    Posey Tube                     CA     2
    Harbor Tunnel                  MD     0
    Baytown Tunnel                 TX     0
    Chesapeake Channel Tunnel      VA     0
    Downtown Tunnel                VA     0
    Midtown Tunnel                 VA     0
    Thimble Shoal Channel Tunnel   VA     0
    Holland Tunnel                 NJ     0
    Lincoln Tunnel                 NJ     0
    Brooklyn-Battery Tunnel        NY     0
    Pennsylvania Tunnnels          NY     0
    Queens Midtown Tunnel          NY     0
    Webster Street Tube            CA     0
    Rapid Transit Trans-Bay Tube   CA    -2

So it isn’t necessary to support every SQL feature in the litletable API, since the objects *are* in what is essentially a list structure.

So far littletable has been a handy little tool for quick data manipulation, and maybe some simple database concept experimentation. Not sure if there is much more I really need to add – we’ll see if there are many takers out there for this little recipe.

Advertisements
  1. #1 by ptmcg on March 16, 2012 - 4:38 am

    littletable is now a significant component of the reporting engine at the wholesale website, Buyer’s Best Friend – http://www.bbfdirect.com – best of luck, Adam Sah and company! If you own a specialty shop, find wholesale prices on chocolate, coffee, candies, snacks. If you are a small specialty item producer, list your wares on Buyer’s Best Friend, and start taking orders from anywhere in the world!

    And thank you Adam for your many additions to littletable!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: