Sign in

Programming

I have been a software developer since the late 1980s, mostly working with academic clients in the field of high energy physics and particle astrophysics. My specialties are developing distributed and/or Web-based applications in Python, and low level software engineering (Linux device drivers and embedded systems).

I prefer iterative, test-driven development and fully-automated deployment workflows with frequent releases (“Agile”). I like Clojure, other Lisps, Python, Git, Eclipse, Emacs, and many, many more things.

The site for my consulting business has more details. I’m also on GitHub at http://eigenhombre.github.com

Occasionally I will post a review of a workshop, conference or other interesting tidbit on the following blog. I also tweet about various obsessions, software or otherwise.




Page 1 of 2. next

Continuous Testing in Python, Clojure, and Blub

Saturday, March 31 2012 8:07 p.m. UTC

A separate monitor is handy for showing tests results continuously while working. The paintbrushes are strictly optional.

What follows is a somewhat rambling introduction to continuous, test-driven development, focusing mainly on Python and influenced by Clojure tools and philosophy. At the end, a simple script is introduced to help facilitate continuous TDD in (almost) any language.

For the last four years I have increasingly followed a test-driven approach in my development. My approach continues to evolve and deepen even as some of the limits of TDD are becoming clearer to me.

Initially, I had a hard time getting my head around TDD. Writing tests AND production code seemed like twice as much work, and I typically ran the program under development, e.g. with print statements added, to test each change. But making changes to old code was always a fairly daunting proposition, since there was no way to validate all the assumptions I’d checked “by eye” just after I’d written the code.

TDD helps reduce risk by continuously verifying your assumptions about how the code should perform at any time. Using TDD for a fairly large project has saved my bacon any number of times.

The basic approach is that the test code and the production code evolve together more or less continuously, as one follows these rules:

  1. Don’t write any production code without a failing unit test
  2. Write only enough production code needed to make the tests pass

Once I started writing tests for all new production code, I found I could change that code and make it better without fear. That led to much better (and usually simpler) code. I realized I was spending much less time debugging; and when there were bugs, the tests helped find them much faster. As I have gained experience with this approach I have found that the reliability of, and my trust in, my code written with TDD is vastly superior than otherwise. The two-rule cycle also tends to foster simplicity, as one tends to eschew any modifications that don’t actually achieve the desired objectives (i.e. meet the goals of the software, as specified formally in the tests themselves). The process is also surprisingly agreeable!

It goes without saying that if you follow this approach you will be running your tests a lot. The natural next step is to automate this more. In the book Foundations of Agile Python Development, Jeff Younker explains how to make Eclipse run your unit tests every time you save a file in the project. The speed and convenience of this approach was enough to get me to switch from Emacs to Eclipse for awhile (now I use both, in roughly equal measure).

Most of my daily programming work is in Python, but I have been an avid hobbyist in Clojure for several months now. It wasn’t until I saw Bill Caputo’s preparatory talk for Clojure/West here in Chicago that I heard the term continuous testing and realized that this is what I was already doing; namely, the natural extension of TDD in which one runs tests continuously rather than “by hand.” Bill demoed the expectations module and the autoexpect plugin for Leiningen, which runs your tests after every save without incurring the overhead of starting a fresh JVM each time.

(One point Bill made in his talk was that if your tests are slow, i.e. if you introduce some new inefficiency, you really notice it. Ideally the tests should take a few seconds or less to complete.)

Back to Python-land. Not wanting to be always leashed to Eclipse, and inspired by the autoexpect plugin, I started looking for an alternative to using Eclipse’s auto-builders — something I could use with Emacs or any other editor. There are a lot of continuous build systems out there, but I wanted something simple which would just run on the command line on my laptop screen while I edited code on my larger external monitor. I found tdaemon on GitHub; this program walks a directory tree and runs tests whenever anything changes (as determined by keeping a dictionary/map of SHA values for all the files). This is most of what I want, but it restricts you to its own choices of test programs.

In a large project with many tests, some fast and some slow, I often need to specify a specific test program or arguments. For example, I have a wrapper for nosetests which will alternately run all my “fast” unit tests, check for PEP-8 compliance, run Django tests, etc. In some cases, such as debugging a system with multiple processes, I may need to do something complex at the shell prompt to set up and/or tear down enough infrastructure to perform an existing test in a new way.

One piece of Clojure philosophy (from Functional Programming, a.k.a. “FP”) that has been influencing my thinking of late is the notion of composability: the decoupling or disentanglement of the pieces of the systems one builds into small, general, composable pieces. This will make those pieces easier to reuse in new ways, and will also facilitate reasoning about their use and behaviors. (Unfortunately, the merits of the FP approach, which are many, have poisoned my enthusiasm for OO to the extent that I will typically use a function, or even a closure, before using an object, which would perhaps be more Pythonic in some cases).

So, in the current case under discussion (continuous testing), rather than making some kind of stateful object which knows about not only your current file system, but also what tests should be allowed, their underlying dependencies, etc., it would be better (or at least more 'functional’) instead to simply provide a directory-watching function that checks a dictionary of file hashes, and compose that function with whatever test program suits your purposes at the moment.

The result of these thoughts is a small script called conttest which is a simplification of tdaemon that composes well with any test suite you can specify on the command line.

Some examples follow:

$ conttest nosetests # Runs nosetests whenever the files on disk change

$ conttest nosetests path.to.test:harness # Runs only tests in 'harness' object in path/to/test

$ conttest 'pep8 -r . ; nosetests' # Check both PEP-8 style and unit tests

It would work equally well with a different language ('blub’) with a separate compilation step:

$ conttest 'make && ./run-tests'

Using this program, depending on my needs of the moment, I can continuously run a single unit test, all my “fast” unit tests, or, if I’m willing to deal with slower turnaround times, all my unit and integration tests.

The script is on GitHub for your continuous enjoyment of continuous testing. May you find it helpful.

(Ironically, this script does NOT work that well for JVM languages like Clojure since the JVM startup time is lengthy (a couple of seconds on my MacBook Pro). For Clojure, 'lein autoexpect’ works great.)

Programming Languages

Thursday, Dec. 22 2011 6:09 p.m. UTC

Clojure Koans on a C-17 bound for Antarctica

Today I am inspired to ponder many languages at once and review which ones I use regularly, which ones I’m curious about, which ones I avoid, and what I’d like to use, if it were to exist.

Programming Languages I Use Regularly

Python

By far the language I use most for work. I like it for its clean philosophy, its expressiveness, its 'batteries included’ extensive set of libraries, and, first and foremost, for its readability.

C

Of all the languages I use regularly, C is the one I learned first. I maintain a large Linux kernel device driver I wrote for the IceCube project as well as an embedded system written for 5000+ sensors designed for the same.

C now feels like assembly language to me but I still appreciate its power and elegance.

Clojure

I have dabbled in Lisp since the 1980s but not seriously until recently. Somewhat seduced by Paul Graham’s essays on Lisp and encouraged by a bit of a Lisp renaissance, I have started reading up on Clojure and working through problems on 4clojure.com. While not without its warts, I like many things about Clojure, including the Lisp 'code-as-data’ philosophy, availability of macros (something I wish Python had) and its interoperability with Java classes. While I doubt I’ll be able to use this in my paying work any time soon, I have started playing with Clojure for personal projects.

Having to deal with significantly concurrent systems in my work, I am intrigued by functional programming, as opposed to the usual object-oriented approaches where state is king and where tangled hierarchies of relatively meaningless relationships can obscure intent (see The Kingdom of the Nouns). Clojure takes an interesting approach, with its emphasis on immutability, software-transactional memory and other concurrency primitives.

Bash / Unix tools

It amuses me slightly to include bash here, but combining simple iteration with conditional statements and adding basic Unix concepts and tools such as pipelines, grep, awk, sed, wc, etc. is surprisingly powerful. Every small Bash trick or new tiny-Unix-tool I learn seems to eliminate the need for some number of actual programs, at least for quick-and-dirty work. The results tend to be obscure and hard to parse; if I can do something in a single line of bash, I will; otherwise I’ll resort to Python for most things.

JavaScript / CoffeeScript

Not my favorite language by any stretch, but you can’t avoid it if you’re working in the browser (I don’t consider closed-source Flash an option). The language has a lot of warts, but some good parts too. I can feel the Lisp bones deep underneath the surface of the language when I dive into JavaScript. CoffeeScript is sweet because it’s so much more readable and offers protection from common JavaScript gotchas, but has some flaws of its own.

Programming Languages I Have Used in the Past but Tend to Avoid

Perl

I fell in love with the power of Perl (“the duct-tape of the Internet”) back in the 1990s, but now dislike its strange, ad-hoc syntax and the relative inscrutability when compared to Python.

Java

I haven’t done a ton of Java development, but have done enough to be irritated by certain things about it: its extremely verbose syntax, strict typing, distance from the actual hardware, and lack of (at least until now) anonymous functions (“lambda”). Also the JVM startup time is irritating, a problem Clojure inherits from Java (though there are workarounds).

Java has become so ubiquitous, however, that it’s hard to avoid, and it does have a certain self-consistent habitability to it. I think current JVM languages such as Clojure and Scala will only strengthen the role of Java and the JVM in modern computing, unless Oracle massively screws things up.

C++

Another language I’ve played with a bit. A language that splits the difference between C and Java (I realize C++ came before Java); I would prefer to write in a “real” higher level language and glue C in where needed.

FORTRAN

I’m sorry to say that, coming from physics, I’ve written more FORTRAN code than I care to admit. I find it interesting, however, that while Lisp and FORTRAN are almost the same age, Lisp still holds interest where FORTRAN does not (except to pure number-crunchers, due to ancient and venerable numeric libraries).

Languages I’m Curious About But Haven’t Had Time to Look At Much

Exposure to purely functional programming and lazy evaluation in Clojure made me curious about Haskell.

I am curious about Erlang, which is supposed to have excellent concurrency features.

I saw some talks about Go at OSCON. Go looks like it has some really nice features compared to C (compilation speed, concurrency support, and improved readability), but it may be a bit low-level for my interests.

I have only tinkered with Objective-C, but that is the language of choice for serious Mac OS X or iOS development. Its syntax looks pretty odd, but perhaps that’s a small price to pay for running on all that pretty hardware.

Purely logical languages such as Prolog (equivalents of which can be easily implemented in Lisp) are of interest for their ability to process large amount of semantically-related content. I’m curious about expert systems, ontologies, the Semantic Web, and many other related areas of AI research.

Languages I Haven’t Used Much or At All and Hope to Avoid

Anything .NET (Microsoft-centric). PHP (even more ad-hoc than Perl).

Ruby

Ruby is in a category of its own, because I don’t really love the language, but I appreciate certain of its syntactic features, and I know people who are quite passionate about it (including my dad, who has written a comprehensive and full-featured family-tree-and-photo Web application in Ruby on Rails). Also because it has made such an impact on the Web application world, through Rails. Ruby itself seems to occupy a space somewhere between Perl and Python, borrowing the elegant parts of each more than the warts. The main reason I have for exploring Ruby, however, is that it is the scripting language for SketchUp.

The Language I Wish Existed

The perfect language would:

  1. Be very readable, like Python (whitespace or other visual cues probably playing a significant role)
  2. Support full Lisp-like macros (“homoiconicity”)
  3. Have very broad library support (Python, Java, ...)
  4. Have built-in features in support of test-driven development (Python’s doctests and Clojure’s :test metadata seem like just the beginning of what might be possible)
  5. Start fast — crucial for test-driven development (unfortunately the JVM startup time rules this out for Clojure, unless you use Cake)
  6. Handle concurrency very well (Clojure, Erlang, ... but not Python)
  7. Run in the browser, or be implemented efficiently on top of JavaScript
  8. Allow you to get very close to the machine if necessary, or at least the bytecodes of the virtual machine or interpreter (Python, C, C++, ...)

Points 1-3 are the most important to me. Resolving the tension between points 1 and 2 is of particular interest.

I doubt such a language will come along any time soon. But I’m taking a class next month which, who knows? ... might help someday.

OSCON video

Saturday, June 4 2011 10:25 p.m. UTC

I keep forgetting to post this slashdotted video.

Python Concurrency Workshop

Monday, May 18 2009 midnight UTC

Thursday and Friday, fellow IceCube-r Dave Glowacki and I took David Beazley’s workshop on concurrency in Python here in Chicago.

Dr. Beazley is a former math/physics geek and University of Chicago professor, musician, and consultant, currently teaching on Python full time. He is the author of the Python Essential Reference (soon to be out in a new edition).

I felt quite positive on the whole about the class. Most of the software I’ve worked on recently for IceCube has been parallelized in various ways (relying heavily on RPC and/or threads), and I’ve also wrestled with some frightening concurrency issues while writing device drivers, but it was pretty mind-blowing to see how many multiprocessing tools (and hazards!) there are in Python:

  • Threads. We covered locks, reentrant locks, R/W locks, context managers, decorators, metaclasses, semaphores, queues, and condition variables. Also race conditions, deadlocks etc…. unfortunately Python performs poorly when it comes to CPU-bound threads on multiple cores, due to somewhat broken contention semantics relating to the Global Interpreter lock. We covered this in some detail, even studying the interpreter source code itself.
  • Traditional Unix tools for IPC: pipes, named pipes/FIFOs, sockets, memory mapping.
  • Object serialization strategies and getting data into/out of low-level data structures.

  • The 'multiprocessing’ library, new to Python 2.6. This is a very cool and flexible module which provides nearly the same semantics as the threading library, but which has add-ons for using and managing pools of worker processes on the same physical machine… an approach which becomes more and more relevant as
the limitations of physics push CPU design into more and more cores.

  • Standard concurrency algorithms and patterns: pipelines, worker pools, map-reduce, ...
  • 'Asynchronous’ I/O — both the true asynchronous I/O which is based on signals, and the more common sort which is based on poll() or select(). Using 'select’ to create event-driven programs.
  • Generators and coroutines.

Though I was pretty fried by the end due to the massive info-dump and the simulated jet lag (the class schedule was early for me), I found the very last part to be the most interesting. I’d used generators before, but I had only seen coroutines in extremely gnarly C-language contexts and in Lisp.

Coroutines were introduced into Python in version 2.5. With coroutines and generators, you can make really wild, elegant and powerful patterns in modular programs which pass data between sub-sections, simulating a sort of concurrency without any of the usual race condition hazards one has when using threads. David even showed an elegant, mind-blowing task scheduler implemented with generators in pure Python. It’s not completely obvious to me why you’d want to do this, but it’s really cool that you can!

David’s presentation skills were enviably clear, efficient and easygoing. One of the best things about the class was that we would talk for awhile, and then Dave would toss us a problem, and we’d hack for awhile on that. Then we’d go back to talking, followed by more hacking, and so on. This really allowed the concepts to cement much more quickly than they would have if Dave just talked the whole time.

I have always loved seeing languages (human and computer) pushed to the limit, showing their expressive power in new ways. For example, one of the things I’ve always loved about Lisp was the implicit emphasis on metaprogramming (for example, Lisp macros allow one to create arbitrary new language constructs, not just objects or functions). Before this class, and our study of metaclasses and decorators, I didn’t realize Python had as many metaprogramming capabilities as it does (and I hope it gets more). Metaclasses in particular seemingly don’t (and probably shouldn’t) get used for many ordinary programming tasks, but it’s good to know they are there, because occasionally one does find situations where a whole mess of repeated code can be eliminated with the judicious use of metaprogramming.

It was a real pleasure to discover new capabilities of what has become my language-of-choice for most personal and work-related tasks. Though I had my doubts about the cost of the course (I do wonder if the course cost could be trimmed slightly with a cheaper venue and communal lunch in a nearby restaurant), the experience on the whole was energizing and informative… so much so that I spent the weekend hacking Python code for my various Web sites.

I look forward to more locally-grown and -taught advanced Python classes from David Beazley.

OSCON 2008

Wednesday, July 30 2008 midnight UTC

Highlights from the O’Reilly Open Source Convention, July, 2008 (Portland, OR)

General Observations

This was my first OSCON.

It was interesting to see first hand the evolving relationship between the big vendors and the Open Source Software community. Of course, the big guys were all in attendance at the Expo: Google, Amazon, Microsoft, Sun, etc. (with the notable exception of Apple). To judge by comments and reactions expressed at the conference, Microsoft’s star definitely seems to be setting and Apple’s rising… there is a big outcry against 'vendor lock-in’ with Apple’s iTunes/iPod/iPhone/... as the classic bad-guy example; yet there are tons of iPhones around at the conference and the majority of participants seem to have Macs. There seems to be a sense of good design and usability gaining in importance, with Apple seen both as something to fight and to emulate.

There was a big focus on scripting and the Web, but rather less less than I would have expected on Linux and desktop apps (if Linus Torvalds was in attendance, I didn’t any evidence of it). There was even an introductory talk on C for people who cut their teeth on higher-level languages (which I skipped).

There were also several talks on design and usability (some better than others) which may indicate an increased focus on usable software from the community, which would be a welcome change.

Notes from the talks I attended

Christine Peterson” gave a very interesting talk on security and privacy. She said the Open Source community dropped the ball on Electronic Voting, which is neither transparent nor secure in many or most US implementations. Nanotech and new sensor technologies will be used to try to counter terrorist and other threats. She asserted that the Open Source community, with its deep knowledge of both privacy and security issues, is uniquely positioned to influence the direction of the new technology and how it is used.

Two other engaging keynotes were Why Whinging Doesn’t Work by Danese Cooper on the relative lack of women in software and what to do about it, and fork() && exec(): Spawning the Next Generation of Hackers by Nathan Torkington on how to teach your kids to program.

Eat my Data – How Everybody Gets File IO Wrong by Stewart Smith explained the various stages of POSIX file I/O and pointed out various ways your application can lose data even if you think it has been written to disk correctly. If I got it right, you have to do the following:

  • open() a temp file
  • write()
  • fsync() the file to get your blocks written to disk (not implemented correctly on OS X!)
  • close()
  • rename the file to the desired location
  • fsync() the directory to get the inodes updated
  • Most importantly, check error codes in each case!

After watching Smith’s talk you’d be amazed that any data ever makes it to its proper place!

The Age of Literate Machines by Zak Greant was a wonderful historical and philosophical foray into the history of language, law and computation, with an emphasis on the impact of free software and free information on a free society. This was possibly my favorite talk of the conference.

Matt Russell gave a neat talk on JavaScript animations with the Dojo toolkit. I have been using JQuery happily for IceCube Live but it looks like Dojo is even more powerful (though with a slightly more complicated interface. I even won a copy of his book!

Ryan Briones” spoke about Ruby frameworks. I didn’t realize Rails had so much company. It still seems Rails is the main go-to focus if you’re in the Ruby camp (personally I love Django but my Dad’s crazy for Rails).

Chris Shiflett gave an overview of the two primary Web site security vulnerabilities: Cross-site Scripting (XSS) and cross-site request forgeries (CSRF). Simple rule of thumb: filter your inputs, and escape (de-HTML-ize) your outputs. These attacks have plagued even the big names (Amazon, Google, ...).

After sitting with Keith and Dave and I for lunch, Jacob Kaplan-Moss gave a talk about some cool undocumented features in Django; there was a more comprehensive Django tutorial which occurred before I arrived, but the notes (which I downloaded but can’t remember from where) are very helpful. Django is about to cough out it’s 1.0 release… to keep up with the latest features (and they are doing nice things), you have to stay on the trunk of the code.

One of the funnest, or at least funniest, of the talks was given by two Google Guys — a well-rehearsed and inspirational talk about usability, although most of the topics they covered are available in a few good books.

Expo

As far as the expo went, one of my favorite exhibitors was Inveneo, who are bringing affordable low-power, Open Source computers to Africa (not the same as OLPC, as it seems they are focusing on shoring up services such as education, health care and economic development rather than providing laptops to individuals).

Our Presentation

Dave Glowacki and Keith Beattie, my partners-in-crime

The Electromagnetic Sonic Boom (photo by Keith)

Our presentation (slides here) went pretty well (5/5 stars so far in all our ratings). The crowd was maybe 60 people only, probably due in part to it being the last time slot. But the people who did attend seemed more 'present’ than usual (less multitasking on laptops), and there were some good questions both during and after the talk.

There was a similar talk about NASA data processing which I did not attend due to the fact that it was in a different room right before our talk.

Afterwards

Portland Beer Festival

Portland is a wonderful city — I stayed with friends after the conference and experienced the Portland Beer Festival. Thanks to the Lishkas and to Robin and Neil for a great weekend!

Page 1 of 2. next