Agile Development and Testing in Python

Grig Gheorghiu and Titus Brown

Tutorial given at PyCon 2006, Feb. 23, Dallas

"We must be steady enough in ourselves, to be open and to let the winds of life blow through us, to be our breath, our inspiration; to breathe with them, mobile and soft in the limberness of our bodies, in our agility, our ability, as it were, to dance, and yet to stand upright." -- T.S. Eliot


  • "agile" means primarily rapid feedback, small frequent iterations, continuous integration, automated testing
  • testing pyramid with the various types of testing: unit tests, functional/acceptance tests, UI tests

  • functionality implemented as stories; a story is not done unless it is unit-tested and acceptance-tested
  • automated tests
    • safety net that enables merciless refactoring
    • unit tests ("code-facing tests") should run fast
    • acceptance/functional tests ("customer tests", "business-facing tests") should run via continuous integration (smoke test)
    • "holistic" testing: test the same story at all layers: unit, business logic and UI
  • "tracer bullet" development technique: like candle-making (dip wick in wax, you get a thin but functional candle; keep dipping until you get a fully formed one)
  • remote pair programming works really well: each of the two is accountable to the other
  • writing unit tests for code you did not write: great way to learn/understand code you didn't write, or 3rd party library


Setting up your project


  • Trac is a combined system for managing tasks, tracking bugs, collaborating on docs and specs via a wiki, and browsing source code.
  • Trac is easy to set up, easy to use, attractive, and the interface should scale to projects with 10 or more developers.
  • One extremely useful thing about Trac is the ability to link between bug/task tickets, changesets, source, and wiki pages: everything is treated as wiki text.
  • Also, "recent changes" and "roadmap" pages are very useful.
  • Ease of wiki editing makes it a good way to jot down idle thoughts; same with ticketing system.

Lesson learned: Trac is a fantastic collaborative tool for doing 'test-enhanced' development.



  • subversion is a centralized source-code management system like CVS, but better in a number of ways.
  • integrates with Trac.
  • talk about layout... what else? -> maybe trunk/branches/tags?

packaging with


Continuous integration with buildbot

  • continuous integration is important for the immediate feedback it gives you about changes you made; the more often you build and test your software, the quicker you will discover and fix bugs
  • buildbot is based on Twisted; pretty hard to configure, but works really well once you have it up and running; can be deployed behind Apache for access control purposes
  • master process kicks off build-and-test process on configured slaves (via scheduler or email notifications)
  • easy to extend master.cfg module with extensions for running various types of commands and tests
  • you get the most value out of the process if you have slaves running on as many of the OSes/Python versions/etc. you plan to support
  • run as many types of testing as possible via buildbot: unit, acceptance w. fitnesse & texttest, unit/functional with twill, UI with selenium, coverage and profiling, egg creation and installation (this exercises!); the more aspects of the app you cover, the better protected you are against breakages

Lesson learned: running unit/acceptance tests as 'buildbot' user will uncover all kinds of environment-specific issues (OS version, Python version, paths, required package version)


Code coverage analysis

  • is a tool for analyzing exactly which lines of code are touched in a Python session.
  • uses settrace, and hence works with multiple threads in a single process.
  • won't work with multiple processes without some tweaking, because of the way it saves results.
  • can measure coverage from the command line, or build it into your tests; nose has built-in support.
  • aim for 90-95% coverage on basic automated unit tests.

Lesson learned: code coverage analysis is an excellent way to figure out which portions of your code are being missed by your various tests. Code coverage analysis by itself isn't incredibly useful; instead, use it to figure out which tests to add or modify.


Acceptance/regression testing with texttest

  • texttest is a tool for "behavior-based" acceptance testing, i.e. it looks at the overall behavior of the AUT as expressed by log files, stdout and stderr
  • golden images of logs, stdout/err are compared with what is obtained at run time
  • documentation on project's home page is plentiful, but lacks howtos
  • to get the best mileage out of it if, plan your logging carefully, with different severity levels, and have different log files for different functionality areas of the app (a single log file tends to change too much and too rapidly)

Lesson learned: as a developer you need to be disciplined about how you add new logging to your app: add new code with no new logging, then run texttest and make sure nothing broke, then add new logging and regenerate golden images (alternatively, you can tweak log severity levels)


Acceptance/functional testing with FitNesse

  • FitNesse is a more user-friendly variant of Ward Cunningham's FIT framework
  • "business-facing" tests, as opposed to "code-facing" tests (i.e. unit tests)
  • tests are expressed as stories ("storytests") in business domain specific language, at a higher level compared to unit tests
  • FitNesse tests make sure you "write the right code", while unit tests make sure you "write the code right"
  • tests are written in tabular format, with inputs and expected outputs
  • wiki format encourages collaboration
  • one of the main advantages of using Fit/FitNesse is that it brings together business customers, developers and testers, and it forces them to focus on the core business rules of the application
  • James Shore: Done right, FIT fades into the background
  • PyFIT is Python port of FIT; also supports FitNesse
  • fixtures are a thin layer of glue code that tie the test tables to the application
  • the main types of fixtures are ColumnFixture (similar to SQL select/insert row from/into table) and RowFixture (similar to SQL select from table); together they cover the vast majority of testing needs
  • declarative style (ColumnFixture) vs. procedural style (ActionFixture)
  • FitNesse tests can also be executed from the command line, so they can be included in a smoke test run via a continuous integration tool such as buildbot

Lesson learned: acceptance tests written with FitNesse prove to be very resilient in the presence of code changes, as opposed to GUI tests, which are fragile and need frequent changes to keep up with changes in the UI.


Unit and Functional Testing Web apps with twill

  • testing Web applications is a pain: can test backend code in many ways, but Web frontend is annoying and picky.
  • twill is a domain-specific language that runs an "HTTP driver", i.e. it's a command-line programmable browser.
  • written in Python, so you can script it and extend it from Python quite easily (although not necessary).
  • should work like a normal browser, with the exception of JavaScript (which it doesn't understand).
  • twill fits somewhere between unit tests and functional tests. You can use it to test small isolated bits of functionality if your Web app is built that way; many Web apps require a sequence of events (login, cookies, multiple forms, etc.) so less "unit"-y and more functional testing.
  • we have used twill for both quick, automated "smoke tests" (can the site serve a basic Web page?) and more extensive functional tests.
  • also useful for setting up/tearing down Selenium tests (see below).
  • twill can be used for in-process testing that bypasses the network, obviating the need to bind a socket.

Lesson learned: twill can test essentially all non-JavaScript Web functionality quite easily, and when combined with coverage analysis it's an excellent way to quickly "cover" your application with tests.



  • functional/acceptance testing at the UI level for Web applications
  • uses an actual browser driven via JavaScript
  • unique features
    • client-side JavaScript testing (think AJAX)
    • browser compatibility testing: can be used cross-platform and cross-browser
  • Selenium framework needs to be deployed on the same server that is running the AUT
  • individual tests and test suites written as HTML tables, similar to tests written in FIT/FitNesse
  • tests consist in actions (open, type, select, click) and assertions
  • HTML elements identified by
    • id or name attribute (ideal)
    • XPath expression (can be tricky, doesn't work very well in IE)
    • DOM traversal syntax
  • Selenium IDE: very helpful record/playback tool
  • XPath Checker and XPather Firefox extensions
  • "Driven mode": stand-alone Selenium server using Twisted + XML-RPC
    • uses reverse proxy to get around JavaScript cross-site scripting security limitation
    • can be driven by scripts written in any language with XML-RPC bindings
  • Selenium tests can be added to the continuous integration process
    • setup/teardown via twill
    • post results for reporting purposes

Lessons learned:

  • Selenium tests are fairly brittle (as all GUI tests are) in the presence of UI changes
  • it's no great fun to write the tests either, but the Selenium IDE helps a lot
  • Selenium is the only way known to humankind for testing AJAX


Agile documentation with doctest and epydoc

  • doctest: "literate testing" or "executable documentation"
  • unit tests expressed as stories (big docstrings) that offer context
  • many projects generate documentation from doctests with minimal processing code (Django, Zope3)
  • epydoc makes it trivial to show the storytests
  • test lists: set of unit tests for a given module
  • test maps: set of unit test functions that exercise a given application function/method
  • easy to generate automatically and integrate with epydoc

Lesson learned: unit test duplication is not a bad thing; when writing "agile documentation", we discovered bugs that weren't caught by the initial unit tests



[tests] DO OR DO NOT [pass]


-- paraphrasing Yoda