From the book writing procrastination dep't
I've recently had to step away from a great project at Lotterywest – it was difficult to continue after having to leave the country. :-) The project goes live in a few months, but (hopefully) we managed to finished most of the development work.
I can't talk too much about the project itself, except to say that when it launches, it will probably be one of the highest-traffic Plone 4 sites in existence. Moreover, it's a site with long lulls followed by huge spikes in traffic, that needs to cater for both logged-in members of the public and anonymous users, with different load profiles.
I do want to talk a bit about the tools and technologies we used, in the hopes that others will find it interesting. This is easily the most sophisticated stack I've ever used in a real-world project. And on the whole, it's worked extremely well so far.
The team
Beyond myself, the team consisted of two very capable developers with experience of Plone 2.5 and through-the-web development, but limited Plone "filesystem" experience, on a full-time basis. In addition, we had the help of a network engineer and a tester, as well as a peripheral team of testers, trainers and others.
A side-goal of the project was to leave behind tools and working practices that could be applied to future development projects. As such, a lot of thought went into how the development environment was set up and how it could be reused.
Development environment
We used an agile development process centered around Scrum. To support this, we used Pivotal Tracker to manage stories and defects. This is my third attempt at using Pivotal in anger, and I'm pretty happy with it. Whilst certainly not perfect, it's simple and user friendly enough to fit into my preferred workflow, and it helps the release planning and story estimation process. That, and the price is right.
For source control, we used an internal Subversion server. This was a step up from CVS. I'm pretty confident that Subversion was the right choice: a DVCS would have been far too complicated and confusion-prone for this project.
We installed Trac on our development server as well. We didn't actually use its issue tracking capabilities (since we kept all features and defects in Pivotal), but made use of its wiki and the Subversion browser. I find a project wiki useful for keeping track of "tips and tricks", build instructions, third party contact details and other transient information. I'm not sure Trac was the only (or even the best) choice here, but it did the job. My biggest gripe with it is probably that the wiki syntax is a bit awkward.
Naturally, our development process insisted on having tests for everything. We used Hudson to run the tests regularly and alert us of any regressions. Continuous Integration is hugely important. If you don't have it in your project, get it now. Hudson is easy to set up and flexible enough for all your CI needs.
The last thing to go on our development server was an Apache instance serving a static directory to which selected users had write access over scp. We used this as the release target for jarn.mkrelease when making internal releases of our own packages: The rule was that no production release could contain Subversion checkouts of packages. Instead, we made internal releases to this directory, which was listed in the find-links option in our buildout.
All of our environments were managed with Buildout, of course. To make the buildouts more re-usable and robust, we kept a number of buildout files in a directory called buildout.d. Each file was responsible for building or configuring one aspect of the system. For example, prod-nginx.cfg would configure nginx for the production server, and dev-beaker.cfg would configure Beaker (which we used for session management via collective.beaker) for the development environment. In addition, buildout.d/templates contained templates for configuration files, which were set up using collective.recipe.template.
At the top level, we had the following files:
- packages.cfg, listing known good sets for packages we used, specifying checkout locations and packages for checkout by mr.developer (an indispensable development package management tool), and defining egg working sets for deployment and testing. The buildout.dumppickedversions extension was used to notify of unpinned dependencies.
- versions.cfg, containing our own known good set for our own released packages, as well as third party dependencies not covered by other known good set. This file was included from packages.cfg.
- One top level file for each environment. The default buildout.cfg was used for the development environment. The other files had names corresponding to servers, e.g. prod-app-master.cfg for the main application server and prod-web-1.cfg for the first of two web servers.
The top level "environment" files were only allowed to include extends lines to bring in the required components, and settings for host names, ports, users, etc. For example:
[buildout] extends = buildout.d/base.cfg buildout.d/prod-base.cfg packages.cfg buildout.d/prod-lxml.cfg buildout.d/postgres.cfg buildout.d/postgres-relstorage.cfg buildout.d/prod-beaker.cfg buildout.d/prod-instance.cfg # Hostnames to use for various services [hosts] public = www.example.org master = server-1 slave = server-2 postgres = server-1 # Ports to use for various services [ports] instance1 = 8801 instance2 = 8802 instance3 = 8803 instance4 = 8804 postgres = 5432 # Users to run as [users] zope = nobody postgres = nobody
Each file in buildout.d used the the same pattern. Here is an example to build HAProxy:
############################################################################## # Production HAProxy - load balancer ############################################################################## [buildout] parts += haproxy-build haproxy-config # Configuration # ************* [hosts] haproxy = localhost [ports] haproxy = 8200 [users] haproxy = nobody [downloads] haproxy = http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.8.tar.gz [haproxy-build] target = generic cpu = generic # Recipes # ******* [haproxy-build] recipe = plone.recipe.haproxy url = ${downloads:haproxy} [haproxy-config] recipe = collective.recipe.template input = ${buildout:directory}/buildout.d/templates/haproxy.conf.in output = ${buildout:directory}/etc/haproxy.conf
The settings under [hosts] and [ports] are expected to be overridden in the top-level buildout file.
In the development buildout, we installed a number of tools:
- An "omelette" of all installed eggs for easier debugging, using collective.recipe.omelette.
- A test runner and coverage reporting tool.
- A script to help check for new versions of pinned packages.
- ZopeSkel, for making new packages.
- jarn.mkrelease, for making internal releases easily.
We also installed the following eggs into the main development Zope instance:
- BPython, for a nice interactive shell
- plone.reload - absolutely indispensable
- Products.PDBDebugMode, for instant debugging
- Products.PrintingMailHost, to help debug code that sends mail
Finally, we installed Sphinx, which we used to build documentation from reStructuredText files under source control in the docs directory in the build. This is probably the thing I'm most pleased about. We had a rule that no story could be completed without documentation being added to Sphinx. We then set up Hudson to automatically build and deploy the documentation after a successful build. The result is the best-documented project I've ever worked on. Design decisions, maintenance tasks, critical go-live activities and "how the hell did that work again" type documentation all found its way into the Sphinx documentation. Instead of leaving all the docs to the end, we had a continually expanding body of knowledge, and a process to ensure that it was not neglected during busy times of the project.
Each developer's machine ran Mac OS X with TextMate, Terminal and Firefox as the main "IDE". Firebug was of course installed. We used the Zope bundle for TextMate, which includes "pyflakes-on-save" functionally - a big time saver and code quality improver. We also used David Glick's mr.igor to help remember Python imports.
During deployment, we used FunkLoad for extensive load testing. At one point, we had two 8-way/32Gb machines generating load. To facilitate that, we wrote some scripts since released as BenchMaster. If you have never worked with FunkLoad or done proper load testing of your solutions, you're missing out. It's hugely important, and helped us identify a number of bottlenecks and optimisations that would almost certainly have brought the site to its knees in its first week after launch.
Production deployment
We used a number of technologies in the production deployment. Each deserves a blog post in its own right, but here is a quick run-down:
- We had two identical, redundant servers running nginx and Varnish.
- nginx was used to accept SSL traffic, perform Zope virtual host monster URL rewriting, and force the user into and out of SSL as necessary. We also used nginx to add certain request headers used to optimise caching and load balancing, and to serve a "panic page" - a static HTML file to which HAProxy would redirect if no Zope backends were available.
- Varnish did what Varnish does: Make the site fast. We used the Varnish configuration bundled with plone.app.caching as a starting point, and tweaked it for our fairly unique load profile.
- Behind these servers, we had two application servers: One running HAProxy, Zope, Memcached and PostgreSQL, and one running additional Zope instances.
- HAProxy was configured to distribute load across the back-end Zope instances. It used headers set by nginx to route content authors, other logged-in users and anonymous users to appropriate back end Zope instances. We kept a pool of "shared" instances, with some instances ring-fenced for certain types of traffic. If no instances could be found, HAProxy would redirect the user to a "panic page" served directly by nginx.
- Zope was used to run Plone, obviously. In total, we had 16 Zope instances on each of the two 8-way back-end machines. These were configured to use RelStorage against a Postgres database. Additional relational database access was provided via SQLAlchemy. Session management and shared caching used Beaker (via collective.beaker), which was configured to store its data in Memached, allowing sessions to be non-sticky. Theming was provided by XDV, via collective.xdv (in our tests, we got better performance out of this than deploying the theme to a separate nginx instance). Cache control was provided by plone.app.caching. All custom content types were built using Dexterity. We used collective.tinymcetemplates for content templating, and collective.transmogrifier for migration from the previous site.
- Memcached was used by RelStorage and Beaker.
- Additionally, each build contained a specific Supervisord configuration to start and stop all relevant services with a single command.
- We configured a central Syslog server using rsyslog, collecting logs from all relevant services on all production servers. This was configured to insert log entries into a separate Postgres database. We created views for common log queries (e.g. "all errors in the last 24 hours"), and exposed these via phpPgAdmin, a simple, but effective solution for centralised log analysis.
- The logging server also acted as a Munin server, with each production server acting as nodes.
Overall, this project started to "feel right" pretty soon. The development environment and infrastructure held up very well, and was able to accommodate changes both in the requirements and our understanding of the problem domain. Some highlights for me were:
- We managed to build documentation with the code, by incorporating Sphinx into our workflow.
- We avoided deploying code from Subversion by using internal releases. jarn.mkrelease was a big help here.
- Varnish is just such an awesome piece of software.
- And nginx is not much worse. :-)
- HAProxy could handle everything we threw at it, and then some. I will definitely use it again. For very simple scenarios, the built-in nginx or Varnish load balancers may suffice, but for complex setups like this, HAProxy is awesome.
- FunkLoad was a revelation. Not only did we find bottlenecks we wouldn't have found otherwise: thinking through the load test scenarios and the load test results helped us understand how our site would need to be built to perform acceptably under load.
- Plone 4 is a fantastic release. We started around beta 2, and it's been virtually flawless since, save for a few minor hiccups.
- XDV is clearly the future of theming - perhaps not just Plone theming, but theming in general. We ended up espousing several improvements that Laurence kindly put in for us. With the 0.4 release, I think it's reaching maturity. It quickly became an integral part of our workflow, and a favourite of one of our team members who's got considerable experience theming Plone and other CMS's.
- Having taught people Archetypes development in the past, I have no doubt that I prefer teaching Dexterity (and Grok-like views). It's quicker to learn, more intuitive and more consistent with "modern" Zope and Plone.
For me personally, this project was a very positive experience. If nothing else, it has taught me a lot of things I intend to put in the book I should be writing an update for right now...