How We Reduced Our Rails Test Runtimes By 10x
Testing code is a core requirement for delivering quality software products. Like many others, at Socialcast we strive for comprehensive automated test coverage, and apply tight quality controls — including, but not limited to, requiring that each code release passes all tests with no failures in a controlled, continuous-integration (CI) testing environment.
In order to iteratively develop and quickly deliver innovations and improvements to our products, it’s imperative that we keep the testing feedback cycle quick. Slow test suites mean that developers are less likely to execute the complete set of tests before promoting code, increasing the likelihood of CI test failures. Worse, they also increase the cost and risk of those failures: during the time of initial test execution, fix, and re-run of tests, additional changes can get merged as other engineers promote their code. This can lead to a vicious cycle of failures and re-tests, inadvertently cascading discrete change sets into waterfall-like bundled releases, which are notoriously more difficult to troubleshoot.
Metrics and Goals
Like any optimization effort, having reliable metrics and setting goals against which we could measure progress was critical for us. In the context of continuous integration builds, we found the CI::Reporter gem to be a handy, low-overhead tool for tracking our test runtimes. Through its generation of JUnit-formatted test result files, we were able to use the test trending reports and graphs built into our CI server, Jenkins, to visualize performance over time both holistically, and also at the test suite and individual test case levels. This allowed us to not only objectively measure the effectiveness of changes, but also to focus efforts on the areas of the test suite with the greatest potential payoff.
In addition to setting a specific runtime goal, we also constrained ourselves to minimize changes to the actual production codebase, to avoid major test maintainability or readability degradation, and to have zero loss in test coverage.
Parallelism, memory usage, leaks, and garbage collection
Our initial approach to speeding up the full test suite execution was simply to parallelize the problem away, employing the parallel_tests framework. However, while the cpu processing headroom was there, it quickly became apparent that memory usage was both a capacity constraint, and also driving a significant garbage collection contribution to the test runtime. We applied garbage collection tuning parameters, and, inspired by earlier successes by 37signals, we implemented variable scrubbing in test teardowns. We saw good results, but continued to see memory leakage during test runs. Leveraging memprof to investigate, we found the following additional changes to be beneficial:
- For functional tests, Rails appears to hold on to references to the Controller objects even after scrubbing the test’s @controller variable. Scrubbing the instance variables of the controller itself in an ActionController::TestCase teardown (see gist) allowed the garbage collector to also purge this data successfully.
- With transactional fixtures in play, Rails issue #3300 leads to memory bloat, which we addressed via the patch referenced in the comments there.
The progressive application of these changes cut the peak memory footprint of serial runs of the full test suite by 75%, and reduced the runtime by over 30%:
Equally importantly, we were now able to proceed with parallelizing the test runs farther before running into memory constraints. Seeding parallel_tests with runtime information allowed for balanced execution of test suites across multiple processes, giving us near-linear scalability — limited by hardware resources, and eventually by the runtime of the slowest test suite.
Factories, Fixtures, and Transactionality
Up until this point, we hadn’t really had to change anything within the tests themselves — just the environmental structure in which they executed. However, as we increased parallelism, it became clear that we had a problem: the Big Slow Testsuite, a single test file with a runtime measured in minutes. Parallelization at the suite level couldn’t solve this “slow boat” issue, and splitting it up didn’t make architectural sense, so we dug into the code itself. Doing so exposed effective optimization approaches we’ve since applied to other tests.
The test in question involved validating access controls between and among the various models in our application. As such, it included instantiations of nearly every kind of fixture and factory defined for our tests — we use both traditional Rails fixtures (for simplicity and speed) and FactoryGirl factories (for more flexible and true-to-life modeling). Upon closer analysis, it became clear that the factory invocations were responsible for the majority of the runtime, and that we were repeatedly recreating equivalent factory objects for multiple tests within the suite.
Reluctant to abandon the power of factories, we made two significant changes to the test:
- We utilized the Transactionata gem to invoke factory creation once, at the beginning of the test suite. This then utilizes transactional fixtures behavior to restore the database to that known state before each individual test, without the recurring expense of factory invocation.
- We changed our test code, which in this case is built using the Shoulda framework, to consolidate multiple, related, non-mutating assertions into fewer test cases, by converting separate “should” blocks into “assert” statements within an aggregate “should” block.
Pragmatic testing: One experiment, multiple measurements
The payoff from the Transactionata implementation was clear: if we created an object from a factory ten times in a test, doing that instead only once and simply rehydrating it from the database the other nine times is a significant win — as long as the test logic itself doesn’t rely on either observing changes as they happen, or require the absence of pre-created data. (We did find the single setup block per test suite to be somewhat limiting, so we extended the approach internally to one which works fundamentally equivalently, but allows for switching out transactional contexts within a single test suite.)
Consolidating assertions might seem less palatable, being in apparent contradiction with the “one assertion per test” approach. However, in particular when dealing with testing object changes on the fly, the benefits of reducing executions of test setup blocks were substantial: For example, asserting that an API response has ten fields set correctly in one test is roughly ten times faster than generating that identical response in ten different tests, and asserting each field’s correctness individually. By annotating assertions with relevant text in the sometimes-neglected “message” argument, we were able to retain sufficient self-documentation, and avoided the cost of such things as repeating controller invocations only to assert a 200/OK response in one test, and a JSON content-type in another test of the same response, and so on.
Surprises and miscellany
A few interesting things that came up along the way:
- When running in parallel, we found that image manipulation activities ran suprisingly more slowly in the continuous integration environment. We tracked this down to ImageMagick defaulting to using all processors in parallel, which caused cpu contention when more than one operation took place concurrently; limiting the threads available to ImageMagick (which we did by setting the MAGICK_THREAD_LIMIT environment variable) made this more comfortably parallel.
- Placing the database and search engine filesystems on ramdisk, and disabling SQL logging in test environments, shaved about 5-10% off total test runtime. (These changes are safe because the test data is all transient.)
- Utilizing backgrounded processing in the production codebase, and selectively invoking that functionality only when needed for specific tests (via, e.g., resque_unit) also saved unnecessary overhead.
- Beware the obvious: An occasional “sleep” or remote web call can slip into any sufficiently mature test suite, with associated performance costs. Similarly, eliminating the spawning of subprocesses, especially involving image and other file manipulation, is a classic and effective optimization.
Through controlled memory usage, increased parallelism, transactional factory invocation, pragmatically judicious test refactoring, and a few miscellaneous extras, we were successfully able to reduce full test suite runtimes in our continuous-integration environments by a factor of ten. While mileage will of course vary depending on the size and scope of applications, their test suites, and architectural details, we hope that some of the above may be useful either concretely or motivationally — faster test suites help make happier, more effective, and more productive teams.