Acceptance Testing with Pavilion
Paul Ferrell
Abstract
Testing new clusters for operability, performance, and contract compliance,
otherwise known as acceptance testing, generally requires a great deal of
time and effort from a significant number of staff. Tests must be scripted,
made compatible with the new cluster, and calibrated for optimal
performance. Documentation for running these tests is typically lacking,
with expertise spread across disparate individuals who each have their own
methodologies and organization. Everything from test build scripts to
results are in custom, incompatible formats, necessitating a substantial
amount of coordination when performing acceptance testing.
Pavilion 2.0 was developed with the goal of streamlining the day-to-day
operational and regression testing of production clusters at Los Alamos
National Laboratory (LANL). Pavilion provides a standardized, YAML config
based method which automatically handles many of the complicated and system
dependent tasks involved in building and running tests and performance
metrics. Results are gathered automatically into a common JSON format which
can be displayed or graphed directly via Pavilion, or fed into Splunk or
similar tools. Pavilion has allowed our team to drastically improve the
quality and quantity of the tests we run to ensure consistent functionality
of our systems.
With the advent of Crossroads, our new flagship Advanced Technology System
(ATS) and its associated smaller clusters, we sought to simplify our
acceptance testing processes as well. While this has involved the addition
of new features in Pavilion itself, it has primarily entailed the
development of Pavilion test configs for a wide variety of performance and
functionality tests. By developing the test configurations under Pavilion,
we were able to take advantage of Pavilion’s ability to modularize the
process, allowing these tests to be easily adapted to each of the new
clusters. Moreover, the standardization under Pavilion makes it so that
anyone on our testing team can now build, run, and gather standardized
results for any of the acceptance tests. These new test configurations
were written with more than just LANL in mind - they are intended for
incorporation into Pavilion itself as a library adaptable to any new cluster.