After more than a decade, the sun has set on Python 2. Love it or hate it, Python 2.7.18 is the final official release — and to remain current with security patches and continue enjoying all of the new developments Python has to offer, it’s time to upgrade to Python 3.
Nearly all common Python packages have already made the conversion, and many major projects have dropped backward support. This leaves a shrinking island of support for anyone still maintaining legacy software.
One of the most difficult decisions in program management is whether to invest in upgrading to a new major software version, or start freezing requirements and commit to legacy long-term support. For our data science team, we had to weigh the cost of maintaining backports of major machine learning and math libraries against the cost of reworking all of our existing feature extractors and applications.
Our team could not ignore the security implications of being trapped on a legacy code base — both in terms of pure CPython, but also in dependent libraries where we were already maintaining finicky internal forks of public repositories. This blog explores how we took approximately 200K lines of Python into a modern framework.
Table of Contents
Step 0: Get Management Support
On its face, upgrading to Python 3 is not technically difficult, and there are many tools and guides to help. But let’s be clear at the outset:
Pitching a project like this is difficult. Many developers from all levels will have valid concerns about touching old code that might be mission-critical.
We needed to align both engineers and other stakeholders on the value of upgrading as part of a long-term support picture rather than a one-time quarterly goal. The benefits to the organization as a whole will come in ways that might not feel critical on a short-term scale but will yield force multipliers down the road, such as:
- Remedying security problems by taking upgraded packages
- Taking performance gains from common packages
- Keeping up with other partners and their upgrades
- More easily training/hiring new developers without specific legacy or seniority requirements
These “soft” benefits will not improve the bottom line today or this quarter, but with buy-in from above, we can power through the social inertia of “don’t touch” and not leave any important piece of code orphaned. It is absolutely critical to have leadership help push through and shepherd this project to completion.
This is also an excellent chance to discuss what products need support going forward. Maybe long-term support (LTS) from an organization such as RedHat or Active State is appropriate for your organization, though it comes at an exponentially greater cost. Or, perhaps it’s time to adopt a development strategy of accretion over support — creating new things to replace broken things.
Take this moment to identify business-critical products and prune back products that are legacy or otherwise not providing value to your team or your customers. The easiest possible conversion for Python 3 compatibility is deprecation.
Step 1: Prepare the Ground
The most important first step is to properly catalog the code: applications, libraries, dependencies and scripts. Our goal was to completely eliminate Python 2 from our Docker environments, so we could not leave any product or build script behind.
For us, this process started by analyzing our largest project. The benefit here was that it touched nearly every internal dependency and highlighted most of our external dependencies. This allowed us to take stock of the kinds of issues we expected to see (string encoding, cython, raw file parsing) and work back through helper libraries to build an estimate of the work required. Also, by collecting a full list of external dependencies, we were able to identify both easy fixes and more difficult pain points by simply updating required versions.
This also gave us the chance to solidify our continuous integration/continuous delivery (CI/CD) pipelines. Without robust testing, build, release and roll-back tools, many of our projects would have been stuck in unstable states. Though it should be expected as a given that tests and build tools empower developers to make whatever changes are necessary, the reality for most projects is far short of that goal.
Step 2: Automate the Easy Stuff
If we can update and release safe code with a command-line tool or a requirements update, these should be done first to clear as much ground as possible at the start. Where automation falls short, we should use drop-in replacements/backports to smooth over the 2to3 transition. Again, the chief goal is to just get it running in Python 3 — things don’t have to be beautiful at the start.
First, to validate that things continue to work, we need Tox. Tox is a test environment parameterization tool to run the same test suite over multiple versions of Python. This is not only useful in our 2to3 work, it can also be invaluable for more complicated libraries that want to provide support for things like PyPy or specialty implementations like Jython. Let’s break down our sample Tox config:
[tox]
envlist = py27,py36
[testenv]
deps =
pytest
commands =
pytest tests/
--junitxml=test-results/junit-{envname}.xml
{posargs}
envlist
: This shows which versions of Python should be tested. Versions must be installed in the environment ahead of time, although it would be nice if pyenv was better integrated to install on demand.deps
: These are the test-only dependencies required to run tests. This allows us to only install these requirements in a test context and never release them as part of our production code.commands
: This is a shell-like command line to execute tests. It’s important to note the {posargs} section that allows us to pass commands to both Tox and pytest to control testing.
EX:
tox --sitepackages -p 2 -- -x
Run Tox, allow use of system site packages, use two threads to run Tox, and stop pytest after the first fail.
Now, with a place to check our work, we can move onto the next step in automation: code fixing. Here we used Futurize and 2to3 to automatically apply fixes across the codebase. These tools provide turnkey modes to fix everything possible, as well as individual recipes that can be targeted to fix things like print -> print()
. We strongly advise reading Futurize’s best practices process; we were not able to take many of their “Phase 2” fixes in our codebase.
Futurize by itself probably will not be enough, so the last tool we used to apply fixes was Six. Six provides quick access to many backports and is a lightweight pure-Python dependency that integrates nicely even with Cython code.
This combination of automatic and recipe fixes got us through the bulk of our library and dependency updates. By maintaining a light touch on existing code, we could quickly unblock the harder work of fixing our more complex code.
Step 3: The Hard Part
We could not rely on automated fixes alone. In our feature extractors, Python code is used to translate executable files into feature vectors: mathematical descriptions of those files that can be used for machine learning. This is not an exact science, and many files have unusual properties, unique encodings and complex specifications. Especially in malware files, bad actors often throw out-of-specification variations specifically to complicate parsing. Our migrated code needs to produce consistent results, even for these all-too-common corner cases. Even if we know a specification should use certain encodings, we must be very careful about decoding this metadata and handling errors in production.
While we used a light touch on functional code to avoid scope creep or unintended bugs, we focused our energy on major refactors of test code to better exercise and cover our most complex code. In many cases, we improved our integration testing significantly by adding live or like-live samples to validate complex behaviors.
To approach these problems, we leaned heavily on pytest to help isolate and exercise complex code. By using PySnooper, we were able to combine debugger output with pytest marks to isolate tricky bugs. Using pytest parametrize to refactor our existing tests into a more granular form allowed us to focus on specific subsets of issues rather than only having macro failure information. By using robust pytest fixtures, we were able to safely add parallelization to our full test pass. Also, by getting very granular with marks inside parameterized tests, we were able to highlight specific issues for specialists to focus on while continuing 2to3 work. This took our test suite from 40-50 tests to over 3,000 test cases after parameterization and new integration tests were added, vastly improving specificity in our testing suite.
In addition, we designed a last-mile validation test before shipping our feature extractors. By being able to quantify if there is any drift and loop back on those highlighted issues, we can avoid problems like model drift as well as hard-to-catch issues such as missing values in production data. Adding a robust A-B test at the end of the process enabled us to satisfy all stakeholders that code quality was maintained, even if at that moment this process was largely manual. Now that a process has been demonstrated and refined, we are planning to automate the process in the near future.
Conclusion
Upgrading from Python 2 to Python 3 was difficult from a process and planning perspective. Changing the wheels and parts on a moving car should be done carefully and with intent.
Some additional advice as you progress through your 2to3 journey:
- Combative development: New code should, whenever possible, be Python 3 only. New tools should serve as carrots to help move old systems into the future. In other teams, we used new common libraries as a beachhead and used things like f-strings and install-requires to limit back-sliding.
- Prepare for the unexpected: Our final project in this journey was extremely complex. This led to big pushes to clear bugs that then uncovered new sets of unseen bugs. We tackled this with a scrum approach of tagging bugs and triaging them several times per week.
- Push for priority: It can be difficult to get teams to take over orphaned or legacy code. As legacy items are touched, formalize support and ownership going forward.
- Limit touches: It’s tempting to think “while I’m in here…” and start refactoring code. Resist that urge and just make it work. Ticketing and cataloging those pain points is useful, but we focused our refactoring work purely on tests.
Now that we have 100% Python 3 code, our team is excited to use the full width and breadth of the tools available to us. Another benefit has been the increased visibility on complex legacy code that is still mission-critical.
This process was not easy, but we have come out on the other side with much higher code quality and a deeper understanding of our processes as we move forward into much more ambitious projects.
Projects we learned from:
Critical Tools:
Are you an expert in designing large-scale distributed systems? The CrowdStrike Engineering Team wants to hear from you. Check out the openings on our career page.
Leave a Reply