Unpatched 15-year old Python bug allows code execution in 350k projects

A vulnerability in the Python programming language that has been overlooked for 15 years is now back in the spotlight as it likely affects more than 350,000 open-source repositories and can lead to code execution.

Disclosed in 2007 and tagged as CVE-2007-4559, the security issue never received a patch, the only mitigation provided being a documentation update warning developers about the risk.

Unpatched since 2007

The vulnerability is in the Python tarfile package, in code that uses un-sanitized tarfile.extract() function or the built-in defaults of tarfile.extractall(). It is a path traversal bug that enables an attacker to overwrite arbitrary files.

Technical details for CVE-2007-4559 have been available since the initial report in August 2007. While there are no reports about the bug being leveraged in attacks, it represents a risk in the software supply chain.

Earlier this year, while investigating another security issue, CVE-2007-4559 was rediscovered by a researcher at Trellix, a new business providing extended detection and response (XDR) solutions that resulted from the merger of McAfee Enterprise and FireEye.

"Failure to write any safety code to sanitize the members files before calling for tarfile.extract() tarfile.extractall() results in a directory traversal vulnerability, enabling a bad actor access to the file system" - Charles McFarland, vulnerability researcher in the Trellix Advanced Threat Research team

The flaw stems from the fact that code in the extract function in Python's tarfile module explicitly trusts the information in the TarInfo object "and joins the path that is passed to the extract function and the name in the TarInfo object"

CVE-2007-4559 - path joining with filename
CVE-2007-4559 - path joining with filename
source: Trellix

Less than a week after the disclosure, a message on the Python bug tracker announced that the issue was closed, the fix being updating the documentation with a warning "that it might be dangerous to extract archives from untrusted sources."

Estimated 350,000 projects impacted 

Analyzing the impact, Trellix researchers found that the vulnerability was present in thousands of software projects, both open and closed source.

The researchers scraped a set of 257 repositories more likely to include the vulnerable code and manually checked 175 of them to see if they were affected. This revealed that 61% of them were vulnerable.

Running an automated check on the rest of the repositories increased the number of impacted projects to 65%, indicating a widespread issue.

However, the small sample set served only as a baseline for coming up with an estimation of all impacted repositories available on GitHub.

"With GitHub’s help we were able to get a much larger dataset to include 588,840 unique repositories that include ‘import tarfile’ in its python code" - Charles McFarland

Using the 61% vulnerability rate verified manually, Trellix estimates that there are more than 350,000 vulnerable repositories, many of them used by machine learning tools (e.g. GitHub Copilot) that help developers complete a project faster.

Such automated tools rely on code from hundreds of thousands of repositories to provide "auto-complete" options. If they provide insecure code, the issue propagates to other projects without the developer knowing it.

​​​​​​​GitHub Copilot suggesting vulnerable tarfile extraction code
GitHub Copilot suggesting vulnerable tarfile extraction code
source: Trellix

Looking further into the problem, Trellix found that open-source code vulnerable to CVE-2007-4559 "spans a vast number of industries."

As expected, the most impacted is the development sector, followed by web and machine learning technology.

Code vulnerable to CVE-2007-4559 present across industries
Code vulnerable to CVE-2007-4559 present across industries
source: Trellix

Exploiting CVE-2007-4559

In a technical blog post today, Trellix vulnerability researcher Kasimir Schulz, who rediscovered the bug, described the simple steps to exploit CVE-2007-4559 in the Windows version of Spyder IDE, an open-source cross-platform integrated development environment for scientific programming.

The researchers showed that the vulnerability can be leveraged on Linux, too. They managed to escalate the file write and achieve code execution in a test on Polemarch IT infrastructure management service.

Apart from drawing attention to the vulnerability and the risk it poses, Trellix also created patches for a little over 11,000 projects. The fixes will be available in a forked of the impacted repository. Later, they will be added to the main project via pull requests.

Because of the large number of affected repositories, the researchers expect more than 70,000 projects to receive a fix in the next few weeks. Hitting the 100% mark is a tough challenge, though, as merge requests also need to be accepted by the maintainers.

BleepingComputer has reached out to Python Software Foundation for a comment about CVE-2007-4559 but has not received an answer at publishing time.

Related Articles:

Hackers exploit Aiohttp bug to find vulnerable networks

ScreenConnect critical bug now under attack as exploit code emerges

Hackers poison source code from largest Discord bot platform

CISA urges software devs to weed out SQL injection vulnerabilities

GitHub’s new AI-powered tool auto-fixes vulnerabilities in your code