TL;DR: Deleting a PyPI package does not remove it from PyPI’s servers. We recovered and scanned 600,000 package releases, discovering 190 unique secrets, including a GitHub PAT with admin access to the Apache and Astronomer organizations.
We discovered a GitHub Personal Access Token (PAT) lurking in a package from April 2023. This token granted admin access to the Apache and Astronomer GitHub organizations.
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.
The PAT was base64 encoded inside a helpers.py file. This package also
contained some hardcoded AWS keys (which were invalid) as well as some other OAuth
tokens.
Given that these organizations support hundreds of thousands of users, a malicious actor could have potentially compromised critical repositories and the downstream supply chain. We disclosed the PAT to both organizations; they verified the vulnerability and revoked the credentials within hours.
Hundreds of Python packages are published to PyPI every day, and some get deleted by
their owners. We found that deleting a PyPI package doesn’t actually remove the
artifact. PyPI’s UI and API (pip install <package>) return a
clean 404 when requesting a deleted package, but the underlying object storage keeps
serving the artifacts to anyone who knows the direct URL, indefinitely.
Package deletion should mean permanent removal, unfortunately in PyPI, it doesn’t. PyPI only hard-deletes artifacts from object storage when PII is reported, and only on request. For everything else, the files persist. We call this gap Ghost Persistence, and it allowed us to recover the code of over 600,000 deleted packages.
Let’s look at how Ghost Persistence works in practice.
First, we will publish a package to PyPI called deleted-pypi-package-test.
Next, let’s install this package to verify it was published properly:
pip3 install deleted-pypi-package-test.
Now, let’s delete the package from PyPI via the GUI.
We’ll remove it from our system by uninstalling it:
pip3 uninstall deleted-pypi-package-test
Finally, let’s try to re-install:
pip3 install deleted-pypi-package-test==1.0
This is all functioning as you would expect, we’ve proven you can’t install a deleted package. Why?
When you run pip3 install <package>, the first thing pip3 does is look
up that package using the PyPI API to return metadata, including links to download the
package.
Let’s look at an example when installing the popular requests package with the
verbose flag (-vv) pip3 install -vv requests
When a user deletes a package, the registry’s database removes the pointer to the website and API. The file itself, however, stays where it was: on the Python Package Index official object storage server (files.pythonhosted.org). These are the same links pip3 uses to download the package code.
Here’s the tarball for the test package that I uploaded and “deleted”:
https://files.pythonhosted.org/packages/50/fd/6ab738a1d520996f7352086bd0463d14b33613072348e9169481663401d9/deleted_pypi_package_test-1.0.tar.gz
You can still download it today, months after we “deleted” it from PyPI.
That URL is only one piece of the puzzle. The object storage path follows a hash-based directory structure: three path components derived from the file’s blake2b-256 digest, appended with the package name and version.
Brute-forcing the blake2b-256 path components isn’t realistic, and
you need the whole hash to access the package. Instead, we leveraged PyPI’s own
public metadata to reconstruct these paths.
PyPI maintains a public BigQuery dataset containing metadata for every package ever uploaded. Crucially, this data is never removed, even if a release or project is deleted. By querying this dataset, we can reconstruct the exact object storage download URLs for packages that no longer exist.
“The table is meant to be a data dump of metadata from every release on PyPI, which means that the rows in this BigQuery table are immutable and are not removed even if a release or project is deleted.” - https://docs.pypi.org/api/bigquery/
Next, we needed to identify which packages had been deleted. To do this we saved all the unique package names of entries in the BigQuery dataset and ran these through a simple python script to detect availability.
select distinct(name) from bigquery-public-data.pypi.distribution_metadata;
The python script checked the response code of requests to this endpoint https://pypi.org/pypi/{package}/json a 404 meant the package was “deleted”.
def check_package(name: str, session: requests.Session) -> dict:
"""Check if a package exists on PyPI."""
url = PYPI_URL.format(package=name.lower().replace("_", "-").replace(".", "-"))
for attempt in range(3):
try:
resp = session.head(url, allow_redirects=True, timeout=15)
if resp.status_code == 429:
time.sleep(2 ** attempt)
print("rate limited")
continue
return {"name": name, "alive": resp.status_code == 200}
except requests.RequestException:
return {"name": name, "alive": False, "error": True}
return {"name": name, "alive": False, "error": True}
We found 150,868 “deleted” packages. Some packages had multiple versions, so in total, we found 678,376 unique artifacts using this method. This methodology excludes packages where a new owner re-registered the original name.
We downloaded the package’s using the object storage URL and scanned it with TruffleHog’s filesystem command.
trufflehog --filesystem <dir> --only-verified --no-update --allow-verification-overlap --json
The 190 secrets span 47 different TruffleHog detectors. The five detectors below are the most prevalent. The long tail spans another 42 secret types.
Nearly half of the secrets we found weren’t introduced in a single bad release. They leaked in every version of the package they appeared in.
Of the 161 packages with secrets, fifteen leaked more than one unique credential in a single version: an entire wallet of keys in one tarball.
The oldest package that contained a secret was published in 2018 and contained a live Telegram Bot Token. The latest package was published in January 2026 and contained valid Postgres credentials.
This is a significant gap in the deletion process, some of these secrets have permissive access to sensitive services and information. The packages are also believed to be deleted, so organizations aren’t looking here for leaks.
This research highlights a fundamental misunderstanding of secret remediation: Deletion is not revocation. A ‘deleted’ PyPI package, a force-pushed Git commit, a made-private repo, once a secret touches a public ecosystem, consider it compromised.
Although the results surprised us, they make sense. Most developers assume that once a package has been deleted, that artifact is no longer available. Some probably even used deleting the package as a method of remediation, “out of sight, out of mind”.
If you saw this box would you assume your data was permanently deleted?
The only reliable remediation is to revoke and rotate credentials as soon as they have been exposed to public ecosystems. Never treat deletion as a fix; this research shows ‘deleted’ doesn’t always mean not accessible.
We contacted PyPI regarding this issue. According to their policy, hard deletions from the underlying object storage are only performed when PII is reported. They also noted that deleting packages has limited effectiveness, as PyPI mirrors continuously archiving package code.