Luke Marshall The Dig May 27, 2026

Admin on Apache Org Exposed for 2.5 Years via Deleted PyPI Package

TL;DR: Deleting a PyPI package does not remove it from PyPI’s servers. We recovered and scanned 600,000 package releases, discovering 190 unique secrets, including a GitHub PAT with admin access to the Apache and Astronomer organizations.

Deleted packages recovered
678,376
Unique live secrets
190
Packages with secrets
161

Case Study: Admin Access to Apache & Astronomer

We discovered a GitHub Personal Access Token (PAT) lurking in a package from April 2023. This token granted admin access to the Apache and Astronomer GitHub organizations.

Output showing a verified GitHub Personal Access Token with admin access to the Apache and Astronomer organizations
TruffleHog verification showing the OAuth scopes the token has and org access.
The base64-encoded GitHub PAT hidden as a default argument inside helpers.py
Decoding the GitHub token we get a classic PAT: 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.
Hardcoded AWS access keys found inside the same package
AWS keys were also found in the same package. These were not valid.

The PAT was base64 encoded inside a helpers.py file. This package also contained some hardcoded AWS keys (which were invalid) as well as some other OAuth tokens.

Given that these organizations support hundreds of thousands of users, a malicious actor could have potentially compromised critical repositories and the downstream supply chain. We disclosed the PAT to both organizations; they verified the vulnerability and revoked the credentials within hours.

The Persistence Gap

Hundreds of Python packages are published to PyPI every day, and some get deleted by their owners. We found that deleting a PyPI package doesn’t actually remove the artifact. PyPI’s UI and API (pip install <package>) return a clean 404 when requesting a deleted package, but the underlying object storage keeps serving the artifacts to anyone who knows the direct URL, indefinitely.

Package deletion should mean permanent removal, unfortunately in PyPI, it doesn’t. PyPI only hard-deletes artifacts from object storage when PII is reported, and only on request. For everything else, the files persist. We call this gap Ghost Persistence, and it allowed us to recover the code of over 600,000 deleted packages.

PyPI's package deletion confirmation modal
PyPI prompts for confirmation before deleting a published package. Note it says “permanently deleting all releases for this project”.

How Ghost Persistence Works

Let’s look at how Ghost Persistence works in practice.

First, we will publish a package to PyPI called deleted-pypi-package-test.

Terminal output showing the test package being published to PyPI
Publishing a package to PyPI via the CLI using twine.

Next, let’s install this package to verify it was published properly: pip3 install deleted-pypi-package-test.

pip3 successfully installing the test package
Installing the published package using pip3.

Now, let’s delete the package from PyPI via the GUI.

The PyPI web UI showing the project being deleted
Deleting the package from PyPI via the GUI. This is the second and final confirmation prompt.

We’ll remove it from our system by uninstalling it: pip3 uninstall deleted-pypi-package-test

Terminal output showing pip3 uninstalling the test package
Uninstalling the package from our system.

Finally, let’s try to re-install: pip3 install deleted-pypi-package-test==1.0

pip3 returning a 404 when trying to re-install the deleted package
Trying to re-install via pip3 gives us an error.

This is all functioning as you would expect, we’ve proven you can’t install a deleted package. Why?

When you run pip3 install <package>, the first thing pip3 does is look up that package using the PyPI API to return metadata, including links to download the package.

Excerpt of the pip architecture documentation describing how pip locates packages
This comes from PyPI’s official architecture documentation (link).

Let’s look at an example when installing the popular requests package with the verbose flag (-vv) pip3 install -vv requests

Verbose pip output showing the package metadata lookup and the files.pythonhosted.org download URL
Using the verbose flag you can see the network requests pip3 makes to PyPI when trying to install a package.

When a user deletes a package, the registry’s database removes the pointer to the website and API. The file itself, however, stays where it was: on the Python Package Index official object storage server (files.pythonhosted.org). These are the same links pip3 uses to download the package code.

Here’s the tarball for the test package that I uploaded and “deleted”:

https://files.pythonhosted.org/packages/50/fd/6ab738a1d520996f7352086bd0463d14b33613072348e9169481663401d9/deleted_pypi_package_test-1.0.tar.gz

You can still download it today, months after we “deleted” it from PyPI.

Enumerating “deleted” Packages

That URL is only one piece of the puzzle. The object storage path follows a hash-based directory structure: three path components derived from the file’s blake2b-256 digest, appended with the package name and version.

Diagram showing the blake2b-256 digest split into three path components in the object storage URL
A breakdown of the object storage URL: the server URL, digest, and package name and version.
Annotated breakdown of the files.pythonhosted.org URL showing each segment of the hash
PyPI exposes the blake2b-256 hash for live packages, but the API hides this metadata behind a 404 the moment a package is deleted.

Brute-forcing the blake2b-256 path components isn’t realistic, and you need the whole hash to access the package. Instead, we leveraged PyPI’s own public metadata to reconstruct these paths.

PyPI maintains a public BigQuery dataset containing metadata for every package ever uploaded. Crucially, this data is never removed, even if a release or project is deleted. By querying this dataset, we can reconstruct the exact object storage download URLs for packages that no longer exist.

The table is meant to be a data dump of metadata from every release on PyPI, which means that the rows in this BigQuery table are immutable and are not removed even if a release or project is deleted.” - https://docs.pypi.org/api/bigquery/

BigQuery console showing the PyPI distribution_metadata table
It’s easier to query the public PyPI distribution metadata using BigQuery, which gives us a list of all the information we need.
Close-up of the path column from the BigQuery dataset showing artifact storage paths
Reconstructing the URL and performing a curl lets us download the package tarball.

Next, we needed to identify which packages had been deleted. To do this we saved all the unique package names of entries in the BigQuery dataset and ran these through a simple python script to detect availability.

select distinct(name) from bigquery-public-data.pypi.distribution_metadata;

The python script checked the response code of requests to this endpoint https://pypi.org/pypi/{package}/json a 404 meant the package was “deleted”.

def check_package(name: str, session: requests.Session) -> dict:
    """Check if a package exists on PyPI."""
    url = PYPI_URL.format(package=name.lower().replace("_", "-").replace(".", "-"))
    for attempt in range(3):
        try:
            resp = session.head(url, allow_redirects=True, timeout=15)
            if resp.status_code == 429:
                time.sleep(2 ** attempt)
                print("rate limited")
                continue
            return {"name": name, "alive": resp.status_code == 200}
        except requests.RequestException:
            return {"name": name, "alive": False, "error": True}
    return {"name": name, "alive": False, "error": True}

We found 150,868 “deleted” packages. Some packages had multiple versions, so in total, we found 678,376 unique artifacts using this method. This methodology excludes packages where a new owner re-registered the original name.

We downloaded the package’s using the object storage URL and scanned it with TruffleHog’s filesystem command.

trufflehog --filesystem <dir> --only-verified --no-update --allow-verification-overlap --json

The five most common secret types

The 190 secrets span 47 different TruffleHog detectors. The five detectors below are the most prevalent. The long tail spans another 42 secret types.

Leaks across every version

Nearly half of the secrets we found weren’t introduced in a single bad release. They leaked in every version of the package they appeared in.

Multiple secrets in a single version

Of the 161 packages with secrets, fifteen leaked more than one unique credential in a single version: an entire wallet of keys in one tarball.

15 packages leaked more than one unique secret in a single version

The oldest package that contained a secret was published in 2018 and contained a live Telegram Bot Token. The latest package was published in January 2026 and contained valid Postgres credentials.

This is a significant gap in the deletion process, some of these secrets have permissive access to sensitive services and information. The packages are also believed to be deleted, so organizations aren’t looking here for leaks.

Deletion is Not a Security Control

This research highlights a fundamental misunderstanding of secret remediation: Deletion is not revocation. A ‘deleted’ PyPI package, a force-pushed Git commit, a made-private repo, once a secret touches a public ecosystem, consider it compromised.

Although the results surprised us, they make sense. Most developers assume that once a package has been deleted, that artifact is no longer available. Some probably even used deleting the package as a method of remediation, “out of sight, out of mind”.

If you saw this box would you assume your data was permanently deleted?

The PyPI deletion confirmation modal

The only reliable remediation is to revoke and rotate credentials as soon as they have been exposed to public ecosystems. Never treat deletion as a fix; this research shows ‘deleted’ doesn’t always mean not accessible.

PyPI’s Response

We contacted PyPI regarding this issue. According to their policy, hard deletions from the underlying object storage are only performed when PII is reported. They also noted that deleting packages has limited effectiveness, as PyPI mirrors continuously archiving package code.