Email spoofing and DNS you never configured

Your robots.txt is telling crawlers exactly where /admin lives

Open yourdomain.com/robots.txt in a browser right now. If there's a Disallow: /admin line in there, you've just done the exact thing a stranger does on their first visit, and you've both read the same answer.

You added that file to keep your admin area out of Google. You listed the paths you didn't want indexed, deployed it, the panel stopped showing up in search, and you moved on. But Google isn't the only thing that fetches robots.txt. It's public by design, it's one of the first files automated tools pull, and every Disallow: line in it is a labeled map of the parts of your site you'd rather strangers didn't find. So who else is reading it, and what does it tell them?

What robots.txt is, and what it is not

robots.txt is a politeness convention. It lives at yourdomain.com/robots.txt, and it tells well-behaved crawlers which paths to skip. That's the entire scope of its authority: a request not to index. It asks; it does not enforce.

It does nothing to stop access. A Disallow: line does not require a password, does not return a 403, does not hide anything. The path stays exactly as reachable as it was before, you've just written it down in a public file and asked nicely. Anyone can open the file in a browser. Crawlers that don't care about the convention, which includes essentially every tool an attacker runs, read the Disallow: lines as a curated list of where the interesting stuff is and go straight there.

So the file you added to reduce attention does the opposite for the audience that matters. To Google, Disallow: /admin means "skip this." To a stranger enumerating your site, it means "the admin panel is at /admin, start there."

request
GET /robots.txt HTTP/1.1
response
User-agent: *
Disallow: /admin
Disallow: /phpmyadmin
Disallow: /backup
Disallow: /staging
Disallow: /api/internal
One public GET returns a labeled index of the paths you wanted kept quiet.

What a stranger does with the list

The first move against any site is reconnaissance: figure out what's there before trying anything. robots.txt hands that over for free. A stranger fetches it before doing anything else, and now they have a list of paths you yourself flagged as sensitive enough to hide.

Each line is a lead. Disallow: /phpmyadmin says you may be running phpMyAdmin, so they hit it and check for a login page and a default-credentials or known-CVE angle. Disallow: /admin gets them to your login form to try credential stuffing or brute force. Disallow: /backup is the one that hurts: people park backup.sql or a zipped export there, and a directory that's "hidden" by robots but has directory listing on will happily show the file. Disallow: /staging points at an environment that's usually less locked down than production and often shares its database. Disallow: /api/internal maps endpoints you assumed nobody knew about.

None of this is hacking. It's reading a public file and visiting the URLs in it. The paths you wrote to keep crawlers out become a to-do list for the visitor who was the real threat all along.

The fix is not "block robots.txt"

The instinct is to lock the file down or stuff it with decoys. Skip both. Crawlers you want, like Google, need to read it, and an attacker doesn't rely on it anyway. The real problem was never the file. It's that the file revealed paths that aren't actually protected.

So fix the protection, and stop advertising the paths. The two go together.

# robots.txt: a public map of your soft spots
User-agent: *
Disallow: /admin
Disallow: /phpmyadmin
Disallow: /backup
Disallow: /staging
Don't name sensitive paths. Protect them at the path itself, and use a noindex header for what must stay reachable but unindexed.

Take the sensitive paths out of robots.txt. If something must stay crawler-reachable but unindexed, use an X-Robots-Tag: noindex response header on those pages instead, which keeps them out of search without publishing their addresses. Then make the paths actually safe, because that's the part that was missing. Put authentication in front of /admin, and an IP allow-list if you can. Turn directory listing off so a "hidden" folder doesn't enumerate its own contents. Get database dumps and backups out of the web root entirely so there's no backup.sql to find. Lock staging behind auth or take it off the public internet.

A worth-reading aside: the same instinct that leaks paths through robots.txt tends to come with the DNS gaps in this pillar, like a DMARC record stuck at p=none doing no enforcement. They're all the same shape, a setting that looks done but quietly protects nothing.

This is the rare finding you can verify yourself in ten seconds: open the file and look. SurfaceCheckr just does it at scale and follows the thread. It reads the Disallow: lines that name admin panels, backups, staging, or internal APIs, then walks each one to see whether it's actually exposed, whether /admin answers, whether a directory lists its files, whether a backup is sitting there downloadable. Every request comes from outside with no access to your server, because that's the whole position the threat is in too. We're not going to brute-force your login or test your business logic; that's a pentest, and a different job. We just read what your site already hands to anyone who asks, which, with robots.txt, turns out to be quite a lot.

Find it before someone else does.

Paste your domain. The grade and issue count are free, and you'll see in a couple of minutes exactly what's reachable from outside.