Doppel Named Official Partner of the New York Knicks
Partnership to Showcase Doppel to Knicks Widespread Audience Through In-Arena, Digital and Out-Of-Home Assets
Click rate is deceptive. Use report rate, time-to-report, repeat risk, and difficulty scoring to track real phishing simulation progress.

Click rate is the metric everybody quotes, and it is the metric most likely to mislead you.
If you run phishing simulations long enough, you start seeing it. A great campaign with a low click rate that still produces zero reports. A bad campaign with a spike in clicks that turns out to be security scanners detonating links, auto-previews fetching URLs, or a mobile client helpfully rendering a page in the background. A quarter of steady improvement that disappears the moment you change the lure style. A team that learns how to pass the test, not how to stop the attack.
Brand protection and security teams don’t have time for vanity metrics. When your brand is being impersonated, attackers aren’t grading you on clicks. They are measuring whether they can move someone from a believable message to a credential capture, a helpdesk reset, a wire, a gift card, or a customer trust incident.
So if clicks are still your headline number, here is the hard truth. Click rate is a noisy proxy for behavior and is easy to game, misread, or inflate through automation. It can be useful as a supporting signal, but it isn’t a readiness score on its own.
This article lays out a measurement framework for phishing simulation metrics that better map to risk reduction for security and brand protection teams. It covers how to reduce false positives, separate machine traffic from human behavior, and normalize results with difficulty scoring (including the NIST Phish Scale) so trends remain meaningful as lures become more realistic.
Click rate is easy to distort. Link scanners, safe-link rewrites, preview fetching, and training to the test can swing clicks without changing real-world risk. A stronger set of phishing simulation metrics centers on behaviors that reduce damage: correct report rate, time-to-report, repeat susceptibility (weighted by severity), and performance in high-risk cohorts. Pair those with false-positive controls and difficulty scoring, so you can compare campaigns over time without rewarding easy lures or punishing realistic ones.
Click rate lies because a click is no longer a single human decision. It is an event your tooling observes through layers of email security, browser protections, mobile clients, link wrappers, and automated scanners. That stack generates traffic that can appear to mimic user behavior even when no one did anything. Even when the click is real, it may not indicate meaningful susceptibility. Many employees interact to inspect. Hovering alone typically shouldn't register as a click, but previews, link rewrites, auto-loading, and security tooling can still create click-like events that appear human in reporting. If you treat every click as a failure, you will overstate risk, misrank teams, and optimize your program toward cosmetic improvements instead of safer behavior.
Here are the usual culprits.
Many organizations use link-scanning and detonation technologies that automatically fetch URLs. Some do it at delivery, others at click, others at both. Some email clients also prefetch content. If your simulation records a click when an automated system requests the link, your reported click rate can inflate even if no user ever touches it.
Mobile clients can register unintentional interactions. A fat-finger tap while scrolling. A preview pane loading a URL. A safety banner that rewrites links. A safe link wrapper that changes how your tracking works. If you don’t normalize these behaviors, you end up comparing different client behaviors (mobile vs desktop, different mail apps, different security stacks) as if they were the same human decision.
Teams learn the patterns of your tests. The same templates. The same internal sender style. The same cadence. People stop reading email, and instead look for the tells that indicate “this is probably a simulation.” That drives click rate down while real-world resilience stays flat.
If employees feel punished, shamed, or tricked for sport, you will reduce reporting. You will also teach people to hide mistakes. That is the opposite of what you need in a real incident. A program that optimizes for embarrassment will optimize against early detection.
You should measure what you want to happen in a real phishing event. That means shifting from “did someone interact with the lure” to “did the organization detect it early, route it correctly, and reduce the chance of repeat compromise.” Strong phishing simulation metrics reward actions that help defenders and protect the business. Reporting is the most obvious, but it isn’t enough on its own. You also need speed, consistency, and severity-aware measurement. Otherwise, you end up celebrating lower clicks while credential entry, helpdesk bypasses, or workflow violations quietly stay the same.
A strong metric set usually includes:
Track actions by severity, not as a single fail. Clicking isn’t the same as entering credentials, and credential entry isn’t the same as sharing an OTP or bypassing an identity check. A severity-weighted metric lets you show real progress even when you intentionally run more realistic lures that might increase low-severity interactions.
A good report rate climbs steadily as you increase realism. That’s the key. If report rate only looks good when lures are obvious, you haven’t built durable detection behavior. Report rate should also be interpreted alongside noise. A program that drives tons of reports but overwhelms triage isn’t succeeding. It is shifting the burden. The best report-rate trends are paired with stable or improving report quality. More correct reports, fewer low-signal “everything is phishing” submissions, and faster routing into the right queue so response teams can act.
To make report rate meaningful:
Time-to-report matters because speed turns a suspicious email into an incident we can contain early. Most organizations don’t fail because nobody reports anything. They fail because reporting is slow, inconsistent, or routed into a dead-end mailbox. A strong time-to-report metric captures both human recognition and workflow design. If you make reporting frictionless and you reinforce that reporting is valued, time-to-report usually drops fast, especially among people who see high volumes of external messages. That is a meaningful win because early reports give defenders a head start on blocking sender infrastructure, searching mailboxes, and warning targeted teams.
Time-to-report captures:
Practical ways to track it:
If your report flow requires three clicks, a login, and a form, your metric is grading your UX, not your humans.
Repeat susceptibility is the most honest metric you can track, and it is also the easiest to misuse. The goal is to identify where targeted coaching, guardrails, or workflow changes reduce risk.
Use it like this:
If the same cohort repeats, treat it as a design problem, not a character flaw. Are they overloaded? Are they trained on outdated examples? Are attackers targeting their workflows more intensely?
For brand protection teams, repeat susceptibility also serves as a bridge between internal readiness and external threat reality. When you detect an impersonation campaign targeting customers, you can model similar lures internally and see whether the same weaknesses exist.
High-risk cohorts aren’t just privileged users. They are the people whose everyday workflows intersect with money movement, identity verification, and exceptions. That includes finance, payroll, and AP. It includes IT helpdesk staff who can reset MFA or approve access changes. It includes customer support teams who can override safeguards or validate identity under pressure. It includes executive assistants who act as trusted proxies. These groups are targeted differently, so they shouldn’t be judged by the same generic baseline as the rest of the organization. If you don’t break out their performance, your overall averages will hide the outcomes you actually need to improve.
Typical high-risk cohorts include:
Your program should report results for these groups separately, even if you also maintain an overall dashboard. If your overall click rate drops while your helpdesk cohort stays flat, your real risk is that you won't improve.
The strongest metrics for high-risk cohorts are correct report rate, time-to-report, and critical-action rates, such as credential submission, MFA/OTP sharing, and workflow violations (for example, approving a vendor change without out-of-band verification).
Clicks are weak here because the business impact usually happens after the click. The goal is to measure whether the cohort detects fast, escalates correctly, and avoids the irreversible actions that attackers need.
If simulations never align with real outcomes, the program becomes a quarterly ritual rather than a risk-reduction engine. The point is to shorten detection time, improve escalation accuracy, and reduce successful compromise in the scenarios you see in the wild. You can tie simulations to outcomes without perfect attribution. Track whether reports create actionable triage events. Track whether the security team responds faster when reports are high-quality. Track whether specific workflow failures decline over time. For brand protection teams, the connection can be even tighter. Simulation themes can mirror real impersonation tactics your organization is seeing externally, so you’re measuring readiness against current threats rather than generic templates.
Start with two questions.
You can build a lightweight outcome model without creating a giant analytics project.
If you’re already monitoring external phishing infrastructure, you can use live campaign patterns to inform simulations, then see whether internal metrics align with those themes. That creates a tight loop between threat reality and human readiness.
False positives are where many simulation programs quietly lose credibility. If people are blamed for clicks that were actually scanners, or if a campaign fails because the tracking is distorted by link rewriting, the metrics stop being trusted. Trust matters because you need employees to report honestly, and you need leadership to fund the program based on signal, not noise. Reducing false positives is both a technical and a program design problem. You need telemetry that can distinguish machine behavior from human behavior, and you need definitions that don’t shift from campaign to campaign. When you get this right, your trendlines become defensible. That makes the rest of the framework worth implementing.
If half your org is on mobile and half on desktop, your click and report patterns will differ. Your metrics should reflect that reality instead of averaging it away.
If your employees cannot report in under 10 seconds, your time-to-report is measuring friction.
You score difficulty, so your results aren’t just a reflection of how tricky your latest template was. Difficulty scoring lets you compare performance across quarters, teams, and lure types with less self-deception.
This is where frameworks like the NIST Phish Scale help. It rates human phishing detection difficulty by scoring observable cues and premise alignment (how well the lure matches the recipient’s context), then mapping that to a difficulty rating. In plain terms, it helps you label whether a simulation was easy or hard based on properties that actually influence human judgment.
You don’t need to turn this into a dissertation.
If you want the program to feel current, difficulty scoring also encourages you to keep up with attacker realism. When you run more realistic lures, your raw click rate might rise. Your difficulty-normalized performance can still improve, and that is the story your leadership actually needs to hear.
A better metrics dashboard shows detection and response behaviors first, then susceptibility, then click data as supporting context.
Here is a practical structure that works.
If your current reporting stack cannot produce this, that is usually not a people problem. It is a telemetry and workflow problem.
You prevent gotcha culture by aligning metrics with learning and response, not punishment. That means no public leaderboards, no shaming, and no incentives that encourage hiding mistakes.
Practical guardrails:
Security awareness only works when people trust the system they are part of.
If clicks still dominate your phishing simulation program, you are probably leaving risk insights on the table. Shift measurement toward reporting, speed, severity-weighted actions, and high-risk cohort performance. Normalize by difficulty, then use the results to fix workflows and controls, not just training content.
If you want to pressure-test realistic social engineering flows and measure outcomes that map to risk reduction, Doppel Simulation is built for that kind of threat-informed loop.
Join hundreds of companies already using our platform to protect their brand and people from social engineering attacks.