Don't Miss Our Upcoming Webinar: Defending Against Multi-Channel Threats with Tripadvisor. Register now!Doppel Image
  • Platform
    • Platform Overview
      • Doppel Vision
        Doppel VisionAI-powered social engineering defense platform
      • Doppel Integrations
        IntegrationsSee our integrations partners
    • Products
      • Brand Protection
        Brand ProtectionDismantle threats and protect your brand's reputation
      • Executive Protection
        Executive ProtectionPrevent impersonation, phishing, and identity-based attacks
      • Simluation
        SimulationStrengthen your business again social engineering attacks
      • Brand AbuseBox
        Brand AbuseBoxConnect customer-detected scams; take down attacks
  • Solutions
      • Financial Services
      • Crypto
      • Government
      • Healthcare
      • Retail
      • Technology
  • Customers
  • Resources
  • Company
      • About us
      • Leadership
      • Doppelpedia
      • Events
      • Careers
      • Newsroom
  • Blog
Customers
Resources
Blog
Book a Demo
  • Platform
    • Platform Overview
      • Doppel Vision
        Doppel VisionAI-powered social engineering defense platform
      • Doppel Integrations
        IntegrationsSee our integrations partners
    • Products
      • Brand Protection
        Brand ProtectionDismantle threats and protect your brand's reputation
      • Executive Protection
        Executive ProtectionPrevent impersonation, phishing, and identity-based attacks
      • Simluation
        SimulationStrengthen your business again social engineering attacks
      • Brand AbuseBox
        Brand AbuseBoxConnect customer-detected scams; take down attacks
  • Solutions
      • Financial Services
      • Crypto
      • Government
      • Healthcare
      • Retail
      • Technology
  • Customers
  • Resources
  • Company
      • About us
      • Leadership
      • Doppelpedia
      • Events
      • Careers
      • Newsroom
  • Blog
Customers
Resources
Blog
Book a Demo
HomeHome
BlogBlog
Battling Scams Scale Inside Doppels High Throughput Ml PlatformBattling Scams Scale Inside Doppels High Throughput Ml Platform
Company

Battling Scams at Scale: Inside Doppel’s High-Throughput ML Platform

How our engineering team took our ML platform from zero to inference at internet scale in four months.

Team Doppel

Doppel Engineering Team

Engineering Team

•
August 28, 2025
Battling Scams at Scale: Inside Doppel’s High-Throughput ML Platform

Share this article

Detect. Takedown. Monitor. Repeat.

The internet is both a powerful marketplace and a playground for threat actors. At Doppel, we see firsthand how a single undetected phishing site can cost a brand millions of dollars in fraud losses and erode customer trust overnight. With billions of URLs live at any moment—and new malicious sites spun up in seconds—our window to catch threats is razor‑thin.

Our mission at Doppel is to protect organizations from social engineering—fake login pages, scam ads, spoofed social accounts, and malicious mobile apps that trick real people into handing over their data. As a full‑stack digital risk protection solution, we handle everything from initial detections to full takedowns. A “takedown” involves working with hosting providers, domain registrars, and platform operators to remove or disable malicious sites, ads, or accounts the moment we identify them.

Image removed.

Figure 1. Overview of Doppel’s Digital Risk Protection Platform: live Internet signals flow into the system, where we automatically Detect malicious content, initiate Takedowns, and maintain ContinuousMonitoring in a closed‑loop to catch any re‑emergence.

In our early days, we leaned on third‑party machine learning (ML) vendors to help us keep pace. But these tools left us:

  • Vulnerable to rapid attacker innovation, with retraining cycles measured in days.
  • In the dark about model decisions, because of black‑box predictions.
  • Burdened by tuning constraints, driving up false positives and missing real scams.

Faced with escalating volume and stakes, we knew end‑to‑end control was the only way forward.

Why build an ML Platform?

To overcome these limitations, we needed more than individual point solutions—we needed a unified ML platform. As our business scaled, we found that feature engineering, model training, and serving are all highly repetitive workflows: pulling data, transforming features, spinning up training jobs, packaging models, and deploying endpoints. Manually orchestrating each step not only slowed development and wasted engineering cycles, but also led to divergent design patterns across teams and inconsistent results.

By abstracting away this complexity and standardizing end‑to‑end patterns, we set out to build a platform that lets any engineer at Doppel:

  • Write and ship new features rapidly: Define and deploy feature transformations without rebuilding data pipelines.
  • Use a unified training & serving interface: Track experiments, version datasets, and run validation checks through a single workflow.
  • Push models live with confidence: Deploy models with built‑in performance monitoring and auto‑scaling for real‑time throughput.

With our ML platform now fully in place, we’ve transformed how quickly and effectively we defend against threats. In this post, we’ll walk through how we bootstrapped adversarial model development, built a real‑time serving stack at scale, and distilled key lessons from running ML in production.

TLDR; Impact

In just 4 months, we’ve achieved the following outcomes:

  • Expanded model portfolio: We’ve replaced ~3 opaque vendor classifiers with six in‑house models, each backed by versioned feature sets.
  • Unified feature platform: We’ve moved from brittle DIY abstractions—and the headaches of training‑serving skew—to a single source of truth that standardizes training and serving pipelines and supports hundreds of batch and real‑time features powering our detection systems.
  • Accelerated time from ideation to production: We’ve cut the train‑and‑deploy cycle from multiple days to mere hours.

1. Bootstrapping Model Development in an Adversarial Space

When you’re detecting malicious content at scale, the first challenge isn’t infrastructure, it’s signal. You need labeled data to train models, but in adversarial environments like ours, that data is sparse, messy, and constantly evolving. The ground truth is often unclear. Threat actors don’t announce themselves, and most URLs on the internet are benign noise.

Weak signals, strong patterns

Our earliest models were trained on a combination of:

  • Confirmed takedowns from real impersonation cases
  • Manual reviews by our security operations team
  • Heuristics encoded from subject-matter expertise — things like measuring keyword similarity to official brand names, computing domain string entropy, and flagging inconsistencies between page content and the claimed brand.

These signals were noisy, but directional. We built labeling pipelines that could aggregate weak supervision at scale, and prioritized models that could help rank content for human review, not make binary decisions.

Biases in the data, and how we handled them

Our initial labeled sets risked overfitting to high‑confidence edge cases. To combat this, we built a suite of data build tool (dbt) models in our data warehouse that:

  • Codify label sources: We formalize derived labels (e.g., heuristic flags) alongside human review annotations, and apply consistent transformation logic so every signal follows the same lineage.
  • Unify external and internal datasets: We ingest third‑party threat feeds and internal security‑ops tags, then merge and reconcile these sources into a single, versioned training table.

Whenever our security team flags a benign domain as suspicious, we capture that correction in dbt and send it through the exact same pipelines as our other labels. This way, every false positive becomes part of our training data, keeping our models grounded in real‑world feedback.

Optimizing for learning velocity

We didn’t optimize for model performance out of the gate. We optimized for learning velocity — how quickly we could train, evaluate, ship, and get feedback. That meant:

  • Detailed, versioned datasets with rich label metadata: we tag every training example with standardized metadata,like label source, timestamp, and labeller, so it’s easy to trace exactly how each label was generated and updated over time.
  • Lightweight model experimentation with reproducible notebooks and metrics
  • Tight integration with our takedown system to close the loop between predictions and outcomes

This early investment let us move fast without losing track of what was working, and gave us a foundation we could confidently scale on top of.

2. Serving Real-Time ML at Scale

Model training is one challenge—deploying inference in production at scale is another entirely. Our serving infrastructure must process a continuous stream of 100 million+ URL checks per day, maintaining sub‑100 ms P99 latency under bursty traffic. In an adversarial context, a single false negative lets a phishing site slip through undetected, while thousands of false positives per second would drown our SOC team in noise and degrade our signal‑to‑noise ratio.

We needed a serving stack that was:

  • Low-latency: supporting both bulk‑style and point‑based feature generation and inference in real time
  • High-throughput: tens of millions of predictions daily, with peak hour spikes
  • Auditable and debuggable: we had to be able to explain predictions to ourselves and to customers

Chalk as our feature store

At the core of our real-time serving is Chalk — a real-time feature platform we use to compute and serve features dynamically based on the latest web content. The features we define are “resolved” by functions or Python-native resolvers that compute everything from domain string features (e.g., entropy, brand overlap, token patterns) to page-level metadata extracted from crawled content.

This pattern enables us to write features which are:

  • Versioned and testable
  • Composable into higher-level features
  • Servable at request time or in batch depending on use case

This lets us reuse the same logic in both training and production, reducing drift and improving reproducibility.

Productionizing model inference

Model inference is orchestrated within Chalk: Each model consumes raw feature primitives—domain strings, extracted HTML, metadata, and upstream feature outputs—and emits its predictions as first‑class features. This lets us treat model scores just like any other resolver, composing them seamlessly into downstream workflows.

As an illustrative example, imagine we had a general phishing detection model which consumes four intuitive signals to gauge phishing risk:

  • num_login_forms: The count of login forms on the page—more forms can indicate an attempt to harvest credentials.
  • has_suspicious_language: A boolean flag for whether the HTML content language is in a suspicious language list
  • external_reputation_score: A third‑party trust metric, where lower scores signal riskier domains.


We could first spell out those features in code in a feature class like so:

We could then write a resolver which passes these primitives to our phishing detection service, which returns a single floating‑point phishing_probability between 0.0 (safe) and 1.0 (definitely phishing). That value is exposed as Url.phishing_probability, making it seamlessly available for any downstream resolver or workflow.

Behind the scenes, we package each model and its dependencies into custom Docker containers and serve them via lightweight Cloud Run services. This serverless approach keeps inference modular and testable, and allows us to scale, version, and monitor each model independently, while keeping feature logic and model orchestration centralized in Chalk.

To meet real‑time latency targets, we optimize along three key dimensions:

  • Raw input caching: Store crawled HTML in the online feature store with a TTL to eliminate redundant fetches and parsing.
  • Entity‑level feature cache: Precompute and cache feature vectors for features related to customers to slash per‑request computation.
  • Prediction caching: Cache model outputs for frequently seen URLs to bypass repeat inference.

Built‑in observability spans schema enforcement, performance telemetry, and end‑to‑end traceability:

  • Input/output contracts: Enforce schemas with Pydantic to catch data mismatches before inference.
  • Metrics tracking: Surface latency, throughput, and error rates via Cloud Run dashboards and alerts.
  • Audit logs: Persist full feature snapshots and model version metadata per prediction for compliance and post‑hoc analysis.
  • Inference lineage: Leverage Chalk’s query planning DAG to reconstruct the exact computation graph for every score—vital for debugging, validation, and customer audits.
Image removed.

Figure 2: High‑level real‑time serving architecture where multiple detection workloads (A, B, C) funnel through Chalk’s feature platform for on‑the‑fly feature generation, and individual models (A, B, C) are deployed as containerized Cloud Run services to provide low‑latency inference back into the feature store.

3. Lessons from Running ML in Production

Owning our ML stack end‑to‑end revealed critical engineering insights that map directly back to the challenges we tackled:

  • Centralized label management with dbt: Relying on ad hoc CSVs and manual tagging led to brittle, non‑reproducible training sets. By codifying label logic in dbt—merging external threat feeds, security‑ops annotations, and false‑positive flags into a single versioned table—we keep our training data in lockstep with evolving attacker behaviors.
  • Treat features as first‑class, versioned artifacts: Divergent feature definitions across teams spawned training‑serving skew and hard‑to‑debug errors. Centralizing all feature resolvers in one platform, versioning them, and sharing the same code paths for batch and real‑time compute eliminated drift and made model behavior predictable.
  • Layered caching + containerized inference for sub‑100 ms P99: Real‑time detection at 100M+ URL checks/day demanded more than raw compute power. Our layers of caching combined with custom Docker containers on Cloud Run smashed redundant work, drove down tail latency, and kept costs under control.
  • Shift‑left validation & end‑to‑end observability: Late‑stage surprises—schema mismatches, silent feature drift, or performance regressions—are unacceptable in adversarial settings. We baked Pydantic schema checks, dataset‑drift alerts, and “shadow” inference tests into our CI/CD pipelines, and log every inference with full feature snapshots and model metadata to BigQuery for instant traceability.

These practices have transformed Doppel’s ML platform from a collection of one‑off scripts into a robust, scalable ecosystem—empowering engineers to safely ship and operate new models at internet scale.

We’re Hiring

Interested in pushing the boundaries of AI applications in cybersecurity? We’re hiring—let’s build the future of Social Engineering Defense.

Related Articles

Retire the Phishing Test: Doppel Simulation is Here

Retire the Phishing Test: Doppel Simulation is Here

Founder’s Note: Why We Built Doppel Simulation

Founder’s Note: Why We Built Doppel Simulation

AI Agents in Cybersecurity: Cutting SOC Workloads by 30% in 30 Days

AI Agents in Cybersecurity: Cutting SOC Workloads by 30% in 30 Days

Learn how Doppel can protect your business

Join hundreds of companies already using our platform to protect their brand and people from social engineering attacks.

PlatformDoppel VisionBrand ProtectionExecutive ProtectionSimulationBrand AbuseBoxIntegrations
SolutionsFinancial ServicesGovernmentTechnologyCrypoHealthcareRetail
CompanyAbout usCareersLeadershipCustomersDoppelpediaNewsroom
LearnResourcesBlogEvents
Theme
© 2025 Doppel, All rights reserved
Terms of ServicePrivacy PolicySecurityStatus