Description
Doppel leverages the latest OpenAI models to automate key SOC workflows, detecting and taking down social engineering attacks faster than traditional methods. By combining AI-driven automation with human oversight, Doppel has transformed threat response while reducing operational burden.
Watch this webinar with Doppel’s CTO and Co-Founder, Rahul Madduluri, OpenAI’s Théophile Sautory, and Doppel's Software Engineer Kiran Arimilli to understand how Doppel is using OpenAI’s latest innovation around reinforcement fine tuning.
We’ll cover recent results including:
-80% reduction in overall SOC workloads
-3× higher threat-handling capacity with faster response times
-Scalable operations enabling rapid growth and consistent performance across clients
And Doppel will discuss:
-How Doppel fine-tunes AI models for low-fault-tolerance security environments
-How human analysts coach and refine AI systems to maximize reliability
-How autonomous detection and response can accelerate SOC efficiency and business impact
-Our roadmap toward a fully autonomous SOC, capable of identifying, triaging, and neutralizing threats at internet scale
Learn more here: www.doppel.com/blog/minutes-not-hours-what-our-openai-case-study-proves-about-social-engineering-defense
See OpenAI's case study here: openai.com/index/doppel/
Transcript
Hey everyone and welcome. Really happy to have you all here and we're excited to get started. So um I'm Rahul, co-founder and CTO of Doppel and we're thrilled to have you all here for our webinar scaling threat response with OpenAI Doppel's impact on the modern SOC. So before we get started just a couple of quick housekeeping notes. So today's session will be recorded and you will all get a replay when it's over. Um to just to keep things smooth, all mics will be muted. And lastly, if you have it have any questions about it, um, feel free to drop them in the Q&A section throughout the presentation and we'll we'll be more than happy to answer them all towards the end. So, with that, let's kick things off.
So, aside from me, , we have two amazing , special guests that we're fortunate to have here. So, one is Theo. Theo joins us from OpenAI. He works on the applied AI team there as an engineer. and more specifically he worked on the reinforce and fine-tuning API that we'll be talking about in more depth shortly and joining him is Kiran from the Doppel. So Kiran has worked on a lot of our AI security work for quite a while and has done everything with regards to setting up the agent with the open AI team to be um to reach a superhuman level of of performance.
And we'll go into more detail into all of that pretty shortly. But um thank you both of you for for joining us today.
So, , before we get into some of the LM work that we did and the reinforcement fine-tuning, I just want to give you all a quick overview on Doppel. , for those of you who are not super well acquainted with what Doppel is, we're the social engineering defense company. And there's a lot I can go into as far as how it works and what products we sell. But before that, I just want to talk a little bit about what makes Doppel special. Um, social engineering is obviously a a big problem, but I think the unique thing is about Doppel is the way we've been approaching solving it is quite different than the way it's been solved in the past. So, for one, we've gone from a multi-channel a single channel world to a multi- channelannel world. So, a lot of these
threats used to be primarily focused on websites and and domains. And in the post AI world, a lot of these attacks are coming in a multi-channel format. So instead of just a single website happening in isolation, that website is now connected to social media accounts and mobile apps. You can think of maybe a Facebook group that links out to a Telegram chat which then links the website, right? So be being able to have visibility across all these channels across the entire internet is a completely different ballgame and is something that Doppel has done pretty much since since day one. The second thing is graph driven intelligence. So, like I said, a lot of these attacks used
to be done in isolation on the attacker side. And so, they would spin up a website on the defense side. We'd find the website, we take it down, and you kind of have this whack-a-ole kind of situation. And one of the biggest innovations that we brought to the market is taking a graph based approach to this. So, instead of just finding these one at a time, what we're doing is mapping out the entire attacker infrastructure. So in the example I just gave, instead of just finding that website, we would find the website, the social media accounts, the Telegram groups, everything attached to it and take it all down together. So it makes it improves our ability to find things that would be difficult to find if you just did a very traditional crawl. And it also improves our ability to do as
much damage as possible to the attacker infrastructure by taking it all down as a group. And lastly, what we're all here for today is the AI part and what everyone's super excited about is and that's the agentic AI automation. So instead of finding something and having analysts triage it to another set of analysts who will then triage it to another set of analysts who will make a final decision, this whole process taking days um or even hours. What we've been able to build is an endto-end automated system that's able to go from identifying a new tac to remediating it within minutes. And more more to come pretty shortly on that, but that's been one of the biggest differentiators for
us versus the competitors. So let's talk about what we actually do here. there's several product lines here, but I'd like to kind of divide into two big categories. So on one side, we are going out and finding all these social engineering threats and taking them down for you. So you can think of this as kind of like a detection engine across the internet to find social engineering attacks targeting your brand, your employees, your execs, um, and any other stakeholders. And the second big category I would point to is the ability to simulate attacks. So detect on one side, simulate on the other. Um, we're not trying to do damage to people. It's more so we're trying to
help companies assess risk. So, if I can get code access from your company, um what is that, you know, what is that risk um worth to you as a business in terms of making sure that your security posture is is up to date? Um and so we are two sides of the coin. We're your blue team and we're your red team as well.
So, taking a little bit of a step back, why did we even start this company in the first place? obviously AI has become a big part of public discourse and more recently a really big part of the security world's discourse as well.
the framing Kevin and I came from was that we were seeing LM's radically trans improve the ability to deceive people online back in the around 2022 kind of era and since then it's only gotten better extremely quickly but we view viewed it as one of the biggest problems in the postAI world. So the ability to create just about infinite deception at scale for almost no cost on the attacker side was something that we thought would radically change the attack landscape and is the reason Doppel exists today.
And how that's actually translated into the the cyber world is that businesses are being infiltrated from social engineering more often than ever. organizations like Scattered Spider are increasingly effective and the vast majority of breaches continue to be via human error. So that's been the case for quite a while and it's only become a bigger problem in in the post AI world and from me and Kevin's perspective there wasn't enough being done on the defense side to to account for it.
So to give a little more color there what worked before isn't as effective now. I touched on several of these points, but for one, you can't be single channel, you need to be multi- channelannel. You need to take a more sophisticated graph-driven approach to solving this problem instead of playing whack-a-ole. Um, and in terms of speed, the the volume of attacks is so high now that if you just try to rely on human analysts, even if they're experts, you simply won't have the number of analysts to account for the number of attacks that are happening. Just simple toy example, just supply and demand, right?
If you have, you know, a 100 analysts and they're dealing with a thousand attacks per day, what happens when you get to 10,000 attacks per day, do you hire 10 times as many analysts? Do you just slow down? Do you grind to a halt and take things down over weeks instead of days or or hours? Um, you really have to use AI. It's not an option anymore.
It's actually now just the requirement to be able to to solve the problem at scale. As far as how the system actually works, we have a variety of attack surfaces here um as you're all familiar with. So, fake LinkedIn accounts, paid advertisements, SEO poisoning, phishing attacks. Everyone on this call is probably getting texts or or phone calls that you weren't getting in the past. And we're now entering a world where the phone calls are as realistic as real ones. That's a whole separate topic to get into. But the way our system works is that we have an AI crawler that's constantly finding all this public information out there and crawling. As soon as we find any
evidence of a potential attack, the first thing we do is we try to crawl to see what is connected to that attack. The next thing we do is we score and validate all these attacks. So is it likely to be a threat based on all the threats we've ever seen? And that comes from our machine learning models that have been fine-tuned based on thousands and thousands of decisions that have been made in the past. Um, at that point we still have human the loop for some of of the alerts and we make sure that our analysts are focusing on the ones that are least obvious. Um, for the most obvious ones, we don't need oversight. But for things that are not obvious, we want as much human oversight as possible to train the system to get better. The goal here is
for the system to get better and smarter with every new attack that enters the system. And and lastly, we use the agentic AI to identify the the potential attacks to actually decide what the remediation plan is going to be and to ultimately send the takedown request that gets the content removed from whatever platform is hosting it.
So lastly, um really appreciate all of our partners and customers. They have been an extremely valuable um asset in terms of improving our product. And we also we're also just really grateful to work with such incredible companies across pretty much every industry. So finance, tech, defense, oil and gas, retail, um travel, pretty much every industry you can think of. Um we're working with them. And the reason I mentioned that is because each industry has slightly different variety of attacks. And by having customers across all these industries, the the great thing is that we get to learn from them. And so if
there's a huge number of attacks targeting one airline, a another airline will have a very similar variety of attacks. And it's it's very important to get a little bit of exposure to all of them um in order to create the the broadest system you possibly can um to learn in real time.
And so that brings us to the the main topic for for today, which is the the OpenAI case study we did with them and explaining what work we did behind the scenes in order to achieve some of the outcomes that we've been really proud of. So we published a case study with OpenAI just recently. it was done um by the great writers at at OpenAI and they explored some of the work we've done both with GPT5 and prompt engineering um but also with the reinforcement fine-tuning API. And so that's what Theo here is is an expert in and we'll go into in more depth, but as a result of using these tools, we've been able to achieve superhuman performance that exceeds even what some of our best analysts were able to do
independently. So quickly stepping back to give you the whole historical trajectory of how LLMs have radically improved our system. This feels like it was 20 or 30 years ago, but this was only earlier this year and it's it's almost hard to believe that that's true. But we introduced LLMs into our system to replace some of the key workflows at the L1 analyst layer. So we have L1, L2, and L3 analysts. And our L1 analysts were doing a lot of work of looking through the sea of noise and trying to identify the threats that were most relevant to them. And by introducing LLMs, we're able to automate the entire L1 analyst workflow within one month. And that was extremely
powerful. It allowed us to get 30% more efficient. Um, and it more more important than anything, it proved that you could bring LLMs into into the SOC and actually get the kind of gains that many people at this time are mostly talking about or theorizing. And it was um pretty significant for us and for the cyber security industry as a whole. And OpenAI did showcase it at their dev day earlier this year.
And with that, we'd love to jump into talking about what is reinforcement fine-tuning at a at a very high level. And I'll hand it off to Theo to talk about that.
>> Yeah. Yeah. Thanks. Thanks a lot, Rahul. So, reinforcement finetuning, I guess many of you are familiar with supervised finetuning, which is a technique to change the weights of a model so that it matches more your distribution of data and that it just performs better on your task. And reinforcement fine-tuning is another form of finetuning where again we're going to change the weights of a model but here we want to change the weights of a reasoning model. So instead of giving a full solution to and full output to the model and trying to get the model to learn every single token that comes here we're trying to just get the model to be better at providing the
final answer. So provided the task and assistant prompt you will let the model reason and what you're actually grading the model on and training the model on is this final answer. So you can see on the right hand side we have kind of a schematic where you have data on the left hand side and then the greater on the right hand side and for the data this is the input and for the greater this is where you're going to be able to compare the output of the model compared with the reference answer. And so when you do this, you're actually going to train the model to take a reasoning path that increases its performance on a specific task, but you're not telling it how to change the reasoning. This will just happen during
the training process when the model will explore different traces and different trajectories to get to higher rewards. So what essentially happens here is that the model is going to start to reinforce the good reasoning patterns. And so it's going to start reasoning maybe slightly more like an expert in that field to achieve better results. And because of the supervision happening only on the very end you can actually work with tens or hundreds of samples and not necessarily thousands because the model will naturally explore so many different trajectories that you'll be comparing some good ones and some bad ones throughout and that provides a lot of
signal um per sample. So it's a very very efficient and very powerful technique to improve reasoning models and that's what we've been doing with with Doppel.
>> Kiran, do you want to talk about Doppel's use of RFT? >> Yeah, definitely. Um so on our side, right, like we're always trying to find the best tool um to solve like the problem that we're trying to address.
And so in our case here, we're really trying to improve our model's ability to predict like certain threat types within a group of data that we have um with the goal of unlocking like endto-end automatic mitigation for certain threats. And so you know early on in our journey as Rahul kind of mentioned earlier we did a lot of really good work with the models that OpenAI has made publicly available on production for people to use. So that was a lot of prompt engineering work and and we saw some massive gains. Um but when it came to like really automatically making decisions on on mitigating threats, we wanted to have a really high level of
precision um in some of those threat type classifications. So as a result of that, you know, we were one of the early kind of adopters of RFT and and users of RFT and and when we were doing that, we we kind of understood like the potential there. And so when we came back around to trying to figure out how to solve that problem of of making our precision super high, um we felt like RFT at least trying it out for that use case was like a no-brainer for us. And so the problem we we really brought forward and and what the really helped us with was kind of structuring it as a multiclassification problem. So like asking the model to choose certain
threat types um based on a given set of data. Um, and then when designing our grader, which Theo kind of mentioned is like how you score, um, the model's output, um, we just basically graded it on accuracy and allowed for for partial credit there. And that was kind of the setup that we we put together and and it's what allowed it to kind of work well for us.
And just to add a little more color on this. So, you know, we're all in in security and I think we all know how important it is to minimize false positives and false negatives. I think with many of the products you look to buy, those are probably some of the top things that come to mind. And you know, vendors will constantly say like, "Yeah, we have the least false positives because we use AI." Um or we have the false least false negatives because we use AI. And I I think it really comes down to the last.1% of improvement. And that's where you really can get superhuman performance because if you can't get that, sure, it might cost the company less money to use AI, but it's you're going to get worse outcomes instead of having analysts who
are experts in their field who have been doing things for 10 years. And so it's really important to actually be able to match the level of performance. Otherwise, it's it's just a cost-saving measure. And the reason we used RFT is because RFT is particularly good at getting that last mile of accuracy down.
Be able to minimize false positives and and negatives. And that that's the main reason that this is such a huge win for for us as a company.
So yeah, I just want to talk about best practices learned along the way as well. >> Yeah, Kira, you can can take it. I can I can start. Maybe Kiran you can add if you're interest if you have something to add on this but I think there are many best practices that we learned together.
The first one is when you work on RFT as we mentioned you can work on very small data sets. So the quality of the data set is going to be extremely critical and therefore you want to spend some time on building those 50 or like sourcing those 50 100 or thousand samples and here because we were looking at classification task one part of it was naturally to work on label imbalance and making sure that the samples were representative of some of the easy but also some of the hard task that double was facing. And so a lot of work goes into this. And then a second part was because you're fine-tuning a model, you will have to take the gradient through
your output. I mean through your all your input tokens. And this will take quite some time. And so what we did together was also working on actually the size of the context and reducing the size of the this context for not only training to be faster but also for inference later on to to be faster and when you use it in production to to get results um in a faster way as well. Then the third part was actually designing a business align grader. So Kiran mentioned that we transformed this into a multilification. Initially it was kind of a list of classes and then just boolean for all of them. But this does not work particularly well in
particular RFT which really leverages the semantic information in all the classes that it has to predict. So by transforming this boolean prediction on multiple classes and into just a multiclass classification task we got much better results. And then the last one is going to be to iterate. So of course we're working on a stocastic product where we don't have a full control of what's going to happen. So we had to do a lot of iterations on hyperparameters that can be batch size number of epoch epochs but also stacking some different runs. So you can run you can actually start an RFT run with
like high reasoning to explore more solutions and have better reasoning patterns and then you can stack another one with like lower reasoning so that you actually reduce the latency and the number of tokens consumed by the model. And by being able to stack those two, you can actually have the model go on like very deep and good reasoning patterns initially and then just make them more efficient. And so those are a lot of like best practices that we learned and and from from our side it was also good to do this with doppel and learn about about this and and design some of those best practices.
Yeah, I think for me the Theo kind basically hit on everything here, but the the biggest thing I think for me that was really helpful was kind of really starting to understand like you know the strategy behind how you should kind of organize your runs. kind of like what Theo was talking about just now where you can stack different runs the hyperparameter tuning like once we had the problem set up that like tweaking that was where we kind of got to you know the final model that we ended up using. Um overall it was it was you know definitely learned a lot from from the whole thing and it was definitely really powerful for us.
Yeah. So um this is just one graph an example of kind of how we we saw improvement. This is essentially a graph that gets generated when you run RFT um on your data set. Um you can see the green is kind of a per step based reward and then purple is kind of on your validation set your hold out set. um at every given set of steps it it or every every um grouping of steps it kind of calculates um the reward. So you can see basically the line kind of goes up.
That's that's a good thing for us. Um really what I wanted to hit on here was kind of what Rahul was saying earlier.
essentially like we really found that we got pretty good results out of just using OpenAI's production models, but it just wasn't good enough to like really make any automated decisions based on.
And I think RFT, as you can see, it it really just jumps performance up for the specific task that we wanted to kind of take care of. Um, you know, within this, you can kind of break it down even further. Um, OpenAI gives you a lot of visibility into performance um, at the kind of more granular level where you can kind of see, okay, per like classification category like how did we do? Were we really good at one? Were we not so good at another? And you know what we learned from that is there are a couple categories that we were really able to get, you know, kind of that 99th
percentile good at. And that's what was really exciting for us and what allowed us to take a lot of steps going forward after that um to to really like improve things. So then going forward um you know we obviously this worked really well for our domains product um which is kind of websites and we want to expand this across kind of all of our products product verticals um kind of make our categorization as well a lot more granular. We, you know, in our first and a few few runs after that, we kind of had much larger categories. We want to break those down. We feel like that'll
be effective. Um, we obviously like retrain and things um with a specific kind of size of training set. So, we want to experiment with that. Um, and then ultimately we just kind of want to automate this whole loop. Um, right now it's it's it is manual. there are people who, you know, cuz when we retrain the model, we want to take a look, we want to see how it's doing and things like that. Down the line, we want to have this kind of be an automated process for us. Um, and that would be really exciting.
>> Cool. So, now would love to shift to a a Q&A. So, um, want to start with a couple questions and then I'd love to open it up to the floor to answer any questions that the audience has. So, I'll start with just a quick one on on my end. So, so Theo, who should use RFT and when does it make sense?
>> Yeah. Yeah. Thanks for the question. So, I I think first I'm going to start with when does it make sense? Um if if you're a company building on the LLM app or using um LLMs to solve a task and that you've tried all the frontier models and they work somehow and they work a little bit but they don't work well enough for a particular use case then if you really want to be pushing the performance then you would start looking at RFT and you would need to be able to have a way of grading the output um in a way that is aligned to the domain um specialist or experts of your field. So that will
require a bit of of work and if if the solution is a bit ambiguous then you will have to spend more time on building that greater to make sure you provide the right signal to the model. So that's that's when you would have to do it and like who would do it. I think there's another question on if you're working in a very niche domain in particular like cyber I mean like security here where the models are exposed to the high level concepts but sometimes will lack the the level of accuracy precision recall that you need for production use case then this is really good opportunity to use RF because you're interested in those like three four five latest I mean yeah percentage of performance
for your particular task. So if you're in any of those very niche domains, I think it's a great opportunity for you.
>> Makes a lot of sense. >> Yeah. And then I have a quick one for you, Rahul. Um so obviously like the impact of all of these changes we've made, the prompt engineering kind of RF um led to this 80% reduction in SOC and it's really allowed us to like decrease our takedown and mitigation timelines especially from kind of weeks to days.
What is the like tangible impact of that automation for like security leaders?
>> Yeah, it ultimately the impact is felt in how quickly we're finding things and taking them down. So in a previous world when we had humans in the loop for every single website that came in, even if our system was set up perfectly, it would still take hours before we could mitigate an attack. And post us being able to launch these new models in RFT and getting to that 99.9% of accuracy, we've been able to get the mitigation times down to 10 to 15 minutes. So that's we discover something that just spun up and then 15 minutes later it's now mitigated and no longer accessible by by the majority of people.
And if you think about this from the attacker perspective, this is a huge disincentive to go after your business. If you're spinning something up and it's coming down as fast as it's going up, you're probably just going to shift your attention to something else, it's probably not worth your time at that point. And so this is one of the biggest improvements we've made um to our system, but also to the industry to date. Um we're pretty proud of this work and I think it's a huge reason for attackers to no longer worry about your business and think about other targets that are more worth their time.
Cool. Um would love to shift it to the audience. Um so I see that there's some questions in the chat. Just going to open it up.
Okay. Um the question is can you share more about Doppel's ability to protect against deep fakes and this is a great question one we get from customers all the time and the good news is that Doppel already does protect against deep fakes the system we're constantly scanning not just images or or text we're also scanning videos across all the major platforms so Tik Tok you know Instagram reels pretty much every platform you can think of and we're finding things that we think are impersonations and and taking them down.
So on the detection side, we've been detecting deep fakes and taking them down for for quite a while now successfully. And on the other side, on the simulation side, we're now generating our own deep fakes as well to tr test and train your your employee force. So one thing we do is that we will record the voice of many of the your executives or people on your team and we'll be able to call people on your team in their voice in order to convince them to give up code access or click a link or make a payment and that's been extremely effective and it's been very eye openening for a lot of organizations who felt they were very protected but once they experienced
these kind of attacks realized they needed a another layer of visibility. Cool. Just looking through. Um, if you have any other questions, feel free to to post.
I don't see any open questions at the moment, though. Um, oh, there we go. Right. , Whitney says, um, I had a great conversation about alignment with the healthcare market for Doppel. Where do you see your value proposition for large healthcare orgs and how would you use your domain expertise for from a clinician for that for that market? And so, yeah, like I mentioned earlier, one of the things that's really important is to have many customers within every vertical because there's so much overlap in the the types of attacks that are used. And so we have a number of healthcare companies that um that we
work with very closely and those companies have the type of attacks that are very much going after their patients. We see that very common very often. Um and so we look for that the type of attacks that are going after those patients. We try to fingerprint those kind of attacks and then when we work with a new healthcare company we use those same exact methods. Um and we found that to be very effective.
Um something else that is digital health is becoming a bigger and bigger thing and so healthcare companies are trying to reach their customers in different ways than they were in the past. Maybe it's mobile apps or um maybe they get to talk to an AI is something that's recently coming up and I think it's extra important to have this kind of protection and visibility in that world because people give a lot of very sensitive information to their healthcare providers. they trust them a lot more so than they would pretty much almost anyone else. And so the the cost of impersonation is very very high. Um just toy example, but let's say someone's interacting with a a
healthcare system and they go to a fake link and the link has some sort of chat interface and then they give a lot of their personal medical information away.
Um that's a pretty bad scenario. We want to make sure we can minimize the likelihood that that event occurs. Oh, okay. There's some additional questions here.
Um, so the question is, what LLM models are you using for Doppel and did you perform any benchmarking for different LLM models? , Kiran, do you want to take that?
Yeah, definitely. Um, so [clears throat] right now we use a mix of GPD551 and 04 mini. So when we did the RFT exercise, , we did it on 04 mini.
That's the model that OpenAI has made available um, for RFT and , it's like a fun thing about that is, you know, we did this before GPD5 came out. Um, and so we had our benchmarks and and things like that. And then when GPD5 came out, I was like, "Okay, well, did this just blow our fine-tuned model out of the water." And so we ran a bunch of evals at different levels of reasoning and all sorts of things. And it turned out across the board like no, the fine-tuned models still performed better for that particular task, which I thought was pretty cool. Um so yeah.
Yeah, this is a really important point and it it is a really relevant to Theo's answer on when you should use RFT versus not for our use case. Just to reiterate Kiran's point, the older model outperformed the newer model when we used RFT. And I think um for most tasks that will not be true, but if for the task that where that is true, those are the situations where RFT really makes a big difference.
Um cool have a question. And can you describe in more detail the red team simulations you can create? Um yeah so we're able to generate simulations across every channel. So it could be a phone call, it could be an email, could be a text message, it could be a telegram message or a signal message.
We're able to generate messages from whatever the attack medium is and reach your employees. Um and so for for voice, we might call them in the voice of one of your your team members. If it's text, we'll just send a SMS text to their phone number. But I think the very key thing here is not just the delivery mechanism. It's the sophistication of the attacks themselves. We do a lot of public research on the internet on your organization. So what is your org chart?
What tools do you use? This is all information attackers have access to right now. And if they use the latest AI tools, they're able to pull this information into their system very easily and and deploy a lot of these attacks against your organizations. So um it's it's very important to get visibility on what is possible today that wasn't even possible even a year ago.
>> Cool. Um with that going to wrap up now but thank you all for attending the the session and the webinar. Um really appreciate all of your time. Hopefully it was informative and and useful and happy to answer any questions you have afterwards as well. Feel free to reach out to me directly. But aside from that, thank you and hope you have a good rest of your day.


