Deepfakes Turn Video Calls Into Fraud Vector

The video call was supposed to be the final check for trust. If an email looked suspicious, a quick call would confirm the request. Deepfakes have broken that assumption. Live video is now a delivery mechanism for fraud, not a defense against it.

The Arup case illustrates the problem clearly. In its 2024 financial statement, the engineering firm disclosed that criminals used “fake voice, signatures and images” to execute a social engineering attack. Arup’s networks were not compromised. No personal or project data was accessed. The attack relied entirely on impersonation through synthetic media. The video call was the step that sealed the fraud.

After-the-fact digital forensics can reconstruct what happened, preserve files, and support litigation. A forensic report does not pull back a wire transfer or undo a false admission made during a live call. The synthetic media may never become a preserved file on the victim’s device. The call happens, the instruction is given, and the harm moves immediately.

The Arup Case: Attack Vector Without Network Breach

The attack on Arup was a social engineering campaign using synthetic media to impersonate trusted individuals. The fraud succeeded because the victim saw and heard what they believed was a real person. The video call was not the solution to the trust problem. It was the problem.

This is not a theoretical risk. Deepfake generation tools are widely available, and live video manipulation is advancing rapidly. The Arup case shows that even large, sophisticated organizations can be targeted. The exposure is not limited to finance departments. Legal teams, HR, executive assistants, and anyone who handles sensitive information or authorizes payments is a potential target.

What the Attack Did Not Require

No network compromise: Arup’s systems were not breached.
No stolen credentials: The attackers did not need access to internal accounts.
No malware: The fraud relied entirely on impersonation through synthetic media.

The attack vector was the human trust that a live video call traditionally carried. That trust is now a vulnerability.

Cloud Detection vs On-Device: The Privacy-Latency Tradeoff

The natural response is to add deepfake detection to video calls. The architecture of that detection matters. If a live face and voice stream must be sent to a cloud API for analysis, the organization solves one risk by creating another. It moves biometric data – faces, voices, and derived signals – away from the endpoint and into an external system.

This is not just a privacy-law issue. GDPR Article 9 treats certain biometric data as a special category of personal data. Illinois’ Biometric Information Privacy Act (740 ILCS 14/10) includes voiceprints and scans of face geometry in its definition of biometric identifiers. Sending that data to a cloud provider means trusting that provider’s security, logging, retention, and deletion practices. If the data is saved, a breach risk always exists.

The Privacy-Execution Tradeoff

Cloud detection: Offloads compute but exposes biometric data to third-party risk.
On-premises detection: Reduces vendor exposure but still moves the stream off the endpoint.
On-device detection: Keeps the analysis where the call is already happening.

The privacy argument does not make the engineering easy. A detector that runs on a laptop or phone must live with limited processing power, battery pressure, heat, and latency. Larger models catch more subtle artifacts but are harder to run live. Smaller models may be fast enough. They must remain effective against better fakes.

Adoption Matters: Invisible Security Over Perfect Human Behavior

Security that depends on perfect human behavior fails because humans are busy, rushed, and social. Cloud deepfake detection can become another step someone has to enable, route, approve, and trust before or during a call. Those extra steps are often the first things to disappear when people are under pressure.

On-device detection can be closer to invisible security. It can be built into the call experience, run by default, and stay out of the user’s way unless something looks wrong. When risk rises, the user does not need to remember a separate process. The system can surface the one action humans can actually perform in the moment: pause and verify through another channel.

Where This Matters Most

Scams targeting elderly individuals: They are less likely to navigate extra security steps.
Video calls on dating apps or websites: Trust is already fragile; a detection warning can prevent exploitation.
Employees receiving urgent executive requests: Pressure to act quickly makes verification steps easy to skip.
Lawyers preparing remote depositions: A false admission during a call cannot be easily undone.
Witnesses in high-pressure video meetings: The cognitive load of the situation makes detection warnings critical.

What Would Confirm the On-Device Thesis

Confirmation would come from major video call platforms integrating on-device detection as a default feature, not an optional add-on. If Microsoft Teams, Zoom, or Google Meet ship native on-device deepfake detection with their next update, the market will have validated the architecture. The second signal: enterprise security teams begin specifying on-device detection in their procurement requirements for collaboration tools.

Another confirmatory signal: hardware vendors include dedicated neural processing units (NPUs) for real-time deepfake detection in consumer laptops and phones. That would reduce the engineering tradeoff and make on-device detection practical at scale.

What Would Weaken It

The thesis weakens if cloud-based detection becomes the industry standard. That scenario exposes biometric data to third-party risk and creates a new attack surface. A second weakening signal: on-device detectors prove too slow or too inaccurate against the next generation of deepfakes. The engineering challenge is real. Smaller models must be efficient without becoming useless against better fakes. If detection quality degrades significantly on consumer hardware, the privacy advantage may not matter.

A third weakening scenario: regulation mandates cloud-based logging of all video calls for anti-fraud purposes. That would force the data off the device by law, making on-device detection a secondary layer rather than the primary control.

The Decision Point for Security Teams

The risk is not going away. Deepfakes will only get better. The question is whether the security industry will build detection that works where people actually need it: inside the live call experience, without sending their faces to the cloud.

For now, the practical rule is simple. If a video call involves money, sensitive data, or legal testimony, treat the video as potentially synthetic. Verify the request through a separate channel before acting. That is the only reliable defense until on-device detection becomes standard across platforms. Security teams should start evaluating on-device detection vendors today, because the next Arup-style attack is already in progress somewhere.

Deepfakes Turn Video Calls Into Fraud Vector

The Arup Case: Attack Vector Without Network Breach

What the Attack Did Not Require

Cloud Detection vs On-Device: The Privacy-Latency Tradeoff

The Privacy-Execution Tradeoff

Adoption Matters: Invisible Security Over Perfect Human Behavior

Where This Matters Most

What Would Confirm the On-Device Thesis

What Would Weaken It

The Decision Point for Security Teams

Explore More

More from AlphaScala

Trading Q&A

Related Tools & Research

Asset Profiles