
Arup deepfake case shows video call trust is broken. On-device detection avoids biometric data exposure. Adoption and engineering tradeoffs define the next decision point.
The video call was supposed to be the final check for trust. If an email looked suspicious, a quick call would confirm the request. Deepfakes have broken that assumption. Live video is now a delivery mechanism for fraud, not a defense against it.
The Arup case illustrates the problem clearly. In its 2024 financial statement, the engineering firm disclosed that criminals used “fake voice, signatures and images” to execute a social engineering attack. Arup’s networks were not compromised. No personal or project data was accessed. The attack relied entirely on impersonation through synthetic media. The video call was the step that sealed the fraud.
After-the-fact digital forensics can reconstruct what happened, preserve files, and support litigation. A forensic report does not pull back a wire transfer or undo a false admission made during a live call. The synthetic media may never become a preserved file on the victim’s device. The call happens, the instruction is given, and the harm moves immediately.
The attack on Arup was a social engineering campaign using synthetic media to impersonate trusted individuals. The fraud succeeded because the victim saw and heard what they believed was a real person. The video call was not the solution to the trust problem. It was the problem.
This is not a theoretical risk. Deepfake generation tools are widely available, and live video manipulation is advancing rapidly. The Arup case shows that even large, sophisticated organizations can be targeted. The exposure is not limited to finance departments. Legal teams, HR, executive assistants, and anyone who handles sensitive information or authorizes payments is a potential target.
The attack vector was the human trust that a live video call traditionally carried. That trust is now a vulnerability.
The natural response is to add deepfake detection to video calls. The architecture of that detection matters. If a live face and voice stream must be sent to a cloud API for analysis, the organization solves one risk by creating another. It moves biometric data – faces, voices, and derived signals – away from the endpoint and into an external system.
This is not just a privacy-law issue. GDPR Article 9 treats certain biometric data as a special category of personal data. Illinois’ Biometric Information Privacy Act (740 ILCS 14/10) includes voiceprints and scans of face geometry in its definition of biometric identifiers. Sending that data to a cloud provider means trusting that provider’s security, logging, retention, and deletion practices. If the data is saved, a breach risk always exists.
The privacy argument does not make the engineering easy. A detector that runs on a laptop or phone must live with limited processing power, battery pressure, heat, and latency. Larger models catch more subtle artifacts but are harder to run live. Smaller models may be fast enough. They must remain effective against better fakes.
Security that depends on perfect human behavior fails because humans are busy, rushed, and social. Cloud deepfake detection can become another step someone has to enable, route, approve, and trust before or during a call. Those extra steps are often the first things to disappear when people are under pressure.
On-device detection can be closer to invisible security. It can be built into the call experience, run by default, and stay out of the user’s way unless something looks wrong. When risk rises, the user does not need to remember a separate process. The system can surface the one action humans can actually perform in the moment: pause and verify through another channel.
Confirmation would come from major video call platforms integrating on-device detection as a default feature, not an optional add-on. If Microsoft Teams, Zoom, or Google Meet ship native on-device deepfake detection with their next update, the market will have validated the architecture. The second signal: enterprise security teams begin specifying on-device detection in their procurement requirements for collaboration tools.
Another confirmatory signal: hardware vendors include dedicated neural processing units (NPUs) for real-time deepfake detection in consumer laptops and phones. That would reduce the engineering tradeoff and make on-device detection practical at scale.
The thesis weakens if cloud-based detection becomes the industry standard. That scenario exposes biometric data to third-party risk and creates a new attack surface. A second weakening signal: on-device detectors prove too slow or too inaccurate against the next generation of deepfakes. The engineering challenge is real. Smaller models must be efficient without becoming useless against better fakes. If detection quality degrades significantly on consumer hardware, the privacy advantage may not matter.
A third weakening scenario: regulation mandates cloud-based logging of all video calls for anti-fraud purposes. That would force the data off the device by law, making on-device detection a secondary layer rather than the primary control.
The risk is not going away. Deepfakes will only get better. The question is whether the security industry will build detection that works where people actually need it: inside the live call experience, without sending their faces to the cloud.
For now, the practical rule is simple. If a video call involves money, sensitive data, or legal testimony, treat the video as potentially synthetic. Verify the request through a separate channel before acting. That is the only reliable defense until on-device detection becomes standard across platforms. Security teams should start evaluating on-device detection vendors today, because the next Arup-style attack is already in progress somewhere.
Prepared with AlphaScala research tooling and grounded in primary market data: live prices, fundamentals, SEC filings, hedge-fund holdings, and insider activity. Each story is checked against AlphaScala publishing rules before release. Educational coverage, not personalized advice.