Gaze Forensics is my first academic project. This post will briefly explain the motivation, key ideas, and results of our paper.

Gaze Forensics Explained

DeepFake Detection

The goal of DeepFake detection is to find an effective model to discern the authenticity of face images.

model(image) -> DeepFake/Authentic


When generating DeepFakes, it’s hard to keep the details of the face consistent, especially in cross-frame scenarios.

We found details in the eye region abundant, e.g. highlights, geometry shape, iris color, etc.

So we decided to utilize these features for DeepFake detection.

Gaze Estimation

To extract the aforementioned features, we found gaze estimation a suitable foundation.

The goal of 3D gaze estimation is to find an effective model to find the relative direction between the human gaze and the camera.

model(image) -> GazeDirection

Key Ideas

Add an MSE loss term to constrain the CNN part (backend) of the model. The leaky features are designed for an accuracy boost.

Q: Why don’t we use the pretrained gaze estimation backend directly (why don’t we use a frozen version)?

A: It doesn’t work unless the backend of the model is not frozen.

Q: Why don’t we finetune based on gaze estimation backend?

A: It indeed worked, but you lose control of the degree of reliance on the gaze information, it will concentrate on other features over training epochs rather than focus on the eye region. Also, the experimental result of this configuration is not as good as the MSE-constrained one.

Q: Why does such an MSE constraint work better?

A: Refer to Subsection 3.1 in our paper, and see the explanation for the following formulation:

Model Structure

We used the attention mechanism to make an effective comparison between different frames, and it indeed works better than RNNs.

Q: Why don’t we use the transformer or more layers of multi-head attention?

A: The complexity and capacity of the datasets we used are not big enough for a more complex model.


We achieved SOTA accuracy in the WildDeepfake and FaceForensics++ datasets.

The effectiveness of our method can also be proved by the occlusion sensitivity test shown in the following figure.

However, our model failed on cross-dataset evaluations. The test and explanation for this term are still mysterious.