Multi-Attentional Deepfake Detection Multi-attentional Deepfake Detection Hanqing Zhao1 Wenbo Zhou1,† Dongdong Chen2 Tianyi Wei1 Weiming Zhang1,† Nenghai Yu1 University of Science and Technology of...

1 answer below »
Reproduce results of this deepfake detection code. I would like an instructional on how to run this code and reproduce the results from it. I would like to know what program to use to run it and how to run it. I am willing to speak to the tutor on the phone or through a Zoom call as well.




Multi-Attentional Deepfake Detection Multi-attentional Deepfake Detection Hanqing Zhao1 Wenbo Zhou1,† Dongdong Chen2 Tianyi Wei1 Weiming Zhang1,† Nenghai Yu1 University of Science and Technology of China1 Microsoft Cloud AI2 {zhq2015@mail, welbeckz@, bestwty@mail, zhangwm@, ynh@}.ustc.edu.cn [email protected] Abstract Face forgery by deepfake is widely spread over the in- ternet and has raised severe societal concerns. Recently, how to detect such forgery contents has become a hot re- search topic and many deepfake detection methods have been proposed. Most of them model deepfake detection as a vanilla binary classification problem, i.e, first use a backbone network to extract a global feature and then feed it into a binary classifier (real/fake). But since the differ- ence between the real and fake images in this task is often subtle and local, we argue this vanilla solution is not op- timal. In this paper, we instead formulate deepfake detec- tion as a fine-grained classification problem and propose a new multi-attentional deepfake detection network. Specifi- cally, it consists of three key components: 1) multiple spa- tial attention heads to make the network attend to differ- ent local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggre- gate the low-level textural feature and high-level semantic features guided by the attention maps. Moreover, to address the learning difficulty of this network, we further introduce a new regional independence loss and an attention guided data augmentation strategy. Through extensive experiments on different datasets, we demonstrate the superiority of our method over the vanilla binary classifier counterparts, and achieve state-of-the-art performance. The models will be released recently at https://github.com/yoctta/ multiple-attention. 1. Introduction Benefiting from the great progress in generative mod- els, deepfake techniques have achieved significant success recently and various face forgery methods [19, 41, 21, 31, 32, 44, 28, 38] have been proposed. As such techniques can generate high-quality fake videos that are even indistin- guishable for human eyes, they can easily be abused by ma- † Corresponding Author. Figure 1: Example of the multiple attentional regions ob- tained by our method. The attention regions are separated and respond to different discriminative features. licious users to cause severe societal problems or political threats. To mitigate such risks,many deepfake detection ap- proaches [27, 34, 22, 33, 26, 45] have been proposed. Most of them model deepfake detection as a vanilla binary clas- sification problem (real/fake). Basically, they often first use a backbone network to extract global features of the suspect image and then feed them into a binary classifier to discrim- inate the real and fake ones. However, as the counterfeits become more and more re- alistic, the differences between real and fake ones will be- come more subtle and local, thus making such global fea- ture based vanilla solutions work not well. But actually, such subtle and local property shares a similar spirit as the fine-grained classification problem. For example, in the fine-grained bird classification task, some species look very similar and only differentiate from each other by some small and local differences, such as the shape and color of the beak. Based on this observation, we propose to model deep- fake detection as a special fine-grained classification prob- lem with two categories. Inspired by the success of parts based model in the 2185 fine-grained classification field, this paper presents a novel multi-attention network for deepfake detection. First, in order to make the network attend to different potential ar- tifacts regions, we design multi-attention heads to predict multiple spatial attention maps by using the deep semantic features. Second, to prevent the subtle difference from dis- appearing in the deep layers, we enhance the textural feature obtained from shallow layers and then aggregate both low- level texture features and high-level semantic features as the representation for each local part. Finally, the feature rep- resentations of each local part will be independently pooled by a bilinear attention pooling layer and fused as the repre- sentation for the whole image. Figure 1 gives an example of the discriminative features obtained by our method. However, training such a multi-attentional network is not a trivial problem. This is mainly because that, unlike single- attentional network [6] which can use the video-level labels as explicit guidance and be trained in a supervised way, the multi-attentional structure can only be trained in an unsu- pervised or weakly-supervised way. By using a common learning strategy, we find the multi-attention heads will de- grade to a single-attention counterpart, i.e., only one atten- tion region produces a strong response while all remaining attention regions are suppressed and can not capture use- ful information. To address this problem, we further pro- pose a new attention guided data augmentation mechanism. In detail, during training, we will deliberately blur some high-response attention region (soft attention dropping) and force the network to learn from other attention regions. Simultaneously, we introduce a new regional independence loss to encourage different attention heads to attend to dif- ferent local parts. To demonstrate the effectiveness of our multi-attentional network, we conduct extensive experiments on different existing datasets, including FaceForensics++[34], Celeb- DF[25] and DFDC[9]. It shows that our method is superior to the vanilla binary classifier baselines and achieves state- of-the-art performance. In summary, the contributions of this paper are threefold as below: • We reformulate the deepfake detection as a fine- grained classification task, which brings a novel per- spective for this field. • We propose a new multi-attentional network architec- ture to capture local discriminative features from mul- tiple face attentive regions. To train this network, we also introduce a regional independence loss and design an attention guided data augmentation mechanism to assist the network training in an adversarial learning way. • Extensive experiments demonstrate that our method outperforms the vanilla binary classification baselines and achieves state-of-the-art detection performance. 2. Related Works Face forgery detection is a classical problem in computer vision and graphics. Recently, the rapid progress in deep generative models makes the face forgery technique “deep” and can generate realistic results, which presents a new problem of deepfake detection and brings significant chal- lenges. Most deepfake detection methods solve the problem as a vanilla binary classification, however, the subtle and lo- cal modifications of forgeried faces make it more similar to fine-grained visual classification problem. 2.1. Deepfake Detection Since the face forgery causes great threat to societal se- curity, it is of paramount importance to develop effective countermeasures against it. Many works [46, 23, 4, 53, 34, 22, 33, 26, 45, 43] have been proposed. Early works [46, 23] detect the forgery through visual biological artifacts, e.g., unnatural eye blinking or inconsistent head pose. As the learning based methods become mainstream, some works [53, 34] have proposed frameworks which ex- tract features from spatial domain and have achieved ex- cellent performances on specific datasets. Recently, more data domains have been considered by emerging methods. [45] detects tampered faces through Spatial, Steganalysis and Temporal features. It adds a stream of simplified Xcep- tion with a constrained convolution layer and an LSTM. [26] uses a two-branch representation extractor to combine information from the color domain and the frequency do- main using a multi-scale Laplacian of Gaussian (LoG) op- erator. [33] uses frequency-aware decomposition and local frequency statistic to expose deepfake artifacts in frequency domain and achieves state-of-the-art performance. Most existing methods treat the deepfake detection as a universal binary classification problem. They focus on how to construct sophisticated feature extractors and then a di- chotomy to distinguish the real and fake faces. However, the photo-realistic counterfeits bring significant challenge to this binary classification framework. In this paper, we redefine the deepfake detection problem as a fine-grained classification problem according to their similarity. 2.2. Fine-grained Classification Fine-grained classification [50, 49, 13, 37, 12, 52, 47, 17, 10] is a challenging research task in computer vision, which captures the local discriminative features to distinguish dif- ferent fine-grained categories. Studies in this field mainly focus on locating the discriminative regions and learning a diverse collection of complementary parts in weakly- supervised manners. Previous works [50, 49] build part models to localize objects and treat the objects and seman- tic parts equally. Recently, several works [52, 47, 10] have been proposed under a multiple attentional framework, the 2186 !"#"$%&' ())%$)"*$ +**#"$, -$./)0"1&,% ! !&234*$% 5&6%'708 9:&##*;0#&6%' !"#$!!" !&234*$% 5&6%'70< !&234*$%="" 5&6%'70="">#*4�?%&)/'% " @%A)/'�?%&)/'%01&.7 #"#$ ())%$)"*$0 B*C/#% ())%$)"*$01&.70 $ @%A)/'&# ?%&)/'%01&)'"A % D#&77"?"%' E%�*'0 F&3% ())%$)"*$0#&6%' !"# !!"% G%$7%0 4#*23 !"#$%&"'"()*(+","($'-./+0 &H%'&,%0 .**#"$, $*$I)%A)/'&# ?%&)/'%01&. & %A.&$C '%7"C/&# Figure 2: The framework of our method. Three components play an important role in our framework: an Attention Module for generating multiple attention maps, a texture enhancement block for extracting and enhancing the textural information and a bidirectionally used bilinear attention pooling for aggregating textural and semantic features. core ideal of these method is that learning discriminative re- gions in multiple scales or image parts simultaneously and encouraging the fusion of these features from different re- gions. In addition, [17] designs attention cropping and at- tention dropping to obtain more balanced attention maps. In this paper, we model deepfake detection as a special fine- grained classification problem for the first time. It shares the same spirit in learning subtle and discriminative features, but only involves two categories, i.e., real and fake. 3. Methods 3.1. Overview In this section, we initially state the motivation of the designing and give a brief overview of our framework. As aforementioned, the discrepancy between real and fake faces is usually subtle and occurs in local regions, which is not easy to be captured by single-attentional structural networks. Thus we argue that decomposing the attention into multiple regions can be more efficient for collecting local feature for deepfake detection task. Meanwhile, the global average pooling which is commonly adopted by cur- rent deepfake detection approaches is replaced with local attention pooling in our framework. This is mainly be- cause the textural patterns vary drastically among different regions, the extracted features from different regions may be averaged by the global pooling operation, resulting in a loss of distinguishability. On the other hand, we observe that the slight artifacts caused by forgery methods tend to be preserved in textural information of shallow features. Here, the textural information represents the high frequency com- ponent of the shallow features, just like the residual infor- mation of RGB images. Therefore, more shallow feature should be focused on and enhanced, which has not been considered by current state-of-the-art detection approaches. ! !"#$! %!%!&! '(! )*+,! !! -.*/*#0!1!234*5! /,.036.37803"#! 9*&0,:8.5;*80,:*5! /86455!!!"!#!$! <00*#03"#5.8=*:! %!"!#!!!="">!"!?! <00*#03"#5 8645!="" &!'!="" #!="" '!="" @!="" %!a!b!a!c!="" 9*&0,:8.5!="" ;*80,:*5/80:3&!="" !!="" (":/8.3d*e5!=""><$*:8f*5! g"".3#f! !""#$"%&$'(&)*+#! ,-&.#)*-#'&/'&0"1%$%$2'"#3"*-1+'/#1"*-#'41"-%3! )*43d*! 5!,! figure g"".3#f!="" !""#$"%&$'(&)*+#!="" ,-&.#)*-#'&/'&0"1%$%$2'"#3"*-1+'/#1"*-#'41"-%3!="" )*43d*!="" 5!,!="">
Answered 4 days AfterJan 20, 2022

Answer To: Multi-Attentional Deepfake Detection Multi-attentional Deepfake Detection Hanqing Zhao1 Wenbo...

Pawan answered on Jan 25 2022
113 Votes
instructions
Instructions to install
1. Download and install miniconda from https://docs.conda.io/
en/latest/miniconda.html
Follow the installation process
https://docs.conda.io/en/latest/miniconda.html
2. Add the path of the installation directory to the environment variable.
Open Edit the system environment variable
Go to Environment variable and edit the system variable
Click New
Add the path to the miniconda installation
3. Open anaconda prompt
4. Check the python and pip version. These steps will tell you that the installation is
successful.
5. Edit the following file
dataset/data.py comment the line 38. This line is reading a database json which is not available
currently. Undo this process when you generate a dataset for celeb.json
6. Install all the software listed below by typing the commands in the prompt
● pip install numpy
● pip install torch
● pip install kornia
● pip install torchvision
● pip install -U albumentations[imgaug]
● pip...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here