CN115471675A

CN115471675A - Disguised object detection method based on frequency domain enhancement

Info

Publication number: CN115471675A
Application number: CN202211226059.3A
Authority: CN
Inventors: 黄小珊; 朱江海; 仲亦杰; 田锋亮; 刘政龙; 黄培强; 张文强
Original assignee: Xuchang 3d Mapping Co ltd
Current assignee: Xuchang 3d Mapping Co ltd
Priority date: 2022-10-09
Filing date: 2022-10-09
Publication date: 2022-12-13

Abstract

The invention provides a method for detecting a disguised object based on frequency domain enhancement, which comprises the following steps: 1, constructing a camouflage object network: building a camouflage detection camouflage object network, inputting an image set of a camouflage object into the camouflage object network, and optimizing the camouflage object network through an iterative optimization loss function; 2, constructing a feature alignment module and a high-frequency channel selection module: the purpose of time-frequency feature alignment and high-frequency feature screening is achieved by using the camouflage object network; 3, dividing the network framework of the disguised object into a model training stage and a testing stage; in the model training stage, a disguised object training picture set subjected to data preprocessing is input into a disguised object network, and the disguised object network is optimized by using a frequency domain enhancement module and two-time supervision; and in the testing stage, inputting the image of the disguised object to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image.

Description

Method for detecting disguised object based on frequency domain enhancement

Technical Field

The invention relates to a method for detecting a camouflaged object image, in particular to a method for detecting a camouflaged object based on frequency domain enhancement.

Background

In recent years, with the rapid development of deep convolutional networks, the detection of the disguised object has made a great breakthrough. Compared with the traditional disguised object detection algorithm, the disguised object detection method based on deep learning is greatly improved in accuracy rate, high-level semantic information of an image can be acquired through a deep neural network, and the information can be used for more accurately detecting the disguised significant object in a scene. For example, the documents Deng-Ping Fan, ge-Peng Ji, ming-Ming Cheng, and Ling Shao, "converged object detection," CoRR,2021. And the documents Jingjing Ren, xiaoaeei Hu, lei Zhu, xuemiao Xu, yangyang Xu, weiming Wang, zijun Deng, and Pheng-Ann Heng, "Deep texture-aware features for camouflaged object detection," in CoRR,2021. Attempt to design texture enhancement modules or use attention mechanisms to guide the model camouflaged areas. The documents YunqiuLv, hanging Zhang, yuchao Dai, aixuan Li, bowen Liu, nick Barnes, and Dengping Fan, "Simulanouselocity localization, segment and rank the captured objects," in CVPR,2021. And the documents P.Sengottuevan, amitabh Wahi, and A.Shanmugam, "Performance of decoding through output image analysis," in ICETET,2019. It is intended to treat the segmented camouflage as a two-stage process, improving the Performance of the network algorithm.

Although the accuracy of the detection of the disguised objects can be further improved by improving the network structure by the method, the disguised objects are only detected in the RGB space, and the difference between the disguised objects and other areas in the frequency domain space is ignored, so that the further improvement of the performance of the disguised objects and other areas is prevented.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a method for detecting a disguised object based on frequency domain enhancement aiming at the defects of the prior art.

In order to solve the technical problem, the invention discloses a method for detecting a disguised object based on frequency domain enhancement.

The method disclosed by the invention is based on a frequency domain enhancement module to better detect the disguised object from the background. In particular, the invention designs a new Frequency Enhancement Module (FEM) to mine clues of disguised objects in the frequency domain. In addition, the invention provides a feature alignment method (FA) for fusing the features of the RGB domain and the frequency domain. Finally, in order to further exploit frequency information, a higher order relational module (HOR) is proposed to handle rich fusion features.

The method comprises the following specific steps:

step 1, constructing a camouflage object network based on a frequency domain enhancement module; the disguised object network comprises: the device comprises a camouflage object network framework, a frequency domain enhancement module, a feature alignment module and a high-frequency channel selection module;

the device comprises a frequency domain enhancement module, a characteristic alignment module, a high-frequency channel selection module and a frequency domain matching module, wherein the frequency domain enhancement module is used for extracting the characteristics of a disguised object in a frequency domain;

step 2, training the camouflage object network, comprising: inputting a camouflage object training image set subjected to data preprocessing into a camouflage object network, and optimizing the camouflage object network by using a frequency domain enhancement module and twice supervision to obtain a trained camouflage object network;

step 3, testing by adopting the trained camouflage object network, comprising the following steps: and inputting the disguised object image to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image, and finishing the detection of the disguised object based on frequency domain enhancement.

The step 1 comprises the following steps:

step 1-1, constructing a camouflage object network framework to extract RGB characteristics;

step 1-2, designing a frequency domain enhancement module FEM to extract the characteristics of the disguised object in the frequency domain;

step 1-3, constructing a feature alignment module FA; fusing the time domain features and the frequency domain features;

step 1-4, constructing a high-frequency channel selection module HOR; and (5) performing high-frequency characteristic screening.

In the step 1-1, the network skeleton of the camouflage object comprises four stages, wherein each stage comprises two convolution layers of 3 multiplied by 3, and the stride is 2; extracting a characteristic diagram corresponding to the RGB image by using the camouflage object network

Wherein H represents a height of the image, W represents a width of the image,

representing the overall resolution size of the feature map.

The frequency domain enhancement module in step 1-2 comprises: an offline discrete cosine transform and online learning enhancement module OLE;

obtaining frequency domain information from the RGB image through offline discrete cosine transform; and the online learning enhancement module OLE is used for obtaining the characteristics of the camouflage object hidden in the frequency space, namely the frequency domain characteristics.

The off-line discrete cosine transform described in step 1-2 transforms the feature map x ^rgb Conversion to YCbCr space, in which the feature map is represented as

Subsequently, x is ^YCbCr Image divided into 8 × 8 size

An image representing a region, i, j representing the coordinates of said region; each region is then processed into a frequency spectrum by Discrete Cosine Transform (DCT)

Where each value corresponds to the intensity of a certain frequency band, the above process is represented by the following formula:

in the formula, the first and second sets of data are represented,

represent

All connections of (c);

all the regions with the same frequency are collected into one channel, a characteristic diagram is obtained again, and flatten represents a collection method.

In step 1-2, the online learning enhancement module OLE is used to obtain characteristics of a camouflage object hidden in a frequency space, and specifically includes:

firstly, in order to enable the neural network to learn information of different frequency bands in the image, the signal is down-sampled and divided into two parts, wherein the first 96 channels are low-signal segments

The last 96 channels are high signal segments

Wherein k represents a size; in order to enhance the signal in the corresponding frequency band, the low signal segment is combined withInputting two multi-head self-attention MHSAs into the high signal section respectively, and connecting the outputs to restore the original shape;

then another multi-headed self-attention MHSA reconciles all the different frequency bands, and the newly formed signal is represented as

Multi-headed self-attention MHSA capture input representations

Rich dependencies between each Patch chunk in the set; the specific method comprises the following steps: firstly, the first step is to

Shaping to form a characteristic diagram

Then, modeling the relationship among all Patch blocks by using MHSA;

finally, upsampling and obtaining an enhanced frequency signal x ^freq 。

In step 1-3, the method for fusing the time domain feature and the frequency domain feature comprises:

feature alignment module FA aligns time domain features

Sum frequency domain feature X _fre Carrying out fusion; designing a binary base filter f covering a high frequency band _base And three different learnable filters are added to the Y, cb and Cr color spaces

z represents the number of learnable filters; the filtering being a frequency-response and combination filter f _base +σ(f _z ) The dot product between, where the representation of the sigma function is as follows:

wherein exp is an exponentiation function; for input frequency domain characteristics

Three signals of different frequency bands are obtained through the following formula:

wherein [ ] is an element-level product; information of three different frequency bands is obtained by selecting 3 different filters

The information of the three different frequency bands is spliced together to obtain a frequency domain output X ^freq ：

Splicing the spatial domain information and the frequency domain information; the specific method comprises the following steps: x is to be ⁱ And X ^freq Connected and input a convolution layer with 4 output channels, the output is T; taking out from the third dimension

And reshaped to HW x n

By:

T ₁ ＝T ¹ (T ² ) ^T ，

T ₂ ＝T ³ (T ⁴ ) ^T .

mapping the alignment features; then multiplied by the transform and a learned vector

To adjust the intensity of each channel, the alignment feature field of each channel is defined as:

finally, a fused feature is obtained by adding features of the two domains

In steps 1-4, the method for screening high-frequency characteristics comprises:

is provided with

Representing input features, C representing the channel dimensions of said features, first reshaped to C × HW; since the frequency response comes from a local area, it is necessary to encode primitive features of positional importance to distinguish the disguised object from other objects. The location attention weight is expressed as:

different network layers present potential information at different scales, the latter of which has a larger acceptance area. The representation of multi-scale learning is also enhanced with cross-layer semantics.

Where ψ (X) represents a subsequent layer of the input feature X; the width W of the image is taken as an attention weight to find RGB and frequency response correlations between different layers; the position weight value strengthens the original characteristics, and the most useful characteristics are selected when different samples occur through self-adaptive gating operation, wherein the method comprises the following steps:

wherein,

representing gating weights generated by the FC layer as a function

The gating operation is generated based on spatial perception, and the characteristics of position perception are formed;

after obtaining the position enhancement feature a, establishing a channel perception relation matrix D through similar operations:

wherein C represents the channel dimension of the location-aware features; finally, the relationship matrix D is applied to X to obtain selection information beneficial to the disguised object:

then, the characteristic X is measured _out Input into the decoding process.

The method for training the camouflage target network in the step 2 comprises the following steps:

step 2-1, data preprocessing: the method for enhancing the data in the form of random turning and random cutting of the training set of the disguised object to be input into the disguised object network specifically comprises the following steps:

step 2-1-1, randomly turning: flipping the image in either the horizontal or vertical direction;

step 2-1-2, random cutting: cutting an area with the size of the random ratio of the picture, and keeping the length-width ratio unchanged;

step 2-2, training: inputting the image subjected to data enhancement into a camouflage object network, optimizing the camouflage object network through a loss function, enabling the camouflage object network to generate a complete and accurate camouflage object image, training the camouflage object network repeatedly, setting the number of rounds, and storing final network model parameters.

In step 2-2, for the input camouflage target image, the camouflage target network based on the frequency domain enhancement module is utilized, and the loss l is lost by means of weighted BCE _bce And weighted IoU loss l _iou Training is carried out; wherein the supervision function L _k Is defined as follows:

L _k ＝L _bce (P _k ，M)+L _iou (P _k ，M)

where M is the truth label, k is the kth stage of the network, P _k Representing the predicted result of the kth stage; finally, the overall loss function L _overall Comprises the following steps:

has the advantages that:

firstly, the invention introduces the frequency domain as an additional clue to better detect the disguised object from the background; secondly, in order to further fully utilize frequency information, a high-order relation module (HOR) is provided to process rich fusion characteristics.

Drawings

The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a schematic process flow diagram of the present invention.

FIG. 2 is a schematic diagram of an example set of input camouflaged object images.

Detailed Description

The invention discloses a method for detecting a disguised object based on frequency domain enhancement, which is implemented according to the following steps, as shown in figure 1:

1. constructing a camouflage object network G:

inputting: a set of images of a camouflaged object.

And (3) outputting: and generating a corresponding disguised object segmentation image.

1.1 constructing a network model framework of the disguised object to extract features;

the network model framework of the camouflage object comprises four stages, wherein each stage comprises two convolution layers of 3 multiplied by 3, and the stride is 2; the RGB feature map is extracted by utilizing the skeleton, so that the spatial resolution is reduced at each stage, and the constructed backbone extracts the multi-scale features under the condition of limited parameters

1.2 designing a frequency domain enhancement module FEM to extract the characteristics of the disguised object in the frequency domain;

the frequency domain enhancement module FEM is used for extracting the characteristics of the camouflage object in the frequency domain. Wherein the offline discrete cosine transform converts x ^rgb Conversion to YCbCr space (by)

Represents) of x ^YCbCr A set of frequency domain color channels divided into 8 x 8

Representing the area of a certain color channel. Each region is DCT processed into a frequency spectrum

Where each value corresponds to the intensity of a certain frequency band. According to the regional rule:

and is

To represent

All connections of (c); all components with the same frequency are gathered into a channel, and the frequency spectrum is reshaped into a new input

Thus, the original color input is converted to the frequency domain. The online learning enhancement module OLE is used to obtain the disguised object features hidden in the frequency space. Firstly, the coefficient of local frequency band is raised, the signal is down-sampled and divided into two portions, low signal segment

And high signal section

Where k represents the size. To enhance the signals in the corresponding frequency bands, we input them separately into two multi-tap self-attention (MHSA) and connect their outputs to restore the original shape. Then another MHSA reconciles all the different frequency bands, and the newly formed signal is represented as

The MHSA is able to capture rich correlations between each item in the input properties. At this point, there is complete interaction with the different spectra of the image. For DCT, patch blocks are independent of each other, and the above process only enhances one Patch block. To help the network identify the location of the disguise object, we need to establish connections between Patch blocks. We will first of all be

Moulding to form

Then, we model the relationships between all Patch blocks using MHSA. Finally, we can upsample and get the enhanced frequency signal x ^freq .x ^rgb And x ^freq Are input to the network. Since we apply everywhereSingle-layer MHSA, and the size scale of the frequency signal is small, so that high calculation cost is not brought about.

2. Building feature alignment modules and high frequency channel selection modules

Inputting: a frequency domain image and a time domain image;

and (3) outputting: a loss of the camouflaged object;

2.1 constructing a feature alignment module FA, and fusing the time domain feature and the frequency domain feature;

the feature alignment module FA aligns the time domain features

Sum frequency domain feature X _freq2s And (4) carrying out fusion. We have designed a binary base filter f covering the high frequency band _base And three learnable filters are added to the color space of Y, cb and Cr

The filtering being frequency response and combined filter f _base +σ(f _i ) Dot product between, wherein

For input frequency domain characteristics

The network can pass

Wherein |, is the element-level product. Finally, they are put together:

then, we calculate the variation of the two signals from the spatial and frequency domainsAnd (4) changing. We will X ⁱ And X ^freq Connected and then fed into a convolutional layer with 4 output channels, the output of which is T. Taking out from the third dimension

And reshape them to HW × n

Thus, we proceed by:

T ₁ ＝T ¹ (T ² ) ^T ，

T ₂ ＝T ³ (T ⁴ ) ^T .

second, we can align the feature maps. Then multiplied by the transform and a learned vector

To adjust the intensity of each channel, the alignment feature field for each channel may be defined as:

finally, we can obtain a fused feature by adding features of these two domains

2.2 constructing a high-frequency channel selection module HOR for high-frequency feature screening;

is provided with

Representing the input features, we first reshape it to C HW since the frequency response comes from a local area, it is necessary to encode the original features with positional importance to distinguish the disguised object from other objects. The location attention weight may be expressed as:

in addition, different network layers present potential information at different scales, the latter of which has a larger acceptance area. The representation of multi-scale learning is also enhanced with cross-layer semantics. Here ψ (X) denotes the subsequent layer of X. W thus serves as an attention weight to find RGB and frequency response correlations between different layers. The position weights then emphasize the original features, then through adaptive gating operations, select the most useful features when different samples occur:

wherein,

the gating weight generated by the FC layer can be considered as a function

The gating operation is generated based on spatial perception to form a location-aware feature.

After obtaining the location-enhancing feature a, a channel perception relationship matrix may be established by similar operations:

where C represents the channel dimension of the location-aware features. Each tensor in the channel perceptual relationship has the same c-dimension in both semantic and frequency mapping, which correspond to the original eigenchannel and spectrum. Finally, we apply this relationship matrix to X to obtain selection information that is beneficial to the disguised object:

then, the characteristic X is measured _out Input into the decoding process.

3. Training an integral framework;

the deep learning convolution neural network training parameter based on the double branches comprises a data preprocessing stage, a model framework training stage and a testing stage.

3.1, preprocessing data;

the input image set of the disguise object is adjusted by pulling up, reversing and the like and then input into a network of the disguise object.

Inputting: a set of images of a camouflaged object.

And (3) outputting: and (4) the image set of the disguised object after data enhancement.

Geometric reinforcement: the generalization capability of the model can be enhanced by methods of changing the image geometry such as translation, rotation and shearing;

3.2 model framework training

Inputting: data enhanced image set of camouflaged object

And (3) outputting: image set segmentation result of disguised object

For the input camouflage subject image, the network is used to lose l by means of a weighted BCE _bce And weighted IoU loss l _iou So as to train and obtain a robust camouflage object detection network; wherein the supervision function L _i Is defined as follows:

L _i ＝L _bce (P _i ，M)+L _iou (P _i ，M)

where M is the truth label and i is the ith stage of the network. Finally, the overall loss function is:

during training, a small batch Stochastic Gradient Descent (SGD) optimization algorithm with a batchsize of 32, a momentum of 0.9, and a weight decay of 1e-5 may be used. The learning rate is set to 1e-4 and the maximum epoch is set to 100. The training image is adjusted to 352 x 352 as input to the entire network.

3.3 testing the model framework;

inputting: a set of images of a camouflaged object;

and (3) outputting: corresponding disguised object segmentation images;

step 1, an input image set of a disguised object is sent into a convolutional neural network, and an initial disguised segmented image result can be obtained by using a trained generation network model;

and step 2, filling details in the features obtained in the step 1, and finally obtaining a final camouflage segmentation image result with rich details.

In the present invention, as shown in fig. 2, the first column is the input set of the disguised object image, the second column is the standard segmentation result, and the third column is the network segmentation result of the present invention.

In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the method for detecting a disguised object based on frequency domain enhancement and some or all of the steps in each embodiment of the method for detecting a disguised object based on frequency domain enhancement provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.

It is clear to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a computer program, that is, a software product, which may be stored in a storage medium and include several instructions for enabling a device (which may be a personal computer, a server, a single chip microcomputer, an MUU, or a network device, etc.) including a data processing unit to execute the method according to each embodiment or some portions of the embodiments of the present invention.

The present invention provides a method and a concept for detecting a disguised object based on frequency domain enhancement, and a method and a way for implementing the technical scheme are numerous, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for a person skilled in the art, a plurality of improvements and embellishments can be made without departing from the principle of the present invention, and these improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A camouflaged object detection method based on frequency domain enhancement is characterized by comprising the following steps:

step 1, constructing a camouflage object network based on a frequency domain enhancement module; the disguised object network comprises: the device comprises a camouflage object network framework, a frequency domain enhancement module, a characteristic alignment module and a high-frequency channel selection module;

step 2, training the camouflage object network, which comprises the following steps: inputting a camouflage object training image set subjected to data preprocessing into a camouflage object network, and optimizing the camouflage object network by using a frequency domain enhancement module and a supervision loss to obtain a trained camouflage object network;

step 3, testing by adopting the trained camouflage object network, comprising the following steps: and inputting the image of the disguised object to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image, and completing the detection of the disguised object based on frequency domain enhancement.

2. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 1, wherein step 1 comprises the following steps:

step 1-2, designing a frequency domain enhancement module FEM to extract the characteristics of the camouflage object in a frequency domain;

step 1-4, constructing a high-frequency channel selection module HOR; and (5) performing high-frequency feature screening.

3. The method for detecting the disguised object based on the frequency domain enhancement as claimed in claim 2, wherein in step 1-1, the network skeleton of the disguised object comprises four stages, each stage is two convolution layers of 3 x 3, and the step is 2; extracting a characteristic diagram corresponding to the RGB image by using the camouflage object network

Wherein H represents a height of the image, W represents a width of the image,

representing the overall resolution size of the feature map.

4. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 3, wherein the frequency domain enhancing module in step 1-2 comprises: an offline discrete cosine transform and online learning enhancement module OLE;

obtaining frequency domain information from an RGB image through offline discrete cosine transform; and the online learning enhancement module OLE is used for obtaining the characteristics of the camouflage object hidden in the frequency space, namely the frequency domain characteristics.

5. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 4, wherein the steps of1-2 transforming the feature map x using an offline discrete cosine transform ^rgb Conversion to YCbCr space, in which the feature map is represented as

Subsequently, x is ^YCbCr Image divided into 8 × 8 size

in the formula, the first and second sets of data are represented,

to represent

All connections of (a);

6. The method for detecting a disguised object based on frequency domain enhancement as claimed in claim 5, wherein in step 1-2, the online learning enhancement module OLE is used to obtain the characteristics of the disguised object hidden in the frequency space, and specifically comprises:

firstly, the signal is transmitted

Down-sampling and dividing into two parts, the first 96 channels being low signal segments

The last 96 channels are high signal segments

Wherein k represents a size; respectively inputting the low signal section and the high signal section into two multi-head self-attention MHSAs, and connecting the outputs to restore the original shape;

Multi-headed self-attention MHSA capture input representations

Rich dependencies between each Patch chunk in the set; the specific method comprises the following steps: firstly, the method is to

Shaping of the mold

Then, modeling the relationship among all Patch blocks by using MHSA;

finally, up-sampling and obtaining an enhanced frequency signal x ^freq 。

7. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 6, wherein in step 1-3, the method for fusing the time domain feature and the frequency domain feature comprises:

feature alignment module FA aligns time domain features

Sum frequency domain feature X _freq2s Carrying out fusion; designing a binary base filter f covering a high frequency band _base And three different learnable filters are added to the Y, cb and Cr color spaces

wherein exp is an exponentiation function; for input frequency domain features

wherein £ is an element-level product; information of three different frequency bands is obtained by selecting 3 different filters

And remodeled to HW × n

By:

T ₁ ＝T ¹ (T ² ) ^T ,

T ₂ ＝T ³ (T ⁴ ) ^T .

finally, a fused feature is obtained by adding features of the two domains

8. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 7, wherein in step 1-4, the method for high frequency feature screening comprises:

is provided with

Representing input features, C representing the channel dimensions of said features, which are first reshaped to cxhw; the location attention weight is expressed as:

where ψ (X) represents a subsequent layer of the input feature X; the width W of the image is taken as an attention weight to find RGB and frequency response correlation between different layers; the position weight value strengthens the original characteristics, and the most useful characteristics are selected when different samples occur through self-adaptive gating operation, and the method comprises the following steps:

wherein,

representing gating weights generated by the FC layer as a function

The gating operation is generated based on space perception, and a position perception characteristic is formed;

wherein C represents the channel dimension of the location-aware feature; finally, the relationship matrix D is applied to X to obtain selection information that is favorable for the decoy:

then, the characteristic X is measured _out Input into the decoding process.

9. The method for detecting a disguised object based on frequency domain enhancement as claimed in claim 8, wherein the method for training the network of the disguised object in step 2 comprises the steps of:

step 2-2, training: inputting the image subjected to data enhancement into a camouflage object network, optimizing the camouflage object network through a loss function, enabling the camouflage object network to generate a complete and accurate camouflage object image, training the camouflage object network to set the number of turns repeatedly, and storing final network model parameters.

10. The method according to claim 9, wherein in step 2-2, the input camouflage object image is subjected to BCE loss l weighting by using the camouflage object network based on the frequency domain enhancement module _bce And weighted IoU loss l _iou Training is carried out; wherein the supervision function L _k Is defined as follows:

L _k ＝L _bce (P _k ,M)+L _iou (P _k ,M)

where M is the truth label and k is the kth of the networkStage, P _k Representing the predicted result of the kth stage; finally, the overall loss function L _overall Comprises the following steps: