CN113269730B

CN113269730B - Image processing method, image processing device, computer equipment and storage medium

Info

Publication number: CN113269730B
Application number: CN202110512984.1A
Authority: CN
Inventors: 彭程威; 周锴; 王雷; 张睿
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-11-29
Anticipated expiration: 2041-05-11
Also published as: CN113269730A

Abstract

The application provides an image processing method, an image processing device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: determining a first attention feature based on a first image feature of a first candidate region in an image, the first attention feature being used for representing the importance degree of different sub-features in the first image feature for determining a tampered region; determining a second candidate region in the image based on the first attention feature and the first candidate region; determining a target region in the image based on a second image feature of the second candidate region and the second candidate region, the target region being indicative of a tampered region. According to the method and the device, the attention characteristics are determined, so that the characteristics needing important attention in the process of determining the tampered region are more prominent, the tampered region is obtained through progressive region adjustment step by step based on the attention characteristics, and the accuracy of positioning the tampered region is improved.

Description

Image processing method, image processing device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

With the rapid development of image editing technology, people can edit images more and more conveniently, but people can tamper the images maliciously to cause adverse effects, so that the method has important significance in positioning tampered areas in the images.

In the related art, it is determined whether a candidate region belongs to a tampered region only based on features of the candidate region in an image, and the candidate region is determined as the tampered region when the candidate region belongs to the tampered region, where a difference between the determined tampered region and a real tampered region is large, and accuracy of positioning of the tampered region is low.

Disclosure of Invention

The embodiment of the application provides an image processing method and device, computer equipment and a storage medium, and can improve the accuracy of positioning a tampered area. The technical scheme is as follows:

in one aspect, an image processing method is provided, and the method includes:

determining a first attention feature based on a first image feature of a first candidate region in an image, the first attention feature being used for representing the importance degree of different sub-features in the first image feature for determining a tampered region;

determining a second candidate region in the image based on the first attention feature and the first candidate region;

determining a target region in the image based on a second image feature of the second candidate region and the second candidate region, the target region being indicative of a tampered region.

In one possible implementation, the determining a first attention feature based on a first image feature of a first candidate region in an image includes:

and processing the first image feature through a first attention model in a region determination network to obtain the first attention feature, wherein the region determination network is obtained by training based on a sample image marked with a tampered region.

In another possible implementation manner, the determining a first attention feature based on a first image feature of a first candidate region in an image includes:

determining an attention weight for a sub-feature of each channel in the first image feature based on the first image feature;

determining a channel attention feature based on the attention weight of the sub-feature of each channel and the sub-feature of each channel;

determining an attention weight for a sub-feature for each location in the channel attention feature based on the channel attention feature;

determining the first attention feature based on the attention weight of the sub-feature of each location and the sub-feature of each location.

In another possible implementation, the determining a second candidate region in the image based on the first attention feature and the first candidate region includes:

processing the first attention feature and the first candidate region through a first position regression model in the region determination network to obtain a second candidate region;

the area determination network is obtained by training based on a sample image marked with a tampered area, the first position regression model is used for processing any feature and any candidate area to obtain a processed candidate area, and the coincidence degree of the processed candidate area and the real tampered area is larger than that of the candidate area and the real tampered area before processing.

In another possible implementation manner, before the processing the first attention feature and the first candidate region through the first location regression model in the region determination network to obtain the second candidate region, the method further includes:

processing the first attention characteristic through a classification model in the area determination network to obtain tampering classification information, wherein the tampering classification information is used for indicating whether the first candidate area is a tampering area;

the processing the first attention feature and the first candidate region through a first location regression model in the area determination network to obtain the second candidate region includes:

and if the tampering classification information is used for indicating that the first candidate area is a tampered area, processing the first attention feature and the first candidate area through the first position regression model to obtain a second candidate area.

In another possible implementation manner, the determining a target region in the image based on the second image feature of the second candidate region and the second candidate region includes:

processing the second image feature through a second attention model in the area determination network to obtain a second attention feature;

processing the second attention feature and the second candidate area through a second position regression model in the area determination network to obtain the target area;

the area determination network is obtained by training based on a sample image marked with a tampered area, the second position regression model is used for processing any feature and any candidate area to obtain a processed candidate area, and the coincidence degree of the processed candidate area and the real tampered area is larger than that of the candidate area and the real tampered area before processing.

In another possible implementation, the network structure of the first attention model is different from the network structure of the second attention model.

determining a third candidate region in the image based on the second image feature and the second candidate region;

determining a target region in the image based on a third image feature of the third candidate region and the third candidate region.

In one aspect, an image processing apparatus is provided, the apparatus including:

an attention feature determination module for determining a first attention feature based on a first image feature of a first candidate region in an image, the first attention feature being indicative of how important different sub-features in the first image feature are for determining a tampered region;

a candidate region determination module to determine a second candidate region in the image based on the first attention feature and the first candidate region;

a target region determination module to determine a target region in the image based on a second image feature of the second candidate region and the second candidate region, the target region to indicate a tampered region.

the attention feature determination module is configured to process the first image feature through a first attention model in a region determination network to obtain the first attention feature, and the region determination network is obtained through training based on a sample image labeled with a tampered region.

In another possible implementation manner, the attention feature determination module is configured to:

determining an attention weight for a sub-feature of each location in the channel attention feature based on the channel attention feature;

In another possible implementation manner, the candidate region determining module is configured to:

In another possible implementation manner, the apparatus further includes:

a tampering classification information determining module, configured to process the first attention feature through a classification model in the area determination network to obtain tampering classification information, where the tampering classification information is used to indicate whether the first candidate area is a tampering area;

and the candidate area determining module is configured to, if the tampering classification information is used to indicate that the first candidate area is a tampered area, process the first attention feature and the first candidate area through the first position regression model to obtain the second candidate area.

In another possible implementation manner, the target area determination module is configured to:

processing the second attention feature and the second candidate region through a second position regression model in the region determination network to obtain the target region;

the area determination network is obtained by training based on a sample image marked with a tampered area, the second position regression model is used for processing any feature and candidate area to obtain a processed candidate area, and the overlap ratio of the processed candidate area and the real tampered area is larger than that of the candidate area and the real tampered area before processing.

In another possible implementation manner, the target area determining module is configured to:

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor to implement the image processing method according to any one of the above possible implementation manners.

In another aspect, a computer-readable storage medium is provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to implement the image processing method according to any one of the above possible implementation manners.

In another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code, the computer program code being stored in a computer-readable storage medium, the computer program code being read by a processor of a computer device from the computer-readable storage medium, the computer program code being executed by the processor to cause the computer device to perform the image processing method according to any one of the above-mentioned possible implementation manners.

According to the technical scheme, the attention features corresponding to the initial candidate regions are determined, so that the features needing important attention in the process of determining the tampered regions are more prominent, the initial candidate regions are preliminarily adjusted based on the attention features, the candidate regions representing the tampered regions are more accurately obtained, further the preliminarily adjusted candidate regions are further adjusted based on the image features of the preliminarily adjusted candidate regions, the finally determined tampered regions are obtained through progressive region adjustment step by step, and accuracy of positioning of the tampered regions is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

fig. 2 is a flowchart of an image processing method provided in an embodiment of the present application;

fig. 3 is a flowchart of an image processing method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a feature extractor provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of an attention feature extraction provided by an embodiment of the present application;

fig. 6 is a schematic diagram of an image processing method provided in an embodiment of the present application;

FIG. 7 is a diagram illustrating an image processing result provided by an embodiment of the present application;

fig. 8 is a block diagram of an image processing apparatus according to an embodiment of the present application;

fig. 9 is a block diagram of a terminal according to an embodiment of the present application;

fig. 10 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102.

Optionally, the terminal 101 is a smartphone, a tablet, a laptop, or a desktop computer, but is not limited thereto. The terminal 101 has an image editing function, the terminal 101 is connected to the server 102 via a wireless or wired network, and the terminal 101 transmits an image to the server 102 in response to an upload operation of the image.

Optionally, the server 102 is a server; alternatively, the server 102 is a server cluster or distributed system composed of a plurality of servers; alternatively, server 102 is a cloud computing service center. The server 102 has a function of locating a tampered area from an image. The server 102 receives the image transmitted from the terminal 101, and locates the tampered area from the image.

Fig. 2 is a flowchart of an image processing method according to an embodiment of the present application. The image processing method will be briefly described with reference to fig. 2, and with reference to fig. 2, the image processing method includes the following steps:

201. the computer device determines a first attention feature based on a first image feature of a first candidate region in the image, the first attention feature being indicative of how important different sub-features in the first image feature are for determining the tampered region.

The first candidate area is a preliminarily determined candidate area, and the first candidate area may be a tampered area. The image tampering refers to modifying the image content of the image acquired by the image acquisition equipment, wherein a tampering trace exists in the tampered image, and the tampering area refers to an area where the tampering trace is located.

In the dimension of the channel, the first image feature comprises sub-features of a plurality of channels, and one sub-feature refers to a sub-feature of one channel in the first image feature. In the dimension of space, the first image feature comprises sub-features of a plurality of positions, and one sub-feature refers to a sub-feature of one position in the first image feature. Wherein a location refers to a feature point in the first image feature. For example, the first image feature is a feature map having a width W, a height H, and a number of channels C. In the dimension of the channel, the first image feature comprises sub-features of the C channels. In the dimension of space, the first image feature includes sub-features of W × H locations.

202. The computer device determines a second candidate region in the image based on the first attention feature and the first candidate region.

And the computer equipment adjusts the first candidate region based on the first attention feature and in combination with the importance degree of different sub-features for determining the tampered region to obtain a second candidate region.

203. The computer device determines a target region in the image based on a second image feature of the second candidate region and the second candidate region, the target region indicating a tampered region.

And after the computer equipment preliminarily adjusts the first candidate region to obtain a second candidate region, continuously adjusting the second candidate region based on the second image characteristics of the second candidate region to obtain a target region, and taking the target region as a tampered region.

According to the technical scheme, the attention feature corresponding to the initial candidate region is determined, so that the feature needing important attention in the process of determining the tampered region is more prominent, the initial candidate region is preliminarily adjusted based on the attention feature, the candidate region representing the tampered region is more accurately obtained, the preliminarily adjusted candidate region is further adjusted based on the image feature of the preliminarily adjusted candidate region, the finally determined tampered region is obtained through progressive region adjustment step by step, and the accuracy of positioning the tampered region is improved.

Fig. 3 is a flowchart of an image processing method according to an embodiment of the present application. The image processing method is described in detail below with reference to fig. 3, and referring to fig. 3, the image processing method includes the following steps:

301. and the computer equipment performs feature extraction on the image to obtain the global features of the image.

In some embodiments, the computer device inputs the image into a feature extractor for feature extraction, resulting in global features of the image. Optionally, the network structure of the feature extractor is a convolutional neural network truncated before the pooling layer, and the computer device takes the features output by the last convolutional layer of the feature extractor as global features of the image. The convolutional neural network is a pre-trained convolutional neural network, optionally, the convolutional neural network is obtained by training based on a classification task, for example, the convolutional neural network is obtained by training based on a sample image labeled with a tampered label and a sample image labeled with an untampered label, and the trained convolutional neural network is used for determining whether an input image is a tampered image.

Optionally, the Network structure of the feature extractor is a ResNet (Residual Network) truncated before averaging the pooling layers. The network structure of the feature extractor is shown in fig. 4, and includes a convolution layer, a Batch Normalization layer, a modified Linear Unit (ReLU), a Max Pooling layer, a first residual network, a second residual network, a third residual network, and a fourth residual network. The first residual error network, the second residual error network, the third residual error network and the fourth residual error network all use a residual error block as a basic structure, and network structures among the first residual error network, the second residual error network, the third residual error network and the fourth residual error network are different. Optionally, the ResNet is ResNet101 with a depth of 101, resNet50 with a depth of 50, resNet152 with a depth of 152, and the like, and the network structure of ResNet is not limited in the embodiment of the present application.

In some embodiments, the computer device further normalizes the image before extracting the features of the image, and extracts the features of the normalized image. Optionally, the computer device normalizes the image by subtracting the mean and dividing the variance: the computer device determining a mean of a plurality of pixel values in the image; determining a variance of a plurality of pixel values in the image based on the mean; for each pixel value in the image, determining a difference value between the pixel value and the mean value, and determining a ratio of the difference value and the variance as a normalized pixel value.

Optionally, before the computer device normalizes the image by subtracting the mean and the variance, the computer device further extracts the image of the image in each color channel, and normalizes the extracted image of each color channel. Optionally, the computer device converts the image into an image of an RGB (Red Green Blue) color mode; extracting an image of the image in an R (Red) channel, an image in a G (Green) channel and an image in a B (Blue) channel; the images of the R channel, the G channel, and the B channel are normalized, respectively.

302. The computer device determines a plurality of first candidate regions based on global features of the image.

Optionally, the computer device determines a plurality of coordinate information based on the global feature of the image, one first candidate region being indicated by one coordinate information. Optionally, one piece of coordinate information includes an abscissa value of a center point of the first candidate region, an ordinate value of the center point, a width of the first candidate region, and a height of the first candidate region. Optionally, one coordinate information includes abscissa and ordinate values of four vertices of the first candidate region.

In some embodiments, the computer device processes the global feature of the image through an RPN (Region suggestion network) to obtain a plurality of first candidate regions. The confidence that the first candidate region and the real tampered region are overlapped is higher than the reference confidence, but the overlap ratio of the first candidate region and the real tampered region is lower than the reference overlap ratio, that is, the probability that the first candidate region includes the tampered region with a small area is higher, but the first candidate region includes the non-tampered region with a large area. Wherein the reference confidence is greater than or equal to 0.5 and less than or equal to any value between 1, for example, the reference confidence is 50%, 60%, or 70%. The reference degree of overlap is any value between greater than 0 and less than or equal to 1, e.g., the reference degree of overlap is 0.75, 0.8, or 0.85, etc. The RPN is obtained by training based on a sample image marked with a tampered area.

After the computer device determines the plurality of first candidate regions, each first candidate region is adjusted respectively, so that the overlap ratio between the adjusted first candidate region and the tampered region is higher, the process of adjusting each first candidate region by the computer device is the same, the following steps are described by taking the adjustment process of one first candidate region as an example, and the adjustment process of the plurality of first candidate regions is not repeated.

303. The computer device obtains a first image feature of the first candidate region from the global features of the image.

The computer device samples from the global features of the image based on the coordinate information of the first candidate region to obtain the first image features of the first candidate region, namely the image features of the region indicated by the coordinate information. Optionally, after the computer device acquires the image feature Of the first candidate Region, performing Region Of Interest Pooling (ROI Pooling) processing on the acquired image feature Of the first candidate Region to obtain the first image feature, so that the data amount Of the image feature Of the first candidate Region is reduced, and the processing speed can be improved by subsequently processing the image feature Of the first candidate Region. Optionally, the region-of-interest pooling is maximum pooling or average pooling, which is not limited in the embodiment of the present application.

304. The computer device determines a first attention feature based on a first image feature of a first candidate region in the image.

In some embodiments, the computer device determines an attention weight for each sub-feature in the first image feature based on the first image feature; a first attention feature is determined based on the attention weight of each sub-feature and each sub-feature. The attention weight of one sub-feature is used for representing the importance degree of the sub-feature for determining the tampered region, and the larger the attention weight of one sub-feature is, the larger the influence degree of the sub-feature on determining the tampered region is, and the higher the attention degree of the sub-feature in the process of determining the tampered region is. For each sub-feature, the computer device multiplies the sub-feature by the attention weight of the sub-feature to obtain an attention sub-feature corresponding to the sub-feature, so that the attention sub-features corresponding to the plurality of sub-features constitute the first attention feature.

Optionally, the computer device processes the first image feature through the first attention model to obtain a first attention feature. Wherein the first attention model is used for determining the attention feature corresponding to any feature based on the feature. Optionally, the first attention model is an attention model based on a Spatial domain attention mechanism, for example, the first attention model is an STN (Spatial Transformer Networks) model. Optionally, the first attention model is an attention model based on a channel domain attention mechanism, for example, the first attention model is a SENet (Squeeze-and-Excitation Networks) model. Optionally, the first Attention model is a mixed domain Attention mechanism based Attention model, e.g., the first Attention model is a Residual Attention Network (Residual Attention Network).

In some embodiments, the computer device determines attention weights from dimensions of the space, indicating that it makes sense to focus on the "where" feature in determining the tampered region; the attention weight is determined from the dimensions of the channel, indicating that it makes sense to focus on "what" features in determining the tampered area. Optionally, the computer device processes the first image feature through a first Attention model, which is CBAM (Convolutional Block Attention Module), to obtain the first Attention feature. Accordingly, the step of the computer device determining the first attention feature based on the first image feature of the first candidate region in the image includes the following steps 3041 to 3044:

3041. the computer device determines an attention weight for a sub-feature of each channel in the first image feature based on the first image feature.

Referring to fig. 5, the computer device inputs the first image feature to a channel attention module of the CBAM, and processes the first image feature through the channel attention module to obtain an attention weight of the sub-feature of each channel.

3042. The computer device determines a channel attention feature based on the attention weight of the sub-feature of each channel and the sub-feature of each channel.

For each channel sub-feature, the computer device multiplies the channel sub-feature by the attention weight of the channel sub-feature to obtain a channel attention sub-feature for the channel, such that the channel attention sub-features of the plurality of channels constitute the channel attention feature.

3043. The computer device determines an attention weight for the sub-feature for each location in the channel attention feature based on the channel attention feature.

Optionally, with continued reference to fig. 5, the computer device processes the channel attention feature through a spatial attention module in the CBAM to obtain an attention weight for the sub-feature at each location in the channel attention feature.

3044. The computer device determines a first attention feature based on the attention weight of the sub-features for each location and the sub-features for each location.

For each sub-feature of a location, the computer device multiplies the sub-feature of the location by the attention weight of the sub-feature of the location to obtain a spatial attention sub-feature of the location, such that the spatial attention sub-features of the plurality of locations constitute the spatial attention feature. In some embodiments, the spatial attention feature is a first attention feature. In some embodiments, with continued reference to fig. 5, the computer device also pools the spatial attention feature to obtain a first attention feature.

The above embodiments are described by taking an example that the computer device determines the channel attention feature first, then determines the spatial attention feature based on the channel attention feature, and determines the first attention feature based on the spatial attention feature, in some embodiments, the computer device may also determine the channel attention feature and the spatial attention feature in other orders. For example, the computer device first determines an attention weight for a sub-feature for each location in the first image feature based on the first image feature; determining a spatial attention feature based on the attention weight of the sub-feature of each location and the sub-feature of each location; determining an attention weight for a sub-feature of each channel in the spatial attention feature based on the spatial attention feature; the first attention feature is determined based on the attention weight of the sub-feature of each channel and the sub-feature of each channel. As another example, the computer device determines an attention weight for the sub-feature for each location in the first image feature based on the first image feature and determines an attention weight for the sub-feature for each channel in the first image feature based on the first image feature; and multiplying the attention weight of the sub-feature of each position by the sub-feature of each position, and multiplying the attention weight of the sub-feature of each channel by the sub-feature of each channel to obtain a first attention feature.

According to the technical scheme, the tampering characteristics which need to be focused mainly in the process of determining the tampering region are automatically weighted by introducing the attention mechanism, so that tampering marks with small area occupation ratio in the tampering image can be focused, loss of important information is reduced, omission of the tampering region can be reduced, and accuracy of detection of the tampering region is improved.

305. The computer device determines a second candidate region in the image based on the first attention feature and the first candidate region.

In some embodiments, the computer device adjusts the first candidate region based on the first attention feature to obtain the second candidate region. Optionally, the first candidate region is represented by coordinate information, and the computer device adjusts the coordinate information of the first candidate region based on the first attention feature to obtain the coordinate information of the second candidate region. Optionally, the computer device processes the first attention feature and the first candidate region through a first position regression model to obtain a second candidate region, the first position regression model is used for processing any feature and any candidate region to obtain a processed candidate region, so that the coincidence degree of the processed candidate region and the real tampered region is greater than the coincidence degree of the candidate region before processing and the real tampered region, and the first position regression model is trained based on a sample image labeled with the tampered region.

In some embodiments, the computer device determines a second candidate region in the image based on the first attention feature and the first candidate region if the first candidate region is determined to be a tampered region. Accordingly, the step of the computer device determining a second candidate region in the image based on the first attention feature and the first candidate region comprises: the computer device determines tampering classification information of the first candidate area based on the first attention feature, wherein the tampering classification information is used for indicating whether the first candidate area is a tampering area or not; if the tampering classification information of the first candidate region is used to indicate that the first candidate region is a tampered region, the computer device determines a second candidate region based on the first attention feature and the first candidate region. And if the tampering classification information of the first candidate area is used for indicating that the first candidate area is a non-tampering area, the computer equipment deletes the relevant information of the first candidate area and does not perform subsequent processing on the first candidate area.

Optionally, the computer device processes the first attention feature through the first classification model to obtain tampering classification information of the first candidate region. The first classification model is used for classifying any candidate region based on the characteristics of the candidate region, that is, determining whether the candidate region is a tampered region. The first classification model is obtained by training based on the sample image marked with the tampered label and the sample image marked with the untampered label.

306. The computer device determines a target region in the image based on a second image feature of the second candidate region and the second candidate region, the target region indicating a tampered region.

In some embodiments, the computer device adjusts the second candidate region based on the second image feature to obtain the target region. Optionally, the first candidate region is represented by coordinate information, and the computer device adjusts the coordinate information of the second candidate region based on the second image feature to obtain the coordinate information of the target region. Optionally, the computer device processes the second image feature and the second candidate region through a second position regression model to obtain the target region. The second position regression model is used for processing any one feature and the candidate region to obtain a processed candidate region, the coincidence degree of the processed candidate region and the real tampered region is larger than the coincidence degree of the candidate region and the real tampered region before processing, and the second position regression model is obtained based on sample image training marked with the tampered region. The second position regression model and the first position regression model are two different position regression models. Optionally, the network structure of the second location homing model is different from the network structure of the first location homing model; alternatively, the network structure of the second location regression model is the same as the network structure of the second location regression model, which is not limited in this embodiment of the present application.

In some embodiments, the computer device determines a second attention feature based on the second image feature, wherein the second attention feature is used to represent the importance of different sub-features in the second image feature to determining the tampered region; the computer device determines a target region in the image based on the second attention feature and the second candidate region. Optionally, the computer device processes the second image feature through the second attention model to obtain a second attention feature. Wherein the second attention model is used for determining the attention feature corresponding to any feature based on the feature. Optionally, the network structure of the second attention model is the same as the network structure of the first attention model. Optionally, the network structure of the second attention model is different from the network structure of the first attention model, and the computer device determines the attention characteristics in sequence through the two attention models with different network structures, so that the tampering related characteristics that the two attention models focus on together can be focused on more, and the accuracy of positioning the tampering region can be further improved.

In some embodiments, the computer device marks the target area on the image, and outputs the image after marking the target area. In some embodiments, the computer device further determines, based on the second image feature of the second candidate region, a probability that the second candidate region is a tampered region, i.e., a confidence that the second candidate region is a tampered region; and outputting the probability, wherein the probability represents the probability that the target area belongs to the tampered area. Optionally, the computer device processes the second image feature through the second classification model to obtain a probability that the second candidate region is a tampered region. And the second classification model is used for determining the probability that any candidate region is a tampered region based on the characteristics of the candidate region. The second classification model is obtained by training based on the sample image marked with the tampered label and the sample image marked with the untampered label. The second classification model and the first classification model are two different classification models. Optionally, the network structure of the second classification model is different from the network structure of the first classification model; alternatively, the network structure of the second classification model is the same as the network structure of the first classification model, and this is not limited in this embodiment of the application.

In some embodiments, after adjusting the second candidate region, the computer device further adjusts the adjusted second candidate region to obtain the target region, and accordingly, the step of determining the target region in the image based on the second image feature of the second candidate region and the second candidate region by the computer device includes the following steps 3061 to 3062:

3061. the computer device determines a third candidate region in the image based on the second image feature and the second candidate region.

And the computer equipment adjusts the second candidate region based on the second image characteristic to obtain a third candidate region. Optionally, the computer device processes the second image feature and the second candidate region through a second position regression model to obtain a third candidate region.

In some embodiments, in the case that the computer device determines that the second candidate region is a tampered region, determining a third candidate region in the image based on the second image feature and the second candidate region, and accordingly, the step of determining the third candidate region in the image based on the second image feature and the second candidate region by the computer device includes: the computer device determines tampering classification information of the second candidate region based on the second image feature; if the tampering classification information of the second candidate region is used to indicate that the second candidate region is a tampered region, the computer device determines a third candidate region based on the second attention feature and the second candidate region. And if the tampering classification information of the second candidate area is used for indicating that the second candidate area is a non-tampering area, the computer equipment deletes the relevant information of the second candidate area and does not perform subsequent processing on the second candidate area.

Optionally, the computer device processes the second image feature through the third classification model to obtain the falsification classification information of the second candidate region. The third classification model is used for classifying any candidate region based on the characteristics of the candidate region, that is, determining whether the candidate region is a tampered region. The third classification model is obtained by training based on the sample image marked with the tampered label and the sample image marked with the untampered label. The third classification model and the first classification model are two different classification models. Optionally, the network structure of the third classification model is different from the network structure of the first classification model; alternatively, the network structure of the third classification model is the same as that of the first classification model, which is not limited in this embodiment of the present application.

In some embodiments, the computer device determines a second attention feature based on the second image feature, wherein the second attention feature is used to represent the importance of different sub-features in the second image feature to determining the tampered region; the computer device determines a third candidate region based on the second attention feature and the second candidate region. Optionally, the computer device processes the second image feature through the second attention model to obtain a second attention feature. The process of determining the third candidate region by the computer device based on the second attention feature and the second candidate region is the same as the process of determining the third candidate region by the computer device based on the second image feature and the second candidate region.

3062. The computer device determines a target region in the image based on a third image feature of the third candidate region and the third candidate region.

In some embodiments, the computer device adjusts the third candidate region based on a third image feature of the third candidate region to obtain the target region. Optionally, the computer device processes the third image feature and the third candidate region through a third position regression model to obtain the target region. The third position regression model is used for processing any one feature and the candidate region to obtain a processed candidate region, the coincidence degree of the processed candidate region and the real tampered region is larger than the coincidence degree of the candidate region and the real tampered region before processing, and the third position regression model is obtained based on sample image training marked with the tampered region. Wherein, the third position regression model and the second position regression model are two different position regression models. Optionally, the network structure of the third location regression model is the same as the network structure of the second location regression model; alternatively, the network structure of the third location regression model is different from the network structure of the second location regression model, and this is not limited in this embodiment of the present application.

In some embodiments, the computer device determines a third attention feature based on a third image feature, wherein the third attention feature is used to represent the importance of different sub-features in the third image feature to determining the tampered region; the computer device determines a target region based on the third attention feature and the third candidate region. Optionally, the computer device processes the third image feature through a third attention model to obtain a third attention feature. Wherein the third attention model is used for determining the attention feature corresponding to any feature based on the feature. Optionally, the network structure of the third attention model is the same as the network structure of the second attention model. Optionally, the network structure of the third attention model is different from the network structure of the second attention model, and the computer device determines the attention characteristics in sequence through the attention models with different network structures, so that the tampering related characteristics that are concerned by different attention models together can be concerned more, and the accuracy of positioning the tampering region can be further improved. The process of determining the target region by the computer device based on the third attention feature and the third candidate region is the same as the process of determining the target region by the computer device based on the third image feature and the third candidate region.

Optionally, after the computer device adjusts the third candidate region, the adjusted third candidate region is further adjusted one or more times, so that the target region obtained after final adjustment more accurately represents the tampered region, and accuracy of positioning the tampered region is improved.

Because the initially determined first candidate region includes the small-area tampered region and the large-area non-tampered region, and the first candidate region including the small-area tampered region is adjusted once, so that a more accurate tampered region is difficult to obtain.

In some embodiments, the computer device marks the target area on the image, and outputs the image after marking the target area. In some embodiments, the computer device further determines, based on a third image feature of the third candidate region, a probability that the third candidate region is a tampered region, that is, a confidence that the third candidate region is a tampered region; and outputting the probability, wherein the probability represents the probability that the target area belongs to the tampered area. Optionally, the computer device marks the target region in the image in the form of a rectangular box, and marks the probability that the target region is a tampered region. Optionally, the computer device determines that the target region is a tampered region when the probability is greater than a threshold, determines the image as a tampered image, marks a target region in the image, and marks the probability that the target region is the tampered region. The threshold value can be flexibly configured, for example, the threshold value is 0.7, 0.8, or 0.9, etc.

Optionally, the computer device processes the third image feature through a fourth classification model to obtain a probability that the third candidate region is a tampered region. The fourth classification model is used for determining the probability that any candidate region is a tampered region based on the characteristics of the candidate region. The fourth classification model is obtained by training based on the sample image marked with the tampered label and the sample image marked with the untampered label. The fourth classification model and the third classification model are two different classification models. Optionally, the network structure of the fourth classification model is different from the network structure of the third classification model; or, the network structure of the fourth classification model is the same as the network structure of the third classification model, which is not limited in this embodiment of the application.

Optionally, the computer device performs the above-mentioned steps 303 to 306 through the area determination network to determine the target area in the image. Optionally, the computer device is based on a fast R-CNN (fast Region-CNN, an object detection framework based on CNN (Convolutional Neural Networks). In some embodiments, the area determination network comprises a first attention model, a first location regression model, and a second location regression model. In some embodiments, the area determination network further comprises a first classification model. In some embodiments, the area determination network further comprises a second classification model. In some embodiments, the area determination network further comprises a second attention model. In some embodiments, the area determination network further comprises a third location regression model. In some embodiments, the area determination network further comprises a third attention model. In some embodiments, the area determination network further comprises a third classification model. The area determination network is obtained based on training of a sample image marked with a tampered area. In the training process, the computer equipment inputs the sample image into a region determination network and acquires a predicted target region output by the region determination network; and performing back propagation based on the difference between the predicted target area and the tampered area marked in the sample image, and updating the parameters in the area determination network to reduce the difference between the target area predicted based on the updated parameters and the tampered area marked in the sample image.

In some embodiments, referring to fig. 6, the area determination network includes a first attention model, a first classification model, a first location regression model, a second attention model, a second classification model, a second location regression model, a third attention model, a third classification model, and a third location regression model. The computer equipment performs feature extraction on the image through a feature extractor to obtain the global features of the image; determining a first candidate region through a region suggestion network; sampling is carried out on the global features based on the first candidate region, and first image features are obtained; processing the first image characteristic through the first attention model to obtain a first attention characteristic; processing the first attention characteristic through a first classification model, and determining whether the first candidate area is a tampered area; if the first candidate region is a tampered region, processing the first attention feature and the first candidate region through a first position regression model to obtain a second candidate region; sampling is carried out on the global feature based on the second candidate region, and a second image feature is obtained; processing the second image characteristic through a second attention model to obtain a second attention characteristic; processing the second attention feature through a second classification model, and determining whether a second candidate area is a tampered area; if the second candidate region is a tampered region, processing the second attention feature and the second candidate region through a second position regression model to obtain a third candidate region; sampling is carried out on the global features based on the third candidate region, and third image features are obtained; processing the third image characteristic through a third attention model to obtain a third attention characteristic; processing the third attention feature through a third classification model, and determining the probability that a third candidate region is a tampered region; processing the third attention feature and the third candidate region through a third position regression model to obtain a target region; and outputting the probability and the image of the labeling target area.

Compared with a related scheme, the technical scheme provided by the embodiment of the application has a better performance on the existing data set. Referring to table 1, the AUC (Area enclosed by coordinate axes Under ROC (Receiver Operating Characteristic) Curve) value on the CASIA (image distorted data set) is 0.522 based on the CFA (Color Filter Array) tampering detection method, hereinafter referred to as CFA; the AUC value of CFA over coveraga (an image tampered dataset) was 0.485. The AUC value is an average of AUC values corresponding to a plurality of images in the data set, and the AUC value corresponding to one image is an average of AUC values of a plurality of pixel points in the image. The tampering detection method based on ELA (Error Level Analysis, hereinafter referred to as ELA) has an AUC value of 0.613 on CASIA and an AUC value of 0.583 on COVERAGEA. The tamper detection method based on LSTM (Long-Short Term Memory, long-Short Term Memory model recurrent neural network) J-LSTM has AUC value of 0.614 on COVERAGEA. Noise inconsistency based tamper detection method NOI1 had an AUC value of 0.612 over CASIA and an AUC value of 0.587 over COVERAGEA. Tamper detection method based on RGB stream and noise stream the AUC value of RGB-N on CASIA is 0.795 and the AUC value on COVERAGEA is 0.817. The AUC value of the technical scheme provided in the embodiment of the present application on CASIA is 0.812, and the AUC value on COVERAGEA is 0.830, which are both higher than the AUC values of other related schemes on the same data set.

TABLE 1

	CASIA	COVERAGEA
			CFA	0.522	0.485
ELA	0.613	0.583
			J-LSTM	-	0.614
NOI1	0.612	0.587
			RGB-N	0.795	0.817
This application	0.812	0.830

It should be noted that the technical scheme provided by the embodiment of the present application can be applied to any scene of image tampering detection, and avoids adverse effects by detecting a malicious tampered area in an image. For a take-out service platform, a comment service platform or a map service platform, the technical scheme provided by the embodiment of the application can be used for tampering detection of shop images uploaded by users. For example, the shop image is a door head image, that is, an image including a shop signboard, and referring to fig. 7, 4 images are respectively processed by the technical solution provided in the embodiment of the present application, so as to obtain 4 images with target areas marked and probabilities that the target areas are tampered areas. In a take-out scene, the technical scheme provided by the embodiment of the application can be used for tampering detection on the to-store image uploaded by the rider so as to ensure the real reliability of the to-store image uploaded by the rider. For an insurance platform, the technical scheme provided by the embodiment of the application can be used for tamper detection of insurance documents.

Compared with a related scheme, the technical scheme provided by the embodiment of the application also shows better detection performance in the scene of door head picture tampering detection. Referring to table 2, for a test set including 6762 images, in order to simulate a real online environment, the test set includes 181 tampered images and 6581 non-tampered images, and the recall rate of the related scheme expressed on the test set is 93.92%, whereas the recall rate of the technical scheme provided by the embodiment of the present application expressed on the test set reaches 94.48%, and the accuracy of detecting a tampered region is improved. Wherein the recall rate is the ratio of the number of correctly predicted tampered images to the total number of tampered images. In order to avoid false detection of a tampered image, manual review is usually performed on the tampered image determined by the computer device, the avoidance rate is a ratio of the total number of non-tampered images determined by the computer device to the total number of images to be detected, the avoidance rate of the related scheme expressed on the test set is 44.69%, and the avoidance rate of the technical scheme provided by the embodiment of the present application expressed on the test set is 70.50%. Based on the technical scheme provided by the embodiment of the application, only 29.5% of images need to be audited manually, so that the manual audit cost is reduced, and the manual audit efficiency is improved.

TABLE 2

	Recall rate	Rate of exemption from examination
			Correlation scheme	93.92％	44.69％
This application	94.48％	70.50％

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

Fig. 8 is a block diagram of an image processing apparatus according to an embodiment of the present application. Referring to fig. 8, the apparatus includes:

an attention feature determination module 801, configured to determine a first attention feature based on a first image feature of a first candidate region in an image, where the first attention feature is used to indicate importance of different sub-features in the first image feature for determining a tampered region;

a candidate region determination module 802 for determining a second candidate region in the image based on the first attention feature and the first candidate region;

a target region determining module 803, configured to determine a target region in the image based on the second image feature of the second candidate region and the second candidate region, where the target region is used to indicate a tampered region.

According to the image processing device provided by the embodiment of the application, the attention feature corresponding to the initial candidate region is determined, so that the feature needing important attention in the process of determining the tampered region is more prominent, the initial candidate region is preliminarily adjusted based on the attention feature, the candidate region representing the tampered region is more accurately obtained, the candidate region after preliminary adjustment is further adjusted based on the image feature of the candidate region after preliminary adjustment, the finally determined tampered region is obtained through progressive region adjustment step by step, and the accuracy of positioning the tampered region is improved.

the attention feature determining module 801 is configured to process the first image feature through a first attention model in an area determination network to obtain the first attention feature, where the area determination network is obtained by training based on a sample image labeled with a tampered area.

In another possible implementation, the attention feature determination module 801 is configured to:

determining attention weights for sub-features of each channel in the first image feature based on the first image feature;

determining an attention weight of the sub-feature for each location in the channel attention feature based on the channel attention feature;

In another possible implementation manner, the candidate region determining module 802 is configured to:

In another possible implementation manner, the apparatus further includes:

the candidate region determining module 802 is configured to, if the tampering classification information is used to indicate that the first candidate region is a tampered region, process the first attention feature and the first candidate region through the first location regression model to obtain the second candidate region.

In another possible implementation manner, the target area determining module 803 is configured to:

processing the second image characteristic through a second attention model in the area determination network to obtain a second attention characteristic;

a target region in the image is determined based on a third image feature of the third candidate region and the third candidate region.

It should be noted that: in the image processing apparatus provided in the above embodiment, only the division of the functional modules is illustrated when performing image processing, and in practical applications, the functions may be distributed by different functional modules as needed, that is, the internal structure of the computer device may be divided into different functional modules to complete all or part of the functions described above. In addition, the image processing apparatus and the image method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

Alternatively, the computer device is configured as a terminal, and the above-described image processing method is executed by the terminal. Fig. 9 shows a block diagram of a terminal 900 according to an exemplary embodiment of the present application. The terminal 900 may be: a desktop computer, a laptop computer, a tablet computer, or a smartphone. Terminal 900 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

In general, terminal 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in a wake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 902 is used to store at least one program code for execution by the processor 901 to implement the image processing method provided by the method embodiments in the present application.

In some embodiments, terminal 900 can also optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 904, display screen 905, camera assembly 906, and power supply 907.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, disposed on the front panel of the terminal 900; in other embodiments, the number of the display panels 905 may be at least two, and each of the display panels is disposed on a different surface of the terminal 900 or is in a foldable design; in other embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the terminal 900. Even more, the display 905 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Typically, the front camera is disposed at the front panel of the terminal 900 and the rear camera is disposed at the rear of the terminal 900. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Power supply 907 is used to provide power to the various components in terminal 900. Power source 907 can be alternating current, direct current, disposable or rechargeable batteries. When power supply 907 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal 900, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

Alternatively, the computer device is configured as a server, and the above-described image processing method is executed by the server. Fig. 10 is a block diagram of a server 1000 according to an embodiment of the present application, where the server 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where at least one program code is stored in the memory 1002, and the at least one program code is loaded and executed by the processors 1001 to implement the image Processing methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a computer-readable storage medium having at least one program code stored therein, the at least one program code being executable by a processor in a computer device to perform the image processing method in the above-described embodiments. For example, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present application also provides a computer program product or a computer program comprising computer program code stored in a computer readable storage medium, which is read by a processor of a computer device from the computer readable storage medium, and which is executed by the processor to cause the computer device to execute the image processing method in the above method embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

processing the global features of the image through a regional suggestion network to obtain a plurality of first candidate regions;

the region determination network comprises a first attention model, a first classification model and a first position regression model, wherein a first image feature of each first candidate region in the first candidate regions in the image is processed through the first attention model, and a first attention feature is determined, and the first attention feature is used for representing the importance degree of different sub-features in the first image feature for determining the tampered region;

processing the first attention characteristic through the first classification model to obtain tampering classification information, wherein the tampering classification information is used for indicating whether the first candidate area is a tampering area or not;

if the tampering classification information is used for indicating that the first candidate area is a tampered area, processing the first attention feature and the first candidate area through the first position regression model, and determining a second candidate area in the image;

the area determination network further comprises a second attention model, a second classification model and a second location regression model;

processing a second image feature of the second candidate region through the second attention model to obtain a second attention feature;

processing the second attention feature through the second classification model to determine whether the second candidate region is a tampered region;

if the second candidate area is a tampered area, processing the second attention feature and the second candidate area through the second position regression model to obtain a target area;

the area determination network is obtained by training based on a sample image marked with a tampered area, the first position regression model is used for processing any feature and any candidate area to obtain a processed candidate area, the coincidence degree of the processed candidate area and the real tampered area is larger than that of the candidate area before processing and the real tampered area, the second position regression model is used for processing any feature and any candidate area to obtain a processed candidate area, and the coincidence degree of the processed candidate area and the real tampered area is larger than that of the candidate area before processing and the real tampered area.

2. The method of claim 1, wherein a network structure of the first attention model is different from a network structure of the second attention model.

3. The method of claim 1, wherein the processing, by the first attention model, the first image feature of each of the plurality of first candidate regions in the image to determine a first attention feature comprises:

processing the first image feature through the first attention model, and determining attention weights of sub-features of each channel in the first image feature;

4. The method of claim 1, wherein the processing the second attention feature and the second candidate region through the second location regression model to obtain the target region comprises:

processing the second attention feature and the second candidate region through the second position regression model to determine a third candidate region in the image;

5. An image processing apparatus, characterized in that the apparatus comprises:

means for performing the steps of: processing the global features of the image through a regional suggestion network to obtain a plurality of first candidate regions;

the region determination network comprises a first attention model, a first classification model and a first position regression model, and an attention feature determination module, wherein the attention feature determination module is used for processing a first image feature of each first candidate region in a plurality of first candidate regions in the image through the first attention model to determine a first attention feature, and the first attention feature is used for representing the importance degree of different sub-features in the first image feature for determining a tampered region;

a tampering classification information determining module, configured to process the first attention feature through the first classification model to obtain tampering classification information, where the tampering classification information is used to indicate whether the first candidate region is a tampering region;

a candidate region determining module, configured to, if the tampering classification information is used to indicate that the first candidate region is a tampered region, process the first attention feature and the first candidate region through the first location regression model, and determine a second candidate region in the image;

the target area determining module is used for processing the second image characteristics of the second candidate area through the second attention model to obtain second attention characteristics;

the apparatus also includes means for performing the steps of: processing the second attention feature through the second classification model to determine whether the second candidate region is a tampered region;

the target area determining module is configured to, if the second candidate area is a tampered area, process the second attention feature and the second candidate area through the second location regression model to obtain the target area;

6. A computer device, characterized in that it comprises a processor and a memory, in which at least one program code is stored, which is loaded and executed by the processor to implement the image processing method according to any one of claims 1 to 4.

7. A computer-readable storage medium, characterized in that at least one program code is stored therein, which is loaded and executed by a processor, to implement the image processing method of any one of claims 1 to 4.