CN110570490B

CN110570490B - Saliency image generation method and equipment

Info

Publication number: CN110570490B
Application number: CN201910840719.9A
Authority: CN
Inventors: 李甲; 李道伟; 付奎; 赵一凡; 赵沁平
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2019-09-06
Filing date: 2019-09-06
Publication date: 2021-07-30
Anticipated expiration: 2039-09-06
Also published as: CN110570490A

Abstract

According to the method and the device for generating the saliency image, the single sample image of the target domain in the twin network is selected as the first image and any image in the image data set of the source domain in the twin network is selected as the second image, the first feature extraction module, the second feature extraction module, the first saliency map generation module, the second saliency map generation module and the judgment module are trained through the first image and the second image, so that the generated first saliency map and the generated second saliency map meet the saliency prediction standard, the first feature extraction module and the first saliency map generation module form a saliency image generation model, the saliency image of the target image is generated after the target image is input into the saliency image generation model, and the efficiency of generating the saliency map of the target image is improved.

Description

Saliency image generation method and equipment

Technical Field

The embodiment of the invention relates to the field of computer vision, in particular to a method and equipment for generating a saliency image.

Background

The image saliency refers to the fact that in the field of computer vision, the vision attention mechanism of human eyes is simulated through an intelligent algorithm, and the region containing important information in an image, namely the region interesting to human or a salient feature is extracted. The visual saliency detection algorithm extracts and codes the features in the image by simulating a human visual attention mechanism, calculates important components of information in a visual field, and finally integrates and positions the features and the information to give a final saliency map.

The traditional algorithm is a visual attention model based on saliency, and the model extracts primary visual features in an image, including color, brightness, orientation, different-scale midrange ratios and other features, and fuses the features to obtain a final saliency map. Compared with the traditional algorithm, the deep neural network has richer feature representation methods and a mode of training based on multiple saliency data sets, but the phenomenon that the image saliency performance is reduced on different images exists. In order to solve the phenomenon that the image significance performance of the deep neural network is reduced on different images, marking can be carried out on a new image domain, namely, a new training data set is made to retrain the network; or the deep network is finely adjusted by a small amount of data to meet the requirement of significance prediction on new images.

However, because the training of the deep neural network requires a large amount of data, that is, a large amount of manpower and material resources are required to prepare new data again in the significance prediction task, how to migrate the significance prediction model trained on the existing data set to a new image domain and generate a better prediction significance map becomes an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a saliency image generation method and device, a saliency prediction model is trained through a source domain image data set and a target domain single sample image, a saliency image is generated according to the trained model, and the efficiency of generating a target image saliency map is improved.

In a first aspect, an embodiment of the present invention provides a saliency image generation method, which is applied to a twin network, and includes:

selecting a single sample image of a target domain in the twin network as a first image, and selecting any image in a source domain image data set in the twin network as a second image;

the training process of the first characteristic extraction module and the second characteristic extraction module is as follows: extracting feature information of a first image from the first image as first feature information by a first feature extraction module in a target domain in the twin network, extracting feature information of a second image as second feature information by a second feature extraction module in a source domain in the twin network, and hierarchically fusing the first feature information into the second feature information to generate third feature information;

the training process of the first saliency map generation module and the second saliency map generation module is as follows: generating, by a first saliency map generation module in a target domain in the twin network, a first saliency map from the first feature information; generating, by a second saliency map generation module in a source domain in the twin network, a second saliency map from the third feature information;

the training process of the discrimination module is as follows: distinguishing the first saliency map through the distinguishing module, distinguishing the second saliency map through the distinguishing module, and judging whether the generated first saliency map and the generated second saliency map meet saliency prediction standards;

if the first saliency map and the second saliency map do not accord with the saliency prediction standard, taking a single sample image of a target domain as a first image again, selecting any image in the source domain image data set in the twin network as a second image, and repeating the training processes of the first feature extraction module, the second feature extraction module, the first saliency map generation module, the second saliency map generation module and the discrimination module until the generated first saliency map and the generated second saliency map accord with the saliency prediction standard;

if the first saliency map and the second saliency map meet the saliency prediction standard, a first feature extraction module and a first saliency map generation module form a saliency image generation model;

and after the target image is input into the saliency image generation model, generating a saliency image of the target image.

In one possible design, the first saliency prediction map comprises first label data representing saliency features of a first image; the second saliency prediction map comprises second label data representing saliency features of a second image;

the determining, by the determining module, whether the generated first saliency map and the generated second saliency map meet a saliency prediction criterion by determining the first saliency map and the second saliency map while determining the second saliency map by the determining module includes:

the distinguishing module judges whether the first saliency map accords with the saliency prediction standard or not according to first label data in the first saliency prediction map, and if the distinguishing module judges that the first saliency map can correctly map the saliency features marked by the first label data, the first saliency map is considered to accord with the saliency prediction standard; if the judging module judges that the first significance map can not correctly map the significance characteristics of the first label data mark, the first significance map is considered to be not in accordance with the significance prediction standard;

the distinguishing module judges whether the second saliency map conforms to the saliency prediction standard or not according to second label data in the second saliency prediction map, and if the distinguishing module judges that the second saliency map can correctly map the saliency features marked by the second label data, the second saliency map is considered to conform to the saliency prediction standard; and if the judging module judges that the second significance map cannot correctly map the significant features of the second label data mark, the second significance map is considered to be not in accordance with the significance prediction standard.

In one possible design, the identifying, by a discrimination module, the second saliency map includes:

sending the first saliency map to the judgment module, judging whether the salient features of the first saliency map the salient features of the first label data mark, and if the judgment module judges that the salient features of the first saliency map cannot map the salient features of the first label data mark correctly, performing the following operations:

and the distinguishing module updates the distinguishing weight of the distinguishing module according to the deviation of the generated distinguishing features of the first saliency map and the first label data, and repeats the training processes of the first feature extraction module, the second feature extraction module, the first saliency map generation module, the second saliency map generation module and the distinguishing module at the same time until the distinguishing module judges that the distinguishing features of the first saliency map correctly map the distinguishing features marked by the first label data.

sending the second saliency map to the discrimination module, wherein the discrimination module judges whether the salient features of the second saliency map the salient features of the second label data label, and if the discrimination module judges that the salient features of the second saliency map cannot correctly map the salient features of the second label data label, the discrimination module performs the following operations:

and the judging module updates the judging weight of the judging module according to the deviation of the generated remarkable features of the second remarkable diagram and the second label data, and repeats the training processes of the first feature extraction module, the second feature extraction module, the first remarkable diagram generating module, the second remarkable diagram generating module and the judging module at the same time until the judging module judges that the remarkable features of the second remarkable diagram are correctly mapped to the remarkable features marked by the second label data.

In one possible design, random noise information is added in the process of generating the first feature information into a first significance map through a first significance map generating module in a target domain in the twin network;

adding random noise information in the process of generating the second significance map by a second significance map generating module in a source domain in the twin network through the third feature information.

In a possible design, if the first distinguishing module judges that the first saliency map can correctly map the saliency features of the first label data mark, the first saliency map is considered to conform to a saliency prediction standard; if the first judging module judges that the first significance map can not correctly map the significant features of the first label data mark, the first significance map is considered not to meet the significance prediction standard, and the method comprises the following steps:

the first distinguishing module calculates a deviation function value between the first significance map and the first label data, and if the deviation function value is smaller than or equal to a preset deviation threshold value, the first significance map is considered to be in accordance with significance prediction standards; if the deviation function value is larger than a preset deviation threshold value, the first significance map is considered to be not in accordance with the significance prediction standard;

if the second judging module judges that the second significance map can correctly map the significance characteristics of the second label data mark, the second significance map is considered to be in accordance with the significance prediction standard; if the second judging module judges that the second significance map cannot correctly map the significant features of the second tag data markers, the second significance map is considered to be not in accordance with the significance prediction standard, and the method comprises the following steps:

the second judging module calculates a deviation function value between the second significance map and the second label data, and if the deviation function value is smaller than or equal to a preset deviation threshold value, the second significance map is considered to be in accordance with the significance prediction standard; and if the deviation function value is larger than the preset deviation threshold value, the second significance map is considered not to meet the significance prediction standard.

In a possible design, if the first saliency map and the second saliency map do not meet the saliency prediction criterion, the step of taking a single sample image of a target domain as a first image, selecting any one image from a source domain image dataset in the twin network as a second image, and repeating the training process of the first feature extraction module, the second feature extraction module, the first saliency map generation module, the second saliency map generation module, and the discrimination module until the generated first saliency map and the generated second saliency map meet the saliency prediction criterion includes:

calculating deviation function values of the first feature extraction module, the second feature extraction module, the first saliency map generation module and the second saliency map generation module respectively;

calculating a gradient of each module through the deviation function values of the first feature extraction module, the second feature extraction module, the first saliency map generation module and the second saliency map generation module;

updating the weights of the modules according to the gradients of the first feature extraction module, the second feature extraction module, the first saliency map generation module and the second saliency map generation module;

according to the module after updating the module weight, carrying out the next training process of each module until the judgment module judges that the generated first significance map and the second significance map conform to the significance prediction standard

In one possible design, the first feature extraction module includes a first extraction sub-block, a second extraction sub-block, a third extraction sub-block, and a fourth extraction sub-block; the second feature extraction module comprises a fifth extraction sub-block, a sixth extraction sub-block, a seventh extraction sub-block and an eighth extraction sub-block;

the extracting, by a first feature extraction module in a target domain in the twin network, feature information of a first image from the first image as first feature information, extracting, by a second feature extraction module in a source domain in the twin network, feature information of a second image as second feature information, and hierarchically fusing the first feature information into the second feature information to generate third feature information includes:

the first image obtains a first sub-feature through a first extraction sub-block, the first sub-feature obtains a second sub-feature through a second extraction sub-block, the second sub-feature obtains a third sub-feature through a third extraction sub-block, the third sub-feature obtains a fourth sub-feature through a fourth extraction sub-block, and the fourth sub-feature is the first feature information;

the second image is subjected to a fifth sub-feature extraction through a fifth extraction sub-block, the fifth sub-feature is fused with the first sub-feature and then subjected to a sixth sub-feature extraction through a sixth extraction sub-block, the sixth sub-feature is fused with the second sub-feature to obtain a seventh sub-feature, the seventh sub-feature is fused with the third sub-feature to obtain an eighth sub-feature, wherein the eighth sub-feature is second feature information, and the eighth sub-feature is fused with the fourth sub-feature to obtain third feature information.

In one possible design, the label data for a single sample image of the target domain in the twin network shows similar salient features to the label data for all images in the test set of target images in the twin network.

In a second aspect, an embodiment of the present invention provides a terminal device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the saliency image generation method as described above in the first aspect and various possible designs of the first aspect.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the saliency image generation method according to the first aspect and various possible designs of the first aspect is implemented.

According to the saliency image generation method and the saliency image generation device provided by the embodiment of the invention, a single sample image of a target domain in a twin network is selected as a first image, any image in an image data set of a source domain in the twin network is selected as a second image, a first feature extraction module, a second feature extraction module, a first saliency map generation module, a second saliency map generation module and a judgment module are trained through the first image and the second image, so that the generated first saliency map and the generated second saliency map conform to a saliency prediction standard, the first feature extraction module and the first saliency map generation module form a saliency image generation model, and the saliency image of the target image is generated after the target image is input into the saliency image generation model, so that the efficiency of generating the saliency map of the target image is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic block diagram of a saliency image generation method according to an embodiment of the present invention;

fig. 2 is a first flowchart of a saliency image generation method according to an embodiment of the present invention;

FIG. 3 is a block diagram illustrating a saliency image generation method according to another embodiment of the present invention;

fig. 4 is a flowchart illustrating a second method for generating a saliency image according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic diagram of a saliency image generation model according to an embodiment of the present invention, and as shown in fig. 1, a saliency image generation model 10 includes: the system comprises a first feature extraction module 11, a second feature extraction module 12, a first saliency map generation module 13, a second saliency map generation module 14 and a judgment module 15. The first feature extraction module 11 is connected to the first saliency map generation module 13, the second feature extraction module 12 is connected to the second saliency map generation module 14, and the first saliency map generation module 13 and the second saliency map generation module 14 are respectively connected to the determination module 15.

The saliency image generation model provided by the embodiment of the invention is constructed according to a convolutional neural network, wherein a first feature extraction module 11 and a second feature extraction module 12 are used for extracting feature information of an image; the first saliency map generation module 13 and the second saliency map generation module 14 are used for generating saliency images from the images with the characteristic information; the judging module 15 is used for judging whether the generated saliency image meets the saliency prediction standard.

Fig. 2 is a schematic flow chart of a saliency image generation method provided by an embodiment of the present invention, and based on the saliency image generation model 10 provided in fig. 1, as shown in fig. 2, the method includes:

s201, selecting a single sample image of a target domain in the twin network as a first image, and selecting any image in a source domain image data set in the twin network as a second image.

The twin network provided by the embodiment of the invention is a two-way network structure provided based on a convolutional neural network structure, and a first way and a second way of the two-way network are a single sample image of a target domain in the twin network and a source domain image data set in the twin network respectively. And selecting a single sample image of the target domain in the first path as a first image, and selecting any image in the source domain image data set in the second path as a second image.

Selecting a target field single sample image (x)₀，y₀) As a first image, an image of a source domain image dataset is selected

Label (R)

After being used as a second image, the two images are reorganized into a data format required by model training. And aiming at the twin network structure and the feature distribution space, the first image and the second image are combined into an image pair.

In a possible implementation manner, the saliency image generation method provided by the embodiment of the present invention is applied to academic researches such as solar image analysis, solar activity prediction, spatial weather forecast, and the like, and a single sample image of a target domain in a first path may be a solar image, and a source domain image data set in a second path is a ground natural image set. The model and the method provided by the embodiment of the invention can generate the image capable of reflecting the saliency characteristics of the sun.

The training process of S202, the first feature extraction module 11 and the second feature extraction module 12 is as follows: feature information of the first image is extracted from the first image as first feature information by a first feature extraction module 11 in a target domain in the twin network, feature information of the second image is extracted as second feature information by a second feature extraction module 12 in a source domain in the twin network, and the first feature information is hierarchically fused into the second feature information to generate third feature information.

As shown in fig. 1, the first image is an input of the first feature extraction module 11, and an output of the first feature extraction module 11 is first feature information, where the first feature information is an image feature displayed by the first image; the second image is an input of the second feature extraction module 12, and an output of the second feature extraction module 12 is second feature information, where the second feature information is an image feature displayed by the second image. And generating third feature information after the first feature information is hierarchically fused with the second feature information, wherein the third feature information is an image feature reflecting the first feature information in the second feature information.

The training process of S203, the first saliency map generation module 13, and the second saliency map generation module 14 is: generating a first saliency map from the first feature information by a first saliency map generation module 13 in the target domain in the twin network; the third feature information is generated into a second saliency map by a second saliency map generation module 14 in the source domain in the twin network.

As shown in fig. 1, the first feature information is an input of the first saliency map generation module 13, and an output of the first saliency map generation module 13 is a first saliency map, where the first saliency map reflects an important feature displayed by the first feature information. The third feature information is input to the second saliency map generation module 14, and the output of the second saliency map generation module 14 is a second saliency map, where the second saliency map reflects an important feature displayed by the third feature information.

For source domain image set

(x_iRepresenting an image in a source data set, y_iIs represented by the formula_iCorresponding saliency map labels) and target image datasets

(x_iRepresenting an image in a target dataset, y_iIs represented by the formula_iCorresponding saliency map labels) in order to find a mapping f (x) that maps the information of the source domain image and the single target image to the same distribution space R_dI.e., X → R. The information is then mapped into respective saliency maps, R, by means of a generation module g (x)_d→ Y. The mapping process can be expressed as:

H(x)＝G(λ·F(x_i)+(1-λ)·F(x₀))，x＝(x_i，x₀)

where h (x) represents the mapping of the input image to the saliency map, and λ represents the degree of fusion of the target domain single image information to the source domain image information. In the training process of the convolutional neural network, lambda is learnable to improve the information fusion efficiency.

S204, the training process of the discrimination module 15 is as follows: the first saliency map is distinguished through the distinguishing module 15, the second saliency map is distinguished through the distinguishing module 15, and whether the generated first saliency map and the generated second saliency map meet the saliency prediction standard or not is judged.

As shown in fig. 1, the first saliency map and the second saliency map are used as input of the determination module 15, and the determination module 15 determines whether the first saliency map and the second saliency map meet the saliency prediction criterion.

S205, if the first saliency map and the second saliency map do not accord with the saliency prediction standard, taking the single sample image of the target domain as the first image again, selecting any image in the source domain image data set in the twin network as the second image, and repeating the training processes of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, the second saliency map generation module 14 and the discrimination module 15 until the generated first saliency map and the generated second saliency map accord with the saliency prediction standard.

If the first saliency map and the second saliency map do not meet the saliency prediction standard, taking the single sample image of the target domain as the first image again, selecting any image in the source domain image data set in the twin network as the second image, and repeating the processes from S202 to S204 provided by the embodiment of the invention. The reorganized training data image is

The label is

And S206, if the first significance map and the second significance map meet the significance prediction standard, combining the first feature extraction module 11 and the first significance map generation module 13 into a significance image generation model 10.

In the embodiment of the present invention, the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, the second saliency map generation module 14, and the discrimination module 15 are repeatedly trained to finally make the generated first saliency map and the generated second saliency map conform to the saliency prediction standard, and the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, the second saliency map generation module 14, and the discrimination module 15 form the saliency image generation model 10.

S207, the target image is input to the saliency image generation model 10, and then the saliency image of the target image is generated.

For a sample image, define x_0g＝[F(x₀)，z]As a target road generation moduleWherein F (x)₀) And the output of the target path generation module is represented as follows:

y_0p＝G(x_0g)＝G([F(x₀)，z])

for source domain images, define x_ig＝[λF(x_i)+(1-λ)F(x₀)，z]As input to the source domain way generation module, where F (x)_i) For the output of the path feature extraction module, the output of the source domain path generation module is expressed as:

y_ip＝G(x_ig)＝G([λF(x_i)+(1-λ)F(x₀)，z])

wherein z is random noise and z ∈ R_d，z∈N(0，1)。

According to the saliency image generation method provided by the embodiment of the invention, a single sample image of a target domain in a twin network is selected as a first image, any image in an image data set of a source domain in the twin network is selected as a second image, a first feature extraction module, a second feature extraction module, a first saliency map generation module, a second saliency map generation module and a discrimination module are trained through the first image and the second image, so that the generated first saliency map and the generated second saliency map meet a saliency prediction standard, the first feature extraction module and the first saliency map generation module form a saliency image generation model, and the saliency image of the target image is generated after the target image is input into the saliency image generation model, so that the efficiency of generating the saliency map of the target image is improved.

Fig. 3 is a schematic block diagram of a saliency image generation method according to another embodiment of the present invention, as shown in fig. 3, in this embodiment, based on the embodiment in fig. 1, a first feature extraction module 11 includes: a first extraction sub-block 111, a second extraction sub-block 112, a third extraction sub-block 113, and a fourth extraction sub-block 114; the second feature extraction module 12 includes: a fifth extraction sub-block 121, a sixth extraction sub-block 122, a seventh extraction sub-block 123, and an eighth extraction sub-block 124.

The first extraction sub-block 111 is connected to the second extraction sub-block 112, the second extraction sub-block 112 is connected to the third extraction sub-block 113, and the third extraction sub-block 113 is connected to the fourth extraction sub-block 114. The first image obtains a first sub-feature through a first extraction sub-block 111, the first sub-feature obtains a second sub-feature through a second extraction sub-block 112, the second sub-feature obtains a third sub-feature through a third extraction sub-block 113, the third sub-feature obtains a fourth sub-feature through a fourth extraction sub-block 114, and the fourth sub-feature is first feature information.

The fifth extraction sub-block 121 is connected to the sixth extraction sub-block 122, the sixth extraction sub-block 122 is connected to the seventh extraction sub-block 123, and the seventh extraction sub-block 123 is connected to the eighth extraction sub-block 124. The second image obtains a fifth sub-feature through a fifth extraction sub-block 121, the fifth sub-feature is fused with the first sub-feature and then obtains a sixth sub-feature through a sixth extraction sub-block 122, the sixth sub-feature is fused with the second sub-feature and then obtains a seventh sub-feature through a seventh extraction sub-block 123, the seventh sub-feature is fused with the third sub-feature and then obtains an eighth sub-feature through an eighth extraction sub-block 124, wherein the eighth sub-feature is second feature information, and the eighth sub-feature is fused with the fourth sub-feature and then obtains third feature information.

An embodiment of the present invention provides a first feature extraction module, including: a first extraction subblock, a second extraction subblock, a third extraction subblock and a fourth extraction subblock; the second feature extraction module includes: a fifth extraction sub-block, a sixth extraction sub-block, a seventh extraction sub-block, and an eighth extraction sub-block. According to the common feature space in the source domain image and the target domain single sample image, the features of the first image are fused into the features of the second image, so that the generated third feature image can display the features of the first image, the features of the target domain single sample image are fully considered, and the features of the target domain single sample image can be fully extracted for significance prediction.

Fig. 4 is a second flowchart of the method for generating a saliency image according to the embodiment of the present invention, where on the basis of the method for generating a saliency image shown in fig. 2, the process of repeating the training of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, the second saliency map generation module 14, and the discrimination module 15 includes:

s401, calculating deviation function values of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13 and the second saliency map generation module 14 respectively.

And S402, calculating the gradient of each module through the deviation function values of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13 and the second saliency map generation module 14.

And S403, updating the weights of the modules according to the gradients of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13 and the second saliency map generation module 14.

The updating process of the discrimination module D is as follows: given two image pairs (x) of a first image and a second image_i，x₀) The two image pairs of the first saliency map and the second saliency map generated by the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, and the second saliency map generation module 14 are (x)_ig，x_0g) The judging module 15 outputs two distributions D (x) according to the judging result_i) And D (x)₀) Thus producing two-class cross-entropy losses L_ds，sAnd L_at，t(where s represents source domain data and t represents target domain data). Therefore, the whole discrimination module 15 is updated using the following penalty function:

update of the second saliency map generation module 14: in order to ensure the correctness of the saliency map generated by the second saliency map generation module 14, a strong supervision MSE Loss (mean square error Loss) is applied to the second saliency map before it is sent to the decision module 15. Conversely, because there is only one labeled sample on the target road, no strong supervision is used to prevent the network from overfitting. In addition, in the classical generation countermeasure network, the second saliency map generation module 14 plays a role of spoofing the generated second saliency map into the judgment of the discrimination module 15. Thus, the penalty for the second saliency map generation module 14 comes from two parts: the strong supervision loss MSE and judgment module 15 specifically comprises the following components:

updating of the second feature extraction module 12: similarly to the second saliency map generation module 14, the updating of the weights of the second feature extraction module 12 also uses the strongly supervised loss function and the loss function from the discrimination module 15. To ensure that the target way information is more fully fused to the source domain way, different weights α and β are added to the two losses from the discrimination module. Therefore, the penalty function of the second feature extraction module 12 is configured to:

in the invention, multiple experiments prove that the performance of the migration model can be optimized when alpha is 0.2 and beta is 0.8.

S404, according to the module after updating the module weight, performing the next training process of each module until the judging module 15 judges that the generated first significance map and the second significance map meet the significance prediction standard.

According to the embodiment of the invention, the image pair is input into the first feature extraction module 11, the second feature extraction module 12, the first significant image generation module 13, the second significant image generation module 14 and the judgment module 15 in sequence for training, and the weight values of the image pair are updated according to the loss value in a back propagation mode, so that the significance map generated by the significance map generation model established by the first feature extraction module 11 and the first significant image generation module 13 can be more and more close to the label data of the target domain single-sample image in repeated training of the first feature extraction module 11, the second feature extraction module 12, the first significant image generation module 13, the second significant image generation module 14 and the judgment module 15.

In one possible design, the training process of the discrimination module 15 specifically includes:

the first saliency prediction map comprises first label data for representing saliency features of the first image; the second saliency prediction map comprises second label data, and the second label data is used for representing the saliency feature of the second image;

the distinguishing module 15 distinguishes the first saliency map, and the distinguishing module 15 distinguishes the second saliency map, and judges whether the generated first saliency map and the generated second saliency map meet the saliency prediction standard, including:

the judging module 15 judges whether the first saliency map meets the saliency prediction standard according to the first label data in the first saliency prediction map, and if the judging module judges that the first saliency map can correctly map the saliency features marked by the first label data, the judging module considers that the first saliency map meets the saliency prediction standard; if the judging module 15 judges that the first saliency map cannot correctly map the saliency features of the first label data markers, the first saliency map is considered to be not in accordance with the saliency prediction standard;

further, the first distinguishing module calculates a deviation function value between the first significance map and the first label data, and if the deviation function value is smaller than or equal to a preset deviation threshold value, the first significance map is considered to be in accordance with the significance prediction standard; and if the deviation function value is larger than the preset deviation threshold value, the first significance map is considered not to meet the significance prediction standard.

In the embodiment of the invention, a first distinguishing module is used for calculating a deviation function value between the first significance map and the first label data, and comparing the relation between the deviation function value and a preset deviation threshold value to judge whether the first significance map meets the significance prediction standard or not. Whether the first saliency map meets the saliency prediction standard or not is judged by setting a preset deviation threshold, so that the first saliency map can be closer to the first label data, namely important region characteristics of the first image are better reflected.

The judging module 15 judges whether the second saliency map meets the saliency prediction standard according to the second label data in the second saliency prediction map, and if the judging module 15 judges that the second saliency map can correctly map the saliency features marked by the second label data, the second saliency map is considered to meet the saliency prediction standard; and if the judging module judges that the second significance map cannot correctly map the significant features of the second label data mark, the second significance map is considered to be not in accordance with the significance prediction standard.

Further, the second judging module calculates a deviation function value between the second significance map and the second label data, and if the deviation function value is smaller than or equal to a preset deviation threshold, the second significance map is considered to be in accordance with the significance prediction standard; and if the deviation function value is larger than the preset deviation threshold value, the second significance map is considered not to meet the significance prediction standard.

In the embodiment of the invention, a second distinguishing module is used for calculating a deviation function value between the second significance map and the second label data, and comparing the relation between the deviation function value and a preset deviation threshold value to judge whether the second significance map meets the significance prediction standard. Whether the second saliency map meets the saliency prediction standard or not is judged by setting a preset deviation threshold, so that the second saliency map can be closer to the second label data, namely the important area features of the second image are better reflected.

In one possible design, the process of identifying the first saliency map by the identification module 15 includes:

for the first image: sending the first saliency map to a judging module, judging whether the salient features of the first saliency map the salient features of the first label data mark or not by the judging module, and if the salient features of the first saliency map cannot be correctly mapped, performing the following operations:

the judging module 15 updates the judging weight of the judging module according to the deviation between the generated salient features of the first saliency map and the first label data, and repeats the training processes of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, the second saliency map generation module 14 and the judging module 15 at the same time until the judging module 15 judges that the salient features of the first saliency map correctly map the salient features marked by the first label data.

For the second image: sending the second saliency map to the discrimination module 15, the discrimination module 15 determining whether the salient features of the second saliency map the salient features of the second label data label, and if the discrimination module 15 determines that the salient features of the second saliency map cannot be correctly mapped, performing the following operations:

the judging module 15 updates the judging weight of the judging module 15 according to the deviation between the generated salient features of the second saliency map and the second label data, and repeats the training processes of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, the second saliency map generation module 14 and the judging module 15 at the same time until the judging module 15 judges that the salient features of the second saliency map correctly map the salient features of the second label data labels.

In the embodiment of the invention, the discrimination weight of the discrimination module is updated according to the deviation between the salient features of the generated first saliency map and the first label data and the deviation between the salient features of the generated second saliency map and the second label data, so that the discrimination module can meet the requirement of saliency discrimination in the repeated training process.

In one possible design, random noise information is added in the process of generating the first saliency map from the first saliency map generation module 13 of the target domain in the twin network. Random noise information is added in the process of generating the second saliency map from the third feature information by the second saliency map generation module 14 in the source domain in the twin network.

In the embodiment of the present invention, the same random noise information is added in the training process of the first saliency map generation module 13 and the second saliency map generation module 14, so that the same influence factors in the two training processes can be ensured, and the influence factors on the two generated saliency maps are consistent.

In a possible design, the same parameters are added in the repeated training process of the first feature extraction module 11, the second feature extraction module 12, the first saliency map generation module 13, the second saliency map generation module 14 and the discrimination module 15, so that the generated saliency map is prevented from being inaccurate due to overfitting of the training process of a certain single module.

In one possible design, the label data for a single sample image of the target domain in the twin network shows similar salient features to the label data for all images in the target image test set in the twin network.

The significance map generation method provided by the embodiment of the invention is applied to the research of generating the significance of the sun, the target image test set in the twin network is a sun image sample set, and the target image requirements for generating the significance map selected from the target image test set in the twin network are as follows:

(1) the method comprises the following steps of requiring as much solar activity as possible to occur on the surface of a solar image in a single sample image of a target domain, and enabling the surface of the solar image to be as complex as possible, wherein the surface of the solar image comprises an activity area map under various complex conditions;

(2) it is required that the labels of the single sample image of the target domain are as similar as possible to the distribution of the selected labels showing the sun in the test set of target images in the twin network, i.e. that the image is selected to be as small as possible from the labels of the test set of target images in the twin network.

In the embodiment of the invention, the distribution of the label of the single sample image of the target domain and the label which is selected in the target image test set in the twin network and displays the sun is required to be similar as much as possible, so that the significance map generated by the significance map generation model provided by the embodiment of the invention can be ensured to reflect the important information of the target image more accurately.

Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present invention. As shown in fig. 5, the terminal device 50 of the present embodiment includes: a processor 51 and a memory 52; wherein

A memory 52 for storing computer-executable instructions;

the processor 51 is configured to execute the computer-executable instructions stored in the memory to implement the steps performed by the receiving device in the above embodiments. Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 52 may be separate or integrated with the processor 51.

When the memory 52 is provided separately, the terminal device further includes a bus 53 for connecting the memory 52 and the processor 51.

The embodiment of the present invention further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for generating a saliency image as described above is implemented.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (enhanced Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A saliency image generation method applied to a twin network, comprising:

2. The saliency image generation method according to claim 1, characterized in that said first saliency prediction map includes first label data representing a saliency feature of a first image; the second saliency prediction map comprises second label data representing saliency features of a second image;

3. The saliency image generation method according to claim 2, characterized in that said distinguishing of the first saliency map by a discrimination module comprises:

4. The saliency image generation method according to claim 2, characterized in that said discriminating, by a discrimination module, the second saliency map comprises:

5. The saliency image generation method according to claim 1, characterized in that random noise information is added in generating the first saliency map from the first saliency map information by a first saliency map generation module in a target domain in the twin network;

6. The saliency image generation method according to claim 2, characterized in that if the discrimination module determines that the first saliency map can correctly map the saliency features of the first label data labels, the first saliency map is considered to conform to the saliency prediction criterion; if the judging module judges that the first significance map can not correctly map the significant features of the first label data mark, the judging module considers that the first significance map does not accord with the significance prediction standard, and the method comprises the following steps:

if the judging module judges that the second significance map can correctly map the significance characteristics of the second label data mark, the second significance map is considered to be in accordance with the significance prediction standard; if the judging module judges that the second significance map can not correctly map the significant features of the second label data mark, the judging module considers that the second significance map does not accord with the significance prediction standard, and the method comprises the following steps:

7. The method according to claim 1, wherein if the first saliency map and the second saliency map do not meet a saliency prediction criterion, the step of taking a target domain single sample image as a first image and selecting any one image from a source domain image dataset in the twin network as a second image, and repeating the training process of the first feature extraction module, the second feature extraction module, the first saliency map generation module, the second saliency map generation module, and the discrimination module until the generated first saliency map and the generated second saliency map meet a saliency prediction criterion includes:

and according to the module after the module weight is updated, carrying out the next training process of each module until the judgment module judges that the generated first significance map and the second significance map conform to the significance prediction standard.

8. The saliency image generation method according to claim 1, characterized in that said first feature extraction module comprises a first extraction sub-block, a second extraction sub-block, a third extraction sub-block and a fourth extraction sub-block; the second feature extraction module comprises a fifth extraction sub-block, a sixth extraction sub-block, a seventh extraction sub-block and an eighth extraction sub-block;

9. The saliency image generation method according to any one of claims 1 to 8, characterized in that the label data of a single sample image of a target domain in said twin network shows similar saliency features to the label data of all images in a test set of target images in said twin network.

10. A terminal device, comprising: at least one processor, a memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the salient image generation method of any of claims 1-9.

11. A computer storage medium having computer executable instructions stored thereon which, when executed by a processor, implement the saliency image generation method of any one of claims 1 to 9.