CN113052786B

CN113052786B - Sonar image synthesis method and device

Info

Publication number: CN113052786B
Application number: CN202110604552.3A
Authority: CN
Inventors: 张庆港
Original assignee: Beijing Startest Tec Co Ltd
Current assignee: Beijing Startest Tec Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-03
Anticipated expiration: 2041-05-31
Also published as: CN113052786A

Abstract

The application discloses a sonar image synthesis method and device, wherein the method comprises the following steps: extracting the characteristics of a content image to be synthesized and a style image input image to obtain the characteristics of the content image and the style image; whitening processing is carried out on the characteristics of the content image to remove style characteristics in the content image to obtain whitened characteristics; fusing the whitening characteristic and the characteristic of the style image to obtain a target characteristic after the style conversion is carried out on the content image; and reconstructing an image based on the target characteristics, an image reconstruction network and random noise to obtain a synthetic sonar image, wherein the image characteristic extraction network and the image reconstruction network are deep learning networks obtained based on sample image training, the image characteristic extraction network and the image reconstruction network are mirror images of each other, and the sample image is an optical image. According to the method and the device, random noise is introduced to smooth the reconstructed image during image reconstruction, so that the reconstructed synthetic sonar image is closer to a real sonar image, and the problems of high acquisition cost and small sample number of the sonar image can be solved.

Description

Sonar image synthesis method and device

Technical Field

The application relates to the technical field of computers, in particular to a sonar image synthesis method and device.

Background

The side scan sonar is an electronic device which utilizes the propagation characteristics of sound waves under water, and completes underwater detection and communication tasks through electroacoustic conversion and information processing. The side scan sonar image is a two-dimensional image obtained by scanning and detecting an underwater object by using a side scan sonar. With the development of artificial intelligence, the research on sonar images by using machine learning is more and more important, but the experimental cost for acquiring sonar images is higher, so that sonar image samples are scarce, the research and development of the sonar images based on machine learning are slower, and the synthesis of the sonar images by using optical images becomes a feasible road.

At present, an image synthesis technology is mainly applied to optical images, for example, PhotoWCT is an optical image synthesis technical scheme with a very good synthesis effect, and the technical scheme converts the style of a reference image into a content photo so as to synthesize an image which can reach the photo level fidelity. Since the purpose of this solution is to synthesize an image with a photo-level reality, it performs very well in terms of image details. The sonar image based on the sound wave displays more outlines but not specific details, so the difference between the synthesized side-scan sonar image and the real side-scan sonar image is obvious, and the method is not well suitable for synthesizing the side-scan sonar image.

Therefore, a sonar image synthesis scheme closer to a real side-scan sonar image needs to be provided urgently to solve the problems of high sonar image acquisition cost and small sample number.

Disclosure of Invention

The embodiment of the application provides a sonar image synthesis method and device to synthesize a sonar image closer to a real side-scan sonar image, so that the problems of high sonar image acquisition cost and small sample number are solved.

In a first aspect, an embodiment of the present application provides a sonar image synthesis method, including:

inputting a content image to be synthesized and a style image into an image feature extraction network to obtain the features of the content image and the features of the style image, wherein the content image is an optical image, the style image is a real sonar image, the feature extraction network is a deep learning network obtained based on sample image training, and the sample image is an optical image;

whitening processing is carried out on the characteristics of the content image so as to remove style characteristics in the content image and obtain whitening characteristics of the content image;

fusing the whitening characteristic and the characteristic of the style image to obtain a target characteristic after the style conversion is carried out on the content image;

and reconstructing an image based on the target feature, an image reconstruction network and random noise to obtain a synthetic sonar image, wherein the synthetic sonar image has the content of the content image and the style of the style image, the image reconstruction network is a deep learning network obtained based on the sample image training, the image reconstruction network is a mirror image network of the feature extraction network, and the feature extraction network and the image reconstruction network are obtained based on the optimization strategy training of the reconstruction loss minimization of the sample image.

In a second aspect, an embodiment of the present application further provides a sonar image synthesizing apparatus, including:

the system comprises a feature extraction module, a feature extraction module and a feature extraction module, wherein the feature extraction module is used for inputting a content image to be synthesized and a style image into an image feature extraction network to obtain the features of the content image and the style image, the content image is an optical image, the style image is a real sonar image, the feature extraction network is a deep learning network obtained based on sample image training, and the sample image is an optical image;

the whitening processing module is used for whitening the characteristics of the content image to remove the style characteristics in the content image to obtain the whitening characteristics of the content image;

the feature fusion module is used for fusing the whitening features and the features of the style images to obtain target features after the style conversion is carried out on the content images;

the image reconstruction module is used for reconstructing images based on the target features, an image reconstruction network and random noise to obtain a synthetic sonar image, wherein the synthetic sonar image has the content of the content image and the style of the style image, the image reconstruction network is a deep learning network obtained based on the training of the sample image, the image reconstruction network is a mirror image network of the feature extraction network, and the feature extraction network and the image reconstruction network are obtained based on the training of an optimization strategy for minimizing the reconstruction loss of the sample image.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method according to the first aspect.

In a fifth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method according to the first aspect.

According to the at least one technical scheme, when image reconstruction is performed on the basis of the target features and the image reconstruction network after the style conversion is performed on the content image, random noise is introduced to smooth the reconstructed image, so that the reconstructed synthetic sonar image is closer to a real side-scan sonar image, and the problems of high sonar image acquisition cost and small sample number can be solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic view of a deep learning network training process before sonar image synthesis is performed by a sonar image synthesis method provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a deep learning network provided in an embodiment of the present application;

fig. 3A is a schematic structural diagram of an image feature extraction network provided in an embodiment of the present application;

FIG. 3B is a schematic diagram of the structure of the first residual block in the convolution block 2_ x in FIG. 3A;

FIG. 4 is a schematic diagram illustrating an image reconstruction principle in a deep learning network training process according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart of a sonar image synthesis method provided in an embodiment of the present application;

fig. 6A is a schematic diagram illustrating a sonar image synthesis method according to an embodiment of the present disclosure;

FIG. 6B is a schematic diagram of the structure of the hybrid pooling layer of FIG. 6A;

FIG. 7 is a schematic diagram of an image reconstruction part in a sonar image synthesis method according to an embodiment of the present application;

FIG. 8A is a schematic diagram of the effect of a sonar image synthesized using PhotoWCT;

fig. 8B is a schematic view showing the effect of a sonar image synthesized when the random noise intensity is level 1 by using the sonar image synthesis method provided in the embodiment of the present application;

fig. 8C is a schematic view of the effect of a sonar image synthesized when the random noise intensity is level 2 by using the sonar image synthesis method provided in the embodiment of the present application;

fig. 8D is a schematic view of an effect of a sonar image synthesized when the random noise intensity is level 3 by using the sonar image synthesis method provided in the embodiment of the present application;

fig. 8E is a schematic diagram illustrating the effect of a sonar image synthesized when the random noise intensity is level 4 by using the sonar image synthesis method provided in the embodiment of the present application;

fig. 8F is a schematic view of an effect of a sonar image synthesized when the random noise intensity is level 5 by using the sonar image synthesis method provided in the embodiment of the present application;

fig. 9 is a schematic structural diagram of a sonar image synthesizing apparatus according to an embodiment of the present invention;

fig. 10 is a detailed structural schematic diagram of a network training module in a sonar image synthesizing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to synthesize a sonar image closer to a real side-scan sonar image, so as to solve the problems of high sonar image acquisition cost and small sample number, embodiments of the present application provide a sonar image synthesis method and apparatus, where the method may be executed by an electronic device, such as a terminal device or a server, or the method may be executed by software installed in the electronic device. Wherein the terminal device includes but is not limited to: any one of smart terminal devices such as smart phones, Personal Computers (PCs), notebook computers, tablet computers, electronic readers, web tvs, and wearable devices; the server may be a background server device of an insurance company, and the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

The sonar image synthesis method provided by the embodiment of the application can comprise two stages, wherein the first stage is a training stage of a deep learning network and aims to train the deep learning network with good performance, and the deep learning network comprises an image feature extraction network and an image reconstruction network; the second stage is a stage of image synthesis based on the deep learning network trained in the first stage. The first stage may be regarded as a preparation stage before synthesizing a sonar image, and the preparation stage is generally executed once, that is, when a sonar image synthesis method provided in the embodiment of the present application is executed, the step of the first stage is not required to be executed every time. Or, before the step of the second stage, the step of the first stage is performed to train the image feature extraction network and the image reconstruction network based on the sample image.

The first stage, i.e., the training process of the deep learning network, is described with reference to fig. 1 and 2. Fig. 1 is a schematic view of a deep learning network training process before sonar image synthesis is performed by a sonar image synthesis method according to an embodiment of the present application. Fig. 2 is a schematic structural diagram of a deep learning network provided in an embodiment of the present application. As shown in fig. 2, the deep learning network provided in the embodiment of the present application includes an image feature extraction network 21 and an image reconstruction network 22, and the image reconstruction network 22 is a mirror network of the image feature extraction network 21.

As shown in fig. 1, a training process of a deep learning network in a sonar image synthesis method provided in an embodiment of the present application may include:

step 101, inputting a sample image into an image feature extraction network to obtain features of the sample image.

The sample images are from a training set, wherein the images in the training set are optical images collected in advance, the optical images are content images containing preset content information, such as contents of vehicles, ships, airplanes and the like, and the optical images can include, but are not limited to, images of objects such as vehicles, ships, airplanes and the like in public data sets DIOR, DOTA-v1.5, NWPU VHR-10, LEVIR, HRSID.

Optionally, the sample image may be preprocessed before being input into the image feature extraction network. The pre-processed content may include, but is not limited to, one or more of the following processes: resizing (resize) the sample image, such as reshaping the sample image to 512 x 512; normalizing the sample image; converting the color channels of the sample image, and so on.

As shown in fig. 2, the image feature extraction network 21 includes a plurality of convolution modules (convolution) and a plurality of Max pooling layers (Max pooling layers), one convolution module includes at least one convolution layer, and one convolution module is followed by one Max pooling layer. After a series of convolution and maximum pooling downsampling are performed on an input sample image, the characteristics of the sample image can be obtained.

Alternatively, as shown in fig. 2, when the feature matrix of the sample image input to the layer is downsampled based on the maximum pooling layer in the image feature extraction network 21, position information (maximum information) of a value in the sampling result in the feature matrix is recorded, and the position information is recorded as the first position information.

Fig. 3A shows a schematic structural diagram of an image feature extraction network provided in an embodiment of the present application. As shown in fig. 3A, the image feature extraction network includes a convolution module 1, a convolution module 2_ x, a convolution module 3_ x, and a convolution module 4_ x, and a maximum pooling layer is respectively disposed behind the four convolution modules, where the convolution module 2_ x, the convolution module 3_ x, and the convolution module 4_ x are residual error structure modules, where 2_1 represents a first residual error module of a 2 nd convolution module, and so on.

Fig. 3B shows a schematic structure diagram of 2_1 (x = 1) in fig. 3A, specifically, in fig. 3B, a portion within the block 32 is a schematic structure diagram of 2_1, and 2_1 includes two convolutional layers: conv2_1_1 and Conv2_1_ 2. As shown in fig. 3B, 2_1 first receives the result 31 of the maximum pooled downsampling from the upper layer, then performs two convolutions with convolution kernel of 3 × 3, step size of 1, and packed convolution layer of 1, and finally adds the output result from the upper layer to the output of the present module as the input of the output 31 of the present layer and the next layer.

And 102, reconstructing the sample image based on the characteristics of the sample image and the image reconstruction network to obtain a reconstructed image of the sample image.

As shown in fig. 2, the sample image is reconstructed based on the features of the sample image extracted by the image feature extraction network 21 and the image reconstruction network 22, and the output of the image reconstruction network 22 is used as a reconstructed image of the sample image.

The image reconstruction network 22 is a mirror network of the image feature extraction network 21 as described earlier. As an example, if, when the feature matrix of the input sample image is downsampled based on the maximum pooling layer in the image feature extraction network 21, the position information of the value in the downsampling result in the feature matrix is recorded (the position information corresponding to the maximum value, that is, the first position information), step 102 may specifically include: based on the inverse pooling layer (un-pooling layer) in the image reconstruction network 21 and the corresponding first location information, the feature matrix of the sample image input to the layer is up-sampled to obtain a reconstructed image of the sample image, wherein one of the inverse pooling layers in the image reconstruction network 22 is a mirror image of a corresponding maximum pooling layer in the image feature extraction network 21.

Fig. 4 shows a schematic diagram of an image reconstruction principle in a deep learning network training process according to an embodiment of the present application. Referring to fig. 4, in the process of extracting the features of the sample image through the image feature extraction network, while the maximum pooling downsampling is performed on the image feature matrix input to the maximum pooling layer 41 to obtain the downsampling result 42, the position information of the value in the downsampling result 42 in the image feature matrix, that is, the position information 43 of the maximum value in the image feature matrix, may also be retained; accordingly, in the process of reconstructing the sample image by using the image reconstruction network, the output 44 of the upper layer one may be upsampled at the corresponding position (filled position) recorded in the position information 43 of the maximum value in the inverse pooling layer 45, so as to complete the reconstruction of the sample image.

And 103, determining the reconstruction loss of the sample image based on the reconstructed image and the sample image.

As an example, the image feature extraction network and the image reconstruction network may be optimized with the goal of minimizing the euclidean distance of the sample image from the reconstructed image. Accordingly, the loss function that calculates the reconstruction loss may be:

wherein the content of the first and second substances,

which is indicative of a loss of the reconstruction,

a pixel value representing the ith pixel of the sample image,

representing the pixel value of the i-th pixel of the reconstructed image.

And 104, optimizing the image feature extraction network image reconstruction network based on the optimization strategy of reconstruction loss minimization to obtain the optimized image feature extraction network and the optimized image reconstruction network.

It can be understood that in the present stage, an optical content image is taken as a data set, the image reconstruction loss is taken as a target, an image feature extraction network and an image reconstruction network are trained, and a reliable network structure is provided for sonar image synthesis in the next stage.

It can be understood that the deep learning network is used as the image feature extraction network and the image reconstruction network, so that the features with better robustness can be extracted from the input image, and accordingly, the more vivid reconstructed image can be restored by using the features with better robustness.

Next, a second stage of the sonar image synthesis method according to the embodiment of the present invention will be described with reference to fig. 5.

As shown in fig. 5, a sonar image synthesis method according to an embodiment of the present invention may include:

step 501, inputting a content image to be synthesized and a style image into an image feature extraction network to obtain the features of the content image and the features of the style image.

The image input at this stage is divided into two parts of a content image and a style image, and the style of the content image is replaced by the style of the style image, so that the image with the style of the style image and the content image content is synthesized. The content image is an optical image, and the genre image is a real sonar image (a real side scan sonar image so as to synthesize a sonar image having the content of the optical image and the genre of the real sonar image.

The feature extraction network is a deep learning network obtained based on sample image training in the above, and the sample image is an optical image.

Optionally, before inputting the content image and the style image to be combined into the image feature extraction network, the method shown in fig. 5 may further include: the content image and the style image of the network for extracting the features of the image to be input are also preprocessed. The content of the preprocessing is identical to that of the sample image described above.

Alternatively, when the feature matrix of the content image input to the layer is downsampled based on the maximum pooling layer in the image feature extraction network, position information (maximum information) of a value in the sampling result in the feature matrix is recorded and is recorded as second position information, and it can be understood that one maximum pooling layer corresponds to downsampling once and that one downsampling corresponds to recording the second position information once.

Step 502, whitening the feature of the content image to remove the style feature in the content image, so as to obtain the whitened feature of the content image.

As an example, the feature of the content image may be subjected to whitening processing by the following formula:

=

wherein the content of the first and second substances,

a feature representing an image of the content is,

a whitening characteristic representing an image of the content,

representing a covariance matrix

The eigenvalues of (a) form a diagonal matrix,

representing a covariance matrix

A corresponding orthogonal matrix is formed by the orthogonal matrix,

。

and 503, fusing the whitening characteristic and the characteristic of the style image to obtain a target characteristic after the style conversion is carried out on the content image.

As an example, the above target feature can be obtained by the following formula:

wherein the content of the first and second substances,

in order to be a feature of the stylized image,

in order to achieve the above-mentioned object,

representing a covariance matrix

The eigenvalues of (a) form a diagonal matrix,

representing a covariance matrix

A corresponding orthogonal matrix is formed by the orthogonal matrix,

。

the processes described in the above step 502 and the above step 503 may be regarded as a feature conversion process, which will be described in detail below with reference to fig. 6A and 6B, and will not be described again here.

And step 504, reconstructing an image based on the target characteristics, the image reconstruction network and random noise to obtain a synthetic sonar image.

The synthetic sonar image has the content of a content image and the style of a style image, the image reconstruction network is a deep learning network obtained based on sample image training in the above, the image reconstruction network is a mirror image network of the feature extraction network, and the feature extraction network and the image reconstruction network are obtained based on optimization strategy training of reconstruction loss minimization of the sample image.

The same as the image reconstruction in the training stage, in the second stage, the mixed pooling (Mixporoling) is adopted, that is, random noise is also introduced during the image reconstruction, so that the purpose is to perform fuzzy or smoothing treatment on details in the synthesized sonar image obtained by reconstruction, and the synthesized sonar image obtained by reconstruction is closer to a real side-scan sonar image, thereby solving the problems of high acquisition cost of the sonar image and small sample number. The following is a detailed description with reference to fig. 6A and 6B. Fig. 6A is a schematic diagram illustrating a principle of a sonar image synthesis method according to an embodiment of the present application. Fig. 6B is a schematic structural diagram of the hybrid pooling layer of fig. 6A.

As shown in fig. 6A, after a content image and a genre image are input as input, and an image feature extraction network 61 obtains features of both, a feature conversion 62 may be further performed to obtain a target feature of the content image (a fusion feature of the content image and the genre of the genre image), and then the target feature may be up-sampled as input of an image reconstruction network 63, and a smoothing process may be performed by adding random noise, so that a synthetic sonar image may be output.

Referring to fig. 6A, the image reconstruction network 63 is not completely identical to the image reconstruction network in fig. 2, but differs from the image reconstruction network 63 in fig. 6A in that a mixed pooling layer is employed. With continued reference to FIG. 6B, the hybrid pooling layer is a superposition of the anti-pooling layer and the random noise of FIG. 2.

As an example, if, when the feature matrix of the input content image is downsampled based on the maximum pooling layer in the image feature extraction network, position information of a value in the downsampled result in the feature matrix is recorded (position information corresponding to the maximum value, that is, the second position information), the step 504 may specifically include: and extracting an anti-pooling layer of the network and corresponding second position information based on the image features, up-sampling a feature matrix of the content image input to the layer, and superposing random noise (mixed pooling) on an up-sampling result to obtain a synthetic sonar image.

More specifically, the step 504 may specifically include: and the anti-pooling layer of the image feature extraction network up-samples the feature matrix of the content image input to the layer at the corresponding position recorded in the corresponding second position information, and adds random noise at the position except the position recorded in the second position information in the up-sampling result to obtain a synthetic sonar image.

Fig. 7 shows a schematic diagram of a principle of an image reconstruction part in a sonar image synthesis method provided by an embodiment of the present application. Referring to fig. 7, in the process of extracting the feature of the content image through the image feature extraction network, while the maximum pooling downsampling is performed on the image feature matrix input to the maximum pooling layer 71 to obtain the downsampling result 72, the position information of the value in the downsampling result 72 in the image feature matrix, that is, the position information 73 of the maximum value in the image feature matrix, may also be retained; accordingly, in the process of reconstructing an image by using an image reconstruction network, for the output 74 of the upper layer of the image reconstruction network, the output 74 of the upper layer may be up-sampled at the corresponding position (filled position) recorded in the position information 73 of the maximum value in the hybrid pooling layer 76, and at the same time, random noise 75 is added at other positions to complete the reconstruction of the image, so as to obtain a synthetic sonar image.

Optionally, the step 504 may further include: random noise of various intensities is added to the position except the position recorded in the second position information in the up-sampling result to obtain a plurality of synthetic sonar images under various noise intensities, and one noise intensity corresponds to one synthetic sonar image. Correspondingly, the method shown in fig. 5 may further include: comparing the plurality of synthetic sonar images with real side-scan sonar images to select a target synthetic sonar image which is closest to the real side-scan sonar images from the plurality of synthetic sonar images; and determining the random noise intensity corresponding to the target synthetic sonar image as the reference noise intensity of the subsequent synthetic sonar image.

According to the sonar image synthesis method provided by the embodiment of the application, random noise is introduced to smooth the reconstructed image when image reconstruction is performed on the basis of the target features and the image reconstruction network after the content image is subjected to style conversion, so that the reconstructed synthesized sonar image is closer to a real (more vivid) side scan sonar image, the problems of high sonar image acquisition cost and small sample number can be solved, and good data support can be provided for sonar image research based on deep learning.

Fig. 8A shows an effect diagram of a sonar image synthesized using phowct in the related art. Fig. 8B to 8F are schematic diagrams illustrating the effect of a sonar image synthesized by using a sonar image synthesis method according to an embodiment of the present application when the introduced noise intensity is level 1, level 2, level 3, level 4, and level 5, respectively, where the smaller the level of the noise intensity is, the weaker the corresponding noise intensity is. As can be seen from comparison of fig. 8A to 8F, details of sonar images synthesized by using phowct in the related art are too obvious, and sonar images at different noise intensities can be synthesized by using the sonar image synthesis scheme provided by the embodiment of the present application, and the synthesis effect at each intensity is superior to that of the prior art. The most appropriate noise intensity can be selected through comparison, and therefore a synthetic sonar image with high reliability is generated.

Corresponding to the above method embodiment, the present application embodiment further provides a sonar image synthesizing apparatus, which is described below.

As shown in fig. 9, a sonar image synthesizing apparatus 900 according to an embodiment of the present invention may include: a feature extraction module 901, a whitening processing module 902, a feature fusion module 903 and an image reconstruction module 904.

The feature extraction module 901 is configured to input a content image to be synthesized and a style image into an image feature extraction network to obtain features of the content image and features of the style image, where the content image is an optical image, the style image is a real sonar image, the feature extraction network is a deep learning network obtained based on sample image training, and the sample image is an optical image.

A whitening processing module 902, configured to perform whitening processing on the feature of the content image to remove the style feature in the content image, so as to obtain a whitening feature of the content image.

A feature fusion module 903, configured to fuse the whitening feature and the feature of the genre image to obtain a target feature after performing genre conversion on the content image.

An image reconstruction module 904, configured to perform image reconstruction based on the target feature, an image reconstruction network, and random noise to obtain a synthetic sonar image, where the synthetic sonar image has content of the content image and a style of the style image, the image reconstruction network is a deep learning network obtained based on the sample image training, the image reconstruction network is a mirror image network of the feature extraction network, and the feature extraction network and the image reconstruction network are obtained based on an optimization strategy training that minimizes reconstruction loss of the sample image.

Optionally, the sonar image synthesizing device 900 according to the embodiment of the present application may further include a network training module, configured to train the image feature extraction network and the image reconstruction network based on a sample image before inputting the content image and the style image to be synthesized into the image feature extraction network and obtaining the features of the content image and the features of the style image.

Optionally, as shown in fig. 10, the network training module 1005 may include: a feature extraction sub-module 1001, an image reconstruction sub-module 1002, a loss determination sub-module 1003 and a network optimization sub-module 1004.

The feature extraction sub-module 1001 is configured to input a sample image into the image feature extraction network to obtain features of the sample image.

An image reconstruction submodule 1002, configured to reconstruct the sample image based on the features of the sample image and the image reconstruction network, so as to obtain a reconstructed image of the sample image.

A loss determination submodule 1003 configured to determine a reconstruction loss of the sample image based on the reconstructed image and the sample image.

A network optimization sub-module 1004 configured to optimize the image feature extraction network and the image reconstruction network based on the optimization strategy for minimizing reconstruction loss.

Optionally, the image feature extraction network includes a plurality of convolution modules and a plurality of maximum pooling layers, where one convolution module includes at least one convolution layer, and one maximum pooling layer is set after one convolution module, and the apparatus 900 may further include: and the first position information recording module is used for recording the position information of the value in the sampling result in the characteristic matrix when the characteristic matrix of the sample image input to the layer is downsampled based on the maximum pooling layer before the sample image is reconstructed based on the characteristics of the sample image and the image reconstruction network to obtain the reconstructed image of the sample image, and recording the position information as the first position information, wherein one maximum pooling layer corresponds to downsampling, and one downsampling corresponds to recording the first position information. Accordingly, the image reconstruction sub-module 1002 may be configured to: and based on an anti-pooling layer in the image reconstruction network and corresponding first position information, up-sampling the characteristic matrix of the sample image input into the layer to obtain a reconstructed image of the sample image, wherein the anti-pooling layer is a mirror image of a corresponding maximum pooling layer.

On this basis, optionally, the apparatus 900 may further include: and the second position information recording module is used for recording the position information of the value in the sampling result in the feature matrix when the feature matrix of the content image input to the layer is subjected to down-sampling based on the maximum pooling layer before whitening processing is carried out on the features of the content image, and recording the position information as second position information, wherein one maximum pooling layer corresponds to one down-sampling, and one down-sampling corresponds to one second position information. Accordingly, the image reconstruction module 904 is particularly operable to: and based on the anti-pooling layer and corresponding second position information, up-sampling the feature matrix of the content image input into the layer, and superposing random noise on an up-sampling result to obtain a synthetic sonar image.

More specifically, the image reconstruction module 904 may be configured to: and upsampling the feature matrix of the content image input into the layer at the corresponding position recorded in the corresponding second position information based on the anti-pooling layer, and adding random noise at the position except the position recorded in the second position information in the upsampling result to obtain a synthetic sonar image.

Optionally, the image reconstruction module 904 may be configured to: random noise of various intensities is added to the position except the position recorded in the second position information in the up-sampling result to obtain a plurality of synthetic sonar images under various noise intensities, and one noise intensity corresponds to one synthetic sonar image. Accordingly, the apparatus 900 may further include: the device comprises a selection module and a reference noise strength determination module.

And the selection module is used for comparing the plurality of synthetic sonar images with the real side-scan sonar images so as to select the target synthetic sonar image which is closest to the real side-scan sonar images from the plurality of synthetic sonar images.

And the reference noise intensity determining module is used for determining the random noise intensity corresponding to the target synthetic sonar image as the reference noise intensity of the synthetic sonar image.

Optionally, the apparatus 900 may further include: the first preprocessing module is used for preprocessing the sample image before inputting the sample image into the image feature extraction network to obtain the features of the sample image.

Optionally, the apparatus 900 may further include: and the second preprocessing module is used for preprocessing the content image and the style image before inputting the content image and the style image to be synthesized into an image feature extraction network to obtain the features of the content image and the features of the style image.

The embodiment of the application provides a sonar image synthesis device, because when carrying out image reconstruction based on target feature and image reconstruction network after carrying out the style conversion to the content image, introduce random noise and carried out smooth processing to the image of rebuilding, consequently, make the synthetic sonar image that the rebuild obtained more be close to real (more lifelike) side scan sonar image, thereby can solve sonar image and acquire with high costs, the difficult problem that the sample is small in quantity, can provide fine data support for sonar image research based on deep learning.

It should be noted that, since the device embodiments are executed in a manner similar to the method embodiments, the device embodiments are described in a simplified manner, and reference is made to the method embodiments for relevant points.

Fig. 11 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 11, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 11, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the sonar image synthesis device on a logic level. And a processor for executing the program stored in the memory and specifically for executing the sonar image synthesis method provided by the embodiment of the present application.

The method executed by the sonar image synthesizing device according to the embodiment shown in fig. 5 of the present application can be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The embodiment of the present application also provides a computer-readable storage medium, which stores one or more programs, where the one or more programs include instructions, which when executed by an electronic device including a plurality of application programs, enable the electronic device to execute the method performed by the sonar image synthesizing apparatus in the embodiment shown in fig. 5, and are specifically used for executing the sonar image synthesizing method provided in the embodiment of the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that all the embodiments in the present application are described in a related manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A sonar image synthesis method, comprising:

inputting a content image to be synthesized and a style image into an image feature extraction network to obtain the features of the content image and the features of the style image, wherein the content image is an optical image, the style image is a real sonar image, the feature extraction network is a deep learning network obtained based on sample image training, the sample image is an optical image, the image feature extraction network comprises a plurality of convolution modules and a plurality of maximum pooling layers, one convolution module comprises at least one convolution layer, and one maximum pooling layer is arranged behind the other convolution module;

when the feature matrix of the content image input to the maximum pooling layer is downsampled based on the maximum pooling layer, recording position information of a value in a sampling result in the feature matrix, and recording the position information as second position information, wherein one maximum pooling layer corresponds to downsampling once, and one downsampling corresponds to recording the second position information once;

carrying out image reconstruction based on the target features, an image reconstruction network and random noise to obtain a synthetic sonar image, wherein the synthetic sonar image has the content of the content image and the style of the style image, the image reconstruction network is a deep learning network obtained based on the sample image training, the image reconstruction network is a mirror image network of the feature extraction network, and the feature extraction network and the image reconstruction network are obtained based on the optimization strategy training of the reconstruction loss minimization of the sample image;

wherein, carry out image reconstruction based on target feature, image reconstruction network and random noise, obtain synthetic sonar image, include:

based on the anti-pooling layer and corresponding second position information in the image reconstruction network, the feature matrix of the content image input to the layer is up-sampled, and random noise is superimposed on the up-sampling result to obtain a synthetic sonar image.

2. The method according to claim 1, wherein before the extracting a content image and a genre image to be combined into a feature extraction network, the method further comprises:

training the image feature extraction network and the image reconstruction network based on sample images.

3. The method of claim 2, wherein training the image feature extraction network and the image reconstruction network based on sample images comprises:

inputting a sample image into the image feature extraction network to obtain the features of the sample image;

reconstructing the sample image based on the characteristics of the sample image and the image reconstruction network to obtain a reconstructed image of the sample image;

determining a reconstruction loss of the sample image based on the reconstructed image and the sample image;

and optimizing the image feature extraction network and the image reconstruction network based on the optimization strategy for minimizing the reconstruction loss.

4. The method of claim 3, wherein before the reconstructing the sample image based on the features of the sample image and the image reconstruction network to obtain the reconstructed image of the sample image, the method further comprises:

when the feature matrix of the sample image input into the layer is downsampled based on the maximum pooling layer, recording position information of a value in a sampling result in the feature matrix, and recording the position information as first position information, wherein one maximum pooling layer corresponds to downsampling once, and one downsampling corresponds to recording the first position information once;

reconstructing the sample image based on the characteristics of the sample image and the image reconstruction network to obtain a reconstructed image of the sample image, including:

and based on an anti-pooling layer in the image reconstruction network and corresponding first position information, up-sampling the characteristic matrix of the sample image input into the layer to obtain a reconstructed image of the sample image, wherein the anti-pooling layer is a mirror image of a corresponding maximum pooling layer.

5. The method according to claim 1, wherein the up-sampling a feature matrix of the content image input to an anti-pooling layer in the image reconstruction network based on the layer and corresponding second position information, and superimposing random noise on the up-sampling result to obtain a synthetic sonar image, comprises:

and upsampling the feature matrix of the content image input into the layer at the corresponding position recorded in the corresponding second position information based on the anti-pooling layer, and adding random noise at the position except the position recorded in the second position information in the upsampling result to obtain a synthetic sonar image.

6. The method according to claim 5, wherein the intensity of the random noise is adjustable, and wherein adding the random noise to the up-sampling result at a position other than the position recorded in the second position information to obtain the synthetic sonar image comprises:

random noise with various intensities is added to the positions except the position recorded in the second position information in the up-sampling result so as to obtain a plurality of synthetic sonar images under various noise intensities, wherein one noise intensity corresponds to one synthetic sonar image; and

the method further comprises the following steps:

comparing the plurality of synthetic sonar images with real side-scan sonar images to select a target synthetic sonar image which is closest to the real side-scan sonar images from the plurality of synthetic sonar images;

and determining the random noise intensity corresponding to the target synthetic sonar image as the reference noise intensity of the synthetic sonar image.

7. The method of claim 3 or 4, wherein before inputting the sample image into the image feature extraction network to obtain the features of the sample image, the method further comprises:

and preprocessing the sample image.

8. The method according to claim 7, wherein before the extracting network of the content image and the style image to be combined into the feature extraction network obtains the feature of the content image and the feature of the style image, the method further comprises:

and preprocessing the content image and the style image.

9. A sonar image synthesizing apparatus, comprising:

the system comprises a feature extraction module, a data processing module and a data processing module, wherein the feature extraction module is used for inputting a content image to be synthesized and a style image into an image feature extraction network to obtain the features of the content image and the features of the style image, the content image is an optical image, the style image is a real sonar image, the feature extraction network is a deep learning network obtained based on sample image training, the sample image is an optical image, the image feature extraction network comprises a plurality of convolution modules and a plurality of maximum pooling layers, one convolution module comprises at least one convolution layer, and one maximum pooling layer is arranged behind one convolution module;

a second position information recording module, configured to, before whitening processing is performed on the features of the content image, record position information of a value in a sampling result in a feature matrix of the content image input to the layer when the feature matrix is downsampled based on the maximum pooling layer, and record the position information as second position information, where one maximum pooling layer corresponds to one downsampling, and one downsampling corresponds to one second position information;

the image reconstruction module is used for reconstructing an image based on the target feature, an image reconstruction network and random noise to obtain a synthetic sonar image, wherein the synthetic sonar image has the content of the content image and the style of the style image, the image reconstruction network is a deep learning network obtained based on the training of the sample image, the image reconstruction network is a mirror image network of the feature extraction network, and the feature extraction network and the image reconstruction network are obtained based on the training of an optimization strategy for minimizing the reconstruction loss of the sample image;

wherein the image reconstruction module is specifically configured to:

10. The apparatus of claim 9, further comprising:

and the network training module is used for training the image feature extraction network and the image reconstruction network based on the sample image before inputting the content image to be synthesized and the style image into the image feature extraction network to obtain the features of the content image and the style image.

11. The apparatus of claim 10, wherein the network training module comprises:

the characteristic extraction submodule is used for inputting a sample image into the image characteristic extraction network to obtain the characteristics of the sample image;

the image reconstruction submodule is used for reconstructing the sample image based on the characteristics of the sample image and the image reconstruction network to obtain a reconstructed image of the sample image;

a loss determination sub-module for determining a reconstruction loss of the sample image based on the reconstructed image and the sample image;

and the network optimization submodule is used for optimizing the image feature extraction network and the image reconstruction network based on the reconstruction loss minimized optimization strategy.

12. The apparatus of claim 11, further comprising:

the first position information recording module is used for recording position information of a value in a sampling result in a characteristic matrix of the sample image input to the layer when the characteristic matrix is downsampled based on the maximum pooling layer before the sample image is reconstructed based on the characteristics of the sample image and the image reconstruction network to obtain the reconstructed image of the sample image, and recording the position information as first position information, wherein one maximum pooling layer corresponds to downsampling, and one downsampling corresponds to recording the first position information;

wherein the image reconstruction sub-module is specifically configured to:

13. The apparatus of claim 9, wherein the image reconstruction module is specifically configured to:

14. The apparatus of claim 13, wherein the image reconstruction module is specifically configured to:

the device further comprises:

the selection module is used for comparing the plurality of synthetic sonar images with real side-scan sonar images so as to select a target synthetic sonar image which is closest to the real side-scan sonar images from the plurality of synthetic sonar images;

15. The apparatus of claim 11 or 12, further comprising:

the first preprocessing module is used for preprocessing the sample image before inputting the sample image into the image feature extraction network to obtain the features of the sample image.

16. The apparatus of claim 15, further comprising:

and the second preprocessing module is used for preprocessing the content image and the style image before inputting the content image and the style image to be synthesized into an image feature extraction network to obtain the features of the content image and the features of the style image.

17. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

18. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to: