WO2021249233A1

WO2021249233A1 - Image processing method, target recognition model training method, and target recognition method

Info

Publication number: WO2021249233A1
Application number: PCT/CN2021/097562
Authority: WO
Inventors: 刘道学; 耿天宝; 杨铭; 于健; 胡伟; 肖丽娜; 张尧尘
Original assignee: 中铁四局集团有限公司; 安徽数智建造研究院有限公司
Priority date: 2020-06-10
Filing date: 2021-06-01
Publication date: 2021-12-16
Also published as: CN111784642A; CN111784642B

Abstract

An image processing method, which belongs to the technical field of target recognition. The method comprises the following steps: extracting an edge feature of an image to be processed, so as to obtain a gradient a priori map Fg; extracting a texture feature of said image, so as to obtain a texture feature map Fv; extracting a structural feature of said image, so as to obtain a structural feature map Fs; performing contact splicing on the gradient a priori map Fg, the texture feature map Fv and the structural feature map Fs, so as to obtain a multi-feature splicing map Fc; and performing a convolution fusion operation on the multi-feature splicing map Fc, so as to obtain a multi-feature fusion map Ff. By means of the method, edge feature enhancement is first performed on an image to be processed, and a texture feature and a structural feature are then fused, such that a target feature in the image can be recognized more accurately.

Description

An image processing method, target recognition model training method and target recognition method

Technical field

The invention belongs to the technical field of target recognition, and more specifically, relates to an image processing method, a target recognition model training method and a target recognition method.

Background technique

With the development of information technology, image-based target detection technology is more and more widely used, such as facial recognition and vehicle obstacle recognition. Due to different application scenarios, the same target recognition model is difficult to be used in all areas that require target recognition. For example, in the construction of tunnel engineering, it is necessary to quickly and accurately detect and locate the voids and voids that may appear inside the tunnel's transverse hole. At present, the methods for detecting abnormal geological structures in tunnels are mainly based on human subjective manual frame selection and mark counting from the geological radar scanning spectrogram. Such mark counting method requires high human judgment ability, and the difficulty lies in The radar spectrogram contains a large amount of interference information. Without training, it is difficult for people to quickly distinguish the abnormal state of the structure inside the tunnel such as voids and cavities. Therefore, the traditional method has the disadvantage of time-consuming and labor-consuming. There are no target recognition models and methods that have been applied to tunnel geological defect recognition in the technology.

After searching, Chinese invention patent, publication number: 102254165A, publication date: November 23, 2011, disclosed a hand vein recognition method based on the fusion of structure coding features and texture coding features, including step one, image acquisition and image pre-processing. Processing; step two, extract structure coding features; step three, extract hand vein texture coding features; step four, structure coding features and texture coding features fusion; step five, identify through the classifier, and get the result. Among them, the image is first processed for feature enhancement, and then the target is recognized by the classifier. The original image collected in this recognition method is the photo of the back of the hand taken by optical imaging, but the target imaging that needs to be detected in the tunnel detection is through the radar detection band signal Then the digital information obtained by analog-to-digital conversion is presented in the form of images. Therefore, the above recognition methods cannot be simply applied to the field of tunnel defect recognition. Due to the wide variety of underground objects, the wide distribution and the complex and changeable geological conditions, the radar spectrogram There are a lot of fringes and noise. These interference information greatly affect the recognition of geological defects.

Therefore, there is a need for a method for processing the geological radar spectrum image obtained in the tunnel structure abnormality detection, and identifying the geological defects of the tunnel in the map.

Summary of the invention

1. The problem to be solved

Aiming at the problem that the radar spectrogram in the existing tunnel geological defect detection has a large number of stripes and noises, which greatly affects the recognition of geological defects, the present invention provides an image processing method, a target recognition model training method and a target recognition method , Used to identify the geological defects in the radar spectrogram.

2. Technical scheme

To solve the above-mentioned problems, the present invention adopts the following technical solutions.

An image processing method includes the following steps:

Extract the edge features of the image to be processed to obtain the gradient prior image Fg;

Extract the texture feature of the image to be processed to obtain the texture feature map Fv;

Extract the structural features of the image to be processed to obtain the structural feature map Fs;

Connect the gradient prior map Fg, the texture feature map Fv and the structural feature map Fs to contact stitching to obtain a multi-feature stitching image Fc;

The convolution fusion operation is performed on the multi-feature mosaic image Fc to obtain the multi-feature fusion image Ff.

Further, the gradient transformation algorithm is used to extract the edge features of the image to be processed, and the method of the gradient transformation algorithm to extract the edge features is:

Let f(x,y) be the gray value of the (x,y) point on the image to be processed, G(x) is the image gray value of the edge detection in the x direction of the image to be processed, and G(y) is the image to be processed The image gray value of edge detection in the y direction, a is the convolution factor of G(x), b is the convolution factor of G(y), then the image gray value of each pixel on the gradient prior image Fg is :

The edge feature on the gradient priori map Fg is presented by the gray value of each pixel.

Further, the following steps are used when extracting texture features of the image to be processed:

S1: Perform n consecutive convolution operations on the image to be processed, and obtain convolutional layer v1, convolutional layer v2, ..., convolutional layer vn in sequence;

S2: Perform another convolution operation on the convolutional layer v1 to obtain the convolutional layer v1';

S3: The texture feature map Fv is obtained by adding the convolutional layer v1 and the convolutional layer v1'.

Further, the following steps are used when extracting structural features of the image to be processed:

S11: Perform one convolution operation on the image to be processed to obtain the convolution layer s1;

S12: Perform two consecutive m-1 convolution operations on the convolutional layer s1 to obtain the convolutional layer s2, convolutional layer s3,..., convolutional layer sm, convolutional layer s2', and convolutional layer s3, respectively ',..., convolutional layer sm';

S13: The convolutional layer sm and the convolutional layer sm' in step S12 are spliced through contact to obtain the convolutional layer sm+1;

S14: The convolutional layer sm+1 in step S13 and the convolutional layer sm-1', convolutional layer sm-2',..., convolutional layer s2' in step S12 are continuously spliced through contact, and finally the convolutional layer is obtained. Build-up s2m-1;

S15: Join the convolutional layer s1 in S11 and the convolutional layer s2m-1 in step S14 through contact to obtain the convolutional layer s2m, and perform a convolution operation on the convolutional layer s2m to obtain Fs.

Further, in the step S1, the first convolution operation is performed on the image to be processed, and the convolution kernel is a 3*3 matrix when the convolution layer v1 is obtained, and the convolution kernel of the convolution operation in the step S2 is 5* 5 Matrix.

Further, in the step S15, the convolution kernel is a 1*1 matrix for the convolution operation of the convolution layer s2m.

Because the radar spectrogram has a lot of redundant information such as noise, this interference information greatly affects the recognition of geological defects. Therefore, in order to improve the recognition accuracy, the original image needs to be feature-enhanced, taking into account the image of the radar spectrogram Features and the shape and structure of geological defects. Here, the edge feature, texture feature and structural feature in the image are separately extracted and merged into a single image for recognition operation, which greatly improves the recognition efficiency of geological defects.

The present invention provides a method for training a target recognition model, which includes the following steps:

S21: Cut several images obtained by the ground penetrating radar to obtain a tunnel scan image I;

S22: Use the tunnel scan image I as a to-be-processed image to obtain a multi-feature fusion image Ff through the above image processing method;

S23: Perform a down-sampling operation on the multi-feature fusion map Ff to obtain the corresponding down-sampled feature map Ff';

S24: Select a priori boxes with different aspect ratios on Ff and Ff', and use the position information (Xi, Yi, Wi, Hi) of each a priori box in the figure and the corresponding abnormal state type Ci as a label mark;

S25: Input a priori box with a label as a training sample into a pre-recognition model for training to obtain a target recognition model.

Further, the pre-identification model is a machine self-learning model, including one or more combinations of neural network, convolutional neural network, deep neural network, and feedback neural network.

Further, in the step S23, the multi-feature fusion map Ff is down-sampled and doubled to obtain Ff'.

After the feature enhancement processing of the image, the target recognition model needs to be imported to recognize the geological defects. The target recognition model requires a lot of training in the early stage. The training samples are the images that have been labeled with the defect location and type. The size is different, so here we take two different sizes of the multi-feature fusion map Ff and the down-sampled feature map Ff' for recognition training. In the pre-recognition model, the target feature is classified and recognized by the classifier, and then non-maximum The value suppression calculation is large to realize the determination of the most approximate a priori box.

The present invention also provides a target recognition method, which includes the following steps:

Receive the image to be processed;

A target recognition model is used to recognize the target object in the image to be recognized; wherein, the target recognition model is pre-trained using the above-mentioned target recognition model training method.

The target recognition method proposed here is mainly based on the target recognition model trained in this technical solution, which can perform target recognition on the original image similar to the radar spectrogram, compared with the Faster-R-CNN, SSD and Yolov3 in the prior art. And other algorithm models have faster recognition speed.

3. Beneficial effects

Compared with the prior art, the beneficial effects of the present invention are:

(1) The present invention obtains the gradient prior map Fg, the texture feature map Fv and the structural feature map Fs by feature extraction of the image to be processed, and then stitches the above three images and convolution fusion to obtain the multi-dimensional feature fusion map Ff, which can reduce The influence of noise in the radar spectrogram can also display the edges of the fringe more clearly, and it can also include texture features that are greatly affected by the size of the image, such as fringes in the radar spectrogram, and structural features that are less affected by size changes. , Help to improve the accuracy of target recognition;

(2) The present invention uses a 5*5 convolution kernel to perform convolution operation on conv_v1 in the texture feature extraction to obtain conv_v1', and add conv_v1' and conv_vn to obtain the texture feature map Fv, where conv_v1' retains more The original features of, and the receptive field is larger than conv_v1, which is convenient to better retain the texture features in the original image and improve the accuracy of target recognition; and in the process of structural feature extraction, the original image is first convolved to obtain conv_s1, Then conv_s1 is processed with an hourglass structure that shrinks and then expands, which better preserves the structural features in the original image file, and further improves the accuracy of target recognition.

Description of the drawings

Figure 1 is a schematic flow diagram of an image processing method;

Figure 2 is a schematic diagram of the process of texture feature extraction in an image processing method;

Fig. 3 is a schematic diagram of the flow of structure feature extraction in an image processing method;

Figure 4 is a schematic flow diagram of a target recognition method;

Figure 5 is a schematic diagram of a bad cavity output in the identification of geological defects in tunnels by a target recognition method;

Fig. 6 is a schematic diagram of poor voids output by a target recognition method in the recognition of geological defects in tunnels.

detailed description

The present invention will be further described below in conjunction with specific embodiments.

Example 1

As shown in Figure 1 and Figure 4, before performing geological defect recognition, the image is feature-enhanced. Here, the edge feature, texture feature, and structural feature are enhanced by combining the image features of the radar spectrogram. These three features The extraction enhancement can be advanced in parallel without any order. The following respectively introduces the selection of edge feature, texture feature and structure feature extraction.

When extracting the edge features of the image to be processed, the gradient transformation algorithm is used, and the obtained gradient a priori map is named Fg, where the gradient transformation algorithm is the sobel operator. Let f(x,y) be (x,y) on the image to be processed ) Point gray value, G(x) is the image gray value of the edge detection in the x direction of the image to be processed, G(y) is the image gray value of the edge detection in the y direction of the image to be processed, a is G(x The convolution factor of ), b is the convolution factor of G(y), then the image gray value of each point on the gradient prior image Fg is:

In this technical solution, the values of a and b can be selected as a 3*3 matrix, which is specifically:

The method used when extracting texture features in the image to be processed is:

S1: Perform n consecutive convolution operations on the image to be processed to obtain convolutional layers conv_v1, conv_v2, ..., conv_vn;

S2: Perform another convolution operation on the convolutional layer conv_v1 to obtain conv_v1';

S3: The texture feature map Fv is obtained by adding conv_vn and conv_v1';

Referring to Figure 2, in this technical solution, in order to better extract the original detailed texture features in the image, a scale-invariant multi-layer CNN module based on transitional convolution is designed, because the details of the texture and the The resolution has a strong matching relationship. When the resolution is different, the texture feature must be different. Therefore, the resolution is unchanged when the texture feature is extracted in this solution. The image to be processed is the tunnel scan image I. In the deep convolutional neural network, as the number of convolutions increases, the gradient disappears, and the more the number of convolutions, the slower the calculation efficiency. In this technical solution, n= 5. Since the original image output by the geological radar is a three-channel black-and-white image, before inputting this model, the original image needs to be converted into a single-channel grayscale image. When performing convolution operations on the converted single-channel grayscale image , Using a 3*3 convolution kernel, in order to ensure the resolution is unchanged, the step size of the convolution is 1, and then while ensuring the resolution, the texture feature extraction is performed. The specific information of each convolution is as follows:

Table 1 Texture feature extraction convolution information table

卷积层Convolutional layer	步长Step size	卷积核Convolution kernel	通道数Number of channels
conv_v1conv_v1	11	3333	4848
conv_v2conv_v2	11	3333	4848
conv_v3conv_v3	11	3333	4848
conv_v4conv_v4	11	3333	4848
conv_v5conv_v5	11	3333	4848
conv_v1＇conv_v1'	11	5555	4848

Since a short branch single convolution operation is performed in the conv_v1 conv_v1 and the obtained conv_v1' convolution layer is used as a transitional convolution layer, the conv_v1' convolution factor parameter used when obtaining conv_v1' is a convolution kernel of 5*5, Therefore, conv_v1' has a larger receptive field than conv_v1, and can learn original features different from conv_v1 in it, and conv_v1' is convolved from conv_v1. Compared with conv_v1' through other convolution layers, conv_v1' More original features are retained. When adding them to conv_vn, the texture features in the original image can be better retained, thereby improving the accuracy of the recognition of geological defects. The texture feature map obtained by the above method is named Fv.

The method used when extracting the structural features in the image to be processed is:

S11: Perform a convolution operation on the image to be processed to obtain conv_s1;

S12: Perform two consecutive m-1 convolution operations on conv_s1 to obtain conv_s2, conv_s3,..., conv_sm and conv_s2', conv_s3',..., conv_sm' respectively;

S13: splicing conv_sm and conv_sm' in step S12 through contact to obtain conv_sm+1;

S14: Conv_sm+1 in step S13 and conv_sm-1', conv_sm-2',..., conv_s2' in step S12 are continuously spliced through contact, and finally conv_s2m-1 is obtained;

S15: splicing conv_s1 in S11 with conv_s2m-1 in step S14 through contact to obtain conv_s2m, and perform a convolution operation on conv_s2m to obtain Fs.

3, in this technical solution, the image to be processed is also a tunnel scan image I that has undergone gray-scale processing, where m=3, and the specific information of each convolution is as follows:

Table 2 Structure feature extraction convolution information table

卷积层Convolutional layer	步长Step size	卷积核Convolution kernel	通道数Number of channels
conv_s1conv_s1	11	3333	3232
conv_s2conv_s2	22	3333	6464
conv_s3conv_s3	22	3333	128128
conv_s2＇conv_s2'	22	3333	256256
conv_s3＇conv_s3'	22	3333	128128
conv_s4conv_s4	22	3333	6464
conv_s5conv_s5	22	3333	3232
conv_s6conv_s6	11	3333	3232

The structural feature has a certain scale invariance, that is, when the scale changes within a certain range, its structural feature remains basically unchanged. Since structural features of different sizes may be easily identified on different convolutional layers, if only conv_sm or conv_sm' is used for identification, information will be lost, and structural features of different sizes cannot be well recognized. The Fs obtained by the structural feature extraction method in the scheme not only retains the information of all convolutional layers, but also the size of the original input image, which is convenient for subsequent better identification of structural features. When performing convolution operation on conv_s2m in step S15, a 1*1 matrix convolution kernel is used to convert the multi-channel convolution layer into a single channel. The structural feature map obtained by the above method is named Fs.

Through the above operations, we have obtained the gradient prior map Fg after the edge feature enhancement, as well as the texture feature map Fv and the structural feature map Fs. The above three feature maps are extracted and enhanced for different features in the original image, in order to be more accurate To identify geological defects, the above feature maps need to be spliced and fused. Here, the contact algorithm is used to splice the three feature maps to obtain a multi-feature spliced map Fc, where Fc is defined as: Fc=Concat((Fg,Fv,Fs), 0). When performing convolution and fusion operations on Fc, the following method is used to sequentially perform the same number of down-sampling convolution operations and up-sampling convolution operations on Fc to obtain Ff, in order to make full use of the three feature information extracted previously ，Ff here is designed to be the same three-channel as Fc. In this technical solution, because the image features of the abnormal target have multiple size levels, in order to detect targets of different sizes, the Fc is continuously down-sampled twice, and then the Fc is continuously up-sampled twice. To restore, the convolution kernel is a 3*3 matrix, the convolution step size is 2, and the information of each convolution is:

Table 3 Feature Fusion Convolution Information Table

卷积层Convolutional layer	步长Step size	卷积核Convolution kernel	通道数Number of channels
conv_1conv_1	22	3333	6464
conv_2conv_2	22	3333	128128
conv_3conv_3	22	3333	128128
conv_4conv_4	22	3333	6464

After the above convolution operation, a large amount of redundant information in the original image is discarded, and the edge feature, texture feature and structural feature are all enhanced, which improves the reliability of subsequent target recognition and also improves the confidence in the image. The noise ratio is conducive to improving the efficiency of subsequent target recognition.

Before identifying the geological defects in the tunnel, the target recognition model must be trained. The target recognition model training method in this technical solution includes the following steps:

S21: Cut several images obtained by the geological radar to obtain the tunnel scan image I. In order to facilitate the calculation of splicing and other requirements, a fixed-size crop of equal proportions is used here;

S23: The multi-feature fusion map Ff is down-sampled and doubled to obtain the corresponding down-sampled feature map Ff'. Because the image features of the abnormal target have multiple size levels, in order to detect targets of different sizes, we will The feature map of the detection regression search is searched on two scales;

S24: Select a priori boxes with different aspect ratios on Ff and Ff', and use the position information (Xi, Yi, Wi, Hi) of each a priori box in the figure and the corresponding abnormal state type Ci as a label Mark; where i is the sequence number of the picture, X and Y are the position coordinates of the a priori frame, and W and H are the width and height of the a priori frame;

In this technical solution, the pre-identification model is a self-learning machine model, including one or more combinations of neural networks, convolutional neural networks, deep neural networks, and feedback neural networks. Due to the different sizes of geological defects, two different sizes of image files of multi-feature fusion map Ff and down-sampling feature map Ff' are used for recognition training. During training, the six sets of a priori boxes are respectively detected and regression under the corresponding scale feature map to obtain the target detection frame coordinates, the corresponding state digital type and the confidence level, and finally the results of the detection on different scales are passed through the NMS The non-maximum suppression screening obtains the final target detection result and outputs it.

After the target recognition model training is completed, the geological defect can be automatically recognized and detected on the radar spectrogram that has been cropped in equal proportions. The target recognition method here includes the following steps:

Receive the image to be processed, that is, the tunnel scan image I cropped in equal proportions;

A target recognition model is used to recognize the target object in the image to be recognized; wherein, the target recognition model is pre-trained using the above method. Referring to Figures 5 and 6, the tunnel scan map after image processing has greatly reduced noise, and edge features, texture features, and structural features are presented more clearly, so the speed and accuracy of recognition can be improved.

At present, in the field of target detection, deep learning-based methods basically use a single network for feature extraction. Although this approach can solve the problem of general scenarios to a certain extent, it is for target detection in special scenarios, such as tunnel detection in this technical solution. In the image, the target imaging that needs to be detected is not obtained by optical imaging, but the digital information obtained by detecting the band signal by radar and then using analog-to-digital conversion is presented in the form of an image. There are a lot of stripes and noise in the picture. Therefore, from it Combining image characteristics at the source of the signal, this technical solution designs a set of multi-dimensional feature extraction, which extracts and merges the gradient prior map Fg, texture feature map Fv, and structural feature map Fs respectively. From Table 4 below, compare the detection results It can be seen from the above that the multi-dimensional feature fusion decision has a significant performance improvement on the detection effect in the special scene of tunnel anomaly detection.

Table 4 Comparison of performance results

The creation of the present invention and its implementation are described schematically above. The description is not restrictive. The present invention can be implemented in other specific forms without departing from the spirit or basic characteristics of the present invention. What is shown in the drawings is only one of the embodiments created by the present invention, and the actual structure is not limited to this, and any reference signs in the claims should not limit the related claims. Therefore, if a person of ordinary skill in the art receives its enlightenment, and does not deviate from the purpose of this creation, without creative design, structural methods and embodiments similar to the technical solution should fall within the scope of protection of this patent. In addition, the word "comprising" does not exclude other elements or steps, and the word "a" before an element does not exclude the inclusion of "plurality" of the element. Multiple elements stated in the product claims can also be implemented by one element through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.

Claims

An image processing method, characterized in that it comprises the following steps:

Extract the edge features of the image to be processed to obtain the gradient prior image Fg;

Extract the texture feature of the image to be processed to obtain the texture feature map Fv;

Extract the structural features of the image to be processed to obtain the structural feature map Fs;

Connect the gradient prior map Fg, the texture feature map Fv and the structural feature map Fs to contact stitching to obtain a multi-feature stitching image Fc;

The convolution fusion operation is performed on the multi-feature mosaic image Fc to obtain the multi-feature fusion image Ff.
An image processing method according to claim 1, wherein the gradient transformation algorithm is used to extract the edge features of the image to be processed, and the method of the gradient transformation algorithm to extract the edge features is:

Let f(x,y) be the gray value of the (x,y) point on the image to be processed, G(x) is the image gray value of the edge detection in the x direction of the image to be processed, and G(y) is the image to be processed The image gray value of edge detection in the y direction, a is the convolution factor of G(x), b is the convolution factor of G(y), then the image gray value of each pixel on the gradient prior image Fg is :

The edge feature on the gradient priori map Fg is presented by the gray value of each pixel.
An image processing method according to claim 2, wherein the following steps are used when extracting texture features of the image to be processed:

S1: Perform n consecutive convolution operations on the image to be processed, and obtain convolutional layer v1, convolutional layer v2, ..., convolutional layer vn in sequence;

S2: Perform another convolution operation on the convolutional layer v1 to obtain the convolutional layer v1';

S3: The texture feature map Fv is obtained by adding the convolutional layer v1 and the convolutional layer v1'.
An image processing method according to claim 3, wherein the following steps are used when extracting structural features of the image to be processed:

S11: Perform one convolution operation on the image to be processed to obtain the convolution layer s1;

S12: Perform two consecutive m-1 convolution operations on the convolutional layer s1 to obtain the convolutional layer s2, convolutional layer s3,..., convolutional layer sm, convolutional layer s2', and convolutional layer s3, respectively ',..., convolutional layer sm';

S13: The convolutional layer sm and the convolutional layer sm' in step S12 are spliced through contact to obtain the convolutional layer sm+1;

S14: The convolutional layer sm+1 in step S13 and the convolutional layer sm-1', convolutional layer sm-2',..., convolutional layer s2' in step S12 are continuously spliced through contact, and finally the convolution is obtained Build-up s2m-1;

S15: Join the convolutional layer s1 in S11 and the convolutional layer s2m-1 in step S14 through contact to obtain the convolutional layer s2m, and perform a convolution operation on the convolutional layer s2m to obtain Fs.
An image processing method according to claim 3 or 4, characterized in that, in the step S1, the image to be processed is subjected to the first convolution operation, and the convolution kernel is a 3*3 matrix when the convolution layer v1 is obtained , The convolution kernel of the convolution operation in the step S2 is a 5*5 matrix.
An image processing method according to claim 4, wherein the convolution operation of the convolution layer s2m in the step S15 adopts a convolution kernel as a 1*1 matrix.
A method for training a target recognition model is characterized in that it comprises the following steps:

S21: Cut several images obtained by the ground penetrating radar to obtain a tunnel scan image I;

S22: Use the tunnel scan image I as a to-be-processed image to obtain a multi-feature fusion image Ff through the image processing method of any one of claims 1-6;

S23: Perform a down-sampling operation on the multi-feature fusion map Ff to obtain the corresponding down-sampled feature map Ff';

S24: Select a priori boxes with different aspect ratios on Ff and Ff', and mark the position information of each a priori box in the figure and the corresponding abnormal state type as a label;

S25: Input a priori box with a label as a training sample into a pre-recognition model for training to obtain a target recognition model.
The method for training a target recognition model according to claim 7, wherein the pre-recognition model is a machine self-learning model, including one of a neural network, a convolutional neural network, a deep neural network, and a feedback neural network, or Multiple combinations.
A target recognition model training method according to claim 7, characterized in that, in the step S23, the multi-feature fusion map Ff is down-sampled and doubled to obtain Ff'.
A target recognition method is characterized in that it comprises the following steps:

Receive the image to be processed;

A target recognition model is used to recognize the target object in the image to be recognized; wherein the target recognition model is obtained by pre-training using the method of any one of claims 7-9.