WO2023060575A1 - Image recognition method and apparatus, and electronic device and storage medium - Google Patents

Image recognition method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2023060575A1
WO2023060575A1 PCT/CN2021/124169 CN2021124169W WO2023060575A1 WO 2023060575 A1 WO2023060575 A1 WO 2023060575A1 CN 2021124169 W CN2021124169 W CN 2021124169W WO 2023060575 A1 WO2023060575 A1 WO 2023060575A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
similarity
block
recognized
information
Prior art date
Application number
PCT/CN2021/124169
Other languages
French (fr)
Chinese (zh)
Inventor
许震宇
张锲石
程俊
康宇航
任子良
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2021/124169 priority Critical patent/WO2023060575A1/en
Publication of WO2023060575A1 publication Critical patent/WO2023060575A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces

Definitions

  • the present application belongs to the technical field of image processing, and in particular relates to an image recognition method, device, electronic equipment and storage medium.
  • the image recognition process includes: collecting an image corresponding to an area currently to be recognized; and determining an image recognition result according to the similarity between the collected image and the pre-stored image in the image library.
  • the robustness and accuracy of current image recognition are low.
  • Embodiments of the present application provide an image recognition method, device, electronic device, and storage medium, so as to solve the problem of low robustness and low accuracy of image recognition in the prior art.
  • the first aspect of the embodiments of the present application provides an image recognition method, including:
  • Dividing the image to be processed into a preset number of block images wherein, the image to be processed includes an image to be identified and a reference image, the block image corresponding to the image to be identified is a block image to be identified, the The block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one corresponding position information of the reference block image;
  • a recognition result of the image to be recognized is determined according to the target similarity.
  • the second aspect of the embodiments of the present application provides an image recognition device, including:
  • a segmentation unit configured to divide the image to be processed into a preset number of block images; wherein, the image to be processed includes an image to be recognized and a reference image, and the block image corresponding to the image to be recognized is a block image to be recognized A block image, the block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one corresponding position information of the reference block image;
  • a feature extraction unit configured to perform feature extraction processing on each of the block images to obtain block feature information corresponding to each of the block images
  • a similarity weight determining unit configured to determine a similarity weight between the image to be recognized and the reference image according to the position information of each of the block images and the corresponding block feature information
  • a target similarity determining unit configured to determine the target similarity between the image to be recognized and the reference image according to each of the block feature information and the similarity weight
  • the recognition result determination unit is configured to determine the recognition result of the image to be recognized according to the target similarity.
  • the third aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program
  • the electronic device is made to implement the steps of the image recognition method.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the electronic device realizes the image recognition method steps.
  • a fifth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the image recognition method described in the first aspect above.
  • the image to be processed can be segmented first to obtain the block image and then the feature extraction is performed, the detailed feature information of the image can be extracted more accurately, so that the subsequent similarity calculation can be performed more accurately according to the feature information of each block, thereby improving The accuracy of image recognition.
  • the block feature information of the block images located at different positions in the image to be processed can represent the features of the image to be processed under different viewing angles, the similarity obtained based on the position information and block feature information of each block image is determined
  • the weight can reflect the similarity information of the image to be recognized and the reference image under different viewing angles. Therefore, the target similarity obtained based on the similarity weight is the similarity obtained by considering the robustness of viewing angle changes. According to the target similarity
  • the obtained recognition result is an accurate recognition result obtained by overcoming the influence of the angle of view change caused by the shooting angle of the camera, so that the robustness and accuracy of image recognition can be improved.
  • FIG. 1 is a schematic diagram of an implementation flow of an image recognition method provided in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a similarity weight construction process provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a matching process between an image to be recognized and a reference image provided in an embodiment of the present application
  • FIG. 4 is a schematic flow diagram of training a target model based on a triplet method provided by an embodiment of the present application
  • Fig. 5 is a schematic diagram of an image recognition device provided by an embodiment of the present application.
  • Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of an image recognition method provided by an embodiment of the present application.
  • the execution subject of the image recognition method in this embodiment is an electronic device, which includes but is not limited to smart phones, tablet computers, desktop computers, servers and other computing devices.
  • the image recognition methods shown in Figure 1 include:
  • the image to be processed is divided into a preset number of block images; wherein, the image to be processed includes an image to be identified and a reference image, and the block image corresponding to the image to be identified is a block to be identified
  • the block image corresponding to the reference image is a reference block image
  • each block image to be identified has a one-to-one correspondence of position information with the reference block image.
  • the image to be recognized is an unknown image that needs to be recognized currently (that is, the entity information contained in the image is unknown), and the reference image is a known image pre-stored in the image library (that is, the entity information contained in the image is unknown).
  • a known For the recognition process of the current image to be recognized, it can be summarized as follows: by comparing the features of the image to be recognized with the reference image, determine whether the image to be recognized matches the reference image, and if they match, use the known entity information in the reference image as the identification information of the image to be identified.
  • an image to be recognized and a reference image may be obtained and combined into a pair of images to be processed.
  • the image to be recognized can be received from the shooting device, and the reference image can be obtained from a preset image library.
  • the preset number is a value set for extraction, for example, the preset number may be 4.
  • the above-mentioned division is specifically equal division. That is, for each image to be processed, it is equally divided according to a preset number to obtain a preset number of equally divided block images corresponding to the image to be processed.
  • the image to be recognized is equally divided into a preset number of parts to obtain a preset number of block images corresponding to the image to be recognized, and the block image corresponding to the image to be recognized is called a block image to be recognized block image.
  • the reference image is equally divided into a preset number of parts to obtain a preset number of block images corresponding to the reference image, and the block image corresponding to the reference image is called a reference block image. For each block image to be recognized in the image to be recognized, there is a reference block image with one-to-one correspondence of position information in the reference image.
  • the feature extraction process is performed on the segmented image to be identified to obtain the feature information corresponding to the segmented image to be identified, and the feature information is referred to as the segment to be identified characteristic information.
  • Feature extraction processing is performed on a reference block image in the block image to obtain feature information corresponding to the reference block image, and the feature information is referred to as reference block feature information.
  • the feature extraction processing of the segmented image can be realized by a pre-trained neural network model.
  • each segmented image obtained by segmentation may be subjected to feature extraction processing one by one in sequence.
  • the feature extraction process can be performed on more than one (or even all the block images) at a time, thereby improving the efficiency of the feature extraction process.
  • the location information of the block image refers to information about the location of the block image in the original image to be processed.
  • the location information can be represented by (j,k), where j indicates that the block image is located in the jth row of the image to be processed, and k indicates that the block image is located in the kth column of the image to be processed middle.
  • the block images obtained by segmenting the image to be processed are numbered sequentially from left to right and then from top to bottom, and the number information i is used as the position information of the block images.
  • the image to be processed is divided into four divided images, and the position information of each divided image is represented by 1, 2, 3, and 4 in sequence.
  • each block image obtained by segmentation according to the position information of the block image in the original image to be processed and the block feature information corresponding to the block image, obtain the block image to be identified and the reference block image in the block image.
  • the similarity between each block image, the similarity between each block image to be recognized, and the similarity between each reference image, and these similarities are combined to obtain the similarity relationship between each block image.
  • the similarity weight between the image to be recognized and the reference image can be determined, and the similarity weight can represent the similarity relation information between the block images.
  • the block feature information includes the feature information of the block to be identified and the feature information of the reference block, and the similarity calculation between the image to be identified and the reference image is performed by combining the feature information of each block to be identified, the feature information of each reference block and the similarity weight , to get the target similarity.
  • a recognition result of the image to be recognized is determined according to the target similarity.
  • the recognition result of the image to be recognized is determined according to the value of the target similarity. In one embodiment, if the target similarity is less than a preset similarity threshold, it is determined that the recognition result of the image to be recognized is a recognition failure. In another embodiment, if the target similarity is greater than or equal to the preset similarity threshold, it is determined that the recognition result of the image to be recognized is a successful recognition, and according to the known entity information in the reference image corresponding to the image to be recognized , to determine the identification information of the image to be identified.
  • the identification information of the image to be recognized is "puppy”
  • the entity information of the reference image is "place A” (that is, the reference image is an image corresponding to place A)
  • the identification information of the image to be recognized is "place A”.
  • the image to be processed can be segmented to obtain the block image first, and then the feature extraction is performed, the detailed feature information of the image can be extracted more accurately, so that the follow-up can be performed more accurately according to the feature information of each block. Similarity calculation, thereby improving the accuracy of image recognition.
  • the block feature information of the block images located at different positions in the image to be processed can represent the features of the image to be processed under different viewing angles, the similarity obtained based on the position information and block feature information of each block image is determined
  • the weight can reflect the similarity information of the image to be recognized and the reference image under different viewing angles. Therefore, the target similarity obtained based on the similarity weight is the similarity obtained by considering the robustness of viewing angle changes. According to the target similarity
  • the obtained recognition result is an accurate recognition result obtained by overcoming the influence of the angle of view change caused by the shooting angle of the camera, so that the robustness and accuracy of image recognition can be improved.
  • step S102 includes:
  • For each of the block images input the block images into a trained convolutional neural network for processing to obtain initial feature information corresponding to the block images;
  • feature extraction processing is performed on the segmented image through a trained convolutional neural network.
  • Each block image is sequentially input into the trained convolutional neural network, and the initial feature information corresponding to the block image is obtained through the convolution processing of each layer of the convolutional neural network.
  • the trained convolutional neural network may be AlexNet.
  • the image size is scaled to 3 ⁇ 224 ⁇ 224 to obtain an image I m in RGB (Red, Green, Blue, red, green, blue) color mode, and the image Im is input to AlexNet for feature extraction deal with. Since the feature location recognition effect of the fifth convolutional layer (Conv_5) output of AlexNet is the best, the feature with a dimension of 256 ⁇ 6 ⁇ 6 output by Conv_5 is extracted as the initial feature information to improve the accuracy of feature extraction.
  • dimensionality reduction processing is performed on the initial feature information to obtain feature information with a lower dimension as the block feature information of the block image.
  • This dimensionality reduction process can be implemented by operations such as pooling or downsampling.
  • the feature of the block image can be accurately extracted through the convolutional neural network, and the block feature information of the block image can be obtained through dimensionality reduction processing, it is possible to ensure the accuracy of feature extraction while , reduce system resource consumption for subsequent calculations, and improve image recognition efficiency. Moreover, since the block feature information after dimensionality reduction processing has certain robustness to the feature information with too high dimensionality, it can still Based on this robustness, the success rate of image recognition can be improved.
  • the pooling operation can reduce the dimensionality of feature information through simple calculations, while ensuring the accuracy of subsequent recognition.
  • the average pooling operation captures all the details of the entire scene of the image. Compared with the maximum pooling operation, it can better extract the features of the image scene.
  • the Adaptive Average Pooling (AAP, Adaptive Average Pooling) layer is a processing layer that adaptively performs an average pooling operation on the input feature information of a specified size. Therefore, the initial feature information is input
  • the adaptive average pooling layer can reduce the dimensionality of the initial feature information.
  • the dimensionality reduction autoencoder in the embodiment of the present application is specifically an autoencoder for reducing the dimension of feature information.
  • Autoencoder (autoencoder, AE) is a type of artificial neural network (Artificial Neural Networks, ANNs) used in semi-supervised learning and unsupervised learning. Its function is to perform representation learning on input information by using input information as a learning target. (representation learning).
  • the self-encoder consists of two parts: an encoder (Encoder) and a decoder (Decoder).
  • the encoder can encode the input feature information to achieve information compression and reduce the dimension of the feature information; and the decoder in the autoencoder can restore the feature information compressed by the encoder.
  • the encoder part of the dimensionality reduction autoencoder is used to realize dimensionality reduction processing of feature information, so that the feature information based on the output of the encoder can achieve more robust and efficient Image Identification.
  • the initial feature information of the block image can be input into the adaptive average pooling layer for processing, and then further input into the dimensionality reduction autoencoder for processing, so as to obtain the block feature information corresponding to the block image with a lower dimension .
  • the initial feature information with a feature dimension of 256 ⁇ 6 ⁇ 6 is input into the adaptive average pooling layer for average pooling processing, and expanded into a one-dimensional vector by the expansion function Flatten( ), and the dimension is (1 ⁇ 4096) pooling features.
  • the pooled feature is input into the dimensionality reduction autoencoder for processing, and the feature information with a dimension of 1 ⁇ 256 output by the encoder in the dimensionality reduction autoencoder is obtained as the block feature information of the block image.
  • the dimension of the initial feature information can be efficiently and accurately reduced, and the efficiency and accuracy of subsequent image recognition can be improved.
  • the method before inputting the block image into a trained convolutional neural network for processing for each block image to obtain initial feature information corresponding to the block image, the method further includes:
  • the trained dimensionality reduction autoencoder is the dimensionality reduction autoencoder to be trained. obtained after training.
  • a preset number of preset sample feature information may be obtained, and the preset sample feature information is input into the dimensionality reduction autoencoder to be trained, and the training of the dimensionality reduction autoencoder to be trained is started.
  • the preset sample characteristic information is the characteristic information of the block sample image obtained by performing segmentation and feature extraction processing on the sample image in advance.
  • the decoding feature information output by the decoder part of the dimensionality reduction autoencoder is obtained.
  • the decoded feature information is feature information obtained by decoding and restoring the encoded feature information output by the encoder of the dimensionality-reduced self-encoder.
  • the preset mean square error loss function calculate the mean square error between the input preset sample feature information and the decoded feature information restored by the decoder, and reversely adjust the dimensionality reduction autoencoder to be trained according to the calculation result parameters, until the mean square error between the preset sample feature information and the feature information output by the decoder of the dimensionality reduction autoencoder is less than the preset threshold, stop the current training and obtain the trained dimensionality reduction autoencoder.
  • the dimensionality reduction autoencoder is trained in advance, and an accurate trained dimensionality reduction autoencoder can be obtained, so that the subsequent dimensionality reduction processing can be accurately performed on the initial feature information according to the dimensionality reduction autoencoder, Improve the accuracy of image recognition.
  • step S103 includes:
  • a paired similarity feature vector and an unpaired similarity feature vector are determined; wherein, the paired similarity feature vector includes position information relative The corresponding similarity information between the block image to be identified and the reference block image; the unpaired similarity feature vector includes similarity information between the block images whose position information does not correspond;
  • the similarity weight is determined according to the paired similarity feature vector and the non-paired similarity feature vector.
  • the image to be recognized and the reference image are divided into a preset number of block images according to the same segmentation method, for each block image to be recognized of the image to be recognized, there are The reference tiled image for .
  • the image to be recognized I 1 is divided into four block images to be recognized I 11 , I 12 , I 13 , and I 14
  • the reference image I 2 is divided into four corresponding reference block images I 21 , I 22 .
  • _ _ _ _ 22 the reference block image corresponding to the position information corresponding to the block image I 13 to be identified is I 23
  • the reference block image corresponding to the position information corresponding to the block image I 14 to be identified is I 24 .
  • the first subscript of I is used to distinguish which image to be processed the current block image belongs to; the second subscript of I is used to distinguish which block image the current block image is the image to be processed , which may reflect the location information of the block image in the image to be processed.
  • the similarity calculation is performed according to the block feature information of the two, and the corresponding block images corresponding to each group of position information can be obtained. Each pairwise similarity. Each pairwise similarity is combined to obtain a pairwise similarity feature vector.
  • the similarity between block images whose positions do not correspond is called non-pairwise similarity.
  • the non-pairwise similarity includes: each block image to be identified is different from the reference block image whose position information is different (for example, the above second block image to be identified and the reference block image with different subscripts) The similarity between the block images to be recognized in each different position (such as the different block images to be recognized obtained by segmenting the same image to be recognized), and the reference block images in each different position (such as the same The similarity between different reference block images obtained by segmenting the reference image).
  • These unpaired similarities can be combined to obtain unpaired similarity feature vectors.
  • the similarity calculation of the two block images can be performed by the preset formula (1), This formula (1) is as follows:
  • C(x, y) represents the normalized cosine similarity calculated from the block feature information x of the first block image and the block feature information y of the second block image.
  • indicates the norm operation
  • * indicates the multiplication sign
  • the value range of the similarity obtained by solving the above formula (1) is [0,1].
  • the similarity relationship vector contains the similarity between the block images of different positional relationships of the image to be processed, the similarity relationship vector can represent the similarity between the block images corresponding to the image to be recognized and the reference image relation.
  • a weighting operation is performed according to the values of each element of the relative relationship vector to obtain the similarity weight between the image to be recognized and the reference image.
  • the similarity weight can accurately represent the impact of visual changes between the image to be recognized and the reference image on the image similarity.
  • the similarity relationship between the position-based block images can be accurately expressed through the paired similarity feature vector and the unpaired similarity feature vector, it can be solved based on the similarity relationship to obtain the applicable Based on the similarity weight of the image similarity calculation, the accuracy of the final similarity calculation is improved, thereby improving the accuracy of image recognition.
  • the determining the similarity weight according to the paired similarity feature vector and the non-paired similarity feature vector includes:
  • the similarity weight is determined according to the paired similarity feature vector, the non-paired similarity feature vector, and a preset weight autoencoder.
  • the weighted autoencoder is an autoencoder trained in advance for determining the weights of the similarity relationship vectors.
  • the similarity relationship vector V is input into the preset weight self-encoder WAE for processing to obtain The weight WAE(V) corresponding to the similarity relationship vector.
  • the similarity relationship vectors are weighted and summed and then normalized to obtain a similarity weight with a value range of [0,1].
  • the similarity weight L can be obtained by the following formula (2):
  • V is the similarity relationship vector obtained by splicing
  • WAE(V) is the weight of the similarity relationship vector V output by the weight self-encoder
  • e is the natural base
  • w and t are preset parameter values, where t is based on the above
  • the current similarity weight can be accurately determined through the preset weight autoencoder, thereby improving the accuracy of subsequent image recognition.
  • the unpaired similarity feature vector includes a first similarity feature vector and a second similarity feature vector
  • the first similarity feature vector includes the block image to be identified and
  • the similarity information between the reference block images, the second similarity feature vector includes similarity information between different block images in the same image to be processed;
  • the determining the similarity weight according to the paired similarity feature vector, the non-paired similarity feature vector and the preset weight autoencoder includes:
  • the second similarity weight is determined according to the paired similarity feature vector, the second similarity feature vector, and a preset second weight autoencoder.
  • the aforementioned unpaired similarity feature vectors include a first similarity feature vector and a second similarity feature vector.
  • the first similarity feature vector includes similarity information between the block image to be identified and the reference block image whose position information does not correspond. For each block image to be recognized of the image to be recognized, any reference block image that does not correspond to the position number of the block image to be identified can be selected from the reference image to form a group of images, and the combination of Each group of obtained images is subjected to similarity calculation to obtain each first similarity, and these non-paired first similarities can be combined to obtain a first similarity feature vector. Due to the change of the shooting angle, the same actual physical position or the image position of the actual object in the image to be recognized is different from the image position in the reference image of another shooting angle.
  • the image region corresponding to building B is the upper left of the image to be recognized, and after image segmentation, the image region corresponding to building B is in the first block image I11 to be recognized.
  • the image area corresponding to the building B is at the lower right of the reference image, and after image segmentation, the image area corresponding to the building B is in the fourth reference block image I 24 .
  • the paired similarity eigenvectors calculated according to the position correspondence cannot reflect the similarity between images when the viewing angle changes.
  • the similarity feature vector can represent the similarity between the block image to be recognized and the reference block image when the viewing angle changes.
  • a group of images can be combined to obtain a group of images.
  • each second similarity can be obtained, and these second similarities can be combined to obtain a second similarity feature vector.
  • For the same actual physical location or the image area occupied by the actual object in the image to be recognized may be relatively large, thus occupying different block images.
  • the image area corresponding to building D can occupy the second block image I 12 to be recognized and the third block image I 13 to be recognized in the image to be recognized at the same time; similarly, the image area corresponding to building D can simultaneously A second reference block image I 22 and a third reference block image I 23 among the reference images are occupied.
  • the above-mentioned second similarity feature vector can represent the similarity between different image blocks in the same image to be processed, and keep different images Continuity of features between blocks.
  • the feature information of the block to be identified corresponding to the block image to be identified whose number is i in the image to be identified is x i
  • the reference block feature information corresponding to the reference block image numbered j in a reference image is y j
  • the reference block feature information corresponding to the above-mentioned reference block image I 21 is y 1
  • C( xi ,y j ) is the similarity between the feature information of the block to be identified and the feature information of the reference block corresponding to the position information calculated by the formula (1), ⁇ Represents a set operation.
  • Formula (3) indicates that the paired similarities corresponding to the paired images combined by the block image to be identified and the reference block image with the same number are combined to obtain the paired similarity feature vector V a .
  • the above-mentioned first similarity feature vector V b can be expressed by the following formula (4):
  • V b ⁇ C(x i ,y j ) ⁇ (i ⁇ j)
  • C( xi , y j ) is the similarity between the feature information of the block to be identified and the feature information of the reference block calculated by the formula (1) and the position information does not correspond (ie i ⁇ j), ⁇ Represents a set operation.
  • the formula (4) indicates that the similarities corresponding to the groups of images composed of block images to be identified and reference block images at different positions are combined to obtain the first similarity feature vector V b .
  • the above-mentioned second similarity feature vector V c can be expressed by the following formula (5):
  • V c ⁇ C(x i ,x j ),C(y i ,y j ) ⁇ (i ⁇ j)
  • C(x i , x j ) is the similarity between the feature information x i and x j of the blocks to be recognized that do not correspond to the position information (that is, i ⁇ j) in the same image to be recognized calculated by formula (1).
  • C(y i , y j ) is the similarity between reference block feature information y i and y j at different positions (ie i ⁇ j) in the same reference image calculated by formula (1).
  • the value range of i and j is a positive integer between 1 and N, and N represents the above-mentioned preset number (that is, the blocks into which an image to be processed is divided number of images).
  • the paired similarity feature vector V a and the first similarity feature vector V b can be combined to obtain the first similarity relationship vector V 1 , which can be expressed by the following formula (6):
  • V 1 ⁇ V a ,V b ⁇
  • V 2 ⁇ V a ,V c ⁇
  • the similarity weight may include a first similarity weight and a second similarity weight.
  • the first similarity weight Alpha is specifically determined according to the first similarity relationship vector V 1 formed by the combination of the paired similarity feature vector V a and the first similarity feature vector V b , and the preset first weight autoencoder .
  • the first similarity weight Alpha can be obtained by the following formula (8):
  • V 1 is the first similarity relationship vector
  • WAE 1 (V 1 ) is the weight of the first similarity relationship vector V 1 obtained through the first weight autoencoder processing
  • e is the natural base
  • the value range of the first similarity weight Alpha obtained by formula (8) is [0,1].
  • the second similarity weight Beta is specifically determined according to the second similarity relationship vector V 2 formed by combining the paired similarity feature vector V a and the second similarity feature vector V c , and the preset second weight autoencoder .
  • the second similarity weight Beta can be obtained by the following formula (9):
  • V 2 is the second similarity relationship vector
  • WAE 2 (V 2 ) is the weight of the second similarity relationship vector V 2 obtained through the second weight autoencoder processing
  • e is the natural base
  • the value range of the second similarity weight Beta obtained by formula (9) is [0,1].
  • the first similarity weight obtained by the above method can represent the similarity between the image to be recognized and the reference image under different viewing angles
  • the second similarity weight can represent the continuity between the block images of the image to be processed
  • the subsequent target similarity calculated based on the first similarity weight and the second similarity weight can comprehensively consider the similarity obtained from the robustness of the image shooting angle, thereby improving the robustness and accuracy of image recognition.
  • step S104 includes:
  • the initial similarity is multiplied by the similarity weight to obtain a target similarity between the image to be recognized and the reference image.
  • the reference block images corresponding to the position of the block image to be identified are respectively obtained and combined into a group of images, and the to-be-recognized image in each group of images is obtained A similarity between the block feature information to be identified of the block image and the block feature information of the corresponding reference block image is identified.
  • the average calculation is performed on the similarities of the preset number of images determined based on the block images of the image to be recognized and the reference image to obtain the initial similarity between the image to be recognized and the reference image. After that, the initial similarity is multiplied by the similarity weight, and the obtained result is used as the target similarity between the image to be recognized and the reference image.
  • the target similarity Similarity can be obtained by the following formula (10):
  • N is the preset number that the image is divided into
  • C( xi , y i ) is the difference between the feature information of the block to be identified corresponding to the same position information and the feature information of the reference block calculated by formula (1).
  • similarity Alpha is the above-mentioned first similarity weight
  • Beta is the above-mentioned second similarity weight.
  • the image to be recognized is a scene image to be recognized
  • the reference image is a reference scene image
  • the recognition result of the image to be recognized includes a visual position recognition result of the scene image to be recognized.
  • the image recognition method in the embodiment of the present application is specifically a visual position recognition method.
  • the essence of visual location recognition is to judge whether the two images indicate the same location. This problem can be transformed into the problem of finding the similarity of the two images. When the two images are similar enough, the similarity is close to 1. The two images Images indicate the same location. Conversely, if the similarity between two images is close to -1, then the two images indicate different locations.
  • visual position recognition has very important application value, and can be applied to various application scenarios such as positioning, remote monitoring, and vehicle navigation. Due to the influence of appearance changes caused by lighting, weather, and seasonal changes, and changes in perspective caused by camera shooting angle changes on image recognition, most current visual position recognition methods cannot accurately perform visual position recognition when unmanned systems encounter drastic environmental changes.
  • the image of the scene to be recognized obtained by shooting the scene to be recognized is used as the image to be recognized, and the pre-stored reference scene image is used as the reference image.
  • the feature extraction and Feature dimensionality reduction improve the robustness of image appearance changes, and ensure the robustness of image visual changes through similarity weights, accurately determine the reference scene image that matches the scene image to be recognized, so that the reference scene image carries
  • the location information of the scene image to be recognized is used as the location information corresponding to the scene image to be recognized, and the visual position recognition result of the scene image to be recognized is obtained. That is, based on the above-mentioned image recognition method described in steps S101 to S105, the robustness requirements of visual position recognition under complex environment changes can be met, and the accuracy of visual position recognition can be improved.
  • FIG. 2 shows a schematic diagram of the construction process of the above-mentioned similarity weight, and the construction process of the similarity weight corresponds to the above-mentioned step S101 to step S104.
  • the process of forming the similarity weight is described in detail as follows:
  • each image to be processed is divided into four corresponding block images. Specifically, the image I 1 to be identified is divided into four block images I 11 , I 12 , I 13 , and I 14 to be identified, and the reference image I 2 is divided into four corresponding reference block images I 21 , I 22 , I 23 , I 24 .
  • A2 Next, input each segmented image into AlexNet for feature extraction processing to obtain the initial feature information corresponding to each segmented image, and then input the dimensionality reduction autoencoder for dimensionality reduction processing, from the middle of the dimensionality reduction autoencoder
  • the block feature information obtained by dimensionality reduction is obtained in the output layer of the encoder.
  • the block feature information output by the encoder of the dimensionality reduction autoencoder specifically includes block feature information x 1 , x 2 , x 3 , x 4 , and reference block feature information y 1 , y 2 , y 3 , y 4 corresponding to the reference block images I 21 , I 22 , I 23 , I 24 respectively.
  • A3 After that, determine the similarity relationship between each block image according to the feature information of each block. Specifically, for the upper part of the similarity relationship shown in Figure 2, it means: for each block image to be identified, calculate the similarity between the block image to be identified and the reference block image with the same corresponding position degree information, combining these similarity information to obtain a paired similarity feature vector V a ; and for each block image to be identified, calculating the similarity information between the block image to be identified and the reference block image whose position is different , combine these similarity information to obtain the first similarity feature vector V b ; after that, the paired similarity feature vector V a and the first similarity feature vector V b are expanded into the first Similarity relationship vector V 1 .
  • A4 Input the first similarity relationship vector V 1 into the first weight autoencoder in the weight autoencoder, after obtaining the weight WAE 1 (V 1 ) corresponding to the first similarity relationship vector, according to the above formula (8 ) to obtain the first similarity weight Alpha.
  • FIG. 3 shows a schematic diagram of a matching process between an image to be recognized and a reference image in image recognition.
  • the image to be recognized Q i and each reference image R i are sequentially formed into a pair of images to be processed, and the image to be processed is passed through the above steps S101 to S105
  • the process of determining the target similarity between the image to be recognized and the reference image in the image to be processed for example, when the preset number of block images segmented from the image to be processed is 4, the target similarity
  • n*n images to be processed can be composed, and n*n target similarities corresponding to n*n images to be processed can be combined into a similarity matrix as shown in FIG.
  • the reference image corresponding to the column where the maximum target similarity is located is the best matching item of the image to be recognized, and the best matching item
  • the corresponding entity information may be used as identification information of the image to be identified.
  • the above-mentioned image recognition process can be realized by processing an object model.
  • the target model can include the above-mentioned AlexNet, dimensionality reduction autoencoder and weight autoencoder.
  • the AlexNet, the dimensionality reduction autoencoder and the weight autoencoder in the target model can be jointly trained based on the sample images to obtain the trained target model.
  • the block images are input into the trained target model, and the initial feature information can be extracted through the trained AlexNet.
  • the dimensionality reduction processing is performed on the initial feature information through the trained dimensionality reduction encoder to obtain the block feature information corresponding to the block image.
  • the position information and block feature information of each block image, as well as the trained weight encoder determine the similarity weight between the image to be recognized and the reference image. Based on the similarity weight, the target similarity between the image to be recognized and the reference image can be determined, and then the recognition result of the image to be recognized can be determined.
  • the above-mentioned target model can be trained based on triplets.
  • the triplet-based training method specifically uses triplet sample images as training sample images input to the target model.
  • the triplet sample image includes an anchor point image (anchor), a positive sample image (pos) and a negative sample image (neg).
  • the anchor image is the reference target image in the image similarity calculation process;
  • the positive sample image refers to the entity information represented by the anchor image is the same (for example, an image corresponding to the same location) but the environmental conditions (including lighting, etc.) images with different appearance conditions, shooting angles, etc.); negative sample images refer to images with different entity information represented by the anchor image (for example, images corresponding to different locations).
  • the schematic diagram of the training process of the target model is shown in Figure 4, and the details are as follows:
  • Formula (11) indicates that among the triplet sample images, the similarity of the positive samples corresponding to the positive sample images with the same entity information represented by the anchor image should be greater than the similarity of the negative samples corresponding to the negative sample images.
  • margin is a preset value in advance.
  • Formula (12) indicates that in triplet sample images, the difference between positive sample similarity and negative sample similarity should not be too large.
  • B2 Segment the filtered triplet sample images and input them into the target model, and perform feature extraction processing through AlexNet of the target model to obtain the initial feature information corresponding to the block images of each sample image.
  • Each initial feature information is sequentially input into the adaptive average pooling layer AAP of the target model for average pooling processing and expanded by the expansion function Flatten( ) to obtain the pooled feature f.
  • Each pooling feature f is input to the dimensionality reduction autoencoder for dimensionality reduction processing, the feature information output by the encoder part of the dimensionality reduction autoencoder is used as the block feature information w, and the dimensionality reduction autoencoder The feature information output by the decoder part is used as the decoding feature information z.
  • B5 Based on the block feature information of each block image corresponding to the negative sample image and the anchor point image and the weight autoencoder of the target model, determine the similarity weight between the negative sample image and the anchor point image, and based on the similarity The degree weight is calculated to obtain the negative sample similarity S neg between the negative sample image and the anchor image.
  • the dimensionality reduction encoding is obtained by calculating the preset formula (13) as a loss function
  • the mean square error value L Mse corresponding to the device. This formula (13) is as follows:
  • f an and z an represent the pooling feature and decoding feature information corresponding to the anchor image respectively; f pos and z pos represent the pooling feature and decoding feature information corresponding to the positive sample image respectively; f neg and z neg represent the negative
  • 2 indicates the two-norm operator.
  • L Mse calculated by formula (13) it means that the decoding feature information output by the decoder part of the dimensionality reduction autoencoder is closer to the input pooling feature, indicating that the dimensionality reduction autoencoder's encoding
  • the block feature information obtained by encoder encoding can more accurately represent the features of the image.
  • the triplet network loss value L Triplet is calculated by using the preset formula (14) as a loss function.
  • This formula (14) is as follows:
  • L Triplet calculated by the formula (14)
  • ⁇ 1 and ⁇ 2 are used as hyperparameters of the target model, and their actual values are set in advance according to experience.
  • the training of the target model can be accurately completed, and the trained target model can be obtained, so that the subsequent training can be based on the trained AlexNet, dimensionality reduction encoder and weight encoder in the trained target model. Realize image recognition and improve the accuracy of image recognition.
  • FIG. 5 shows a schematic structural diagram of an image recognition device provided by the embodiment of the present application. For the convenience of description, only the parts related to the embodiment of the present application are shown:
  • the image recognition device includes: a segmentation unit 51 , a feature extraction unit 52 , a similarity weight determination unit 53 , a target similarity determination unit 54 and a recognition result determination unit 55 . in:
  • a segmentation unit 51 configured to divide the image to be processed into a preset number of block images; wherein, the image to be processed includes an image to be identified and a reference image, and the block image corresponding to the image to be identified is a block image to be identified For a block image, the block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one correspondence of position information with the reference block image.
  • the feature extraction unit 52 is configured to perform feature extraction processing on each of the block images to obtain block feature information corresponding to each of the block images.
  • the similarity weight determination unit 53 is configured to determine the similarity weight between the image to be recognized and the reference image according to the position information of each of the block images and the corresponding block feature information.
  • the target similarity determination unit 54 is configured to determine the target similarity between the image to be recognized and the reference image according to each of the block feature information and the similarity weight.
  • the recognition result determination unit 55 is configured to determine the recognition result of the image to be recognized according to the target similarity.
  • the feature extraction unit includes:
  • the initial feature information determination module is used for, for each of the block images, inputting the block images into a trained convolutional neural network for processing to obtain initial feature information corresponding to the block images;
  • a dimensionality reduction module configured to perform dimensionality reduction processing on the initial feature information to obtain block feature information corresponding to the block image.
  • the dimensionality reduction module is specifically configured to input the initial feature information into the adaptive average pooling layer and/or the trained dimensionality reduction autoencoder for dimensionality reduction processing, to obtain the corresponding Block feature information.
  • the image recognition device also includes:
  • a dimensionality reduction autoencoder training unit configured to obtain preset sample feature information, and input the preset sample feature information into the dimensionality reduction autoencoder to be trained; adjust the parameters of the dimensionality reduction autoencoder to be trained, In order to make the mean square error between the preset sample feature information and the decoding feature information output by the decoder of the dimensionality reduction autoencoder smaller than a preset threshold, a trained dimensionality reduction autoencoder is obtained.
  • the similarity weight determination unit includes:
  • a similarity feature vector determination module configured to determine a paired similarity feature vector and an unpaired similarity feature vector according to the position information of each of the block images and the corresponding block feature information; wherein, the paired A pair of similarity feature vectors includes similarity information between the block image to be identified corresponding to the position information and the reference block image; the unpaired similarity feature vector includes the block image not corresponding to the position information Similarity information between block images;
  • a similarity weight determining module configured to determine the similarity weight according to the paired similarity feature vector and the non-paired similarity feature vector.
  • the similarity weight determination module is specifically configured to determine the similarity weight according to the paired similarity feature vector, the non-paired similarity feature vector, and a preset weight autoencoder.
  • the unpaired similarity feature vector includes a first similarity feature vector and a second similarity feature vector
  • the first similarity feature vector includes the block image to be identified and
  • the similarity information between the reference block images, the second similarity feature vector includes similarity information between different block images in the same image to be processed;
  • the similarity weight includes a first similarity weight and a second similarity weight
  • the similarity weight determination module is specifically configured to and the preset first weight autoencoder to determine the first similarity weight; according to the paired similarity feature vector, the second similarity feature vector and the preset second weight autoencoder, determine the second similarity weight.
  • the target similarity determination unit is specifically configured to determine an initial similarity between the image to be recognized and the reference image according to each of the block feature information; combine the initial similarity with the The similarity weight is multiplied to obtain the target similarity between the image to be recognized and the reference image.
  • the image to be recognized is a scene image to be recognized
  • the reference image is a reference scene image
  • the recognition result of the image to be recognized includes a visual position recognition result of the scene image to be recognized.
  • Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 6 of this embodiment includes: a processor 60 , a memory 61 , and a computer program 62 stored in the memory 61 and operable on the processor 60 , such as an image recognition program.
  • the processor 60 executes the computer program 62, the steps in the above-mentioned various image method embodiments are implemented, for example, steps S101 to S105 shown in FIG. 1 .
  • the processor 60 executes the computer program 62, it realizes the functions of each module/unit in the above-mentioned device embodiments, such as the functions of the segmentation unit 51 to the recognition result determination unit 55 shown in FIG. 5 .
  • the computer program 62 can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 60 to complete this application.
  • the one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 62 in the electronic device 6 .
  • the electronic device 6 may be computing devices such as desktop computers, notebooks, palmtop computers, and cloud servers.
  • the electronic device may include, but not limited to, a processor 60 and a memory 61 .
  • FIG. 6 is only an example of the electronic device 6, and does not constitute a limitation to the electronic device 6. It may include more or less components than those shown in the illustration, or combine some components, or different components. , for example, the electronic device may also include an input and output device, a network access device, a bus, and the like.
  • the so-called processor 60 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the storage 61 may be an internal storage unit of the electronic device 6 , such as a hard disk or memory of the electronic device 6 .
  • the memory 61 can also be an external storage device of the electronic device 6, such as a plug-in hard disk equipped on the electronic device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Further, the memory 61 may also include both an internal storage unit of the electronic device 6 and an external storage device.
  • the memory 61 is used to store the computer program and other programs and data required by the electronic device.
  • the memory 61 can also be used to temporarily store data that has been output or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An image recognition method, which is applicable to the technical field of image processing and comprises: segmenting into a preset number of block images an image to be processed, wherein the image to be processed comprises an image to be recognized and a reference image; performing feature extraction processing on each block image to obtain block feature information corresponding to each block image (S102); determining a similarity weight between the image to be recognized and the reference image according to position information of each block image and the corresponding block feature information (S103); determining a target similarity between the image to be recognized and the reference image according to each piece of block feature information and the similarity weight (S104); and determining, according to the target similarity, a recognition result of the image to be recognized (S105). The present invention can improve the robustness and accuracy of image recognition.

Description

图像识别方法、装置、电子设备及存储介质Image recognition method, device, electronic device and storage medium 技术领域technical field
本申请属于图像处理技术领域,尤其涉及一种图像识别方法、装置、电子设备及存储介质。The present application belongs to the technical field of image processing, and in particular relates to an image recognition method, device, electronic equipment and storage medium.
背景技术Background technique
图像识别作为人工智能的一个重要领域,被广泛应用于无人系统中。通常,图像识别过程包括:采集当前需要识别的区域对应的图像;根据该采集到的图像与图像库的预存图像之间的相似度,确定图像识别结果。然而,受拍摄条件的影响,目前图像识别的鲁棒性和准确率较低。As an important field of artificial intelligence, image recognition is widely used in unmanned systems. Generally, the image recognition process includes: collecting an image corresponding to an area currently to be recognized; and determining an image recognition result according to the similarity between the collected image and the pre-stored image in the image library. However, due to the impact of shooting conditions, the robustness and accuracy of current image recognition are low.
技术问题technical problem
本申请实施例提供了一种图像识别方法、装置、电子设备及存储介质,以解决现有技术中图像识别的鲁棒性和准确率较低的问题。Embodiments of the present application provide an image recognition method, device, electronic device, and storage medium, so as to solve the problem of low robustness and low accuracy of image recognition in the prior art.
技术解决方案technical solution
为解决上述技术问题,本申请实施例采用的技术方案是:In order to solve the above-mentioned technical problems, the technical solution adopted in the embodiment of the present application is:
本申请实施例的第一方面提供了一种图像识别方法,包括:The first aspect of the embodiments of the present application provides an image recognition method, including:
将待处理图像分割为预设数目个分块图像;其中,所述待处理图像包括待识别图像和参考图像,所述待识别图像对应的所述分块图像为待识别分块图像,所述参考图像对应的所述分块图像为参考分块图像;每个所述待识别分块图像均存在位置信息一一对应的所述参考分块图像;Dividing the image to be processed into a preset number of block images; wherein, the image to be processed includes an image to be identified and a reference image, the block image corresponding to the image to be identified is a block image to be identified, the The block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one corresponding position information of the reference block image;
分别对每个所述分块图像进行特征提取处理,得到每个所述分块图像分别对应的分块特征信息;performing feature extraction processing on each of the block images respectively, to obtain block feature information corresponding to each of the block images;
根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定所述待识别图像和所述参考图像之间的相似度权重;determining a similarity weight between the image to be identified and the reference image according to the position information of each of the block images and the corresponding block feature information;
根据各个所述分块特征信息以及所述相似度权重,确定所述待识别图像和所述参考图像之间的目标相似度;determining the target similarity between the image to be recognized and the reference image according to each of the block feature information and the similarity weight;
根据所述目标相似度,确定所述待识别图像的识别结果。A recognition result of the image to be recognized is determined according to the target similarity.
本申请实施例的第二方面提供了一种图像识别装置,包括:The second aspect of the embodiments of the present application provides an image recognition device, including:
分割单元,用于将待处理图像分割为预设数目个分块图像;其中,所述待处理图像包括待识别图像和参考图像,所述待识别图像对应的所述分块图像为待识别分块图像,所述参考图像对应的所述分块图像为参考分块图像;每个所述待识别分块图像均存在位置信息一一对应的所述参考分块图像;A segmentation unit, configured to divide the image to be processed into a preset number of block images; wherein, the image to be processed includes an image to be recognized and a reference image, and the block image corresponding to the image to be recognized is a block image to be recognized A block image, the block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one corresponding position information of the reference block image;
特征提取单元,用于分别对每个所述分块图像进行特征提取处理,得到每个所述分块图像分别对应的分块特征信息;A feature extraction unit, configured to perform feature extraction processing on each of the block images to obtain block feature information corresponding to each of the block images;
相似度权重确定单元,用于根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定所述待识别图像和所述参考图像之间的相似度权重;A similarity weight determining unit, configured to determine a similarity weight between the image to be recognized and the reference image according to the position information of each of the block images and the corresponding block feature information;
目标相似度确定单元,用于根据各个所述分块特征信息以及所述相似度权重,确定所述待识别图像和所述参考图像之间的目标相似度;A target similarity determining unit, configured to determine the target similarity between the image to be recognized and the reference image according to each of the block feature information and the similarity weight;
识别结果确定单元,用于根据所述目标相似度,确定所述待识别图像的识别结果。The recognition result determination unit is configured to determine the recognition result of the image to be recognized according to the target similarity.
本申请实施例的第三方面提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,当所述处理器执行所述计算机程序时,使得电子设备实现如所述图像识别方法的步骤。The third aspect of the embodiments of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program When, the electronic device is made to implement the steps of the image recognition method.
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,使得电子设备实现如所述图像识别方法的步骤。The fourth aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the electronic device realizes the image recognition method steps.
本申请实施例的第五方面提供了一种计算机程序产品,当计算机程序产品在电子设备 上运行时,使得电子设备执行上述第一方面所述的图像识别方法。A fifth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the image recognition method described in the first aspect above.
有益效果Beneficial effect
由于能够先对待处理图像进行分割得到分块图像后再进行特征提取,因此能够更加准确地提取得到图像的细节特征信息,使得后续根据各个分块特征信息能够更准确地进行相似度计算,从而提高图像识别的准确率。并且,由于位于待处理图像中不同位置的分块图像的分块特征信息能够表示该待处理图像不同视角下的特征,使得基于各个分块图像的位置信息和分块特征信息确定得到的相似度权重,能够体现待识别图像和参考图像在不同视角下的相似度信息,因此,基于该相似度权重得到的目标相似度为考虑了视角变化的鲁棒性得到的相似度,根据该目标相似度得到的识别结果为克服相机拍摄角度引起的视角变化的影响所得到的准确的识别结果,从而能够提高图像识别的鲁棒性和准确率。Since the image to be processed can be segmented first to obtain the block image and then the feature extraction is performed, the detailed feature information of the image can be extracted more accurately, so that the subsequent similarity calculation can be performed more accurately according to the feature information of each block, thereby improving The accuracy of image recognition. Moreover, since the block feature information of the block images located at different positions in the image to be processed can represent the features of the image to be processed under different viewing angles, the similarity obtained based on the position information and block feature information of each block image is determined The weight can reflect the similarity information of the image to be recognized and the reference image under different viewing angles. Therefore, the target similarity obtained based on the similarity weight is the similarity obtained by considering the robustness of viewing angle changes. According to the target similarity The obtained recognition result is an accurate recognition result obtained by overcoming the influence of the angle of view change caused by the shooting angle of the camera, so that the robustness and accuracy of image recognition can be improved.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following briefly introduces the drawings that are required in the embodiments or the description of the prior art.
图1是本申请实施例提供的一种图像识别方法的实现流程示意图;FIG. 1 is a schematic diagram of an implementation flow of an image recognition method provided in an embodiment of the present application;
图2是本申请实施例提供的一种相似度权重构造过程的示意图;FIG. 2 is a schematic diagram of a similarity weight construction process provided by an embodiment of the present application;
图3是本申请实施例提供的一种待识别图像和参考图像的匹配过程的示意图;FIG. 3 is a schematic diagram of a matching process between an image to be recognized and a reference image provided in an embodiment of the present application;
图4是本申请实施例提供的一种基于三元组方式对目标模型进行训练的流程示意图;FIG. 4 is a schematic flow diagram of training a target model based on a triplet method provided by an embodiment of the present application;
图5是本申请实施例提供的图像识别装置的示意图;Fig. 5 is a schematic diagram of an image recognition device provided by an embodiment of the present application;
图6是本申请实施例提供的电子设备的示意图。Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the present application.
本申请的实施方式Embodiment of this application
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
实施例一:Embodiment one:
请参见图1,图1是本申请实施例提供的一种图像识别方法的示意流程图。本实施例中图像识别方法的执行主体为电子设备,该电子设备包括但不限于智能手机、平板电脑、台式电脑、服务器等计算设备。如图1所示的图像识别方法包括:Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of an image recognition method provided by an embodiment of the present application. The execution subject of the image recognition method in this embodiment is an electronic device, which includes but is not limited to smart phones, tablet computers, desktop computers, servers and other computing devices. The image recognition methods shown in Figure 1 include:
在S101中,将待处理图像分割为预设数目个分块图像;其中,所述待处理图像包括待识别图像和参考图像,所述待识别图像对应的所述分块图像为待识别分块图像,所述参考图像对应的所述分块图像为参考分块图像;每个所述待识别分块图像均存在位置信息一一对应的所述参考分块图像。In S101, the image to be processed is divided into a preset number of block images; wherein, the image to be processed includes an image to be identified and a reference image, and the block image corresponding to the image to be identified is a block to be identified For an image, the block image corresponding to the reference image is a reference block image; each block image to be identified has a one-to-one correspondence of position information with the reference block image.
本申请实施例中,待识别图像为当前需要进行图像识别的未知图像(即该图像包含的实体信息未知),参考图像为预先存储在图像库中的已知图像(即该图像包含的实体信息已知)。对于当前待识别图像的识别过程,可以概述为:通过待识别图像与参考图像的特征比对,确定该待识别图像与参考图像是否匹配,若匹配,则以该参考图像中已知的实体信息作为该待识别图像的识别信息。本申请实施例中,可以获取一个待识别图像和一个参考图像,组合为一对待处理图像。其中待识别图像可以从拍摄设备接收得到,参考图像可以从预设的图像库中获取得到。In the embodiment of the present application, the image to be recognized is an unknown image that needs to be recognized currently (that is, the entity information contained in the image is unknown), and the reference image is a known image pre-stored in the image library (that is, the entity information contained in the image is unknown). A known). For the recognition process of the current image to be recognized, it can be summarized as follows: by comparing the features of the image to be recognized with the reference image, determine whether the image to be recognized matches the reference image, and if they match, use the known entity information in the reference image as the identification information of the image to be identified. In the embodiment of the present application, an image to be recognized and a reference image may be obtained and combined into a pair of images to be processed. The image to be recognized can be received from the shooting device, and the reference image can be obtained from a preset image library.
本申请实施例中,预设数目为提取设定的数值,例如,预设数目可以为4。在一个实施例中,上述的分割具体为等分分割。即,对于每个待处理图像,根据预设数目进行等分分割,得到该待处理图像对应的预设数目个等分的分块图像。In the embodiment of the present application, the preset number is a value set for extraction, for example, the preset number may be 4. In one embodiment, the above-mentioned division is specifically equal division. That is, for each image to be processed, it is equally divided according to a preset number to obtain a preset number of equally divided block images corresponding to the image to be processed.
在一个实施例中,将待识别图像等分分割为预设数目个部分,得到该待识别图像对应的预设数目个分块图像,将该待识别图像对应的分块图像称为待识别分块图像。将参考图像等分分割为预设数目个部分,得到该参考图像对应的预设数目个分块图像,将该参考图像对应的分块图像称为参考分块图像。待识别图像中的各个待识别分块图像,在参考图像 中均存在位置信息一一对应的参考分块图像。In one embodiment, the image to be recognized is equally divided into a preset number of parts to obtain a preset number of block images corresponding to the image to be recognized, and the block image corresponding to the image to be recognized is called a block image to be recognized block image. The reference image is equally divided into a preset number of parts to obtain a preset number of block images corresponding to the reference image, and the block image corresponding to the reference image is called a reference block image. For each block image to be recognized in the image to be recognized, there is a reference block image with one-to-one correspondence of position information in the reference image.
在S102中,分别对每个所述分块图像进行特征提取处理,得到每个所述分块图像分别对应的分块特征信息。In S102, feature extraction processing is performed on each of the block images to obtain block feature information corresponding to each of the block images.
本申请实施例中,在分割得到各个分块图像后,对其中的待识别分块图像进行特征提取处理,得到该待识别分块图像对应的特征信息,将该特征信息称为待识别分块特征信息。对分块图像中的参考分块图像进行特征提取处理,得到该参考分块图像对应的特征信息,将该特征信息称为参考分块特征信息。In the embodiment of the present application, after dividing and obtaining each segmented image, the feature extraction process is performed on the segmented image to be identified to obtain the feature information corresponding to the segmented image to be identified, and the feature information is referred to as the segment to be identified characteristic information. Feature extraction processing is performed on a reference block image in the block image to obtain feature information corresponding to the reference block image, and the feature information is referred to as reference block feature information.
在一个实施例中,分块图像的特征提取处理可以通过提前训练好的神经网络模型实现。In one embodiment, the feature extraction processing of the segmented image can be realized by a pre-trained neural network model.
在一个实施例中,分割得到的各个分块图像,可以按序依次逐个进行特征提取处理。在另一个实施例中,可以通过开启多线程,每次对一个以上(甚至对所有的分块图像)同时进行特征提取处理,从而提高特征提取处理的效率。In one embodiment, each segmented image obtained by segmentation may be subjected to feature extraction processing one by one in sequence. In another embodiment, by enabling multi-threading, the feature extraction process can be performed on more than one (or even all the block images) at a time, thereby improving the efficiency of the feature extraction process.
在S103中,根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定所述待识别图像和所述参考图像之间的相似度权重。In S103, according to the position information of each of the block images and the corresponding block feature information, determine the similarity weight between the image to be recognized and the reference image.
本申请实施例中,分块图像的位置信息指的是该分块图像在原来的待处理图像中所处的位置的信息。在一个实施例中,该位置信息可以通过(j,k)表示,其中,j表示该分块图像位于待处理图像的第j行,k表示该分块图像位于该待处理图像的第k列中。在另一个实施例中,对于待处理图像中分割得到的分块图像,按从左到右再从上之下的顺序依次进行编号,以该编号信息i作为分块图像的位置信息。示例性地,将待处理图像分割为4个分割图像,每个分割图像的位置信息依次为1、2、3、4表示。In the embodiment of the present application, the location information of the block image refers to information about the location of the block image in the original image to be processed. In one embodiment, the location information can be represented by (j,k), where j indicates that the block image is located in the jth row of the image to be processed, and k indicates that the block image is located in the kth column of the image to be processed middle. In another embodiment, the block images obtained by segmenting the image to be processed are numbered sequentially from left to right and then from top to bottom, and the number information i is used as the position information of the block images. Exemplarily, the image to be processed is divided into four divided images, and the position information of each divided image is represented by 1, 2, 3, and 4 in sequence.
对于分割得到的各个分块图像,根据该分块图像在原待处理图像中的位置信息以及该分块图像对应的分块特征信息,求取分块图像中待识别分块图像和参考分块图像之间的相似度,以及各个待识别分块图像之间的相似度,各个参考图像之间的相似度,并对这些相似度进行组合运算,得到各个分块图像之间的相似度关系。根据该相似度关系,可以确定待识别图像和参考图像之间的相似度权重,该相似度权重能够表示各个分块图像之间的相似度联系信息。For each block image obtained by segmentation, according to the position information of the block image in the original image to be processed and the block feature information corresponding to the block image, obtain the block image to be identified and the reference block image in the block image The similarity between each block image, the similarity between each block image to be recognized, and the similarity between each reference image, and these similarities are combined to obtain the similarity relationship between each block image. According to the similarity relationship, the similarity weight between the image to be recognized and the reference image can be determined, and the similarity weight can represent the similarity relation information between the block images.
在S104中,根据各个所述分块特征信息以及所述相似度权重,确定所述待识别图像和所述参考图像之间的目标相似度。In S104, according to each of the block feature information and the similarity weight, determine the target similarity between the image to be recognized and the reference image.
分块特征信息包括待识别分块特征信息和参考分块特征信息,结合各个待识别分块特征信息、各个参考分块特征信息和相似度权重进行待识别图像和参考图像之间的相似度计算,得到目标相似度。The block feature information includes the feature information of the block to be identified and the feature information of the reference block, and the similarity calculation between the image to be identified and the reference image is performed by combining the feature information of each block to be identified, the feature information of each reference block and the similarity weight , to get the target similarity.
在S105中,根据所述目标相似度,确定所述待识别图像的识别结果。In S105, a recognition result of the image to be recognized is determined according to the target similarity.
在得到目标相似度之后,根据该目标相似度的值的大小,确定该待识别图像的识别结果。在一个实施例中,若该目标相似度小于预设相似度阈值,则确定该待识别图像的识别结果为识别失败。在另一个实施例中,若该目标相似度大于或者等于预设相似度阈值,则确定该待识别图像的识别结果为识别成功,并根据该待识别图像对应的参考图像中已知的实体信息,确定该待识别图像的识别信息。例如,待识别图像对应的参考图像的实体信息为“小狗”(即表示该参考图像为小狗对应的图像),则该待识别图像的识别信息为“小狗”;待识别图像对应的参考图像的实体信息为“地点A”(即表示该参考图像为地点A对应的图像),则该待识别图像的识别信息为“地点A”。After the target similarity is obtained, the recognition result of the image to be recognized is determined according to the value of the target similarity. In one embodiment, if the target similarity is less than a preset similarity threshold, it is determined that the recognition result of the image to be recognized is a recognition failure. In another embodiment, if the target similarity is greater than or equal to the preset similarity threshold, it is determined that the recognition result of the image to be recognized is a successful recognition, and according to the known entity information in the reference image corresponding to the image to be recognized , to determine the identification information of the image to be identified. For example, if the entity information of the reference image corresponding to the image to be recognized is "puppy" (that is, it means that the reference image is an image corresponding to a puppy), then the identification information of the image to be recognized is "puppy"; The entity information of the reference image is "place A" (that is, the reference image is an image corresponding to place A), and the identification information of the image to be recognized is "place A".
本申请实施例中,由于能够先对待处理图像进行分割得到分块图像后再进行特征提取,因此能够更加准确地提取得到图像的细节特征信息,使得后续根据各个分块特征信息能够更准确地进行相似度计算,从而提高图像识别的准确率。并且,由于位于待处理图像中不同位置的分块图像的分块特征信息能够表示该待处理图像不同视角下的特征,使得基于各个分块图像的位置信息和分块特征信息确定得到的相似度权重,能够体现待识别图像和参考图像在不同视角下的相似度信息,因此,基于该相似度权重得到的目标相似度为考虑了视角变化的鲁棒性得到的相似度,根据该目标相似度得到的识别结果为克服相机拍摄角度 引起的视角变化的影响所得到的准确的识别结果,从而能够提高图像识别的鲁棒性和准确率。In the embodiment of the present application, since the image to be processed can be segmented to obtain the block image first, and then the feature extraction is performed, the detailed feature information of the image can be extracted more accurately, so that the follow-up can be performed more accurately according to the feature information of each block. Similarity calculation, thereby improving the accuracy of image recognition. Moreover, since the block feature information of the block images located at different positions in the image to be processed can represent the features of the image to be processed under different viewing angles, the similarity obtained based on the position information and block feature information of each block image is determined The weight can reflect the similarity information of the image to be recognized and the reference image under different viewing angles. Therefore, the target similarity obtained based on the similarity weight is the similarity obtained by considering the robustness of viewing angle changes. According to the target similarity The obtained recognition result is an accurate recognition result obtained by overcoming the influence of the angle of view change caused by the shooting angle of the camera, so that the robustness and accuracy of image recognition can be improved.
可选地,上述的步骤S102,包括:Optionally, the above step S102 includes:
对于每个所述分块图像,将所述分块图像输入已训练的卷积神经网络进行处理,得到所述分块图像对应的初始特征信息;For each of the block images, input the block images into a trained convolutional neural network for processing to obtain initial feature information corresponding to the block images;
将所述初始特征信息进行降维处理,得到所述分块图像对应的分块特征信息。Perform dimensionality reduction processing on the initial feature information to obtain block feature information corresponding to the block image.
本申请实施例中,具体通过已训练的卷积神经网络对分块图像进行特征提取处理。将每个分块图像依次输入已训练的卷积神经网络,经过该卷积神经网络的各层卷积层的卷积处理,得到该分块图像对应的初始特征信息。In the embodiment of the present application, feature extraction processing is performed on the segmented image through a trained convolutional neural network. Each block image is sequentially input into the trained convolutional neural network, and the initial feature information corresponding to the block image is obtained through the convolution processing of each layer of the convolutional neural network.
在一个实施例中,该已训练的卷积神经网络可以为AlexNet。首先,对于分块图像,将其图像大小缩放到3×224×224,得到RGB(Red,Green,Blue,红,绿,蓝)色彩模式的图像I m,该图像I m输入AlexNet进行特征提取处理。由于AlexNet的第五卷积层(Conv_5)输出的特征位置识别效果最好,因此提取Conv_5输出的维度为256×6×6的特征作为初始特征信息,以提高特征提取的准确度。 In one embodiment, the trained convolutional neural network may be AlexNet. First, for the block image, the image size is scaled to 3×224×224 to obtain an image I m in RGB (Red, Green, Blue, red, green, blue) color mode, and the image Im is input to AlexNet for feature extraction deal with. Since the feature location recognition effect of the fifth convolutional layer (Conv_5) output of AlexNet is the best, the feature with a dimension of 256×6×6 output by Conv_5 is extracted as the initial feature information to improve the accuracy of feature extraction.
由于通过卷积神经网络初步提取得到的初始特征信息的维度通常较大,直接基于该初始特征信息进行后续的相似度计算和比对会消耗大量的计算资源和运存内存。因此,在得到该初始特征信息之后,对该初始特征信息进行降维处理,得到维度较低的特征信息作为该分块图像的分块特征信息。该降维处理可以通过池化操作或者下采样等操作实现。Since the dimensionality of the initial feature information initially extracted by the convolutional neural network is usually large, subsequent similarity calculations and comparisons directly based on the initial feature information will consume a large amount of computing resources and storage memory. Therefore, after the initial feature information is obtained, dimensionality reduction processing is performed on the initial feature information to obtain feature information with a lower dimension as the block feature information of the block image. This dimensionality reduction process can be implemented by operations such as pooling or downsampling.
本申请实施例中,由于能够通过卷积神经网络准确地提取分块图像的特征,并进一步通过降维处理得到该分块图像的分块特征信息,因此能够在保证特征提取的准确性的同时,降低后续运算的系统资源消耗,提高图像识别效率。并且,由于降维处理后的分块特征信息相对于维度过高的特征信息具有一定的鲁棒性,使得即使由于光照、天气、季节等变化带来图像外观上细微的外观变化时,也能基于该鲁棒性能够提高图像识别的成功率。In the embodiment of the present application, since the feature of the block image can be accurately extracted through the convolutional neural network, and the block feature information of the block image can be obtained through dimensionality reduction processing, it is possible to ensure the accuracy of feature extraction while , reduce system resource consumption for subsequent calculations, and improve image recognition efficiency. Moreover, since the block feature information after dimensionality reduction processing has certain robustness to the feature information with too high dimensionality, it can still Based on this robustness, the success rate of image recognition can be improved.
可选地,所述将所述初始特征信息进行降维处理,得到所述分块图像对应的分块特征信息。Optionally, performing dimensionality reduction processing on the initial feature information to obtain block feature information corresponding to the block image.
将所述初始特征信息输入自适应平均池化层和/或已训练的降维自编码器进行降维处理,得到所述分块图像对应的分块特征信息。Inputting the initial feature information into an adaptive average pooling layer and/or a trained dimensionality reduction autoencoder for dimensionality reduction processing to obtain block feature information corresponding to the block image.
通常,池化操作能够通过简单计算实现特性信息的维度降低,同时确保后续识别的准确度。而平均池化操作,对图像的整个场景具有全体的细节捕获,相对于最大池化操作,能够更好地提取图像场景的特征。本申请实施例中,自适应平均池化(AAP,Adaptive Average Pooling)层是一种对于输入的指定大小的特征信息自适应地进行平均池化操作的处理层,因此,将该初始特征信息输入该自适应平均池化层,可以实现对初始特征信息的维度降低。Generally, the pooling operation can reduce the dimensionality of feature information through simple calculations, while ensuring the accuracy of subsequent recognition. The average pooling operation captures all the details of the entire scene of the image. Compared with the maximum pooling operation, it can better extract the features of the image scene. In the embodiment of the present application, the Adaptive Average Pooling (AAP, Adaptive Average Pooling) layer is a processing layer that adaptively performs an average pooling operation on the input feature information of a specified size. Therefore, the initial feature information is input The adaptive average pooling layer can reduce the dimensionality of the initial feature information.
本申请实施例中的降维自编码器,具体为用于降低特征信息的维度的自编码器。自编码器(autoencoder,AE)是一类在半监督学习和非监督学习中使用的人工神经网络(Artificial Neural Networks,ANNs),其功能是通过将输入信息作为学习目标,对输入信息进行表征学习(representation learning)。该自编码器由编码器(Encoder)和解码器(Decoder)两部分组成。在该自编码器中,编码器能够将输入的特征信息通过编码实现信息压缩,降低特征信息的维度;而自编码器中的解码器能够将编码器压缩的特征信息还原。在进行图像识别处理时,具体只是用该降维自编码器的编码器部分,实现对特征信息的降维处理,使得基于编码器输出的特征信息能够实现鲁棒性更高、效率更高的图像识别。The dimensionality reduction autoencoder in the embodiment of the present application is specifically an autoencoder for reducing the dimension of feature information. Autoencoder (autoencoder, AE) is a type of artificial neural network (Artificial Neural Networks, ANNs) used in semi-supervised learning and unsupervised learning. Its function is to perform representation learning on input information by using input information as a learning target. (representation learning). The self-encoder consists of two parts: an encoder (Encoder) and a decoder (Decoder). In the autoencoder, the encoder can encode the input feature information to achieve information compression and reduce the dimension of the feature information; and the decoder in the autoencoder can restore the feature information compressed by the encoder. When performing image recognition processing, specifically, the encoder part of the dimensionality reduction autoencoder is used to realize dimensionality reduction processing of feature information, so that the feature information based on the output of the encoder can achieve more robust and efficient Image Identification.
在一个实施例中,可以将分块图像的初始特征信息输入自适应平均池化层处理后再进一步输入降维自编码器进行处理,得到该分块图像对应的维度较低的分块特征信息。示例性地,将特征维度为256×6×6的初始特征信息输入自适应平均池化层进行平均池化处理, 并通过展开函数Flatten(·)展开为一维向量,得到维度为(1×4096)的池化特征。之后,将该池化特征输入降维自编码器进行处理,获取该降维自编码中的编码器输出的维度为1×256的特征信息作为该分块图像的分块特征信息。In one embodiment, the initial feature information of the block image can be input into the adaptive average pooling layer for processing, and then further input into the dimensionality reduction autoencoder for processing, so as to obtain the block feature information corresponding to the block image with a lower dimension . Exemplarily, the initial feature information with a feature dimension of 256×6×6 is input into the adaptive average pooling layer for average pooling processing, and expanded into a one-dimensional vector by the expansion function Flatten( ), and the dimension is (1× 4096) pooling features. Afterwards, the pooled feature is input into the dimensionality reduction autoencoder for processing, and the feature information with a dimension of 1×256 output by the encoder in the dimensionality reduction autoencoder is obtained as the block feature information of the block image.
本申请实施例中,通过自适应平均池化层和/或降维自编码器,能够高效准确地降低初始特征信息的维度,提高后续图像识别的效率和准确率。In the embodiment of the present application, through the adaptive average pooling layer and/or the dimensionality reduction autoencoder, the dimension of the initial feature information can be efficiently and accurately reduced, and the efficiency and accuracy of subsequent image recognition can be improved.
可选地,在所述对于每个所述分块图像,将所述分块图像输入已训练的卷积神经网络进行处理,得到所述分块图像对应的初始特征信息之前,还包括:Optionally, before inputting the block image into a trained convolutional neural network for processing for each block image to obtain initial feature information corresponding to the block image, the method further includes:
获取预设样本特征信息,并将所述预设样本特征信息输入待训练的降维自编码器;Obtaining preset sample feature information, and inputting the preset sample feature information into a dimensionality reduction autoencoder to be trained;
调整所述待训练的降维自编码器的参数,以使得所述预设样本特征信息与所述降维自编码器的解码器输出的解码特征信息之间的均方误差小于预设阈值,得到已训练的降维编码器。Adjusting the parameters of the dimensionality reduction autoencoder to be trained, so that the mean square error between the preset sample feature information and the decoding feature information output by the decoder of the dimensionality reduction autoencoder is less than a preset threshold, Get the trained dimensionality reduction encoder.
本申请实施例中,在对初始特征信息进行处理时,具体需要用到已训练的降维自编码器进行特征维度的降低,该已训练的降维自编码器是对待训练的降维自编码器进行训练后得到的。In the embodiment of the present application, when processing the initial feature information, it is specifically necessary to use a trained dimensionality reduction autoencoder to reduce the feature dimension. The trained dimensionality reduction autoencoder is the dimensionality reduction autoencoder to be trained. obtained after training.
具体地,可以获取预设数量的预设样本特征信息,并将该预设样本特征信息输入待训练的降维自编码器,开始对该待训练的降维自编码器进行训练。其中,该预设样本特征信息为提前将样本图像进行分割和特征提取处理得到的分块样本图像的特征信息。Specifically, a preset number of preset sample feature information may be obtained, and the preset sample feature information is input into the dimensionality reduction autoencoder to be trained, and the training of the dimensionality reduction autoencoder to be trained is started. Wherein, the preset sample characteristic information is the characteristic information of the block sample image obtained by performing segmentation and feature extraction processing on the sample image in advance.
将预设样本特征信息输入待训练的降维自编码器后,获取该降维自编码器的解码器部分输出的解码特征信息。该解码特征信息为对降维自编码器的编码器输出的编码特征信息进行解码还原得到的特征信息。通过预设的均方误差损失函数,计算输入的预设样本特征信息和解码器还原输出的解码特征信息之间的均方误差,并根据计算结果反向调整待训练的降维自编码器的参数,直至预设样本特征信息与降维自编码器的解码器输出的特征信息之间的均方误差小于预设阈值时,停止当前的训练,得到已训练的降维自编码器。After inputting the feature information of preset samples into the dimensionality reduction autoencoder to be trained, the decoding feature information output by the decoder part of the dimensionality reduction autoencoder is obtained. The decoded feature information is feature information obtained by decoding and restoring the encoded feature information output by the encoder of the dimensionality-reduced self-encoder. Through the preset mean square error loss function, calculate the mean square error between the input preset sample feature information and the decoded feature information restored by the decoder, and reversely adjust the dimensionality reduction autoencoder to be trained according to the calculation result parameters, until the mean square error between the preset sample feature information and the feature information output by the decoder of the dimensionality reduction autoencoder is less than the preset threshold, stop the current training and obtain the trained dimensionality reduction autoencoder.
本申请实施例中,提前对降维自编码器进行训练,能够得到准确的已训练的降维自编码器,使得后续根据该降维自编码器能够准确地对初始特征信息进行降维处理,提高图像识别的准确性。In the embodiment of the present application, the dimensionality reduction autoencoder is trained in advance, and an accurate trained dimensionality reduction autoencoder can be obtained, so that the subsequent dimensionality reduction processing can be accurately performed on the initial feature information according to the dimensionality reduction autoencoder, Improve the accuracy of image recognition.
可选地,上述的步骤S103,包括:Optionally, the above step S103 includes:
根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定成对相似度特征向量和非成对相似度特征向量;其中,所述成对相似度特征向量包含位置信息相对应的所述待识别分块图像和所述参考分块图像之间的相似度信息;所述非成对相似度特征向量包含位置信息不对应的所述分块图像之间的相似度信息;According to the position information of each of the block images and the corresponding block feature information, a paired similarity feature vector and an unpaired similarity feature vector are determined; wherein, the paired similarity feature vector includes position information relative The corresponding similarity information between the block image to be identified and the reference block image; the unpaired similarity feature vector includes similarity information between the block images whose position information does not correspond;
根据所述成对相似度特征向量和所述非成对相似度特征向量,确定所述相似度权重。The similarity weight is determined according to the paired similarity feature vector and the non-paired similarity feature vector.
本申请实施例中,由于待识别图像和参考图像均按照同样的分割方式分割为预设数目个分块图像,因此,对于待识别图像的每个待识别分块图像,均存在其对应位置相同的参考分块图像。例如,设待识别图像I 1被分割为四个待识别分块图像I 11、I 12、I 13、I 14,参考图像I 2被分割为对应的四个参考分块图像I 21、I 22、I 23、I 24,则待识别分块图像I 11对应的位置信息相对应的参考分块图像为I 21,待识别分块图像I 12对应的位置信息相对应的参考分块图像为I 22,待识别分块图像I 13对应的位置信息相对应的参考分块图像为I 23,待识别分块图像I 14对应的位置信息相对应的参考分块图像为I 24。其中,I的第一个下标用于区分当前该分块图像属于哪个待处理图像;I的第二个下标用于区分当前该分块图像为该待处理图像的第几个分块图像,可以体现该分块图像在该待处理图像中的位置信息。 In the embodiment of the present application, since the image to be recognized and the reference image are divided into a preset number of block images according to the same segmentation method, for each block image to be recognized of the image to be recognized, there are The reference tiled image for . For example, assume that the image to be recognized I 1 is divided into four block images to be recognized I 11 , I 12 , I 13 , and I 14 , and the reference image I 2 is divided into four corresponding reference block images I 21 , I 22 . _ _ _ _ 22 , the reference block image corresponding to the position information corresponding to the block image I 13 to be identified is I 23 , and the reference block image corresponding to the position information corresponding to the block image I 14 to be identified is I 24 . Among them, the first subscript of I is used to distinguish which image to be processed the current block image belongs to; the second subscript of I is used to distinguish which block image the current block image is the image to be processed , which may reflect the location information of the block image in the image to be processed.
对于上述的位置信息对应相同的每组待识别分块图像和参考分块图像,分别根据二者的分块特征信息进行相似度计算,可以得到各组位置信息相对应的分块图像分别对应的各个成对相似度。各个成对相似度组合得到成对相似度特征向量。For each group of block images to be identified and reference block images corresponding to the same position information above, the similarity calculation is performed according to the block feature information of the two, and the corresponding block images corresponding to each group of position information can be obtained. Each pairwise similarity. Each pairwise similarity is combined to obtain a pairwise similarity feature vector.
而对于除了上述的位置信息相对应的分块图像组合外,位置不相对应的分块图像之间 的相似度称为非成对相似度。具体地,该非成对相似度包括:每个待识别分块图像与位置信息不相同的参考分块图像(例如上述第二个下标不同的待识别分块图像与参考分块图像)之间的相似度、每个不同位置的待识别分块图像(例如同一待识别图像分割得到的不同待识别分块图像)之间的相似度,以及每个不同位置的参考分块图像(例如同一参考图像分割得到的不同参考分块图像)之间的相似度。这些非成对相似度可以组合得到非成对相似度特征向量。In addition to the combination of block images corresponding to the above-mentioned position information, the similarity between block images whose positions do not correspond is called non-pairwise similarity. Specifically, the non-pairwise similarity includes: each block image to be identified is different from the reference block image whose position information is different (for example, the above second block image to be identified and the reference block image with different subscripts) The similarity between the block images to be recognized in each different position (such as the different block images to be recognized obtained by segmenting the same image to be recognized), and the reference block images in each different position (such as the same The similarity between different reference block images obtained by segmenting the reference image). These unpaired similarities can be combined to obtain unpaired similarity feature vectors.
在一个实施例中,对于两个不同的分块图像分别对应的分块特征信息x和分块特征信息y,可以通过预设的公式(1)进行这两个分块图像的相似度计算,该公式(1)如下所示:In one embodiment, for the block feature information x and block feature information y corresponding to two different block images, the similarity calculation of the two block images can be performed by the preset formula (1), This formula (1) is as follows:
Figure PCTCN2021124169-appb-000001
Figure PCTCN2021124169-appb-000001
其中C(x,y)表示第一分块图像的分块特征信息x和第二分块图像的分块特征信息y计算得到的归一化的余弦相似度。|| ||表示范数运算,*表示乘号,上述的公式(1)求解得到的相似度的值域为[0,1]。通过该相似度计算方式,能够将分块图像之间的相似度的值归一化到0到1区间,以便于后续的运算。Where C(x, y) represents the normalized cosine similarity calculated from the block feature information x of the first block image and the block feature information y of the second block image. || || indicates the norm operation, * indicates the multiplication sign, and the value range of the similarity obtained by solving the above formula (1) is [0,1]. Through this similarity calculation method, the value of the similarity between block images can be normalized to a range from 0 to 1, so as to facilitate subsequent calculations.
在确定成对相似度特征向量和非成对相似度特征向量后,将成对相似度特征向量和非成对相似度特征向量进行组合,得到的向量称为相似度关系向量。由于该相似度关系向量包含了待处理图像不同位置关系的各个分块图像之间的相似度,因此该相似度关系向量可以表示待识别图像和参考图像对应的各个分块图像之间的相似度关系。After determining the paired similarity feature vector and the unpaired similarity feature vector, the paired similarity feature vector and the unpaired similarity feature vector are combined, and the obtained vector is called the similarity relationship vector. Since the similarity relationship vector contains the similarity between the block images of different positional relationships of the image to be processed, the similarity relationship vector can represent the similarity between the block images corresponding to the image to be recognized and the reference image relation.
在确定该相似度关系向量后,根据该相对关系向量的各个元素的值进行加权运算,得到该待识别图像和参考图像之间的相似度权重。该相似度权重能够准确地表示待识别图像和参考图像之间的视觉变化对图像相似度的影响。After the similarity relationship vector is determined, a weighting operation is performed according to the values of each element of the relative relationship vector to obtain the similarity weight between the image to be recognized and the reference image. The similarity weight can accurately represent the impact of visual changes between the image to be recognized and the reference image on the image similarity.
本申请实施例中,由于能够通过成对相似度特征向量和非成对相似度特征向量,准确地能够表示基于位置的分块图像间的相似度关系,因此能够基于该相似度关系求解得到适用于图像相似度计算的相似度权重,提高最终相似度计算的准确性,进而提高图像识别的准确性。In the embodiment of the present application, since the similarity relationship between the position-based block images can be accurately expressed through the paired similarity feature vector and the unpaired similarity feature vector, it can be solved based on the similarity relationship to obtain the applicable Based on the similarity weight of the image similarity calculation, the accuracy of the final similarity calculation is improved, thereby improving the accuracy of image recognition.
可选地,所述根据所述成对相似度特征向量和所述非成对相似度特征向量,确定所述相似度权重,包括:Optionally, the determining the similarity weight according to the paired similarity feature vector and the non-paired similarity feature vector includes:
根据所述成对相似度特征向量、所述非成对相似度特征向量以及预设的权重自编码器,确定所述相似度权重。The similarity weight is determined according to the paired similarity feature vector, the non-paired similarity feature vector, and a preset weight autoencoder.
本申请实施例中,权重自编码器为提前训练好的用于确定相似度关系向量的权重的自编码器。In the embodiment of the present application, the weighted autoencoder is an autoencoder trained in advance for determining the weights of the similarity relationship vectors.
在一个实施例中,可以将成对相似度特征向量和非成对相似度特征向量拼接组合成相似度关系向量V之后,将相似度关系向量V输入预设的权重自编码器WAE进行处理,得到该相似度关系向量对应的权重WAE(V)。根据该权重WAE(V)对相似度关系向量进行加权求和运算后进行归一化,得到值域为[0,1]的相似度权重。示例性地,可以通过以下的公式(2)求得相似度权重L:In one embodiment, after the paired similarity feature vectors and unpaired similarity feature vectors can be concatenated and combined into a similarity relationship vector V, the similarity relationship vector V is input into the preset weight self-encoder WAE for processing to obtain The weight WAE(V) corresponding to the similarity relationship vector. According to the weight WAE(V), the similarity relationship vectors are weighted and summed and then normalized to obtain a similarity weight with a value range of [0,1]. Exemplarily, the similarity weight L can be obtained by the following formula (2):
Figure PCTCN2021124169-appb-000002
Figure PCTCN2021124169-appb-000002
其中,V为拼接得到的相似度关系向量,WAE(V)为权重自编码器输出的相似度关系向量V的权重,e为自然底数,w、t为预设的参数值,其中t根据上述的预设数目N确定,例如,t=N 2Among them, V is the similarity relationship vector obtained by splicing, WAE(V) is the weight of the similarity relationship vector V output by the weight self-encoder, e is the natural base, w and t are preset parameter values, where t is based on the above The preset number N is determined, for example, t=N 2 .
本申请实施例中,通过预设的权重自编码器,能够准确地确定当前的相似度权重,从 而提高后续图像识别的准确性。In the embodiment of the present application, the current similarity weight can be accurately determined through the preset weight autoencoder, thereby improving the accuracy of subsequent image recognition.
可选地,所述非成对相似度特征向量包括第一相似度特征向量和第二相似度特征向量,所述第一相似度特征向量包含位置信息不对应的所述待识别分块图像和所述参考分块图像之间的相似度信息,所述第二相似度特征向量包含同一待处理图像中不同分块图像之间的相似度信息;Optionally, the unpaired similarity feature vector includes a first similarity feature vector and a second similarity feature vector, and the first similarity feature vector includes the block image to be identified and The similarity information between the reference block images, the second similarity feature vector includes similarity information between different block images in the same image to be processed;
对应地,所述根据所述成对相似度特征向量、所述非成对相似度特征向量以及预设的权重自编码器,确定所述相似度权重,包括:Correspondingly, the determining the similarity weight according to the paired similarity feature vector, the non-paired similarity feature vector and the preset weight autoencoder includes:
根据所述成对相似度特征向量、所述第一相似度特征向量和预设的第一权重自编码器,确定第一相似度权重;Determine a first similarity weight according to the paired similarity feature vector, the first similarity feature vector, and a preset first weight autoencoder;
根据所述成对相似度特征向量、所述第二相似度特征向量和预设的第二权重自编码器,确定第二相似度权重。The second similarity weight is determined according to the paired similarity feature vector, the second similarity feature vector, and a preset second weight autoencoder.
本申请实施例中,上述的非成对相似度特征向量包括第一相似度特征向量和第二相似度特征向量。In the embodiment of the present application, the aforementioned unpaired similarity feature vectors include a first similarity feature vector and a second similarity feature vector.
第一相似度特征向量包含位置信息不对应的所述待识别分块图像和所述参考分块图像之间的相似度信息。对于待识别图像的每个待识别分块图像,可以从参考图像中选择任意一个与该待识别分块图像位置编号不对应的参考分块图像组成一组图像,对按照这种组合方式组合得到的每组图像进行相似度计算,可以得到各个第一相似度,这些非成对的第一相似度可以组合即可得到第一相似度特征向量。由于拍摄视角的变化,同一实际物理位置或者实际物体在待识别图像中所处的图像位置与在另一拍摄视角的参考图像中所处的图像位置不同。例如,建筑物B对应的图像区域为待识别图像中的左上方,在进行图像分割后,该建筑B对应的图像区域在第一个待识别分块图像I 11中。而在参考图像中,建筑物B对应的图像区域在该参考图像的右下方,在进行图像分割后,该建筑物B对应的图像区域在第四个参考分块图像I 24中。而按照位置对应关系计算的成对相似度特征向量并不能体现视角发生变化时图像之间的相似度,因此,基于上述位置信息不对应的待识别分块图像和参考分块图像确定的第一相似度特征向量,能够表示发生视角变化时待识别分块图像和参考分块图像之间的相似度情况。 The first similarity feature vector includes similarity information between the block image to be identified and the reference block image whose position information does not correspond. For each block image to be recognized of the image to be recognized, any reference block image that does not correspond to the position number of the block image to be identified can be selected from the reference image to form a group of images, and the combination of Each group of obtained images is subjected to similarity calculation to obtain each first similarity, and these non-paired first similarities can be combined to obtain a first similarity feature vector. Due to the change of the shooting angle, the same actual physical position or the image position of the actual object in the image to be recognized is different from the image position in the reference image of another shooting angle. For example, the image region corresponding to building B is the upper left of the image to be recognized, and after image segmentation, the image region corresponding to building B is in the first block image I11 to be recognized. In the reference image, the image area corresponding to the building B is at the lower right of the reference image, and after image segmentation, the image area corresponding to the building B is in the fourth reference block image I 24 . However, the paired similarity eigenvectors calculated according to the position correspondence cannot reflect the similarity between images when the viewing angle changes. The similarity feature vector can represent the similarity between the block image to be recognized and the reference block image when the viewing angle changes.
此外,对于同一待处理图像(包括同一待识别图像或者同一参考图像)对应的任意两个不同位置的分块图像,可以组合得到一组图像,对按照该组合方式组合得到的每组图像进行相似度计算,可以得到各个第二相似度,这些第二相似度可以组合得到第二相似度特征向量。对于同一实际物理位置或者实际物体在待识别图像中占据的图像区域可能比较大,从而占据了不同的分块图像。例如建筑物D对应的图像区域可以同时占据待识别图像中的第二个待识别分块图像I 12和第三个待识别分块图像I 13;同样地,建筑物D对应的图像区域可以同时占据参考图像中的第二个参考分块图像I 22和第三个参考分块图像I 23。为了避免图像分割对原来图像的特征完整性的硬性,本申请实施例中通过上述的第二相似度特征向量,能够表示同一待处理图像中不同图像分块之间的相似度情况,保持不同图像分块之间特征的连续性。 In addition, for any two block images corresponding to the same image to be processed (including the same image to be recognized or the same reference image), a group of images can be combined to obtain a group of images. By performing similarity calculation, each second similarity can be obtained, and these second similarities can be combined to obtain a second similarity feature vector. For the same actual physical location or the image area occupied by the actual object in the image to be recognized may be relatively large, thus occupying different block images. For example, the image area corresponding to building D can occupy the second block image I 12 to be recognized and the third block image I 13 to be recognized in the image to be recognized at the same time; similarly, the image area corresponding to building D can simultaneously A second reference block image I 22 and a third reference block image I 23 among the reference images are occupied. In order to avoid the rigidity of image segmentation on the feature integrity of the original image, in the embodiment of the present application, the above-mentioned second similarity feature vector can represent the similarity between different image blocks in the same image to be processed, and keep different images Continuity of features between blocks.
示例性地,设一个待识别图像中编号为i的待识别分块图像对应的待识别分块特征信息为x i(例如上述的待识别分块图像I 11对应的待识别分块特征信息为x 1),一个参考图像中编号为j的参考分块图像对应的参考分块特征信息为y j(例如上述的参考分块图像I 21对应的参考分块特征信息为y 1),则上述的成对相似度特征向量V a可以通过以下的公式(3)表示: Exemplarily, it is assumed that the feature information of the block to be identified corresponding to the block image to be identified whose number is i in the image to be identified is x i (for example, the feature information of the block to be identified corresponding to the above block image I to be identified is x 1 ), the reference block feature information corresponding to the reference block image numbered j in a reference image is y j (for example, the reference block feature information corresponding to the above-mentioned reference block image I 21 is y 1 ), then the above The pairwise similarity feature vector V a can be expressed by the following formula (3):
V a={C(x i,y j)}(i=j) V a ={C(x i ,y j )}(i=j)
其中,C(x i,y j)为通过公式(1)计算得到的位置信息相对应(即i=j)的待识别分块特征信息与参考分块特征信息之间的相似度,{}表示集合运算。公式(3)表示将各组由编 号相同的待识别分块图像和参考分块图像组合的成对图像对应的成对相似度进行合并,得到成对相似度特征向量V aAmong them, C( xi ,y j ) is the similarity between the feature information of the block to be identified and the feature information of the reference block corresponding to the position information calculated by the formula (1), {} Represents a set operation. Formula (3) indicates that the paired similarities corresponding to the paired images combined by the block image to be identified and the reference block image with the same number are combined to obtain the paired similarity feature vector V a .
上述的第一相似度特征向量V b可以通过以下的公式(4)表示: The above-mentioned first similarity feature vector V b can be expressed by the following formula (4):
V b={C(x i,y j)}(i≠j) V b ={C(x i ,y j )}(i≠j)
其中,C(x i,y j)为通过公式(1)计算得到的位置信息不对应(即i≠j)的待识别分块特征信息与参考分块特征信息之间的相似度,{}表示集合运算。公式(4)表示将各组由不同位置的待识别分块图像和参考分块图像组合成的图像组对应的相似度进行合并,得到第一相似度特征向量V bAmong them, C( xi , y j ) is the similarity between the feature information of the block to be identified and the feature information of the reference block calculated by the formula (1) and the position information does not correspond (ie i≠j), {} Represents a set operation. The formula (4) indicates that the similarities corresponding to the groups of images composed of block images to be identified and reference block images at different positions are combined to obtain the first similarity feature vector V b .
上述的第二相似度特征向量V c可以通过以下的公式(5)表示: The above-mentioned second similarity feature vector V c can be expressed by the following formula (5):
V c={C(x i,x j),C(y i,y j)}(i≠j) V c ={C(x i ,x j ),C(y i ,y j )}(i≠j)
其中,C(x i,x j)为通过公式(1)计算得到的同一待识别图像中位置信息不对应(即i≠j)的待识别分块特征信息x i和x j之间的相似度,C(y i,y j)为通过公式(1)计算得到的同一参考图像中不同位置(即i≠j)的参考分块特征信息y i和y j之间的相似度。将各个C(x i,x j)和C(y i,y j)进行合并,即得到上述的第二相似度特征向量V cAmong them, C(x i , x j ) is the similarity between the feature information x i and x j of the blocks to be recognized that do not correspond to the position information (that is, i≠j) in the same image to be recognized calculated by formula (1). degree, C(y i , y j ) is the similarity between reference block feature information y i and y j at different positions (ie i≠j) in the same reference image calculated by formula (1). Merge C( xi , x j ) and C(y i , y j ) to obtain the above-mentioned second similarity feature vector V c .
在以上的公式(3)至公式(5)中,i、j的取值范围为1到N之间的正整数,N表示上述的预设数目(即一个待处理图像被分割成的分块图像的数目)。In the above formula (3) to formula (5), the value range of i and j is a positive integer between 1 and N, and N represents the above-mentioned preset number (that is, the blocks into which an image to be processed is divided number of images).
之后,可以将成对相似度特征向量V a和第一相似度特征向量V b进行组合,得到第一相似度关系向量V 1,可以通过以下的公式(6)表示: After that, the paired similarity feature vector V a and the first similarity feature vector V b can be combined to obtain the first similarity relationship vector V 1 , which can be expressed by the following formula (6):
V 1={V a,V b} V 1 ={V a ,V b }
将成对相似度特征向量V a和第二相似度特征向量V c进行组合,得到第二相似度关系向量V 2,可以通过以下的公式(7)表示: Combine the paired similarity feature vector V a and the second similarity feature vector V c to obtain the second similarity relationship vector V 2 , which can be expressed by the following formula (7):
V 2={V a,V c} V 2 ={V a ,V c }
对应地,相似度权重可以包括第一相似度权重和第二相似度权重。第一相似度权重Alpha具体根据由成对相似度特征向量V a和第一相似度特征向量V b组合而成的第一相似度关系向量V 1,以及预设的第一权重自编码器确定。示例性地,可以通过以下的公式(8)求得该第一相似度权重Alpha: Correspondingly, the similarity weight may include a first similarity weight and a second similarity weight. The first similarity weight Alpha is specifically determined according to the first similarity relationship vector V 1 formed by the combination of the paired similarity feature vector V a and the first similarity feature vector V b , and the preset first weight autoencoder . Exemplarily, the first similarity weight Alpha can be obtained by the following formula (8):
Figure PCTCN2021124169-appb-000003
Figure PCTCN2021124169-appb-000003
其中,V 1为第一相似度关系向量,WAE 1(V 1)为通过第一权重自编码器处理得到的第一相似度关系向量V 1的权重;e为自然底数,w 1为预设值,例如w 1=﹣10;t为预设的参数值,其中t根据上述的预设数目N确定,t=N 2。当预设数目为4时,即每个待处理图像被分割为4个分块图像时,公式(8)中的t=16。通过公式(8)求得的第一相似度权重Alpha的值域为[0,1]。 Among them, V 1 is the first similarity relationship vector, WAE 1 (V 1 ) is the weight of the first similarity relationship vector V 1 obtained through the first weight autoencoder processing; e is the natural base, and w 1 is the preset Values, such as w 1 =−10; t is a preset parameter value, wherein t is determined according to the preset number N mentioned above, and t=N 2 . When the preset number is 4, that is, when each image to be processed is divided into 4 block images, t=16 in formula (8). The value range of the first similarity weight Alpha obtained by formula (8) is [0,1].
第二相似度权重Beta具体根据由成对相似度特征向量V a和第二相似度特征向量V c组合而成的第二相似度关系向量V 2,以及预设的第二权重自编码器确定。示例性地,可以通过以下的公式(9)求得该第二相似度权重Beta: The second similarity weight Beta is specifically determined according to the second similarity relationship vector V 2 formed by combining the paired similarity feature vector V a and the second similarity feature vector V c , and the preset second weight autoencoder . Exemplarily, the second similarity weight Beta can be obtained by the following formula (9):
Figure PCTCN2021124169-appb-000004
Figure PCTCN2021124169-appb-000004
其中,V 2为第二相似度关系向量,WAE 2(V 2)为通过第二权重自编码器处理得到的第二相似度关系向量V 2的权重;e为自然底数,w 2为预设值,例如w 2=﹣10;t为预设的参数值,其中t根据上述的预设数目N确定,t=N 2。当预设数目为4时,即每个待处理图像被分割为4个分块图像时,公式(9)中的t=16。通过公式(9)求得的第二相似度权重Beta的值域为[0,1]。 Among them, V 2 is the second similarity relationship vector, WAE 2 (V 2 ) is the weight of the second similarity relationship vector V 2 obtained through the second weight autoencoder processing; e is the natural base, and w 2 is the preset Values, such as w 2 =-10; t is a preset parameter value, wherein t is determined according to the preset number N mentioned above, and t=N 2 . When the preset number is 4, that is, when each image to be processed is divided into 4 block images, t=16 in formula (9). The value range of the second similarity weight Beta obtained by formula (9) is [0,1].
通过以上方法求得的第一相似度权重能够表示待识别图像与参考图像在不同视角下的相似度情况,第二相似度权重能够表示待处理图像的各个分块图像之间的连续性情况,使得后续基于该第一相似度权重和第二相似度权重计算得到的目标相似度能够综合考虑图像拍摄视角的鲁棒性得到的相似度,从而提高了图像识别的鲁棒性和准确率。The first similarity weight obtained by the above method can represent the similarity between the image to be recognized and the reference image under different viewing angles, and the second similarity weight can represent the continuity between the block images of the image to be processed, The subsequent target similarity calculated based on the first similarity weight and the second similarity weight can comprehensively consider the similarity obtained from the robustness of the image shooting angle, thereby improving the robustness and accuracy of image recognition.
可选地,上述的步骤S104,包括:Optionally, the above step S104 includes:
根据各个所述分块特征信息,确定所述待识别图像和所述参考图像之间的初始相似度;determining an initial similarity between the image to be recognized and the reference image according to each of the block feature information;
将所述初始相似度与所述相似度权重相乘,得到所述待识别图像和所述参考图像之间的目标相似度。The initial similarity is multiplied by the similarity weight to obtain a target similarity between the image to be recognized and the reference image.
在确定相似度权重之后,对于每个待识别分块图像,分别获取与该待识别分块图像的位置相对应的参考分块图像组合为一组图像,并求取每组图像中待识别分块图像的待识别分块特征信息与其对应参考分块图像的分块特征信息之间的相似度。接着,将基于待识别图像和参考图像的各个分块图像确定的预设数目组图像的相似度进行平均值求取运算,得到该待识别图像和参考图像之间的初始相似度。之后,将该初始相似度乘上相似度权重,得到的结果作为该待识别图像和该参考图像之间的目标相似度。After determining the similarity weight, for each block image to be identified, the reference block images corresponding to the position of the block image to be identified are respectively obtained and combined into a group of images, and the to-be-recognized image in each group of images is obtained A similarity between the block feature information to be identified of the block image and the block feature information of the corresponding reference block image is identified. Next, the average calculation is performed on the similarities of the preset number of images determined based on the block images of the image to be recognized and the reference image to obtain the initial similarity between the image to be recognized and the reference image. After that, the initial similarity is multiplied by the similarity weight, and the obtained result is used as the target similarity between the image to be recognized and the reference image.
示例性地,目标相似度Similarity可以通过以下的公式(10)求得:Exemplarily, the target similarity Similarity can be obtained by the following formula (10):
Figure PCTCN2021124169-appb-000005
Figure PCTCN2021124169-appb-000005
其中,N为图像被分割成的预设数目,C(x i,y i)为通过公式(1)计算得到的对应同一位置信息的待识别分块特征信息与参考分块特征信息之间的相似度,
Figure PCTCN2021124169-appb-000006
Alpha为上述的第一相似度权重,Beta为上述的第二相似度权重。
Among them, N is the preset number that the image is divided into, and C( xi , y i ) is the difference between the feature information of the block to be identified corresponding to the same position information and the feature information of the reference block calculated by formula (1). similarity,
Figure PCTCN2021124169-appb-000006
Alpha is the above-mentioned first similarity weight, and Beta is the above-mentioned second similarity weight.
通过上述的方法,能够综合了不同视角信息的目标相似度,提高图像识别的准确性。Through the above-mentioned method, it is possible to integrate the target similarity of information from different viewing angles, and improve the accuracy of image recognition.
可选地,所述待识别图像为待识别场景图像,所述参考图像为参考场景图像;所述待识别图像的识别结果包括所述待识别场景图像的视觉位置识别结果。Optionally, the image to be recognized is a scene image to be recognized, and the reference image is a reference scene image; the recognition result of the image to be recognized includes a visual position recognition result of the scene image to be recognized.
本申请实施例中的图像识别方法具体为一种视觉位置识别方法。视觉位置识别本质是判断两张图像所指示的是否为同一地点,该问题可以转化为求取两张图像的相似度的问题,当两张图像足够相似,其相似度即接近1,这两张图像指示同一地点。相反地,若两张图像之间的相似度接近-1,则这两张图像指示不同地点。The image recognition method in the embodiment of the present application is specifically a visual position recognition method. The essence of visual location recognition is to judge whether the two images indicate the same location. This problem can be transformed into the problem of finding the similarity of the two images. When the two images are similar enough, the similarity is close to 1. The two images Images indicate the same location. Conversely, if the similarity between two images is close to -1, then the two images indicate different locations.
目前,在无人系统中,视觉位置识别具有很重要的应用价值,可以应用于定位、远程监控、车辆导航等各个应用场景中。由于光照、天气、季节变化引起的外观变化和相机拍摄角度变化引起的视角变化对图像识别的影响,目前大多数视觉位置识别方法在无人系统遭遇剧烈环境变化时无法准确地进行视觉位置识别。At present, in unmanned systems, visual position recognition has very important application value, and can be applied to various application scenarios such as positioning, remote monitoring, and vehicle navigation. Due to the influence of appearance changes caused by lighting, weather, and seasonal changes, and changes in perspective caused by camera shooting angle changes on image recognition, most current visual position recognition methods cannot accurately perform visual position recognition when unmanned systems encounter drastic environmental changes.
本申请实施例中,以拍摄待识别场景得到的待识别场景图像作为待识别图像,以预存的参考场景图像作为参考图像,通过上述的步骤S101至步骤S105,可以通过分块图像的特征提取和特征降维,提高图像外观变化的鲁棒性,并通过相似度权重保证图像视觉变化的鲁棒性,准确地确定与该待识别场景图像相匹配的参考场景图像,从而以该参考场景图像携带的地点信息作为该待识别场景图像对应的地点信息,得到待识别场景图像的视觉位 置识别结果。即,基于上述的步骤S101至步骤S105所述的图像识别方法,能够满足视觉位置识别的在复杂环境变化下的鲁棒性要求,提高视觉位置识别的准确率。In the embodiment of the present application, the image of the scene to be recognized obtained by shooting the scene to be recognized is used as the image to be recognized, and the pre-stored reference scene image is used as the reference image. Through the above steps S101 to S105, the feature extraction and Feature dimensionality reduction, improve the robustness of image appearance changes, and ensure the robustness of image visual changes through similarity weights, accurately determine the reference scene image that matches the scene image to be recognized, so that the reference scene image carries The location information of the scene image to be recognized is used as the location information corresponding to the scene image to be recognized, and the visual position recognition result of the scene image to be recognized is obtained. That is, based on the above-mentioned image recognition method described in steps S101 to S105, the robustness requirements of visual position recognition under complex environment changes can be met, and the accuracy of visual position recognition can be improved.
示例性地,图2示出了上述的相似度权重的构造过程示意图,该相似度权重的构造过程与上述的步骤S101至步骤S104相对应。该相似度权重构成过程详述如下:Exemplarily, FIG. 2 shows a schematic diagram of the construction process of the above-mentioned similarity weight, and the construction process of the similarity weight corresponds to the above-mentioned step S101 to step S104. The process of forming the similarity weight is described in detail as follows:
A1:如图2所示,获取待识别图像I 1和参考图像I 2作为待处理图像后,每个待处理图像分割为4个对应的分块图像。具体地,待识别图像I 1被分割为四个待识别分块图像I 11、I 12、I 13、I 14,参考图像I 2被分割为对应的四个参考分块图像I 21、I 22、I 23、I 24A1: As shown in Figure 2, after obtaining the image to be recognized I 1 and the reference image I 2 as images to be processed, each image to be processed is divided into four corresponding block images. Specifically, the image I 1 to be identified is divided into four block images I 11 , I 12 , I 13 , and I 14 to be identified, and the reference image I 2 is divided into four corresponding reference block images I 21 , I 22 , I 23 , I 24 .
A2:接着,将各个分块图像依次输入AlexNet进行特征提取处理得到各个分块图像分别对应的初始特征信息后,再输入降维自编码器进行降维处理,从该降维自编码器中间的编码器输出层中获取降维得到的分块特征信息。降维自编码器的编码器输出的分块特征信息具体包括待识别分块图像I 11、I 12、I 13、I 14分别对应的待识别分块特征信息x 1、x 2、x 3、x 4,以及包括参考分块图像I 21、I 22、I 23、I 24分别对应的参考分块特征信息y 1、y 2、y 3、y 4A2: Next, input each segmented image into AlexNet for feature extraction processing to obtain the initial feature information corresponding to each segmented image, and then input the dimensionality reduction autoencoder for dimensionality reduction processing, from the middle of the dimensionality reduction autoencoder The block feature information obtained by dimensionality reduction is obtained in the output layer of the encoder. The block feature information output by the encoder of the dimensionality reduction autoencoder specifically includes block feature information x 1 , x 2 , x 3 , x 4 , and reference block feature information y 1 , y 2 , y 3 , y 4 corresponding to the reference block images I 21 , I 22 , I 23 , I 24 respectively.
A3:之后,根据各个分块特征信息确定各个分块图像之间的相似度关系。具体地,对于图2所示的相似度关系中的上面部分,表示的是:对于每个待识别分块图像,计算该待识别分块图像与其对应位置相同的参考分块图像之间的相似度信息,将这些相似度信息合并得到成对相似度特征向量V a;以及对于每个待识别分块图像,计算该待识别分块图像与其位置不同的参考分块图像之间的相似度信息,将这些相似度信息合并得到第一相似度特征向量V b;之后,将该成对相似度特征向量V a和第一相似度特征向量V b通过展开函数Flatten(·)合并展开为第一相似度关系向量V 1。对于图2所示的相似度关系中的下面部分,表示的是对于每个待识别分块图像,计算该待识别分块图像与其对应位置相同的参考分块图像之间的相似度信息,将这些相似度信息合并得到成对相似度特征向量V a;以及分别计算每两个待识别分块图像之间的相似度信息、每两个分块图像之间的相似度信息,将这些相似度信息合并得到第二相似度特征向量V c;之后,将该成对相似度特征向量V a和第二相似度特征向量V c通过展开函数Flatten(·)合并展开为第二相似度关系向量V 2A3: After that, determine the similarity relationship between each block image according to the feature information of each block. Specifically, for the upper part of the similarity relationship shown in Figure 2, it means: for each block image to be identified, calculate the similarity between the block image to be identified and the reference block image with the same corresponding position degree information, combining these similarity information to obtain a paired similarity feature vector V a ; and for each block image to be identified, calculating the similarity information between the block image to be identified and the reference block image whose position is different , combine these similarity information to obtain the first similarity feature vector V b ; after that, the paired similarity feature vector V a and the first similarity feature vector V b are expanded into the first Similarity relationship vector V 1 . For the lower part of the similarity relationship shown in Figure 2, it means that for each block image to be identified, the similarity information between the block image to be identified and the reference block image with the same corresponding position is calculated, and The similarity information is combined to obtain a paired similarity feature vector V a ; and the similarity information between each two block images to be identified and the similarity information between each two block images are calculated respectively, and these similarities The information is combined to obtain the second similarity feature vector V c ; after that, the paired similarity feature vector V a and the second similarity feature vector V c are combined and expanded into the second similarity relationship vector V by the expansion function Flatten( ) 2 .
A4:将第一相似度关系向量V 1输入权重自编码器中的第一权重自编码器,得到该第一相似度关系向量对应的权重WAE 1(V 1)后,根据上述的公式(8)计算得到第一相似度权重Alpha。将第二相似度关系向量V 2输入权重自编码器中的第二权重自编码器,得到该第二相似度关系向量对应的权重WAE 2(V 2)后,根据上述的公式(9)计算得到第二相似度权重Beta。 A4: Input the first similarity relationship vector V 1 into the first weight autoencoder in the weight autoencoder, after obtaining the weight WAE 1 (V 1 ) corresponding to the first similarity relationship vector, according to the above formula (8 ) to obtain the first similarity weight Alpha. Input the second similarity relationship vector V 2 into the second weight autoencoder in the weight autoencoder, and after obtaining the weight WAE 2 (V 2 ) corresponding to the second similarity relationship vector, calculate according to the above formula (9) Obtain the second similarity weight Beta.
示例性地,图3示出了图像识别中待识别图像和参考图像的匹配过程示意图。对于每个待识别图像Q i,在进行图像识别时,将该待识别图像Q i分别与每个参考图像R i依次组成一对待处理图像,对该待处理图像通过上述的步骤S101至步骤S105的处理,确定该对待处理图像中,待识别图像与参考图像之间的目标相似度;示例性地,当待处理图像分割出的分块图像的预设数目为4时,目标相似度
Figure PCTCN2021124169-appb-000007
对于n个待识别图像和n个参考图像,可以组成n*n对待处理图像,n*n对待处理图像对应的n*n个目标相似度可以组合成如图3所示的相似度矩阵。在该相似度矩阵中每一待识别图像Q i对应的一行目标相似度中,最大的目标相似度所在的列对应的参考图像即为该待识别图像的最佳匹配项,该最佳匹配项对应的实体信息可以作为该待识别图像的识别信息。
Exemplarily, FIG. 3 shows a schematic diagram of a matching process between an image to be recognized and a reference image in image recognition. For each image to be recognized Q i , when performing image recognition, the image to be recognized Q i and each reference image R i are sequentially formed into a pair of images to be processed, and the image to be processed is passed through the above steps S101 to S105 The process of determining the target similarity between the image to be recognized and the reference image in the image to be processed; for example, when the preset number of block images segmented from the image to be processed is 4, the target similarity
Figure PCTCN2021124169-appb-000007
For n images to be recognized and n reference images, n*n images to be processed can be composed, and n*n target similarities corresponding to n*n images to be processed can be combined into a similarity matrix as shown in FIG. 3 . In the target similarity of a row corresponding to each image to be recognized Q i in the similarity matrix, the reference image corresponding to the column where the maximum target similarity is located is the best matching item of the image to be recognized, and the best matching item The corresponding entity information may be used as identification information of the image to be identified.
在一个实施例中,上述图像识别过程可以通过目标模型的处理实现。该目标模型可以包括上述的AlexNet、降维自编码器和权重自编码器。在上述的步骤S101之前,可以基于样本图像对该目标模型中的AlexNet、降维自编码器和权重自编码器进行联合训练,得到已训练的目标模型。之后,获取包含待识别图像和参考图像的一对待处理图像并进行图像分割得到分块图像后,将分块图像输入已训练的目标模型,即可通过已训练的AlexNet实现初始特征信息的提取,并通过已训练的降维编码器对该初始特征信息进行降维处理得到 分块图像对应的分块特征信息。之后,各个分块图像的位置信息和分块特征信息,以及已训练的权重编码器,确定待识别图像和参考图像之间的相似度权重。基于该相似度权重,即可确定待识别图像和参考图像之间的目标相似度,进而确定待识别图像的识别结果。In one embodiment, the above-mentioned image recognition process can be realized by processing an object model. The target model can include the above-mentioned AlexNet, dimensionality reduction autoencoder and weight autoencoder. Before the above step S101, the AlexNet, the dimensionality reduction autoencoder and the weight autoencoder in the target model can be jointly trained based on the sample images to obtain the trained target model. Afterwards, after obtaining a pair of images to be processed including the image to be recognized and the reference image and performing image segmentation to obtain block images, the block images are input into the trained target model, and the initial feature information can be extracted through the trained AlexNet. And the dimensionality reduction processing is performed on the initial feature information through the trained dimensionality reduction encoder to obtain the block feature information corresponding to the block image. Afterwards, the position information and block feature information of each block image, as well as the trained weight encoder, determine the similarity weight between the image to be recognized and the reference image. Based on the similarity weight, the target similarity between the image to be recognized and the reference image can be determined, and then the recognition result of the image to be recognized can be determined.
在一个实施例中,可以基于三元组的方式对上述的目标模型进行训练。该基于三元组的训练方式具体以三元组样本图像作为输入目标模型的训练样本图像。该三元组样本图像包括锚点图像(anchor)、正样本图像(pos)和负样本图像(neg)。其中,锚点图像是图像相似度计算过程的基准目标图像;正样本图像指的是与该锚点图像表征的实体信息相同(例如为同一地点对应的图像)但拍摄时环境条件(包括光照等外观条件以及拍摄角度等)不同的图像;负样本图像指的是与该锚点图像表征的实体信息不同的图像(例如分别为不同地点对应的图像)。该目标模型的训练过程示意图如图4所示,详述如下:In one embodiment, the above-mentioned target model can be trained based on triplets. The triplet-based training method specifically uses triplet sample images as training sample images input to the target model. The triplet sample image includes an anchor point image (anchor), a positive sample image (pos) and a negative sample image (neg). Among them, the anchor image is the reference target image in the image similarity calculation process; the positive sample image refers to the entity information represented by the anchor image is the same (for example, an image corresponding to the same location) but the environmental conditions (including lighting, etc.) images with different appearance conditions, shooting angles, etc.); negative sample images refer to images with different entity information represented by the anchor image (for example, images corresponding to different locations). The schematic diagram of the training process of the target model is shown in Figure 4, and the details are as follows:
B1:三元组样本图像的筛选。首先从图像库中筛选合适的三元组样本图像作为目标模型的训练样本。示例性地,设正样本图像与锚点图像之间的相似度(简称为正样本相似度)为S pos,负样本图像与锚点图像之间的相似度(简称为负样本相似度)为S neg,则筛选出的三元组样本图像中,正样本相似度与负样本相似度需要满足公式(11)和公式(12)所示的两个约束条件: B1: Screening of triplet sample images. Firstly, suitable triplet sample images are screened from the image library as training samples for the target model. Exemplarily, it is assumed that the similarity between the positive sample image and the anchor point image (referred to as the positive sample similarity for short) is S pos , and the similarity between the negative sample image and the anchor point image (referred to as the negative sample similarity for short) is S neg , in the filtered triplet sample images, the similarity of positive samples and the similarity of negative samples need to meet the two constraints shown in formula (11) and formula (12):
公式(11):S pos>S neg Formula (11): S pos >S neg
公式(12):S pos-margin<S neg Formula (12): S pos -margin<S neg
公式(11)表示三元组样本图像中,与锚点图像表征的实体信息相同的正样本图像对应的正样本相似度应该大于负样本图像对应的负样本相似度。公式(12)中,margin为提前预设的值,公式(12)表示三元组样本图像中,正样本相似度与负样本相似度之间的差值不要过大,通过该约束条件,可以避免目标模型学习到一个简单三元组图像样本(例如正样本相似度为1或者几乎为1,负样本相似度为0或者几乎为0的三元组图像样本),从而导致目标模型的训练效果较差甚至出现过拟合、模型崩溃的情况。Formula (11) indicates that among the triplet sample images, the similarity of the positive samples corresponding to the positive sample images with the same entity information represented by the anchor image should be greater than the similarity of the negative samples corresponding to the negative sample images. In formula (12), margin is a preset value in advance. Formula (12) indicates that in triplet sample images, the difference between positive sample similarity and negative sample similarity should not be too large. Through this constraint, we can Avoid the target model learning a simple triplet image sample (such as a positive sample similarity is 1 or almost 1, a negative sample similarity is 0 or almost 0 triplet image samples), resulting in the target model The training effect is poor or even overfitting and model collapse occur.
B2:将筛选得到的三元组样本图像进行分割处理后输入目标模型中,通过目标模型的AlexNet分别进行特征提取处理得到各样本图像的分块图像对应的初始特征信息。B2: Segment the filtered triplet sample images and input them into the target model, and perform feature extraction processing through AlexNet of the target model to obtain the initial feature information corresponding to the block images of each sample image.
B2:将各个初始特征信息依次输入目标模型的自适应平均池化层AAP进行平均池化处理并通过展开函数Flatten(·)展开,得到池化特征f。B2: Each initial feature information is sequentially input into the adaptive average pooling layer AAP of the target model for average pooling processing and expanded by the expansion function Flatten( ) to obtain the pooled feature f.
B3:将各个池化特征f分别输入降维自编码器进行降维处理,将该降维自编码器的编码器部分输出的特征信息作为分块特征信息w,将该降维自编码器的解码器部分输出的特征信息作为解码特征信息z。B3: Each pooling feature f is input to the dimensionality reduction autoencoder for dimensionality reduction processing, the feature information output by the encoder part of the dimensionality reduction autoencoder is used as the block feature information w, and the dimensionality reduction autoencoder The feature information output by the decoder part is used as the decoding feature information z.
B4:基于正样本图像和锚点图像对应的各个分块图像的分块特征信息和目标模型的权重自编码器,确定得到正样本图像与锚点图像之间的相似度权重,并基于该相似度权重,计算得到正样本图像与锚点图像之间的正样本相似度S posB4: Based on the block feature information of each block image corresponding to the positive sample image and the anchor point image and the weight autoencoder of the target model, determine the similarity weight between the positive sample image and the anchor point image, and based on the similarity degree weight, and calculate the positive sample similarity S pos between the positive sample image and the anchor image.
B5:基于负样本图像和锚点图像对应的各个分块图像的分块特征信息和目标模型的权重自编码器,确定得到负样本图像与锚点图像之间的相似度权重,并基于该相似度权重,计算得到负样本图像与锚点图像之间的负样本相似度S negB5: Based on the block feature information of each block image corresponding to the negative sample image and the anchor point image and the weight autoencoder of the target model, determine the similarity weight between the negative sample image and the anchor point image, and based on the similarity The degree weight is calculated to obtain the negative sample similarity S neg between the negative sample image and the anchor image.
B6:基于步骤B3中输入降维自编码器的池化特征f、降维自编码器的解码器部分输出的解码特征信息z,通过预设的公式(13)作为损失函数计算得到降维编码器对应的均方误差值L Mse。该公式(13)如下所示: B6: Based on the pooling feature f of the input dimensionality reduction autoencoder in step B3 and the decoding feature information z output by the decoder part of the dimensionality reduction autoencoder, the dimensionality reduction encoding is obtained by calculating the preset formula (13) as a loss function The mean square error value L Mse corresponding to the device. This formula (13) is as follows:
Figure PCTCN2021124169-appb-000008
Figure PCTCN2021124169-appb-000008
其中,f an、z an分别表示锚点图像对应的池化特征、解码特征信息;f pos、z pos分别表示正样本图像对应的池化特征、解码特征信息;f neg、z neg分别表示负样本图像对应的池化特征、 解码特征信息;|| || 2表示二范数运算符。当公式(13)计算得到的均方误差值L Mse越小,表示降维自编码器的解码器部分输出的解码特征信息与输入的池化特征越接近,说明该降维自编码器的编码器编码得到的分块特征信息能够更准确地表示图像的特征。 Among them, f an and z an represent the pooling feature and decoding feature information corresponding to the anchor image respectively; f pos and z pos represent the pooling feature and decoding feature information corresponding to the positive sample image respectively; f neg and z neg represent the negative The pooling feature and decoding feature information corresponding to the sample image; || || 2 indicates the two-norm operator. When the mean square error value L Mse calculated by formula (13) is smaller, it means that the decoding feature information output by the decoder part of the dimensionality reduction autoencoder is closer to the input pooling feature, indicating that the dimensionality reduction autoencoder's encoding The block feature information obtained by encoder encoding can more accurately represent the features of the image.
B7:基于步骤B4计算得到的正样本相似度S pos和步骤B5计算得到的负样本相似度S neg,通过预设的公式(14)作为损失函数计算得到三元组网络损失值L Triplet。该公式(14)如下所示: B7: Based on the positive sample similarity S pos calculated in step B4 and the negative sample similarity S neg calculated in step B5, the triplet network loss value L Triplet is calculated by using the preset formula (14) as a loss function. This formula (14) is as follows:
Figure PCTCN2021124169-appb-000009
Figure PCTCN2021124169-appb-000009
在公式(14)中,M为三元组样本图像的数量;ln为自然对数,e为自然底数,margin、temper为提前设置的超参数。当公式(14)计算得到的三元组网络损失值L Triplet越小,表示当前的正样本相似度越接近1,负样本相似度越接近0,从而实现对正负样本的有效区分。 In formula (14), M is the number of triplet sample images; ln is the natural logarithm, e is the natural base, and margin and temper are hyperparameters set in advance. When the triplet network loss value L Triplet calculated by the formula (14) is smaller, it means that the similarity of the current positive sample is closer to 1, and the similarity of the negative sample is closer to 0, so as to realize the effective distinction between positive and negative samples.
B8:根据步骤B6中求得的均方误差值L Mse和步骤B7求得的三元组网络损失值L Triplet,通过公式(15)计算得到该目标模型的总损失函数值L total。该公式(15)如下所示: B8: According to the mean square error value L Mse obtained in step B6 and the triplet network loss value L Triplet obtained in step B7, calculate the total loss function value L total of the target model by formula (15). This formula (15) is as follows:
L total=λ 1L Mse2L Triplet L total =λ 1 L Mse2 L Triplet
在公式(15)中,λ 1和λ 2作为目标模型的超参数,其实际值根据经验提前设定。 In formula (15), λ 1 and λ 2 are used as hyperparameters of the target model, and their actual values are set in advance according to experience.
B9:通过计算目标模型的总损失函数值L total,迭代更新目标模型中AlexNet、降维自编码器、权重自编码器等各个神经网络的网络参数,并通过上述的步骤B2至B9继续对目标模型进行训练,直至最终计算得到的总损失函数值L total小于预设的损失值,则停止训练,得到已训练的目标模型。 B9: By calculating the total loss function value L total of the target model, iteratively update the network parameters of each neural network such as AlexNet, dimensionality reduction autoencoder, and weight autoencoder in the target model, and continue to target the target through the above steps B2 to B9 The model is trained until the final calculated total loss function value L total is less than the preset loss value, then the training is stopped to obtain the trained target model.
通过上述的训练步骤,能够准确地完成对目标模型的训练,得到已训练的目标模型,使得后续能够基于该已训练的目标模型中已训练的AlexNet、降维编码器和权重编码器,准确地实现图像识别,提高图像识别的准确率。Through the above training steps, the training of the target model can be accurately completed, and the trained target model can be obtained, so that the subsequent training can be based on the trained AlexNet, dimensionality reduction encoder and weight encoder in the trained target model. Realize image recognition and improve the accuracy of image recognition.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.
实施例二:Embodiment two:
图5示出了本申请实施例提供的一种图像识别装置的结构示意图,为了便于说明,仅示出了与本申请实施例相关的部分:Figure 5 shows a schematic structural diagram of an image recognition device provided by the embodiment of the present application. For the convenience of description, only the parts related to the embodiment of the present application are shown:
该图像识别装置包括:分割单元51、特征提取单元52、相似度权重确定单元53、目标相似度确定单元54和识别结果确定单元55。其中:The image recognition device includes: a segmentation unit 51 , a feature extraction unit 52 , a similarity weight determination unit 53 , a target similarity determination unit 54 and a recognition result determination unit 55 . in:
分割单元51,用于将待处理图像分割为预设数目个分块图像;其中,所述待处理图像包括待识别图像和参考图像,所述待识别图像对应的所述分块图像为待识别分块图像,所述参考图像对应的所述分块图像为参考分块图像;每个所述待识别分块图像均存在位置信息一一对应的所述参考分块图像。A segmentation unit 51, configured to divide the image to be processed into a preset number of block images; wherein, the image to be processed includes an image to be identified and a reference image, and the block image corresponding to the image to be identified is a block image to be identified For a block image, the block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one correspondence of position information with the reference block image.
特征提取单元52,用于分别对每个所述分块图像进行特征提取处理,得到每个所述分块图像分别对应的分块特征信息。The feature extraction unit 52 is configured to perform feature extraction processing on each of the block images to obtain block feature information corresponding to each of the block images.
相似度权重确定单元53,用于根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定所述待识别图像和所述参考图像之间的相似度权重。The similarity weight determination unit 53 is configured to determine the similarity weight between the image to be recognized and the reference image according to the position information of each of the block images and the corresponding block feature information.
目标相似度确定单元54,用于根据各个所述分块特征信息以及所述相似度权重,确定所述待识别图像和所述参考图像之间的目标相似度。The target similarity determination unit 54 is configured to determine the target similarity between the image to be recognized and the reference image according to each of the block feature information and the similarity weight.
识别结果确定单元55,用于根据所述目标相似度,确定所述待识别图像的识别结果。The recognition result determination unit 55 is configured to determine the recognition result of the image to be recognized according to the target similarity.
可选地,所述特征提取单元包括:Optionally, the feature extraction unit includes:
初始特征信息确定模块,用于对于每个所述分块图像,将所述分块图像输入已训练的卷积神经网络进行处理,得到所述分块图像对应的初始特征信息;The initial feature information determination module is used for, for each of the block images, inputting the block images into a trained convolutional neural network for processing to obtain initial feature information corresponding to the block images;
降维模块,用于将所述初始特征信息进行降维处理,得到所述分块图像对应的分块特征信息。A dimensionality reduction module, configured to perform dimensionality reduction processing on the initial feature information to obtain block feature information corresponding to the block image.
可选地,所述降维模块,具体用于将所述初始特征信息输入自适应平均池化层和/或已训练的降维自编码器进行降维处理,得到所述分块图像对应的分块特征信息。Optionally, the dimensionality reduction module is specifically configured to input the initial feature information into the adaptive average pooling layer and/or the trained dimensionality reduction autoencoder for dimensionality reduction processing, to obtain the corresponding Block feature information.
可选地,所述图像识别装置还包括:Optionally, the image recognition device also includes:
降维自编码器训练单元,用于获取预设样本特征信息,并将所述预设样本特征信息输入待训练的降维自编码器;调整所述待训练的降维自编码器的参数,以使得所述预设样本特征信息与所述降维自编码器的解码器输出的解码特征信息之间的均方误差小于预设阈值,得到已训练的降维自编码器。A dimensionality reduction autoencoder training unit, configured to obtain preset sample feature information, and input the preset sample feature information into the dimensionality reduction autoencoder to be trained; adjust the parameters of the dimensionality reduction autoencoder to be trained, In order to make the mean square error between the preset sample feature information and the decoding feature information output by the decoder of the dimensionality reduction autoencoder smaller than a preset threshold, a trained dimensionality reduction autoencoder is obtained.
可选地,所述相似度权重确定单元,包括:Optionally, the similarity weight determination unit includes:
相似度特征向量确定模块,用于根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定成对相似度特征向量和非成对相似度特征向量;其中,所述成对相似度特征向量包含位置信息相对应的所述待识别分块图像和所述参考分块图像之间的相似度信息;所述非成对相似度特征向量包含位置信息不对应的所述分块图像之间的相似度信息;A similarity feature vector determination module, configured to determine a paired similarity feature vector and an unpaired similarity feature vector according to the position information of each of the block images and the corresponding block feature information; wherein, the paired A pair of similarity feature vectors includes similarity information between the block image to be identified corresponding to the position information and the reference block image; the unpaired similarity feature vector includes the block image not corresponding to the position information Similarity information between block images;
相似度权重确定模块,用于根据所述成对相似度特征向量和所述非成对相似度特征向量,确定所述相似度权重。A similarity weight determining module, configured to determine the similarity weight according to the paired similarity feature vector and the non-paired similarity feature vector.
可选地,所述相似度权重确定模块,具体用于根据所述成对相似度特征向量、所述非成对相似度特征向量以及预设的权重自编码器,确定所述相似度权重。Optionally, the similarity weight determination module is specifically configured to determine the similarity weight according to the paired similarity feature vector, the non-paired similarity feature vector, and a preset weight autoencoder.
可选地,所述非成对相似度特征向量包括第一相似度特征向量和第二相似度特征向量,所述第一相似度特征向量包含位置信息不对应的所述待识别分块图像和所述参考分块图像之间的相似度信息,所述第二相似度特征向量包含同一待处理图像中不同分块图像之间的相似度信息;Optionally, the unpaired similarity feature vector includes a first similarity feature vector and a second similarity feature vector, and the first similarity feature vector includes the block image to be identified and The similarity information between the reference block images, the second similarity feature vector includes similarity information between different block images in the same image to be processed;
对应地,所述相似度权重包括第一相似度权重和第二相似度权重,所述相似度权重确定模块,具体用于根据所述成对相似度特征向量、所述第一相似度特征向量和预设的第一权重自编码器,确定第一相似度权重;根据所述成对相似度特征向量、所述第二相似度特征向量和预设的第二权重自编码器,确定第二相似度权重。Correspondingly, the similarity weight includes a first similarity weight and a second similarity weight, and the similarity weight determination module is specifically configured to and the preset first weight autoencoder to determine the first similarity weight; according to the paired similarity feature vector, the second similarity feature vector and the preset second weight autoencoder, determine the second similarity weight.
可选地,所述目标相似度确定单元,具体用于根据各个所述分块特征信息,确定所述待识别图像和所述参考图像之间的初始相似度;将所述初始相似度与所述相似度权重相乘,得到所述待识别图像和所述参考图像之间的目标相似度。Optionally, the target similarity determination unit is specifically configured to determine an initial similarity between the image to be recognized and the reference image according to each of the block feature information; combine the initial similarity with the The similarity weight is multiplied to obtain the target similarity between the image to be recognized and the reference image.
可选地,所述待识别图像为待识别场景图像,所述参考图像为参考场景图像;所述待识别图像的识别结果包括所述待识别场景图像的视觉位置识别结果。Optionally, the image to be recognized is a scene image to be recognized, and the reference image is a reference scene image; the recognition result of the image to be recognized includes a visual position recognition result of the scene image to be recognized.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction and execution process between the above-mentioned devices/units are based on the same concept as the method embodiment of the present application, and its specific functions and technical effects can be found in the method embodiment section. I won't repeat them here.
实施例三:Embodiment three:
图6是本申请一实施例提供的电子设备的示意图。如图6所示,该实施例的电子设备6包括:处理器60、存储器61以及存储在所述存储器61中并可在所述处理器60上运行的计算机程序62,例如图像识别程序。所述处理器60执行所述计算机程序62时实现上述各个图像方法实施例中的步骤,例如图1所示的步骤S101至S105。或者,所述处理器60执行所述计算机程序62时实现上述各装置实施例中各模块/单元的功能,例如图5所示分割单元51至识别结果确定单元55的功能。Fig. 6 is a schematic diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 6 , the electronic device 6 of this embodiment includes: a processor 60 , a memory 61 , and a computer program 62 stored in the memory 61 and operable on the processor 60 , such as an image recognition program. When the processor 60 executes the computer program 62, the steps in the above-mentioned various image method embodiments are implemented, for example, steps S101 to S105 shown in FIG. 1 . Alternatively, when the processor 60 executes the computer program 62, it realizes the functions of each module/unit in the above-mentioned device embodiments, such as the functions of the segmentation unit 51 to the recognition result determination unit 55 shown in FIG. 5 .
示例性的,所述计算机程序62可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器61中,并由所述处理器60执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序62在所述电子设备6中的执行过程。Exemplarily, the computer program 62 can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 60 to complete this application. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 62 in the electronic device 6 .
所述电子设备6可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。 所述电子设备可包括,但不仅限于,处理器60、存储器61。本领域技术人员可以理解,图6仅仅是电子设备6的示例,并不构成对电子设备6的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备还可以包括输入输出设备、网络接入设备、总线等。The electronic device 6 may be computing devices such as desktop computers, notebooks, palmtop computers, and cloud servers. The electronic device may include, but not limited to, a processor 60 and a memory 61 . Those skilled in the art can understand that FIG. 6 is only an example of the electronic device 6, and does not constitute a limitation to the electronic device 6. It may include more or less components than those shown in the illustration, or combine some components, or different components. , for example, the electronic device may also include an input and output device, a network access device, a bus, and the like.
所称处理器60可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 60 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
所述存储器61可以是所述电子设备6的内部存储单元,例如电子设备6的硬盘或内存。所述存储器61也可以是所述电子设备6的外部存储设备,例如所述电子设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器61还可以既包括所述电子设备6的内部存储单元也包括外部存储设备。所述存储器61用于存储所述计算机程序以及所述电子设备所需的其他程序和数据。所述存储器61还可以用于暂时地存储已经输出或者将要输出的数据。The storage 61 may be an internal storage unit of the electronic device 6 , such as a hard disk or memory of the electronic device 6 . The memory 61 can also be an external storage device of the electronic device 6, such as a plug-in hard disk equipped on the electronic device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Further, the memory 61 may also include both an internal storage unit of the electronic device 6 and an external storage device. The memory 61 is used to store the computer program and other programs and data required by the electronic device. The memory 61 can also be used to temporarily store data that has been output or will be output.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Completion of modules means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-described embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still implement the foregoing embodiments Modifications to the technical solutions described in the examples, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application, and should be included in the Within the protection scope of this application.

Claims (20)

  1. 一种图像识别方法,其特征在于,包括:An image recognition method, characterized in that, comprising:
    将待处理图像分割为预设数目个分块图像;其中,所述待处理图像包括待识别图像和参考图像,所述待识别图像对应的所述分块图像为待识别分块图像,所述参考图像对应的所述分块图像为参考分块图像;每个所述待识别分块图像均存在位置信息一一对应的所述参考分块图像;Dividing the image to be processed into a preset number of block images; wherein, the image to be processed includes an image to be identified and a reference image, the block image corresponding to the image to be identified is a block image to be identified, the The block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one corresponding position information of the reference block image;
    分别对每个所述分块图像进行特征提取处理,得到每个所述分块图像分别对应的分块特征信息;performing feature extraction processing on each of the block images respectively, to obtain block feature information corresponding to each of the block images;
    根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定所述待识别图像和所述参考图像之间的相似度权重;determining a similarity weight between the image to be identified and the reference image according to the position information of each of the block images and the corresponding block feature information;
    根据各个所述分块特征信息以及所述相似度权重,确定所述待识别图像和所述参考图像之间的目标相似度;determining the target similarity between the image to be recognized and the reference image according to each of the block feature information and the similarity weight;
    根据所述目标相似度,确定所述待识别图像的识别结果。A recognition result of the image to be recognized is determined according to the target similarity.
  2. 如权利要求1所述的图像识别方法,其特征在于,所述分别对每个所述分块图像进行特征提取处理,得到每个所述分块图像分别对应的分块特征信息,包括:The image recognition method according to claim 1, wherein the feature extraction process is performed on each of the block images to obtain block feature information corresponding to each of the block images, including:
    对于每个所述分块图像,将所述分块图像输入已训练的卷积神经网络进行处理,得到所述分块图像对应的初始特征信息;For each of the block images, input the block images into a trained convolutional neural network for processing to obtain initial feature information corresponding to the block images;
    将所述初始特征信息进行降维处理,得到所述分块图像对应的分块特征信息。Perform dimensionality reduction processing on the initial feature information to obtain block feature information corresponding to the block image.
  3. 如权利要求2所述的图像识别方法,其特征在于,所述将所述初始特征信息进行降维处理,得到所述分块图像对应的分块特征信息,包括:The image recognition method according to claim 2, wherein the dimensionality reduction processing of the initial feature information to obtain the block feature information corresponding to the block image includes:
    将所述初始特征信息输入自适应平均池化层和/或已训练的降维自编码器进行降维处理,得到所述分块图像对应的分块特征信息。Inputting the initial feature information into an adaptive average pooling layer and/or a trained dimensionality reduction autoencoder for dimensionality reduction processing to obtain block feature information corresponding to the block image.
  4. 如权利要求3所述的图像识别方法,其特征在于,在所述对于每个所述分块图像,将所述分块图像输入已训练的卷积神经网络进行处理,得到所述分块图像对应的初始特征信息之前,还包括:The image recognition method according to claim 3, wherein, for each of the block images, the block images are input into a trained convolutional neural network for processing to obtain the block images Before the corresponding initial feature information, it also includes:
    获取预设样本特征信息,并将所述预设样本特征信息输入待训练的降维自编码器;Obtaining preset sample feature information, and inputting the preset sample feature information into a dimensionality reduction autoencoder to be trained;
    调整所述待训练的降维自编码器的参数,以使得所述预设样本特征信息与所述降维自编码器的解码器输出的解码特征信息之间的均方误差小于预设阈值,得到已训练的降维自编码器。Adjusting the parameters of the dimensionality reduction autoencoder to be trained, so that the mean square error between the preset sample feature information and the decoding feature information output by the decoder of the dimensionality reduction autoencoder is less than a preset threshold, Get the trained dimensionality-reduced autoencoder.
  5. 如权利要求1所述的图像识别方法,其特征在于,所述根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定所述待识别图像和所述参考图像之间的相似度权重,包括:The image recognition method according to claim 1, wherein the distance between the image to be recognized and the reference image is determined according to the position information of each of the block images and the corresponding block feature information. The similarity weight of , including:
    根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定成对相似度特征向量和非成对相似度特征向量;其中,所述成对相似度特征向量包含位置信息相对应的所述待识别分块图像和所述参考分块图像之间的相似度信息;所述非成对相似度特征向量包含位置信息不对应的所述分块图像之间的相似度信息;According to the position information of each of the block images and the corresponding block feature information, a paired similarity feature vector and an unpaired similarity feature vector are determined; wherein, the paired similarity feature vector includes position information relative The corresponding similarity information between the block image to be identified and the reference block image; the unpaired similarity feature vector includes similarity information between the block images whose position information does not correspond;
    根据所述成对相似度特征向量和所述非成对相似度特征向量,确定所述相似度权重。The similarity weight is determined according to the paired similarity feature vector and the non-paired similarity feature vector.
  6. 如权利要求5所述的图像识别方法,其特征在于,所述根据所述成对相似度特征向量和所述非成对相似度特征向量,确定所述相似度权重,包括:The image recognition method according to claim 5, wherein said determining said similarity weight according to said paired similarity feature vector and said non-paired similarity feature vector comprises:
    根据所述成对相似度特征向量、所述非成对相似度特征向量以及预设的权重自编码器,确定所述相似度权重。The similarity weight is determined according to the paired similarity feature vector, the non-paired similarity feature vector, and a preset weight autoencoder.
  7. 如权利要求6所述的图像识别方法,其特征在于,所述非成对相似度特征向量包括第一相似度特征向量和第二相似度特征向量,所述第一相似度特征向量包含位置信息不对应的所述待识别分块图像和所述参考分块图像之间的相似度信息,所述第二相似度特征向量包含同一待处理图像中不同分块图像之间的相似度信息;The image recognition method according to claim 6, wherein the non-paired similarity feature vectors include a first similarity feature vector and a second similarity feature vector, and the first similarity feature vector contains position information The similarity information between the block image to be identified and the reference block image that does not correspond, the second similarity feature vector includes similarity information between different block images in the same image to be processed;
    对应地,所述根据所述成对相似度特征向量、所述非成对相似度特征向量以及预设的权重自编码器,确定所述相似度权重,包括:Correspondingly, the determining the similarity weight according to the paired similarity feature vector, the non-paired similarity feature vector and the preset weight autoencoder includes:
    根据所述成对相似度特征向量、所述第一相似度特征向量和预设的第一权重自编码器,确定第一相似度权重;Determine a first similarity weight according to the paired similarity feature vector, the first similarity feature vector, and a preset first weight autoencoder;
    根据所述成对相似度特征向量、所述第二相似度特征向量和预设的第二权重自编码器,确定第二相似度权重。The second similarity weight is determined according to the paired similarity feature vector, the second similarity feature vector, and a preset second weight autoencoder.
  8. 如权利要求1所述的图像识别方法,其特征在于,所述根据各个所述分块特征信息以及所述相似度权重,确定所述待识别图像和所述参考图像之间的目标相似度,包括:The image recognition method according to claim 1, wherein the target similarity between the image to be recognized and the reference image is determined according to each of the block feature information and the similarity weight, include:
    根据各个所述分块特征信息,确定所述待识别图像和所述参考图像之间的初始相似度;determining an initial similarity between the image to be recognized and the reference image according to each of the block feature information;
    将所述初始相似度与所述相似度权重相乘,得到所述待识别图像和所述参考图像之间的目标相似度。The initial similarity is multiplied by the similarity weight to obtain a target similarity between the image to be recognized and the reference image.
  9. 如权利要求1至8任意一项所述的图像识别方法,其特征在于,所述待识别图像为待识别场景图像,所述参考图像为参考场景图像;所述待识别图像的识别结果包括所述待识别场景图像的视觉位置识别结果。The image recognition method according to any one of claims 1 to 8, wherein the image to be recognized is a scene image to be recognized, and the reference image is a reference scene image; the recognition result of the image to be recognized includes the Describe the visual position recognition results of the scene image to be recognized.
  10. 一种图像识别装置,其特征在于,包括:An image recognition device, characterized in that it comprises:
    分割单元,用于将待处理图像分割为预设数目个分块图像;其中,所述待处理图像包括待识别图像和参考图像,所述待识别图像对应的所述分块图像为待识别分块图像,所述参考图像对应的所述分块图像为参考分块图像;每个所述待识别分块图像均存在位置信息一一对应的所述参考分块图像;A segmentation unit, configured to divide the image to be processed into a preset number of block images; wherein, the image to be processed includes an image to be recognized and a reference image, and the block image corresponding to the image to be recognized is a block image to be recognized A block image, the block image corresponding to the reference image is a reference block image; each of the block images to be identified has a one-to-one corresponding position information of the reference block image;
    特征提取单元,用于分别对每个所述分块图像进行特征提取处理,得到每个所述分块图像分别对应的分块特征信息;A feature extraction unit, configured to perform feature extraction processing on each of the block images to obtain block feature information corresponding to each of the block images;
    相似度权重确定单元,用于根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定所述待识别图像和所述参考图像之间的相似度权重;A similarity weight determining unit, configured to determine a similarity weight between the image to be recognized and the reference image according to the position information of each of the block images and the corresponding block feature information;
    目标相似度确定单元,用于根据各个所述分块特征信息以及所述相似度权重,确定所述待识别图像和所述参考图像之间的目标相似度;A target similarity determining unit, configured to determine the target similarity between the image to be recognized and the reference image according to each of the block feature information and the similarity weight;
    识别结果确定单元,用于根据所述目标相似度,确定所述待识别图像的识别结果。The recognition result determination unit is configured to determine the recognition result of the image to be recognized according to the target similarity.
  11. 如权利要求10所述的图像识别装置,其特征在于,所述特征提取单元包括:The image recognition device according to claim 10, wherein the feature extraction unit comprises:
    初始特征信息确定模块,用于对于每个所述分块图像,将所述分块图像输入已训练的卷积神经网络进行处理,得到所述分块图像对应的初始特征信息;The initial feature information determination module is used for, for each of the block images, inputting the block images into a trained convolutional neural network for processing to obtain initial feature information corresponding to the block images;
    降维模块,用于将所述初始特征信息进行降维处理,得到所述分块图像对应的分块特征信息。A dimensionality reduction module, configured to perform dimensionality reduction processing on the initial feature information to obtain block feature information corresponding to the block image.
  12. 如权利要求11所述的图像识别装置,其特征在于,所述降维模块,具体用于将所述初始特征信息输入自适应平均池化层和/或已训练的降维自编码器进行降维处理,得到所述分块图像对应的分块特征信息。The image recognition device according to claim 11, wherein the dimensionality reduction module is specifically configured to input the initial feature information into an adaptive average pooling layer and/or a trained dimensionality reduction autoencoder for reducing Dimensional processing to obtain block feature information corresponding to the block image.
  13. 如权利要求12所述的图像识别装置,其特征在于,所述图像识别装置还包括:The image recognition device according to claim 12, wherein the image recognition device further comprises:
    降维自编码器训练单元,用于获取预设样本特征信息,并将所述预设样本特征信息输入待训练的降维自编码器;调整所述待训练的降维自编码器的参数,以使得所述预设样本特征信息与所述降维自编码器的解码器输出的解码特征信息之间的均方误差小于预设阈值,得到已训练的降维自编码器。A dimensionality reduction autoencoder training unit, configured to obtain preset sample feature information, and input the preset sample feature information into the dimensionality reduction autoencoder to be trained; adjust the parameters of the dimensionality reduction autoencoder to be trained, In order to make the mean square error between the preset sample feature information and the decoding feature information output by the decoder of the dimensionality reduction autoencoder smaller than a preset threshold, a trained dimensionality reduction autoencoder is obtained.
  14. 如权利要求10所述的图像识别装置,其特征在于,所述相似度权重确定单元,包括:The image recognition device according to claim 10, wherein the similarity weight determining unit comprises:
    相似度特征向量确定模块,用于根据各个所述分块图像的位置信息和对应的所述分块特征信息,确定成对相似度特征向量和非成对相似度特征向量;其中,所述成对相似度特征向量包含位置信息相对应的所述待识别分块图像和所述参考分块图像之间的相似度信息;所述非成对相似度特征向量包含位置信息不对应的所述分块图像之间的相似度信息;A similarity feature vector determination module, configured to determine a paired similarity feature vector and an unpaired similarity feature vector according to the position information of each of the block images and the corresponding block feature information; wherein, the paired A pair of similarity feature vectors includes similarity information between the block image to be identified corresponding to the position information and the reference block image; the unpaired similarity feature vector includes the block image not corresponding to the position information Similarity information between block images;
    相似度关系向量确定模块,用于根据所述成对相似度特征向量和所述非成对相似度特 征向量,确定所述相似度权重。A similarity relationship vector determining module is used to determine the similarity weight according to the paired similarity feature vector and the non-paired similarity feature vector.
  15. 如权利要求14所述的图像识别装置,其特征在于,所述相似度权重确定模块,具体用于根据所述成对相似度特征向量、所述非成对相似度特征向量以及预设的权重自编码器,确定所述相似度权重。The image recognition device according to claim 14, wherein the similarity weight determination module is specifically configured to, according to the paired similarity feature vector, the non-paired similarity feature vector and the preset weight An autoencoder that determines the similarity weights.
  16. 如权利要求15所述的图像识别装置,其特征在于,所述非成对相似度特征向量包括第一相似度特征向量和第二相似度特征向量,所述第一相似度特征向量包含位置信息不对应的所述待识别分块图像和所述参考分块图像之间的相似度信息,所述第二相似度特征向量包含同一待处理图像中不同分块图像之间的相似度信息;对应地:The image recognition device according to claim 15, wherein the non-paired similarity feature vectors include a first similarity feature vector and a second similarity feature vector, and the first similarity feature vector contains position information The similarity information between the block image to be identified and the reference block image that does not correspond, the second similarity feature vector includes similarity information between different block images in the same image to be processed; corresponding land:
    对应地,所述相似度权重包括第一相似度权重和第二相似度权重,所述相似度权重确定模块,具体用于根据所述成对相似度特征向量、所述第一相似度特征向量和预设的第一权重自编码器,确定第一相似度权重;根据所述成对相似度特征向量、所述第二相似度特征向量和预设的第二权重自编码器,确定第二相似度权重。Correspondingly, the similarity weight includes a first similarity weight and a second similarity weight, and the similarity weight determination module is specifically configured to and the preset first weight autoencoder to determine the first similarity weight; according to the paired similarity feature vector, the second similarity feature vector and the preset second weight autoencoder, determine the second similarity weight.
  17. 如权利要求10所述的图像识别装置,其特征在于,所述目标相似度确定单元,具体用于根据各个所述分块特征信息,确定所述待识别图像和所述参考图像之间的初始相似度;将所述初始相似度与所述相似度权重相乘,得到所述待识别图像和所述参考图像之间的目标相似度。The image recognition device according to claim 10, wherein the target similarity determination unit is specifically configured to determine the initial distance between the image to be recognized and the reference image according to each of the block feature information. Similarity: multiplying the initial similarity by the similarity weight to obtain a target similarity between the image to be recognized and the reference image.
  18. 如权利要求10至17任意一项所述的图像识别装置,其特征在于,所述待识别图像为待识别场景图像,所述参考图像为参考场景图像;所述待识别图像的识别结果包括所述待识别场景图像的视觉位置识别结果。The image recognition device according to any one of claims 10 to 17, wherein the image to be recognized is a scene image to be recognized, and the reference image is a reference scene image; the recognition result of the image to be recognized includes the Describe the visual position recognition results of the scene image to be recognized.
  19. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,当所述处理器执行所述计算机程序时,使得电子设备实现如权利要求1至9任一项所述方法的步骤。An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein when the processor executes the computer program, the electronic device realizes The steps of the method according to any one of claims 1 to 9.
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,当所述计算机程序被处理器执行时,使得电子设备实现如权利要求1至9任一项所述方法的步骤。A computer-readable storage medium, the computer-readable storage medium stores a computer program, characterized in that, when the computer program is executed by a processor, the electronic device realizes any one of claims 1 to 9. method steps.
PCT/CN2021/124169 2021-10-15 2021-10-15 Image recognition method and apparatus, and electronic device and storage medium WO2023060575A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/124169 WO2023060575A1 (en) 2021-10-15 2021-10-15 Image recognition method and apparatus, and electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/124169 WO2023060575A1 (en) 2021-10-15 2021-10-15 Image recognition method and apparatus, and electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2023060575A1 true WO2023060575A1 (en) 2023-04-20

Family

ID=85987992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124169 WO2023060575A1 (en) 2021-10-15 2021-10-15 Image recognition method and apparatus, and electronic device and storage medium

Country Status (1)

Country Link
WO (1) WO2023060575A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218456A (en) * 2023-11-07 2023-12-12 杭州灵西机器人智能科技有限公司 Image labeling method, system, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875572A (en) * 2018-05-11 2018-11-23 电子科技大学 The pedestrian's recognition methods again inhibited based on background
CN109271870A (en) * 2018-08-21 2019-01-25 平安科技(深圳)有限公司 Pedestrian recognition methods, device, computer equipment and storage medium again
CN109829448A (en) * 2019-03-07 2019-05-31 苏州市科远软件技术开发有限公司 Face identification method, device and storage medium
US20190171908A1 (en) * 2017-12-01 2019-06-06 The University Of Chicago Image Transformation with a Hybrid Autoencoder and Generative Adversarial Network Machine Learning Architecture

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190171908A1 (en) * 2017-12-01 2019-06-06 The University Of Chicago Image Transformation with a Hybrid Autoencoder and Generative Adversarial Network Machine Learning Architecture
CN108875572A (en) * 2018-05-11 2018-11-23 电子科技大学 The pedestrian's recognition methods again inhibited based on background
CN109271870A (en) * 2018-08-21 2019-01-25 平安科技(深圳)有限公司 Pedestrian recognition methods, device, computer equipment and storage medium again
CN109829448A (en) * 2019-03-07 2019-05-31 苏州市科远软件技术开发有限公司 Face identification method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117218456A (en) * 2023-11-07 2023-12-12 杭州灵西机器人智能科技有限公司 Image labeling method, system, electronic equipment and storage medium
CN117218456B (en) * 2023-11-07 2024-02-02 杭州灵西机器人智能科技有限公司 Image labeling method, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US20210142108A1 (en) Methods, apparatus, and storage medium for classifying graph nodes
WO2022017245A1 (en) Text recognition network, neural network training method, and related device
WO2020199468A1 (en) Image classification method and device, and computer readable storage medium
WO2020228525A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
WO2019120110A1 (en) Image reconstruction method and device
WO2022116856A1 (en) Model structure, model training method, and image enhancement method and device
Yan et al. Ranking with uncertain labels
CN112001914A (en) Depth image completion method and device
WO2016054779A1 (en) Spatial pyramid pooling networks for image processing
CN112115783A (en) Human face characteristic point detection method, device and equipment based on deep knowledge migration
US20190138816A1 (en) Method and apparatus for segmenting video object, electronic device, and storage medium
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
US20170061253A1 (en) Method and device for determining the shape of an object represented in an image, corresponding computer program product and computer-readable medium
JP2017062778A (en) Method and device for classifying object of image, and corresponding computer program product and computer-readable medium
WO2021164269A1 (en) Attention mechanism-based disparity map acquisition method and apparatus
CN112084849A (en) Image recognition method and device
WO2023109361A1 (en) Video processing method and system, device, medium and product
Liu et al. Learning 2d-3d correspondences to solve the blind perspective-n-point problem
CN111104941B (en) Image direction correction method and device and electronic equipment
WO2023060575A1 (en) Image recognition method and apparatus, and electronic device and storage medium
Chen et al. StateNet: Deep state learning for robust feature matching of remote sensing images
US20230021551A1 (en) Using training images and scaled training images to train an image segmentation model
CN116310462A (en) Image clustering method and device based on rank constraint self-expression
CN110717405A (en) Face feature point positioning method, device, medium and electronic equipment
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21960306

Country of ref document: EP

Kind code of ref document: A1