WO2021036028A1 - 图像特征提取及网络的训练方法、装置和设备 - Google Patents

图像特征提取及网络的训练方法、装置和设备 Download PDF

Info

Publication number
WO2021036028A1
WO2021036028A1 PCT/CN2019/120028 CN2019120028W WO2021036028A1 WO 2021036028 A1 WO2021036028 A1 WO 2021036028A1 CN 2019120028 W CN2019120028 W CN 2019120028W WO 2021036028 A1 WO2021036028 A1 WO 2021036028A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
neighbor
node
training
Prior art date
Application number
PCT/CN2019/120028
Other languages
English (en)
French (fr)
Inventor
李岁缠
陈大鹏
赵瑞
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to JP2022500674A priority Critical patent/JP2022539423A/ja
Priority to KR1020227000630A priority patent/KR20220017497A/ko
Publication of WO2021036028A1 publication Critical patent/WO2021036028A1/zh
Priority to US17/566,740 priority patent/US20220122343A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • G06V10/426Graphical representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present disclosure relates to computer vision technology, in particular to an image feature extraction and network training method, device and equipment.
  • Image retrieval can include text-based image retrieval and content-based image retrieval (CBIR, Content-Based Image Retrieval) according to different ways of describing image content.
  • CBIR Content-Based Image Retrieval
  • content-based image retrieval technology has broad application prospects in industrial fields such as e-commerce, leather cloth, copyright protection, medical diagnosis, public safety, and street view maps.
  • the present disclosure provides at least one image feature extraction and network training method, device and equipment.
  • an image feature extraction method includes:
  • the first association graph including a master node and at least one neighbor node, the node value of the master node represents the image feature of the target image, and the node value of the neighbor node represents the image feature of the neighbor image,
  • the neighbor image is an image similar to the target image;
  • the first correlation graph is input to a feature update network, and the feature update network updates the node value of the master node according to the node value of the neighbor node in the first correlation graph to obtain the image feature of the updated target image .
  • the method before obtaining the first correlation map, further includes: obtaining neighbor images similar to the target image from an image library according to the target image.
  • obtaining neighbor images similar to the target image from an image library includes: separately obtaining image features of the target image and each library image in the image library through a feature extraction network Based on the feature similarity between the image feature of the target image and the image feature of each of the library images in the image library, determine neighbor images similar to the target image from the image library.
  • determining a neighbor image similar to the target image includes: combining the target image The feature similarity with each of the library images is sorted according to the numerical value of the feature similarity in descending order; the library image corresponding to the feature similarity with the first preset number of digits is selected as the similarity to the target image Neighbor image.
  • a neighbor image similar to the target image is determined from the image library , Including: obtaining a first image similar to the target image from each of the library images according to the feature similarity between the image features of the target image and the image features of each of the library images; The feature similarity between the image feature of the first image and the image feature of each of the library images, a second image similar to the first image is obtained from each of the library images; and the first image and The second image is used as a neighbor image of the target image.
  • the number of the feature update network is one, or N stacked in sequence, where N is an integer greater than 1; when the number of the feature update network is N: where the i-th feature update network The input of is the updated first correlation graph output by the i-1th feature update network, where i is an integer greater than 1 and less than or equal to N.
  • the feature update network updates the node value of the master node according to the node value of the neighbor node in the first association graph to obtain the image feature of the updated target image, including: determining the first association graph A weight between the master node and each of the neighbor nodes in an association graph; combine the image features of the neighbor nodes according to the weight to obtain the weighted feature of the master node; according to the weight of the master node The image feature and the weighted feature obtain the image feature of the updated target image.
  • combining the image features of each neighbor node according to the weight to obtain the weighted feature of the master node includes: performing a weighted summation of the image features of each neighbor node according to the weight , To obtain the weighted feature of the master node.
  • obtaining the updated image feature of the target image according to the image feature of the main node and the weighted feature includes: splicing the image feature of the main node with the weighted feature; Non-linear mapping is performed on the features of, and the image features of the updated target image are obtained.
  • determining the weight between the master node and the neighbor node in the first association graph includes: linearly mapping the master node and the neighbor node; The master node and the neighbor node determine an inner product; and the weight between the master node and the neighbor node is determined according to the inner product after nonlinear processing.
  • the target image includes: a query image to be retrieved and each library image in an image library; after obtaining the updated image characteristics of the target image, the method further includes: The feature similarity between the image feature of the target image and the image feature of the respective library images is obtained from the library image as a search result.
  • a method for training a feature update network where the feature update network is used to update image features of an image; the method includes:
  • the second association graph including a training master node and at least one training neighbor node, the node value of the training master node represents the image feature of the sample image, and the node value of the training neighbor node represents the training neighbor
  • the image feature of the image, the training neighbor image is an image similar to the sample image
  • the second correlation graph is input to a feature update network, and the feature update network updates the node value of the master node according to the node value of the training neighbor node in the second correlation graph to obtain the image feature of the updated sample image ;
  • the method before obtaining the second correlation graph, further includes: obtaining the training neighbor image similar to the sample image from a training image library according to the sample image.
  • the method before acquiring the training neighbor image similar to the sample image from the training image library according to the sample image, the method further includes: extracting image features of the training image through a feature extraction network; The prediction information of the training image is obtained according to the image features of the training image; and the network parameters of the feature extraction network are adjusted based on the prediction information and label information of the training image.
  • obtaining the training neighbor image similar to the sample image from a training image library includes: obtaining image features and training of the sample image through the feature extraction network. Image features of each library image in the image library; and based on the feature similarity between the image feature of the sample image and the image feature of each library image, the training neighbor image that is similar to the sample image is determined.
  • an image feature extraction device includes:
  • the graph acquisition module is configured to acquire a first association graph, the first association graph including a master node and at least one neighbor node, the node value of the master node represents the image feature of the target image, and the node value of the neighbor node represents The image feature of the neighbor image, the neighbor image is an image similar to the target image;
  • the feature update module is configured to input the first association graph into a feature update network, and the feature update network updates the node value of the master node according to the node value of the neighbor node in the first association graph to obtain the updated The image characteristics of the target image.
  • the device further includes: a neighbor acquisition module, configured to acquire neighbors similar to the target image from the image library according to the target image before the image acquisition module acquires the first correlation map. image.
  • a neighbor acquisition module configured to acquire neighbors similar to the target image from the image library according to the target image before the image acquisition module acquires the first correlation map. image.
  • the neighbor acquisition module is configured to: separately acquire the image features of the target image and the image features of each library image in the image library through a feature extraction network; based on the image features and image features of the target image Based on the feature similarity between the image features of each of the library images in the library, a neighbor image similar to the target image is determined from the image library.
  • the neighbor acquisition module is further configured to: sort the feature similarity between the target image and each of the library images in descending order of feature similarity; preset before selection
  • the library image corresponding to the feature similarity of the number of bits is used as the neighbor image that is similar to the target image.
  • the neighbor acquisition module is further configured to: obtain from each of the library images according to the feature similarity between the image features of the target image and the image features of each of the library images.
  • a first image similar to the target image; according to the feature similarity between the image feature of the first image and the image feature of each of the library images, a first image similar to the first image is obtained from each of the library images Two images; use the first image and the second image as neighbor images of the target image.
  • the number of the feature update network is one, or N stacked in sequence, where N is an integer greater than 1; when the number of the feature update network is N: where the i-th feature update network The input of is the updated first correlation graph output by the i-1th feature update network, where i is an integer greater than 1 and less than or equal to N.
  • the feature update module is configured to: determine the weight between the master node and each of the neighbor nodes in the first association graph; and calculate the weight of each neighbor node according to the weight
  • the image features are merged to obtain the weighted feature of the master node; and the image feature of the updated target image is obtained according to the image feature of the master node and the weighted feature.
  • the feature update module is further configured to: perform a weighted summation of the image features of each neighbor node according to the weight to obtain the weighted feature of the master node.
  • the feature update module is further configured to: stitch the image features of the master node with the weighted features; perform nonlinear mapping on the stitched features to obtain the updated image features of the target image.
  • the feature update module is further configured to: perform linear mapping on the master node and the neighbor node; determine the inner product of the master node and the neighbor node after linear mapping; The inner product after processing determines the weight between the master node and the neighbor node.
  • a training device for a feature update network includes:
  • the association graph obtaining module is configured to obtain a second association graph, the second association graph including a training master node and at least one training neighbor node, the node value of the training master node represents the image feature of the sample image, and the training neighbor The node value of the node represents the image feature of the training neighbor image, and the training neighbor image is an image similar to the sample image;
  • the update processing module is configured to input the second association graph into a feature update network, and the feature update network updates the node value of the master node according to the node value of the training neighbor node in the second association graph to obtain the updated The image characteristics of the sample image;
  • the parameter adjustment module is configured to obtain prediction information of the sample image according to the image characteristics of the updated sample image; adjust the network parameters of the feature update network according to the prediction information.
  • the device further includes: an image acquisition module, configured to acquire images similar to the sample images from the training image library according to the sample images before the correlation map acquisition module acquires the second correlation map.
  • the training neighbor image is a registered trademark of the device.
  • the device further includes: a pre-training module for extracting image features of the training image through a feature extraction network; obtaining prediction information of the training image based on the image features of the training image; The prediction information and label information of the training image are used to adjust the network parameters of the feature extraction network; the training image is the image used to train the feature extraction network, and the sample image is used after the feature extraction network training is completed. To train the feature to update the image of the network.
  • the image acquisition module is configured to: separately acquire the image features of the sample image and the image features of each library image in the training image library through the feature extraction network; The feature similarity between the image feature and the image feature of each library image determines the training neighbor image similar to the sample image.
  • an electronic device in a fifth aspect, includes a memory and a processor.
  • the memory is used to store computer instructions that can be run on the processor.
  • the processor is used to implement the present disclosure when the computer instructions are executed. The method for extracting image features described in any embodiment, or the method for training a feature update network described in any embodiment of the present disclosure.
  • a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the method for extracting image features according to any one of the embodiments of the present disclosure, or any one of the The training method of the feature update network described in the embodiment.
  • a computer program is provided, which is used to make a processor execute the image feature extraction method according to any embodiment of the present disclosure, or the training of the feature update network according to any embodiment of the present disclosure method.
  • Fig. 1 is an image feature extraction method provided by at least one embodiment of the present disclosure
  • FIG. 2 is a processing flow of a feature update network provided by at least one embodiment of the present disclosure
  • Fig. 3 is a method for training a feature update network provided by at least one embodiment of the present disclosure
  • Fig. 4 is a method for training a feature update network provided by at least one embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of acquired neighbor images provided by at least one embodiment of the present disclosure.
  • Fig. 6 is a schematic diagram of an association diagram provided by at least one embodiment of the present disclosure.
  • FIG. 7 is an image retrieval method provided by at least one embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a sample image and a library image provided by at least one embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a neighbor image search provided by at least one embodiment of the present disclosure.
  • FIG. 10 is a network structure of a feature update network provided by at least one embodiment of the present disclosure.
  • FIG. 11 is an image feature extraction device provided by at least one embodiment of the present disclosure.
  • Fig. 12 is an image feature extraction device provided by at least one embodiment of the present disclosure.
  • FIG. 13 is a training device for a feature update network provided by at least one embodiment of the present disclosure.
  • Fig. 14 is a training device for a feature update network provided by at least one embodiment of the present disclosure.
  • image retrieval can include text-based image retrieval and content-based image retrieval.
  • a computer when performing image retrieval based on content, a computer may be used to extract image features, establish an image feature vector description, and store it in an image feature database.
  • the same feature extraction method can be used to extract the image features of the query image to obtain the query vector, and then calculate the similarity between the query vector and each image feature in the image feature library under the similarity measurement criteria, and finally Sort by similarity and output corresponding pictures in order.
  • FIG. 1 is an image feature extraction method provided by at least one embodiment of the present disclosure. As shown in FIG. 1, the The method can include the following processing:
  • a first association graph is obtained, the first association graph includes a master node and at least one neighbor node, the node value of the master node represents the image feature of the target image, and the node value of the neighbor node represents the neighbor The image feature of the image, and the neighbor image is an image similar to the target image.
  • the target image is an image whose image features are to be extracted.
  • the image can be an image in different application scenarios. For example, it can be an image to be retrieved in an image retrieval application.
  • the following image library can be It is the search image library in the image search application.
  • the obtaining of the neighbor image may be obtained by obtaining a neighbor image similar to the target image from the image library according to the target image before the first correlation map is obtained.
  • neighbor images can be determined according to the image feature similarity measurement criterion, for example, the image features of the target image and the image features of each library image in the image library are obtained through a feature extraction network, based on the image features of the target image And the feature similarity between the image features of each of the library images in the image library, and the neighbor images that are similar to the target image are determined from the image library.
  • the feature similarity between the target image and each of the library images can be sorted in descending order of feature similarity values, and the top N feature similarities corresponding to the The library image is used as a neighbor image that is similar to the target image.
  • the N is a preset number of digits, such as the first 10 digits.
  • a first image similar to the target image may be acquired according to the similarity between image features, and then a second image similar to the first image may be acquired, and the first image and the second image may be combined. Both are regarded as neighbor images of the target image.
  • step 102 the first association graph is input to a feature update network, and the feature update network updates the node value of the master node according to the node value of the neighbor node in the first association graph to obtain the updated target The image characteristics of the image.
  • the feature update network may be an attention-based graph convolution module (AGCN for short), or it may be another module without limitation.
  • AGCN attention-based graph convolution module
  • the graph convolution module in this step can update the node value of the master node according to the node value of the neighbor node. For example, it can determine the master node and each node in the first association graph. The weight between the neighbor nodes, the image features of the neighbor nodes are combined according to the weight to obtain the weighted feature of the master node, and the update is obtained according to the image feature of the master node and the weighted feature The image characteristics of the target image afterwards.
  • the subsequent flow shown in FIG. 2 exemplarily describes the specific process of updating the node value of the master node of the graph convolution module.
  • the number of the graph convolution module may be one, or multiple stacked in sequence.
  • the first correlation graph is input to the first graph convolution module, and the first graph convolution module updates the image features of the main node according to the image features of each neighbor node
  • the image feature of the master node has been updated, and it is the updated first correlation graph.
  • the updated first correlation graph continues to be input to the second graph convolution module, and the second graph convolution module continues to update the image features of the main node according to the image features of each neighbor node, and outputs the updated first correlation graph again ,
  • the image features of the main node have also been updated again.
  • the first association graph in this embodiment includes multiple nodes (for example, a master node, a neighbor node), and the node value of each node represents the image feature of the image represented by the node.
  • each node in the first association graph can be used as the master node, and the image feature of the image corresponding to the node is updated by the method described in FIG. 1 of this embodiment. For example, when the node is the master node, obtain The node is used as the first correlation graph of the master node, and the first correlation graph is input into the feature update network to update the image feature of the node.
  • the image feature extraction method of this embodiment uses the feature update network of the embodiment of the present disclosure to update and extract image features. Because the feature update network updates the image features of the master node according to the image features of the neighbor nodes of the master node, the updated The image characteristics of the target image can express the target image more accurately, so that it is more robust and discriminative in the image recognition process.
  • Fig. 2 illustrates the processing flow of the feature update network in an embodiment, which describes how the feature update network updates the image features of the image input to the network.
  • the processing flow of the feature update network may include the following steps 200-204.
  • step 200 the weight between the master node and each neighbor node is determined according to the image characteristics of the master node and neighbor nodes.
  • the master node may be the target image in the network application stage, and the neighbor node may be the neighbor image of the target image.
  • the weight between the master node and neighbor nodes can be determined as follows: see formula (1),
  • the image feature z u of the master node and the image feature z vi of the neighbor nodes can be linearly transformed, where vi represents one of the neighbor nodes of the master node, and k represents the number of neighbor nodes.
  • W i and W u is the coefficient of linear transformation.
  • the inner product can be determined for the image features of the main node and neighbor nodes after linear transformation.
  • the inner product can be calculated by the function F.
  • the non-linear transformation is realized through ReLU (Rectified Linear Unit), and finally the weight is obtained after softmax operation.
  • the weight a i is the weight between the master node u and the neighbor node vi.
  • the calculation of the weight between the main node and the neighbor node in this step is not limited to the above formula (1), for example, it can also be the value of the similarity of the image features between the main node and the neighbor node, As a weight between the two.
  • step 202 according to the weight, a weighted summation of the image features of the neighbor nodes is performed to obtain the weighted feature of the master node.
  • the image features of each neighbor node of the master node can be non-linearly mapped, and then the weights obtained in step 200 can be used to perform a weighted summation of the image features of each neighbor node after the non-linear mapping, and the resulting features can be called It is a weighted feature.
  • formula (2) As shown in the following formula (2):
  • n u is the weighted feature
  • z vi is the image feature of the neighbor node
  • a i is the weight calculated in step 200.
  • Q and q are the coefficients of the nonlinear mapping.
  • step 204 the updated feature of the updated target image is obtained according to the image feature of the master node and the weighted feature.
  • z u is the image feature of the main node in the correlation graph
  • n u is the weighted feature
  • the nonlinear mapping is performed through ReLU
  • W and w are the coefficients of the nonlinear mapping.
  • the node value of the master node in the first association graph is updated, and the updated image feature of the master node is obtained.
  • the processing flow of the feature update network of this embodiment uses the graph convolution module to perform a weighted summation of the image features of the neighbor nodes of the master node to determine the weighted features of the master node, so that the image features of the target image itself and its association can be comprehensively considered
  • the image features of the neighbor images so that the updated image features of the target image are more robust and discriminative, and the accuracy of image retrieval is improved.
  • Fig. 3 is a method for training a feature update network provided by at least one embodiment of the present disclosure. As shown in Fig. 3, the method describes the training process of a feature update network and may include the following processing:
  • step 300 a training neighbor image similar to the sample image is obtained from the training image library according to the sample image used to train the feature update network.
  • training is used to indicate that this is applied in the training phase of the network and is related to the neighbor images mentioned in the network application phase.
  • image library does not constitute any restriction.
  • training master node and “training neighbor node” mentioned in the following description are also only distinguished from the same concepts in the network application stage in name, and do not constitute any restrictive effect.
  • group training can be used.
  • the training samples can be divided into multiple image subsets (batch), each iteration of training inputs an image subset to the feature update network, combined with the loss value of each sample image included in the image subset, and the loss value is used to reverse the loss value. Adjust the network parameters by transmitting the network. After one iteration of training is completed, the next image subset can be input to the feature update network for the next iteration of training.
  • each image in an image subset batch can be referred to as a sample image, and each sample image can perform the processing of steps 300 to 306, and the loss value loss can be obtained according to the prediction information and the label information.
  • the training image database may be a retrieval image database, that is, an image similar to a sample image will be retrieved from the retrieval image database.
  • the similarity may include the same object as the sample image, or belong to the same category as the sample image.
  • an image similar to the sample image can be called a "training neighbor image”.
  • the method for obtaining the training neighbor image may be, for example, determining an image with a higher similarity as the training neighbor image according to the feature similarity between the images.
  • a second association graph is obtained.
  • the second association graph includes a training master node and at least one training neighbor node.
  • the node value of the training master node represents the image feature of the sample image.
  • the node value represents the image feature of the training neighbor image, and the training neighbor image is an image similar to the sample image.
  • association graph in the network training phase may be called the second association graph
  • association graph that appeared in the network application phase above may be called the first association graph
  • the second association graph may include multiple nodes.
  • the nodes in the second association graph may include: a training master node and at least one training neighbor node.
  • the training master node represents a sample image
  • each training neighbor node represents a training neighbor image determined in step 300.
  • the node value of each node is an image feature.
  • the node value of the training master node is the image feature of the sample image
  • the node value of the training neighbor node is the image feature of the training neighbor image.
  • step 304 the second association graph is input to a feature update network, and the feature update network updates the node value of the training master node according to the node value of the training neighbor node in the second association graph.
  • the feature update network may be a graph convolution module or other types of modules, which is not limited here.
  • the graph convolution module is an attention-based graph convolution (AGCN), which is used to update training according to the image features of the training neighbor nodes in the second correlation graph
  • ACN attention-based graph convolution
  • the image feature of the master node for example, the image feature of the training master node can be updated after a weighted summation of the image features of each training neighbor node.
  • the number of the graph convolution module may be one, or multiple stacked in sequence.
  • the second correlation graph is input to the first graph convolution module, and the first graph convolution module updates the training master node according to the image characteristics of each training neighbor node Image feature, in the second correlation graph output by the first graph convolution module, the image feature of the training master node has been updated and is the updated second correlation graph.
  • the updated second association graph continues to be input to the second graph convolution module, and the second graph convolution module continues to update the image features of the training master node according to the image features of each training neighbor node, and output the updated training master node again.
  • the image characteristics of the node is two, the second correlation graph is input to the first graph convolution module, and the first graph convolution module updates the training master node according to the image characteristics of each training neighbor node Image feature, in the second correlation graph output by the first graph convolution module, the image feature of the training master node has been updated and is the updated second correlation graph.
  • the updated second association graph continues to be input
  • step 306 the image feature of the sample image extracted by the network is updated according to the feature to obtain the prediction information of the sample image.
  • the prediction information of the sample image can be further determined according to the image features extracted by the image convolution module.
  • the graph convolution module can be connected to a classifier, and the classifier obtains the probability that the sample image belongs to each preset category according to the image feature.
  • step 308 the network parameters of the network are updated by adjusting the characteristics according to the prediction information.
  • the difference between the prediction information output by the network and the label information can be updated according to the feature to determine the loss value loss corresponding to the sample image.
  • the network parameters of the graph convolution module can be adjusted by backpropagation according to the loss value of each sample image in a batch. This enables the graph convolution module to extract image features more accurately according to the adjusted network parameters.
  • the flow can be adjusted FIG convolution module mentioned in the description of FIG. 2 in accordance with the value of loss loss W i, W u, Q , q, W , and other coefficients w .
  • the training method of the feature update network of this embodiment updates the image features of the sample image by combining similar images of the sample image when training the network, so that the image features of the sample image itself and the image of the associated training neighbor image can be comprehensively considered Features, so that the image features of the sample images obtained by using the trained features to update the network are more robust and discriminative, so as to improve the accuracy of image retrieval. For example, even if it is affected by changes in illumination, scale, and viewing angles, it is still A relatively accurate image feature can be obtained.
  • Figure 4 illustrates another embodiment of the feature update network training method.
  • image features can be extracted through a pre-trained network for feature extraction (which can be called feature extraction network), and similarity can be performed based on image features.
  • the metric is to obtain training neighbor images similar to the sample image from the training image library.
  • the method may include:
  • step 400 a network for feature extraction is pre-trained using the training set.
  • the pre-trained network used to extract features can be called a feature extraction network, including but not limited to: Convolutional Neural Network (CNN), BP (Back Propagation) neural network, discrete Hopfield Network, etc.
  • CNN Convolutional Neural Network
  • BP Back Propagation
  • discrete Hopfield Network discrete Hopfield Network
  • the images in the training set can be called training images.
  • the training process of the feature extraction network may include: extracting the image features of the training image through the feature extraction network; obtaining the prediction information of the training image according to the image features of the training image; and based on the prediction information and labels of the training image Information, adjust the network parameters of the feature extraction network.
  • the above-mentioned training image refers to the image used to train the feature extraction network
  • the aforementioned sample image refers to the feature update that will be applied to the feature extraction network after the training is completed.
  • the training process of the network for example, through the pre-trained feature extraction network, first extract the image features of the sample images and the image features of each library image in the training image library, and then input the feature update network to update the image features after generating the correlation map, and update the network training in the feature
  • the input image used in the process is the sample image.
  • the sample image and the training image can be the same or different.
  • step 402 the image features of each library image in the sample image and the training image library are respectively obtained through the feature extraction network.
  • step 404 according to the feature similarity between the image features of the sample image and each library image, a first image similar to the sample image is obtained from each library image.
  • the library image is the image in the search image library.
  • the feature similarity between the image feature of the sample image and the image feature of each library image can be calculated separately, and each library image can be sorted according to the similarity, for example, according to the order of similarity from high to low . Then, the library image ranked in the top K is selected from the sorting result as the first image of the sample image.
  • node 31 represents a sample image
  • the library images represented by node 32, node 33, and node 34 are all first images that are similar to the sample image.
  • step 406 a second image similar to the first image is obtained from the library image according to the feature similarity between the image features of the first image and the library image.
  • the feature similarity between the image features of the first image and the library image can be calculated, and a library image similar to the first image is obtained from the library image as the second image.
  • nodes 35 to 37 are library images similar to node 32, and nodes 35 to 37 are second images similar to node 31.
  • nodes 38 to 40 similar to node 34 are also second images similar to node 31.
  • FIG. 5 is an example situation.
  • the first image similar to the master node corresponding to the sample image can be found, and the search for neighbor images is stopped.
  • a larger number of neighbor images such as the third image or the fourth image can also be found.
  • the specific search for several layers of neighbor images can be determined according to the actual test results in different application scenarios.
  • the above-mentioned first image, second image, etc. can all be called neighbor images.
  • they can be called training neighbor images; in the network application stage, they can be called neighbor images.
  • the neighbor image can also be obtained in other ways than the example in this step.
  • a similarity threshold can be set, and all or part of the library images whose feature similarity is higher than the threshold are directly used as neighbor images of the sample image.
  • a second correlation graph is generated based on the sample image and the neighbor image.
  • the nodes in the second correlation graph include: a training master node for representing the sample image and at least one node representing the neighbor image A neighbor node is trained, the node value of the training master node is an image feature of the sample image, and the node value of the training neighbor node is an image feature of the neighbor image.
  • the neighbor image in this step includes the first image obtained in step 404 and the second image obtained in step 406.
  • the second association graph generated in this step is a graph including multiple nodes, and you can refer to the example in FIG. 6.
  • the node 31 in FIG. 6 is the training master node, and all other nodes are training neighbor nodes.
  • the node value may be an image feature of the image represented by the node, and the image feature may be extracted in step 402, for example.
  • step 410 the second association graph is input to a feature update network, and the feature update network updates the image feature of the training master node according to the image feature of the training neighbor node in the second association graph to obtain the updated sample
  • the image feature of the image, and the prediction information of the sample image is obtained according to the updated image feature.
  • step 412 according to the prediction information of the sample image, the network parameters of the feature update network and the network parameters of the feature extraction network are adjusted.
  • the network parameter adjustment in this step can adjust the network parameters of the feature extraction network or not adjust the network parameters of the feature extraction network, which can be determined according to the actual training situation.
  • the training method of the feature update network of this embodiment updates the image features of the sample image by combining similar images of the sample image when training the network, so that the image features of the sample image itself and the image features of other images associated with it can be comprehensively considered Therefore, the image features of the sample images obtained by using the trained feature update network are more robust and discriminative, so as to improve the accuracy of image retrieval; and, by using the feature extraction network to extract image features, not only can the image feature be improved
  • the extraction efficiency can further improve the network training speed, and the network parameters of the feature extraction network can be adjusted according to the loss value, so that the image features extracted by the feature extraction network are more accurate.
  • the embodiment of the present disclosure also provides an image retrieval method, which is to retrieve an image similar to the target image from an image database.
  • the method may include the following processing:
  • step 700 the target image to be retrieved is acquired.
  • the image M may be referred to as a target image. That is, images that have a certain association with the target image are retrieved from the image library. This association may include the same object or belong to the same category.
  • step 702 image features of the target image are extracted.
  • the image feature extraction method described in any embodiment of the present disclosure can be used.
  • step 704 the image features of each library image in the image library are extracted.
  • the image features of each library image in the image library can be extracted according to the image feature extraction method described in any embodiment of the present disclosure, for example, the extraction method shown in FIG. 1.
  • step 706 based on the feature similarity between the image features of the target image and the image features of the respective library images, a similar image of the target image is obtained as a retrieval result.
  • the feature similarity measurement can be performed between the image features of the target image and the image features of the respective library images, so that similar library images are used as the search result.
  • Image retrieval can be applied to a variety of scenarios, such as medical diagnosis, street view maps, intelligent video analysis, security monitoring, etc.
  • security monitoring Take the person search in security monitoring as an example as follows to describe how to apply the method of the embodiments of the present disclosure to train the network used for retrieval and how to use the network to perform image retrieval. In the following description, network training and its application will be explained separately.
  • a group training method can be used. For example, the training samples can be divided into multiple image subsets (batch), and each iteration of training updates the network to the feature to be trained. Each sample image in a batch is input one by one, and Finally, the network parameters of the network are updated by adjusting the characteristics of the loss value of each sample image included in the image subset.
  • the following uses one of the sample images as an example to describe how to obtain the loss value corresponding to the sample image.
  • the sample image 81 includes a pedestrian 82.
  • the goal of the pedestrian search in this embodiment is to search for library images that include the same pedestrian 82 from the search image library.
  • a CNN network can be called a feature extraction network.
  • the image features of each library image in the sample image 81 and the image library are respectively extracted through the feature extraction network. Then calculate the feature similarity between the sample image 81 and each library image, and sort according to the similarity, and select the top preset number (for example, according to the similarity from high to low, and the ranking result is in the top 10).
  • the image, as an image similar to the sample image 81 may be referred to as a neighbor image of the sample image 81.
  • the library image 83, the library image 84 and the library image 85 are all neighbor images.
  • the pedestrians included in these neighbor images may be indeed the same as the pedestrian 82, or they may be different but very similar to the pedestrian 82.
  • the library image is searched in the image library that is similar to each neighbor image.
  • the library image 83 as an example, according to the similarity measure of image features, the top ten library images in the similarity ranking are selected from the library images as the ten neighbor images of the library image 83.
  • the set 91 includes ten library images, and these images are the ten neighbor images of the library image 83.
  • ten neighbor images similar to the library image 84 can be searched again, that is, the set 92 in FIG. 9.
  • the ten neighbor images of the library image 83, the library image 84 and the library image 85 must be searched for the same similar images again, which will not be described in detail.
  • the above library image 83, library image 84, etc. can be referred to as the first image similar to the sample image 81, and the library images in the set 91 and the set 92 can all be referred to as the second image similar to the sample image 81.
  • This embodiment takes the first image and the second image as examples. In other application examples, it is also possible to continue to search for a third image similar to the second image.
  • an association graph can be generated.
  • the association graph is similar to that shown in Fig. 6, which includes a master node and multiple neighbor nodes.
  • the master node represents the sample image 81
  • each neighbor node represents a neighbor image
  • these neighbor nodes include the first image and the second image.
  • the node value of each node is the image feature of the image it represents.
  • the image feature is the image feature extracted and used when obtaining neighbor images for feature similarity comparison. For example, it can be the image feature extracted through the above-mentioned feature extraction network. .
  • FIG. 10 illustrates the network structure of the feature update network for extracting image features.
  • the network structure can include a feature extraction network 1001. Through the feature extraction network 1001, the image features 1002 of the sample image and each library image in the image library are respectively extracted, and processed according to the similarity comparison of the image features, and finally the correlation diagram 1003 ( The figure shows some neighbor nodes, the number of neighbor nodes in actual use can be more).
  • the correlation graph 1003 can be input to a graph convolutional network 1004.
  • the graph convolutional network 1004 includes a stack of multiple graph convolution modules 1005, and each graph convolution module 1005 can be used for the master according to the process shown in FIG. 2 The image characteristics of the node are updated.
  • the graph convolutional network 1004 can output the final updated image feature of the master node as the updated image feature of the sample image, and can continue to determine the prediction information corresponding to the sample image based on the updated image feature, according to the prediction information and The label information of the sample image calculates the loss value loss corresponding to the sample image.
  • the loss value of each sample image can be calculated according to the above processing procedure, and finally the network parameters of the feature update network can be adjusted according to the loss value of these sample images, for example, including the parameters in the graph convolution module and the parameters of the feature extraction network.
  • the network structure shown in FIG. 10 may not include the feature extraction network, but the association graph can be obtained in other ways.
  • the target image is a pedestrian image.
  • the image features of the target image can be extracted from the feature update network in the following manner:
  • the target image is also extracted to image features through the feature extraction network 1001 in FIG. 10.
  • the association graph may include a master node representing the target image and multiple neighbor nodes representing neighbor images.
  • the correlation graph is input into the graph convolution network 1004 in FIG. 10, and the image feature of the master node in the target image is updated by the graph convolution module 1005, and the finally obtained image feature of the master node is the image feature of the updated target image.
  • the image retrieval method of this embodiment combines the image features of neighboring images associated with the target image when performing image feature extraction, so that the image features learned by the updated network using the trained features are more robust and discriminative.
  • the graph convolution module can be stacked in multiple layers, which has good scalability; in group training, each sample image in a batch can be calculated in parallel using the deep learning framework and hardware , The efficiency of network training is higher.
  • Fig. 11 provides an image feature extraction device, which can be used to execute the image feature extraction method of any embodiment of the present disclosure.
  • the device may include: a graph acquisition module 1101 and a feature update module 1102.
  • the graph acquisition module 1101 is configured to acquire a first association graph, the first association graph including a master node and at least one neighbor node, the node value of the master node represents the image feature of the target image, and the node value of the neighbor node represents the neighbor The image feature of the image, and the neighbor image is an image similar to the target image.
  • the feature update module 1102 is configured to input the first association graph into a feature update network, and the feature update network updates the node value of the master node according to the node value of the neighbor node in the first association graph, and obtains the updated The image characteristics of the target image.
  • the device further includes: a neighbor obtaining module 1103, configured to obtain a neighboring image from an image library according to the target image before the image obtaining module obtains the first correlation diagram. Neighbor images that are similar to the target image.
  • the neighbor acquisition module 1103 is configured to: separately acquire the image features of the target image and the image features of each library image in the image library through a feature extraction network; based on the image features and image features of the target image Based on the feature similarity between the image features of each of the library images in the library, a neighbor image similar to the target image is determined from the image library.
  • the neighbor acquisition module 1103 is further used to: sort the feature similarity between the target image and each of the library images in descending order of feature similarity;
  • the library image corresponding to the feature similarity of the number of bits is set as the neighbor image that is similar to the target image.
  • the neighbor acquisition module 1103 is further configured to: according to the feature similarity between the target image and the image features of each of the library images, obtain from each library image similar to the target image The first image; according to the feature similarity between the image features of the first image and the image features of each of the library images, a second image similar to the first image is obtained from each of the library images; The first image and the second image are used as neighbor images of the target image.
  • the number of the feature update network is one, or N stacked in sequence, where N is an integer greater than 1; when the number of the feature update network is N: the number of the i-th feature update network
  • the input is the updated first correlation graph output by the i-1th feature update network, where i is an integer greater than 1 and less than or equal to N.
  • the feature update module 1102 is configured to: determine the weight between the master node and each of the neighbor nodes in the first association graph; and calculate the weight of each neighbor node according to the weight
  • the image features are merged to obtain the weighted feature of the master node; and the image feature of the updated target image is obtained according to the image feature of the master node and the weighted feature.
  • the feature update module 1102 is further configured to: perform a weighted summation of the image features of each neighbor node according to the weight to obtain the weighted feature of the master node.
  • the feature update module 1102 is further configured to: stitch the image features of the master node with the weighted features; perform nonlinear mapping on the stitched features to obtain the updated image features of the target image.
  • the feature update module 1102 is further configured to: perform linear mapping on the master node and neighbor nodes; determine the inner product of the master node and neighbor nodes after linear mapping; The inner product determines the weight between the master node and the neighbor node.
  • FIG. 13 provides a training device for a feature update network, which can be used to execute the training method for a feature update network according to any embodiment of the present disclosure.
  • the apparatus may include: an association graph obtaining module 1301, an update processing module 1302, and a parameter adjustment module 1303.
  • the association graph obtaining module 1301 is configured to obtain a second association graph, the second association graph including a training master node and at least one training neighbor node, the node value of the training master node represents the image feature of the sample image, and the training The node value of the neighbor node represents the image feature of the training neighbor image, and the training neighbor image is an image similar to the sample image;
  • the update processing module 1302 is configured to input the second association graph into a feature update network, and the feature update network updates the node value of the master node according to the node value of the training neighbor node in the second association graph to obtain the update The image characteristics of the sample image afterwards;
  • the parameter adjustment module 1303 is configured to obtain prediction information of the sample image according to the image feature of the updated sample image; adjust the network parameter of the feature update network according to the prediction information.
  • the device further includes: an image acquisition module 1304, configured to acquire from the training image library according to the sample image before the correlation map acquisition module acquires the second correlation map The training neighbor image similar to the sample image.
  • the device further includes: a pre-training module 1305.
  • the pre-training module 1305 is used to extract the image features of the training image through the feature extraction network; obtain the prediction information of the training image according to the image features of the training image; adjust based on the prediction information and label information of the training image Network parameters of the feature extraction network; the training image is an image used to train the feature extraction network, and the sample image is an image used to train the feature update network after the training of the feature extraction network is completed;
  • the image acquisition module 1304 is configured to: separately acquire the image features of the sample image and the image features of each library image in the training image library through the feature extraction network; and based on the image features of the sample image and each library The feature similarity between the image features of the image determines the training neighbor image similar to the sample image.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • At least one embodiment of the present disclosure provides an electronic device.
  • the device may include a memory and a processor.
  • the memory is used to store computer instructions that can be run on the processor.
  • the processor is used to implement the computer instructions when the computer instructions are executed.
  • At least one embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the method for extracting image features or the feature update network described in any of the embodiments of the present disclosure is implemented. Training method.
  • At least one embodiment of the present disclosure provides a computer program for causing a processor to execute the steps of the method for extracting image features or the steps of the method for training a feature update network according to any embodiment of the present disclosure.
  • one or more embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of the present disclosure may adopt computer programs implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program may be stored, and when the program is executed by a processor, the steps of the method for extracting image features described in any of the embodiments of the present disclosure are implemented, and /Or, implement the steps of the feature update network training method described in any embodiment of the present disclosure.
  • the "and/or” means having at least one of the two, for example, "A and/or B" includes three schemes: A, B, and "A and B".
  • Embodiments of the subject matter described in the present disclosure can be implemented as one or more computer programs, that is, one or one of the computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Multiple modules.
  • the program instructions may be encoded on artificially generated propagated signals, such as machine-generated electrical, optical or electromagnetic signals, which are generated to encode information and transmit it to a suitable receiver device for data transmission.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processing and logic flow described in the present disclosure can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output.
  • the processing and logic flow can also be executed by a dedicated logic circuit, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for executing computer programs include, for example, general-purpose and/or special-purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from a read-only memory and/or a random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • the computer will also include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer will be operatively coupled to this mass storage device to receive data from or send data to it. It transmits data, or both.
  • the computer does not have to have such equipment.
  • the computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or, for example, a universal serial bus (USB ) Flash drives are portable storage devices, just to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or Removable disks), magneto-optical disks, CD ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM, and flash memory devices
  • magnetic disks such as internal hard disks or Removable disks
  • magneto-optical disks CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by or incorporated into a dedicated logic circuit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种图像特征提取及网络的训练方法、装置和设备,其中,所述方法包括:获取第一关联图,所述第一关联图中包括主节点以及至少一个邻居节点,所述主节点的节点值表示目标图像的图像特征,所述邻居节点的节点值表示邻居图像的图像特征,所述邻居图像是与所述目标图像相似的图像(100);将所述第一关联图输入特征更新网络,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,得到更新后的目标图像的图像特征(102)。

Description

图像特征提取及网络的训练方法、装置和设备
相关申请的交叉引用
本专利申请要求于2019年8月23日提交的、申请号为201910782629.9、发明名称为“图像特征提取及网络的训练方法、装置和设备”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术,具体涉及一种图像特征提取及网络的训练方法、装置和设备。
背景技术
图像检索按照描述图像内容方式的不同,可以包括基于文本的图像检索和基于内容的图像检索(CBIR,Content Based Image Retrieval)。其中,基于内容的图像检索技术在电子商务、皮革布料、版权保护、医疗诊断、公共安全、街景地图等工业领域具有广阔的应用前景。
发明内容
有鉴于此,本公开至少提供一种图像特征提取及网络的训练方法、装置和设备。
第一方面,提供一种图像特征的提取方法,所述方法包括:
获取第一关联图,所述第一关联图中包括主节点以及至少一个邻居节点,所述主节点的节点值表示目标图像的图像特征,所述邻居节点的节点值表示邻居图像的图像特征,所述邻居图像是与所述目标图像相似的图像;
将所述第一关联图输入特征更新网络,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,以得到更新后的目标图像的图像特征。
在一些实施例中,获取第一关联图之前,所述方法还包括:根据所述目标图像,由图像库中获取与所述目标图像相似的邻居图像。
在一些实施例中,根据所述目标图像,由图像库中获取与所述目标图像相似的邻居图像,包括:通过特征提取网络分别获取所述目标图像的图像特征和图像库中的各个库图像的图像特征;基于所述目标图像的图像特征和图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库中确定与所述目标图像相似的邻居图像。
在一些实施例中,基于所述目标图像的图像特征和图像库中的各个库图像的图像特征之间的特征相似度,确定与所述目标图像相似的邻居图像,包括:将所述目标图像与各个所述库图像之间的特征相似度,按照特征相似度的数值由大到小的顺序进行排序;选取前预设位数的特征相似度对应的库图像,作为与所述目标图像相似的邻居图像。
在一些实施例中,基于所述目标图像的图像特征和图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库中确定与所述目标图像相似的邻居图像,包括:根据所述目标图像的图像特征和各个所述库图像的图像特征之间的特征相似度,由所述各个所述库图像中获得与所述目标图像相似的第一图像;根据所述第一图像的图像特征与各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述第一图像相似的第二图像;将所述第一图像和所述第二图像,作为所述目标图像的邻居图像。
在一些实施例中,所述特征更新网络的数量为一个,或者依次堆积的N个,其中N是大于1的整数;当所述特征更新网络的数量为N个时:其中第i特征更新网络的输入,是第i-1特征更新网络输出的更新后的第一关联图,其 中i是大于1且小于或等于N的整数。
在一些实施例中,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,得到更新后的目标图像的图像特征,包括:确定所述第一关联图中的所述主节点和各所述邻居节点之间的权重;根据所述权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征;根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征。
在一些实施例中,根据所述权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征,包括:根据所述权重,将各所述邻居节点的图像特征进行加权求和,得到所述主节点的加权特征。
在一些实施例中,根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征,包括:将主节点的图像特征与所述加权特征拼接;对拼接后的特征进行非线性映射,得到更新后的目标图像的图像特征。
在一些实施例中,确定所述第一关联图中的所述主节点和所述邻居节点之间的权重,包括:对所述主节点和所述邻居节点进行线性映射;对线性映射后的所述主节点和所述邻居节点确定内积;根据非线性处理后的所述内积,确定所述主节点与所述邻居节点之间的权重。
在一些实施例中,所述目标图像包括:待检索的查询图像以及图像库中各个库图像;在得到所述更新后的所述目标图像的图像特征之后,所述方法还包括:基于更新后的目标图像的图像特征和所述各个库图像的图像特征之间的特征相似度,由所述库图像中获得所述目标图像的相似图像作为检索结果。
第二方面,提供一种特征更新网络的训练方法,所述特征更新网络用于更新图像的图像特征;所述方法包括:
获取第二关联图,所述第二关联图中包括训练主节点以及至少一个训练邻居节点,所述训练主节点的节点值表示样本图像的图像特征,所述训练邻居节点的节点值表示训练邻居图像的图像特征,所述训练邻居图像为与所述样本图像相似的图像;
将所述第二关联图输入特征更新网络,所述特征更新网络根据所述第二关联图中的训练邻居节点的节点值更新所述主节点的节点值,得到更新后的样本图像的图像特征;
根据更新后的样本图像的图像特征,得到所述样本图像的预测信息;
根据所述预测信息调整所述特征更新网络的网络参数。
在一些实施例中,获取第二关联图之前,所述方法还包括:根据所述样本图像,由训练图像库中获取与所述样本图像相似的所述训练邻居图像。
在一些实施例中,根据所述样本图像,由训练图像库中获取与所述样本图像相似的所述训练邻居图像之前,所述方法还包括:通过特征提取网络,提取训练图像的图像特征;根据所述训练图像的图像特征,获得所述训练图像的预测信息;基于所述训练图像的预测信息和标签信息,调整所述特征提取网络的网络参数。在一些实施例中,根据所述样本图像,由训练图像库中获取与所述样本图像相似的所述训练邻居图像,包括:通过所述特征提取网络分别获取所述样本图像的图像特征和训练图像库中的各个库图像的图像特征;并基于所述样本图像的图像特征和各个库图像的图像特征之间的特征相似度,确定与所述样本图像相似的所述训练邻居图像。
第三方面,提供一种图像特征的提取装置,所述装置包括:
图获取模块,用于获取第一关联图,所述第一关联图中包括主节点以及至少一个邻居节点,所述主节点的节点值表示目标图像的图像特征,所述邻居节点的节点值表示邻居图像的图像特征,所述邻居图像是与目标图像相似的图像;
特征更新模块,用于将所述第一关联图输入特征更新网络,所述特征更新网络根据所述第一关联图中的邻居节 点的节点值更新所述主节点的节点值,以得到更新后的目标图像的图像特征。
在一些实施例中,所述装置还包括:邻居获取模块,用于在所述图获取模块获取第一关联图之前,根据所述目标图像,由图像库中获取与所述目标图像相似的邻居图像。
在一些实施例中,所述邻居获取模块,用于:通过特征提取网络分别获取所述目标图像的图像特征和图像库中的各个库图像的图像特征;基于所述目标图像的图像特征和图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库中确定与所述目标图像相似的邻居图像。
在一些实施例中,所述邻居获取模块还用于:将目标图像与各个所述库图像之间的特征相似度,按照特征相似度的数值由大到小的顺序进行排序;选取前预设位数的特征相似度对应的库图像,作为所述目标图像相似的邻居图像。
在一些实施例中,所述邻居获取模块还用于:根据所述目标图像的图像特征和各个所述库图像的图像特征之间的特征相似度,由所述各个所述库图像中获得与所述目标图像相似的第一图像;根据第一图像的图像特征与各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述第一图像相似的第二图像;将所述第一图像和所述第二图像,作为所述目标图像的邻居图像。
在一些实施例中,所述特征更新网络的数量为一个,或者依次堆积的N个,其中N是大于1的整数;当所述特征更新网络的数量为N个时:其中第i特征更新网络的输入,是第i-1特征更新网络输出的更新后的第一关联图,其中i是大于1且小于或等于N的整数。
在一些实施例中,所述特征更新模块,用于:确定所述第一关联图中的所述主节点和各所述邻居节点之间的权重;根据所述权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征;根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征。
在一些实施例中,所述特征更新模块还用于:根据所述权重,将各所述邻居节点的图像特征进行加权求和,得到所述主节点的加权特征。
在一些实施例中,所述特征更新模块还用于:将所述主节点的图像特征与所述加权特征拼接;对拼接后的特征进行非线性映射,得到更新后的目标图像的图像特征。
在一些实施例中,所述特征更新模块还用于:对所述主节点和所述邻居节点进行线性映射;对线性映射后的所述主节点和所述邻居节点确定内积;根据非线性处理后的所述内积,确定所述主节点与所述邻居节点之间的权重。
第四方面,提供一种特征更新网络的训练装置,所述装置包括:
关联图获得模块,用于获取第二关联图,所述第二关联图中包括训练主节点以及至少一个训练邻居节点,所述训练主节点的节点值表示样本图像的图像特征,所述训练邻居节点的节点值表示训练邻居图像的图像特征,所述训练邻居图像为与所述样本图像相似的图像;
更新处理模块,用于将所述第二关联图输入特征更新网络,所述特征更新网络根据所述第二关联图中的训练邻居节点的节点值更新所述主节点的节点值,得到更新后的样本图像的图像特征;
参数调整模块,用于根据更新后的样本图像的图像特征,得到所述样本图像的预测信息;根据所述预测信息调整所述特征更新网络的网络参数。
在一些实施例中,所述装置还包括:图像获取模块,用于在所述关联图获得模块获取第二关联图之前,根据所述样本图像,由训练图像库中获取与所述样本图像相似的所述训练邻居图像。
在一些实施例中,所述装置还包括:预训练模块,用于通过特征提取网络,提取训练图像的图像特征;根据所 述训练图像的图像特征,获得所述训练图像的预测信息;基于所述训练图像的预测信息和标签信息,调整所述特征提取网络的网络参数;所述训练图像是用于训练所述特征提取网络所使用的图像,所述样本图像是特征提取网络训练完成之后用于训练所述特征更新网络的图像。在一些实施例中,所述图像获取模块,用于:通过所述特征提取网络分别获取所述样本图像的图像特征和训练图像库中的各个库图像的图像特征;并基于所述样本图像的图像特征和各个库图像的图像特征之间的特征相似度,确定与所述样本图像相似的所述训练邻居图像。
第五方面,提供一种电子设备,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本公开任一实施例所述的图像特征的提取方法,或者实现本公开任一实施例所述的特征更新网络的训练方法。
第六方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例所述的图像特征的提取方法,或者实现本公开任一实施例所述的特征更新网络的训练方法。
第七方面,提供一种计算机程序,所述计算机程序用于使处理器执行本公开任一实施例所述的图像特征的提取方法,或者本公开任一实施例所述的特征更新网络的训练方法。
附图说明
为了更清楚地说明本公开一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开至少一个实施例提供的一种图像特征的提取方法;
图2为本公开至少一个实施例提供的一种特征更新网络的处理流程;
图3为本公开至少一个实施例提供的一种特征更新网络的训练方法;
图4为本公开至少一个实施例提供的一种特征更新网络的训练方法;
图5为本公开至少一个实施例提供的获取的邻居图像示意图;
图6为本公开至少一个实施例提供的关联图示意图;
图7为本公开至少一个实施例提供的一种图像检索方法;
图8为本公开至少一个实施例提供的一种样本图像和库图像的示意图;
图9为本公开至少一个实施例提供的一种邻居图像搜索示意图;
图10为本公开至少一个实施例提供的一种特征更新网络的网络结构;
图11为本公开至少一个实施例提供的一种图像特征的提取装置;
图12为本公开至少一个实施例提供的一种图像特征的提取装置;
图13为本公开至少一个实施例提供的一种特征更新网络的训练装置;
图14为本公开至少一个实施例提供的一种特征更新网络的训练装置。
具体实施方式
为了使本技术领域的人员更好地理解本公开一个或多个实施例中的技术方案,下面将结合本公开一个或多个实 施例中的附图,对本公开一个或多个实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开一个或多个实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。
图像检索按照描述图像内容方式的不同,可以包括基于文本的图像检索和基于内容的图像检索。在一实施例中,在基于内容进行图像检索时,可以利用计算机提取图像特征,建立图像特征矢量描述并存入图像特征库。当用户输入一张查询图像时,可以用相同的特征提取方法提取查询图像的图像特征得到查询向量,然后在相似性度量准则下计算查询向量到图像特征库中各个图像特征的相似性大小,最后按相似性大小进行排序并顺序输出对应的图片。在该实施例中,可以发现在检索目标对象时,容易受拍摄环境的影响,比如,光照变化、尺度变化、视角变化、遮挡以及背景的杂乱均可能影响检索结果。
有鉴于此,为提高图像检索的准确性,本公开实施例提供了一种图像特征的提取方法,图1是本公开至少一个实施例提供的图像特征的提取方法,如图1所示,该方法可以包括如下处理:
在步骤100中,获取第一关联图,所述第一关联图中包括主节点以及至少一个邻居节点,所述主节点的节点值表示目标图像的图像特征,所述邻居节点的节点值表示邻居图像的图像特征,所述邻居图像是与目标图像相似的图像。
本步骤中,所述目标图像,是待提取图像特征的图像,该图像可以是不同的应用场景中的图像,示例性的,可以是图像检索应用中的待检索图像,下述的图像库可以是图像检索应用中的检索图像库。
例如,邻居图像的获得,可以是在获取第一关联图之前,根据目标图像,由图像库中获取与所述目标图像相似的邻居图像。示例性的,可以根据图像特征相似度量准则确定邻居图像,比如,通过特征提取网络分别获取所述目标图像的图像特征和图像库中的各个库图像的图像特征,基于所述目标图像的图像特征和图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库中确定与所述目标图像相似的邻居图像。
在一个实施例中,可以将所述目标图像与各个所述库图像之间的特征相似度,按照特征相似度的数值由大到小的顺序进行排序,选取前N位的特征相似度对应的库图像,作为所述目标图像相似的邻居图像。该N是预设位数,比如前10位。
在另一个实施例中,还可以先根据图像特征之间的相似度获取与目标图像相似的第一图像,再获取与第一图像相似的第二图像,并将该第一图像和第二图像都作为目标图像的邻居图像。
在步骤102中,将所述第一关联图输入特征更新网络,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,得到更新后的目标图像的图像特征。
例如,特征更新网络可以是基于注意力的图卷积模块(Attention-based Graph convolution,简称:AGCN),也可以是其他模块,不做限制。
以特征更新网络是图卷积模块为例,本步骤中的图卷积模块可以根据邻居节点的节点值更新主节点的节点值,比如,可以确定第一关联图中的所述主节点和各所述邻居节点之间的权重,根据该权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征,根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征。后续的图2所示的流程示例性的描述了图卷积模块的更新主节点的节点值的具体过程。
实际实施中,所述图卷积模块的数量可以是一个,或者,依次堆积的多个。示例性的,当图卷积模块的数量是两个时,第一关联图输入第一个图卷积模块,该第一个图卷积模块根据各个邻居节点的图像特征更新主节点的图像特征,该第一个图卷积模块输出的第一关联图中,主节点的图像特征已经更新,是更新后的第一关联图。该更新的第一关联图继续输入第二个图卷积模块,由该第二个图卷积模块继续根据各个邻居节点的图像特征更新主节点的图像特征,输出再次更新后的第一关联图,其中的主节点的图像特征也已经再次更新。
本实施例中的第一关联图中包括多个节点(如,主节点、邻居节点),其中每个节点的节点值表示该节点所代表的图像的图像特征。并且,第一关联图中的每个节点都可以作为主节点,通过本实施例的图1所述的方法来更新该节点对应的图像的图像特征,比如,当该节点作为主节点时,获取以该节点作为主节点的第一关联图,并将该第一关联图输入特征更新网络进行该节点的图像特征的更新。
本实施例的图像特征的提取方法,通过使用本公开实施例的特征更新网络更新提取图像特征,由于该特征更新网络根据主节点的邻居节点的图像特征更新主节点的图像特征,使得更新后的目标图像的图像特征能更准确地表达目标图像,从而在图像识别过程中更具鲁棒性和判别能力。
图2示例了一实施例中的特征更新网络的处理流程,该流程描述特征更新网络如何更新输入到该网络的图像的图像特征。如图2所示,以该特征更新网络是图卷积模块为例,该特征更新网络的处理流程可以包括如下步骤200-204。
在步骤200中,根据主节点和邻居节点的图像特征,确定主节点和各邻居节点之间权重。
本步骤中,主节点可以是网络应用阶段的目标图像,邻居节点可以是该目标图像的邻居图像。
例如,可以按照如下方式确定主节点与邻居节点之间的权重:参见公式(1)所示,
Figure PCTCN2019120028-appb-000001
首先,可以对主节点的图像特征z u和邻居节点的图像特征z vi进行线性变换,其中,vi表示主节点的其中一个邻居节点,k表示邻居节点的数量。W i和W u是线性变换的系数。
接着,可以对线性变换后的主节点和邻居节点的图像特征确定内积。其中,可以通过函数F进行内积计算。然后,通过ReLU(Rectified Linear Unit)实现非线性变换,最后进行softmax操作后得到权重。如公式(1)所示,权重a i是主节点u与邻居节点vi之间的权重。
此外,本步骤中的主节点和邻居节点之间权重的计算,不局限于上述的公式(1),例如,还可以是将主节点和邻居节点之间的图像特征的相似度的取值,作为两者之间的权重。
在步骤202中,根据所述权重,对所述邻居节点的图像特征加权求和,得到所述主节点的加权特征。
例如,可以对主节点的每个邻居节点的图像特征进行非线性映射,然后利用步骤200中得到的权重,对非线性映射后的各个邻居节点的图像特征进行加权求和,得到的特征可以称为加权特征。如下公式(2)所示:
n u=∑ ka iReLU(Qz vi+q).................(2)
在公式(2)中,n u即为加权特征,z vi是邻居节点的图像特征,a i是步骤200计算得到的所述权重。Q和q是非线性映射的系数。
在步骤204中,根据所述主节点的图像特征和所述加权特征,得到更新后的目标图像的更新特征。
本步骤中,可以将最初得到的关联图中的主节点的图像特征与加权特征拼接(contact)在一起,然后进行非线性映射,如下的公式(3)所示:
Figure PCTCN2019120028-appb-000002
其中,z u是关联图中的主节点的图像特征,n u即为加权特征,通过ReLU进行非线性映射,W和w是非线性映射的系数。
最后再对公式(3)得到的特征进行规范化(normalization),如下公式(4)所示,得到最终的更新后的主节点的图像特征
Figure PCTCN2019120028-appb-000003
Figure PCTCN2019120028-appb-000004
通过上述的步骤200至204,第一关联图中的主节点的节点值得到更新,获得了更新后的主节点的图像特征。
本实施例的特征更新网络的处理流程,通过图卷积模块对主节点的邻居节点的图像特征进行加权求和,确定主节点的加权特征,使得能够综合考虑目标图像本身的图像特征及其关联的邻居图像的图像特征,从而更新后的目标图像的图像特征更具鲁棒性和判别能力,提高图像检索的准确性。
图3是本公开至少一个实施例提供的特征更新网络的训练方法,如图3所示,该方法描述特征更新网络的训练过程,可以包括如下处理:
在步骤300中,根据用于训练所述特征更新网络的样本图像,由训练图像库中获取与所述样本图像相似的训练邻居图像。
需要说明的是,本实施例中的“训练图像库”、“训练邻居图像”,其中的“训练”是用于表示这是应用在网络的训练阶段,并与网络应用阶段提到的邻居图像和图像库在名称上进行区分,并不构成任何限制作用。同理,下文描述中提到的“训练主节点”、“训练邻居节点”也同样仅仅是名称上与网络应用阶段出现的相同概念区分,并不构成任何限制作用。
在训练特征更新网络时,可以采用分组训练方式。例如,可以将训练样本分成多个图像子集(batch),每次迭代训练向特征更新网络输入一个图像子集,结合该图像子集包括的各个样本图像的损失值,用损失值反向回传网络的方式来调整网络参数。一次迭代训练完成后,可以向特征更新网络输入下一个图像子集,以进行下一次迭代训练。
本步骤中,一个图像子集batch中的每一个图像可以称为一个样本图像,且每个样本图像都可以执行步骤300至306的处理,且可以根据预测信息和标签信息获得损失值loss。
示例性的,在图像检索的应用场景中,所述的训练图像库可以是检索图像库,即将由该检索图像库中检索获得与样本图像相似的图像。所述的相似可以是与样本图像包括同一物体,或者与样本图像属于同一类别。
本步骤中,与样本图像相似的图像可以称为“训练邻居图像”。
该训练邻居图像的获得方式,例如,可以是根据图像之间的特征相似度,将相似度较高的图像,确定为所述训练邻居图像。
在步骤302中,获取第二关联图,所述第二关联图中包括训练主节点以及至少一个训练邻居节点,所述训练主节点的节点值表示样本图像的图像特征,所述训练邻居节点的节点值表示训练邻居图像的图像特征,所述训练邻居图像为与所述样本图像相似的图像。
例如,网络训练阶段的关联图可以称为第二关联图,而前文在网络应用阶段出现的关联图可以称为第一关联图。
本步骤中,第二关联图上可以包括多个节点。
其中,所述第二关联图中的节点可以包括:一个训练主节点、以及至少一个训练邻居节点。该训练主节点代表样本图像,每一个训练邻居节点代表步骤300中确定的一个训练邻居图像。每个节点的节点值是图像特征,例如,训练主节点的节点值是样本图像的图像特征,训练邻居节点的节点值是训练邻居图像的图像特征。
在步骤304中,将第二关联图输入特征更新网络,所述特征更新网络根据所述第二关联图中的训练邻居节点的节点值更新训练主节点的节点值。
例如,该特征更新网络可以是图卷积模块,也可以是其他类型的模块,在此不做限定。本步骤中,所述图卷积模块是基于注意力的图卷积模块(Attention-based Graph convolution,简称:AGCN),该模块用于根据第二关联图中的训练邻居节点的图像特征更新训练主节点的图像特征,例如,可以根据各个训练邻居节点的图像特征加权求和 后更新训练主节点的图像特征。
实际实施中,所述图卷积模块的数量可以是一个,或者,依次堆积的多个。示例性的,当图卷积模块的数量是两个时,第二关联图输入第一个图卷积模块,该第一个图卷积模块根据各个训练邻居节点的图像特征更新训练主节点的图像特征,该第一个图卷积模块输出的第二关联图中,训练主节点的图像特征已经更新,是更新后的第二关联图。该更新的第二关联图继续输入第二个图卷积模块,由该第二个图卷积模块继续根据各个训练邻居节点的图像特征更新训练主节点的图像特征,输出再次更新后的训练主节点的图像特征。
在步骤306中,根据特征更新网络提取的样本图像的图像特征,得到所述样本图像的预测信息。
本步骤中,可以根据图卷积模块提取的图像特征,进一步确定样本图像的预测信息。例如,图卷积模块之后可以连接分类器,由分类器根据该图像特征得到样本图像分别属于各个预设类别的概率。
在步骤308中,根据所述预测信息,调整特征更新网络的网络参数。
本步骤中,可以根据特征更新网络输出的预测信息与标签信息的差异,确定样本图像对应的损失值loss。如前所述,以图卷积模块为例,在多个batch分组训练的方式中,可以综合根据一个batch中的各个样本图像的损失值,反向传播调整图卷积模块的网络参数,以使得图卷积模块根据调整后的网络参数更准确的提取图像特征。
例如,在根据损失值loss反向传播调整图卷积模块的网络参数时,可以调整图2流程描述中提到的图卷积模块的W i、W u、Q、q、W和w等系数。
本实施例的特征更新网络的训练方法,通过在训练网络时,结合样本图像的相似图像来更新样本图像的图像特征,使得能够综合考虑样本图像本身的图像特征及其关联的训练邻居图像的图像特征,从而利用训练后的特征更新网络得到的样本图像的图像特征更具鲁棒性和判别能力,以提高图像检索的准确性,例如,即使受到光照变化、尺度变化、视角变化等影响,仍然能够得到相对准确的图像特征。
图4示例了另一实施例的特征更新网络的训练方法,该方法中,可以通过预先训练的用于提取特征的网络(可以称为特征提取网络)提取图像特征,并根据图像特征进行相似性度量,由训练图像库中获取与样本图像相似的训练邻居图像。如图4所示,该方法可以包括:
在步骤400中,使用训练集预训练一个用于提取特征的网络。
例如,该预训练的用于提取特征的网络,可以称为特征提取网络,包括但不局限于:卷积神经网络CNN(Convolutional Neural Network)、BP(Back Propagation,逆向传播)神经网络、离散Hopfield网络等。
训练集中的图像可以称为训练图像。该特征提取网络的训练过程可以包括:通过特征提取网络,提取训练图像的图像特征;根据所述训练图像的图像特征,获得所述训练图像的预测信息;基于所述训练图像的预测信息和标签信息,调整所述特征提取网络的网络参数。
其中,需要说明的是,上述的训练图像是指用于训练所述特征提取网络所使用的图像,而之前提到的所述样本图像是指,该特征提取网络训练完成之后将应用于特征更新网络的训练过程,比如,通过预训练的特征提取网络先提取样本图像和训练图像库中各个库图像的图像特征,再生成关联图之后输入特征更新网络进行图像特征更新,在该特征更新网络训练过程中使用的输入图像即样本图像。样本图像和训练图像可以相同也可以不同。
在步骤402中,通过特征提取网络,分别获取所述样本图像和训练图像库中各个库图像的图像特征。
在步骤404中,根据所述样本图像和各个库图像的图像特征之间的特征相似度,由各个库图像中获得与所述样本图像相似的第一图像。
本步骤中,所述的库图像是检索图像库中的图像。
示例性的,可以分别计算样本图像的图像特征与各个库图像的图像特征之间的特征相似度,并根据该相似度对各个库图像进行排序,如,按照相似度由高到低的顺序排序。再由排序结果中选择排位在前K位的库图像,作为样本图像的第一图像。例如,请参见图5所示,节点31代表样本图像,节点32、节点33和节点34代表的库图像都是与该样本图像相似的第一图像。
在步骤406中,根据第一图像和库图像的图像特征之间的特征相似度,由所述库图像中获得与所述第一图像相似的第二图像。
本步骤中,可以接着计算第一图像和库图像的图像特征之间的特征相似度,由库图像中获得与第一图像相似的库图像,作为第二图像。例如,请参见图5所示,通过图像特征的相似度度量,节点35至节点37是与节点32相似的库图像,该节点35至节点37是与节点31相似的第二图像。同样的,与节点34相似的节点38至节点40也是与节点31相似的第二图像。
此外,图5是示例的情况。实际实施中,可以找到与样本图像对应的主节点相似的第一图像,就停止继续寻找邻居图像。或者,还可以找到第三图像、或第四图像等更多数量的邻居图像。具体查找几层邻居图像可以根据不同应用场景中实际测试的效果确定。上述的第一图像、第二图像等都可以称为邻居图像,在网络训练阶段,可以称为训练邻居图像;在网络应用阶段,可以称为邻居图像。
还需要说明的是,邻居图像的获得也可以采用本步骤示例之外的其他方式。例如,可以设定相似度阈值,将特征相似度高于该阈值的库图像的全部或部分直接作为样本图像的邻居图像。又例如,还可以不采用特征提取网络提取图像特征,而是通过根据图像多个维度的取值确定图像特征。
在步骤408中,根据样本图像和邻居图像,生成第二关联图,所述第二关联图中的节点包括:用于代表所述样本图像的训练主节点、以及用于代表邻居图像的至少一个训练邻居节点,所述训练主节点的节点值是所述样本图像的图像特征,所述训练邻居节点的节点值是所述邻居图像的图像特征。在一个实施例中,本步骤中的邻居图像包括步骤404中得到的第一图像和步骤406中得到的第二图像。
本步骤中生成的第二关联图,是包括多个节点的图,可以参见图6的示例。图6中的节点31是训练主节点,其他所有的节点都是训练邻居节点。节点值可以是该节点代表的图像的图像特征,该图像特征例如可以是由步骤402中提取得到。
在步骤410中,将所述第二关联图输入特征更新网络,所述特征更新网络根据所述第二关联图中的训练邻居节点的图像特征更新训练主节点的图像特征,得到更新后的样本图像的图像特征,并根据该更新后的图像特征得到样本图像的预测信息。
在步骤412中,根据样本图像的预测信息,调整所述特征更新网络的网络参数以及特征提取网络的网络参数。
本步骤的网络参数调整,可以调整特征提取网络的网络参数,也可以不调整特征提取网络的网络参数,可以根据实际训练情况确定。
本实施例的特征更新网络的训练方法,通过在训练网络时,结合样本图像的相似图像来更新样本图像的图像特征,使得能够综合考虑样本图像本身的图像特征及其关联的其他图像的图像特征,从而利用训练后的特征更新网络得到的样本图像的图像特征更具鲁棒性和判别能力,以提高图像检索的准确性;并且,通过采用特征提取网络提取图像特征,不仅可以提高图像特征的提取效率,进而提高网络训练速度,还可以根据损失值调整特征提取网络的网络参数,使得特征提取网络提取的图像特征更准确。
本公开实施例还提供了一种图像检索方法,该方法要由图像库中检索与目标图像相似的图像。如图7所示,该方法可以包括如下处理:
在步骤700中,获取待检索的目标图像。
例如,假设要由图像库中检索与图像M中包括的物体相同的图像,那么可以将图像M称为目标图像。即要由图像库中检索与目标图像有某种关联的图像,这种关联可以是包括相同的物体,或者属于相同的类别。
在步骤702中,提取得到所述目标图像的图像特征。
本步骤中,可以根据本公开任一实施例所述的图像特征的提取方法。
在步骤704中,提取得到所述图像库中各个库图像的图像特征。
本步骤中,可以根据本公开任一实施例所述的图像特征的提取方法,例如,图1所示的提取方法,提取图像库中各个库图像的图像特征。
在步骤706中,基于所述目标图像的图像特征和所述各个库图像的图像特征之间的特征相似度,获得所述目标图像的相似图像作为检索结果。
本步骤中,可以将目标图像的图像特征和所述各个库图像的图像特征之间进行特征相似度量,从而将相似的库图像作为检索结果。
本实施例的图像检索方法,由于该提取到的样本图像特征更具鲁棒性和判别能力,从而提高了检索结果的准确性。
图像检索可以应用于多种场景,例如,医疗诊断、街景地图、智能视频分析、安防监控等。如下以安防监控中的行人检索(person search)为例,描述如何应用本公开实施例的方法训练检索使用的网络、以及如何利用该网络进行图像检索。如下的描述中,将分别说明网络训练及其应用。
网络训练
该网络在训练时,可以采用分组训练方式,例如,可以将训练样本分成多个图像子集(batch),每次迭代训练向待训练的特征更新网络逐个输入一个batch中的各个样本图像,并最终结合图像子集包括的各个样本图像的损失值调整特征更新网络的网络参数。
下面以其中一个样本图像为例,描述如何得到该样本图像对应的损失值。
请参见图8所示,样本图像81中包括一个行人82,本实施例的行人检索的目标是由检索图像库中搜索包括相同行人82的库图像。
假设已经预训练完成一个用于提取图像特征的网络,例如,CNN网络,可以称为特征提取网络。通过该特征提取网络分别提取样本图像81和图像库中的各个库图像的图像特征。然后计算样本图像81与各个库图像的特征相似度,并根据相似度排序,选择排位在前预设数量(例如,按照相似度由高到低排序,且排序结果在前10位)的库图像,作为与样本图像81相似的图像,可以称为样本图像81的邻居图像。请参见图8,库图像83、库图像84直至库图像85都是邻居图像。这些邻居图像中包括的行人,可以的确与行人82相同,也可以与行人82不同但非常相似。
接着,以包括库图像83、库图像84直至库图像85的十个邻居图像为基础,再去图像库中检索分别与每一个邻居图像相似的库图像。示例性的,以库图像83为例,根据图像特征的相似度度量,由库图像中选择相似度排序前十位的库图像作为该库图像83的十个邻居图像。请参见图9所示,集合91中包括十个库图像,这些图像是库图像83的十个邻居图像。同样的方式,可以再检索与库图像84相似的十个邻居图像,即图9中的集合92。库图像83、库图像84直至库图像85的十个邻居图像都要进行同样的相似图像再搜索,不再详述。如上的库图像83、库图像84等,可以称为与样本图像81相似的第一图像,而集合91、集合92中的库图像都可以称为与样本图像81相似的第二图像。本实施例以第一图像和第二图像为例,在其他的应用例子中,还可以继续检索与第二图像相似的第三图像。
然后,根据样本图像以及检索得到的邻居图像,可以生成关联图。该关联图类似于图6所示,图中包括一个主节点和多个邻居节点。其中,该主节点代表样本图像81,每一个邻居节点代表一个邻居图像,这些邻居节点中包括第一图像,也包括第二图像。每个节点的节点值是其代表的图像的图像特征,该图像特征即在获取邻居图像进行特征相似度比较时提取使用的图像特征,例如,可以是通过上述的特征提取网络提取到的图像特征。
请参见图10,图10示例了用于提取图像特征的特征更新网络的网络结构。该网络结构中可以包括特征提取网络1001,通过特征提取网络1001分别提取了样本图像和图像库中各个库图像的图像特征1002,并根据图像特征的相似比较等处理,最终得到了关联图1003(图中示意了部分邻居节点,实际使用中的邻居节点数量可以更多)。该关联图1003可以输入图卷积网络1004,该图卷积网络1004包括堆积(stack)的多个图卷积模块1005,每一个图卷积模块1005都可以按照图2所示的流程对主节点的图像特征进行更新。
图卷积网络1004可以输出主节点的最终更新的图像特征,作为该样本图像的更新后的图像特征,并且,可以继续根据该更新后的图像特征确定样本图像对应的预测信息,根据预测信息和所述样本图像的标签信息计算样本图像对应的损失值loss。
每一个样本图像都可以按照上述的处理流程计算得到损失值,最后可以根据这些样本图像的损失值调整特征更新网络的网络参数,例如包括图卷积模块中的参数以及特征提取网络的参数。在其他的实施例中,图10所示的网络结构中也可以不包括特征提取网络,而是采用其他方式获取到关联图。
利用训练完成的特征更新网络进行行人检索
1):以图10的网络结构为例,例如,可以通过图10中的特征提取网络1001提取图像库中的各个库图像的图像特征,并保存这些提取的图像特征。
2):当接收到一个待检索的目标图像时,例如,该目标图像是一个行人图像。可以按照下述方式由特征更新网络提取目标图像的图像特征:
首先,将该目标图像也通过图10中的特征提取网络1001提取到图像特征。
接着,基于目标图像的图像特征和所述各个库图像的图像特征之间的特征相似度,获得目标图像的邻居图像。根据目标图像及其邻居图像可以得到关联图,该关联图中可以包括代表目标图像的主节点、以及代表邻居图像的多个邻居节点。关联图输入图10中的图卷积网络1004,经过图卷积模块1005对目标图像中主节点的图像特征更新,最终得到的主节点的图像特征即为更新后的目标图像的图像特征。
3):对于每一个库图像,也可以按照与2)相同的处理方式,获得最终由图卷积网络1004输出的更新后的各个库图像的图像特征。
4):计算更新后的目标图像的图像特征与更新后的各个库图像的图像特征之间的特征相似度,并根据相似度排序,得到最终的检索结果。例如,可以将相似度较高的几个库图像作为检索结果。
本实施例的图像检索方法,通过在进行图像特征提取时,结合考虑了与目标图像关联的邻居图像的图像特征,使得利用训练后的特征更新网络学习到的图像特征更加具有鲁棒性和判别能力,从而提高图像检索准确率;并且,图卷积模块可以堆积多层,具有很好的可扩展能力;在分组训练时,一个batch中的各个样本图像可以利用深度学习框架和硬件进行并行计算,网络训练的效率较高。
图11提供了一种图像特征的提取装置,该装置可以用于执行本公开任一实施例的图像特征提取方法。如图11所示,该装置可以包括:图获取模块1101和特征更新模块1102。
图获取模块1101,用于获取第一关联图,所述第一关联图中包括主节点以及至少一个邻居节点,所述主节点的节点值表示目标图像的图像特征,邻居节点的节点值表示邻居图像的图像特征,所述邻居图像是与目标图像相似的 图像。
特征更新模块1102,用于将所述第一关联图输入特征更新网络,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,得到更新后的目标图像的图像特征。
在一个例子中,如图12所示,所述装置还包括:邻居获取模块1103,用于在所述图获取模块获取第一关联图之前,根据所述目标图像,由图像库中获取与所述目标图像相似的邻居图像。
在一个例子中,所述邻居获取模块1103,用于:通过特征提取网络分别获取所述目标图像的图像特征和图像库中的各个库图像的图像特征;基于所述目标图像的图像特征和图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库中确定与所述目标图像相似的邻居图像。
在一个例子中,邻居获取模块1103,还用于:将所述目标图像与各个所述库图像之间的特征相似度,按照特征相似度的数值由大到小的顺序进行排序;选取前预设位数的特征相似度对应的库图像,作为所述目标图像相似的邻居图像。
在一个例子中,所述邻居获取模块1103还用于:根据所述目标图像和各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述目标图像相似的第一图像;根据所述第一图像的图像特征与各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述第一图像相似的第二图像;将所述第一图像和所述第二图像,作为所述目标图像的邻居图像.
在一个例子中,所述特征更新网络的数量为一个,或者依次堆积的N个,其中N是大于1的整数;当所述特征更新网络的数量为N个时:其中第i特征更新网络的输入,是第i-1特征更新网络输出的更新后的第一关联图,其中i是大于1且小于或等于N的整数。
在一个例子中,所述特征更新模块1102,用于:确定所述第一关联图中的所述主节点和各所述邻居节点之间的权重;根据所述权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征;根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征。
在一个例子中,所述特征更新模块1102,还用于:根据所述权重,将各所述邻居节点的图像特征进行加权求和,得到所述主节点的加权特征。
在一个例子中,所述特征更新模块1102还用于:将所述主节点的图像特征与所述加权特征拼接;对拼接后的特征进行非线性映射,得到更新后的目标图像的图像特征。
在一个例子中,所述特征更新模块1102还用于:对所述主节点和邻居节点进行线性映射;对线性映射后的所述主节点和邻居节点确定内积;根据非线性处理后的所述内积,确定所述主节点与所述邻居节点之间的权重。
图13提供了一种特征更新网络的训练装置,该装置可以用于执行本公开任一实施例的特征更新网络的训练方法。如图13所示,该装置可以包括:关联图获得模块1301、更新处理模块1302和参数调整模块1303。
关联图获得模块1301,用于获取第二关联图,所述第二关联图中包括训练主节点以及至少一个训练邻居节点,所述训练主节点的节点值表示样本图像的图像特征,所述训练邻居节点的节点值表示训练邻居图像的图像特征,所述训练邻居图像为与所述样本图像相似的图像;
更新处理模块1302,用于将所述第二关联图输入特征更新网络,所述特征更新网络根据所述第二关联图中的训练邻居节点的节点值更新所述主节点的节点值,得到更新后的样本图像的图像特征;
参数调整模块1303,用于根据更新后的样本图像的图像特征,得到所述样本图像的预测信息;根据所述预测信息调整所述特征更新网络的网络参数。
在一个例子中,如图14所示,所述装置还包括:图像获取模块1304,用于在所述关联图获得模块获取第二关联图之前,根据所述样本图像,由训练图像库中获取与所述样本图像相似的所述训练邻居图像。
在一个例子中,如图14所示,所述装置还包括:预训练模块1305。
预训练模块1305,用于通过特征提取网络,提取训练图像的图像特征;根据所述训练图像的图像特征,获得所述训练图像的预测信息;基于所述训练图像的预测信息和标签信息,调整所述特征提取网络的网络参数;所述训练图像是用于训练所述特征提取网络所使用的图像,所述样本图像是特征提取网络训练完成之后用于训练所述特征更新网络的图像;
所述图像获取模块1304,用于:通过所述特征提取网络分别获取所述样本图像的图像特征和训练图像库中的各个库图像的图像特征;并基于所述样本图像的图像特征和各个库图像的图像特征之间的特征相似度,确定与所述样本图像相似的所述训练邻居图像。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
本公开至少一个实施例提供一种电子设备,该设备可以包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本公开任一实施例所述的图像特征的提取方法或者特征更新网络的训练方法。
本公开至少一个实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例所述的图像特征的提取方法或者特征更新网络的训练方法。
本公开至少一个实施例提供一种计算机程序,该计算机程序用于使处理器执行本公开任一实施例所述的图像特征的提取方法的步骤或者特征更新网络的训练方法的步骤。
本领域技术人员应明白,本公开一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本公开一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本公开一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开实施例还提供一种计算机可读存储介质,该存储介质上可以存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例描述的图像特征的提取方法的步骤,和/或,实现本公开任一实施例描述的特征更新网络的训练方法的步骤。其中,所述的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”包括三种方案:A、B、以及“A和B”。
本公开中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本公开特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本公开中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本公开中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本公开中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据 处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本公开中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本公开包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本公开内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
以上所述仅为本公开一个或多个实施例的较佳实施例而已,并不用以限制本公开一个或多个实施例,凡在本公开一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开一个或多个实施例保护的范围之内。

Claims (30)

  1. 一种图像特征的提取方法,包括:
    获取第一关联图,所述第一关联图中包括主节点以及至少一个邻居节点,所述主节点的节点值表示目标图像的图像特征,所述邻居节点的节点值表示邻居图像的图像特征,所述邻居图像是与所述目标图像相似的图像;
    将所述第一关联图输入特征更新网络,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,以得到更新后的目标图像的图像特征。
  2. 根据权利要求1所述的方法,其特征在于,获取所述第一关联图之前,所述方法还包括:
    根据所述目标图像,由图像库中获取与所述目标图像相似的邻居图像。
  3. 根据权利要求2所述的方法,其特征在于,根据所述目标图像,由所述图像库中获取与所述目标图像相似的邻居图像,包括:
    通过特征提取网络分别获取所述目标图像的图像特征和所述图像库中的各个库图像的图像特征;
    基于所述目标图像的图像特征和所述图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库中确定与所述目标图像相似的邻居图像。
  4. 根据权利要求3所述的方法,其特征在于,基于所述目标图像的图像特征和所述图像库中的各个所述库图像的图像特征之间的特征相似度,确定与所述目标图像相似的邻居图像,包括:
    将所述目标图像与各个所述库图像之间的特征相似度,按照特征相似度的数值由大到小的顺序进行排序;
    选取前预设位数的特征相似度对应的库图像,作为与所述目标图像相似的邻居图像。
  5. 根据权利要求3所述的方法,其特征在于,基于所述目标图像的图像特征和所述图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库中确定与所述目标图像相似的邻居图像,包括:
    根据所述目标图像的图像特征和各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述目标图像相似的第一图像;
    根据所述第一图像的图像特征与各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述第一图像相似的第二图像;
    将所述第一图像和所述第二图像,作为所述目标图像的邻居图像。
  6. 根据权利要求1所述的方法,其特征在于,所述特征更新网络的数量为一个,或者依次堆积的N个,其中N是大于1的整数;
    当所述特征更新网络的数量为N个时:其中第i特征更新网络的输入,是第i-1特征更新网络输出的更新后的第一关联图,其中i是大于1且小于或等于N的整数。
  7. 根据权利要求1所述的方法,其特征在于,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,得到更新后的目标图像的图像特征,包括:
    确定所述第一关联图中的所述主节点和各所述邻居节点之间的权重;
    根据所述权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征;
    根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征。
  8. 根据权利要求7所述的方法,其特征在于,根据所述权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征,包括:
    根据所述权重,将各所述邻居节点的图像特征进行加权求和,得到所述主节点的加权特征。
  9. 根据权利要求7所述的方法,其特征在于,根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征,包括:
    将所述主节点的图像特征与所述加权特征拼接;
    对拼接后的特征进行非线性映射,得到所述更新后的目标图像的图像特征。
  10. 根据权利要求7所述的方法,其特征在于,确定所述第一关联图中的所述主节点和所述邻居节点之间的权 重,包括:
    对所述主节点和所述邻居节点进行线性映射;
    对线性映射后的所述主节点和所述邻居节点确定内积;
    根据非线性处理后的所述内积,确定所述主节点与所述邻居节点之间的权重。
  11. 根据权利要求1~10任一所述的方法,其特征在于,所述目标图像包括:待检索的查询图像以及图像库中各个库图像;
    在得到所述更新后的目标图像的图像特征之后,所述方法还包括:
    基于所述更新后的目标图像的图像特征和所述各个库图像的图像特征之间的特征相似度,由所述库图像中获得所述目标图像的相似图像作为检索结果。
  12. 一种特征更新网络的训练方法,其特征在于,所述特征更新网络用于更新图像的图像特征;所述方法包括:
    获取第二关联图,所述第二关联图中包括训练主节点以及至少一个训练邻居节点,所述训练主节点的节点值表示样本图像的图像特征,所述训练邻居节点的节点值表示训练邻居图像的图像特征,所述训练邻居图像为与所述样本图像相似的图像;
    将所述第二关联图输入所述特征更新网络,所述特征更新网络根据所述第二关联图中的训练邻居节点的节点值更新所述主节点的节点值,得到更新后的样本图像的图像特征;
    根据所述更新后的样本图像的图像特征,得到所述样本图像的预测信息;
    根据所述预测信息调整所述特征更新网络的网络参数。
  13. 根据权利要求12所述的方法,其特征在于,获取所述第二关联图之前,所述方法还包括:
    根据所述样本图像,由训练图像库中获取与所述样本图像相似的所述训练邻居图像。
  14. 根据权利要求13所述的方法,其特征在于,根据所述样本图像,由所述训练图像库中获取与所述样本图像相似的所述训练邻居图像之前,所述方法还包括:
    通过特征提取网络,提取训练图像的图像特征;
    根据所述训练图像的图像特征,获得所述训练图像的预测信息;
    基于所述训练图像的预测信息和标签信息,调整所述特征提取网络的网络参数;
    根据所述样本图像,由所述训练图像库中获取与所述样本图像相似的所述训练邻居图像,包括:
    通过所述特征提取网络分别获取所述样本图像的图像特征和所述训练图像库中的各个库图像的图像特征;并
    基于所述样本图像的图像特征和各个所述库图像的图像特征之间的特征相似度,确定与所述样本图像相似的所述训练邻居图像。
  15. 一种图像特征的提取装置,包括:
    图获取模块,用于获取第一关联图,所述第一关联图中包括主节点以及至少一个邻居节点,所述主节点的节点值表示目标图像的图像特征,所述邻居节点的节点值表示邻居图像的图像特征,所述邻居图像是与目标图像相似的图像;
    特征更新模块,用于将所述第一关联图输入特征更新网络,所述特征更新网络根据所述第一关联图中的邻居节点的节点值更新所述主节点的节点值,以得到更新后的目标图像的图像特征。
  16. 根据权利要求15所述的装置,还包括:
    邻居获取模块,用于在所述图获取模块获取所述第一关联图之前,根据所述目标图像,由图像库中获取与所述目标图像相似的邻居图像。
  17. 根据权利要求16所述的装置,其特征在于,所述邻居获取模块,用于:
    通过特征提取网络分别获取所述目标图像的图像特征和所述图像库中的各个库图像的图像特征;
    基于所述目标图像的图像特征和所述图像库中的各个所述库图像的图像特征之间的特征相似度,从所述图像库 中确定与所述目标图像相似的邻居图像。
  18. 根据权利要求17所述的装置,其特征在于,所述邻居获取模块还用于:
    将所述目标图像与各个所述库图像之间的特征相似度,按照特征相似度的数值由大到小的顺序进行排序;
    选取前预设位数的特征相似度对应的库图像,作为所述目标图像相似的邻居图像。
  19. 根据权利要求17所述的装置,其特征在于,所述邻居获取模块还用于:
    根据所述目标图像的图像特征和各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述目标图像相似的第一图像;
    根据所述第一图像的图像特征与各个所述库图像的图像特征之间的特征相似度,由各个所述库图像中获得与所述第一图像相似的第二图像;
    将所述第一图像和所述第二图像,作为所述目标图像的邻居图像。
  20. 根据权利要求15所述的装置,其特征在于,所述特征更新网络的数量为一个,或者依次堆积的N个,其中N是大于1的整数;
    当所述特征更新网络的数量为N个时:其中第i特征更新网络的输入,是第i-1特征更新网络输出的更新后的第一关联图,其中i是大于1且小于或等于N的整数。
  21. 根据权利要求15所述的装置,其特征在于,所述特征更新模块,用于:
    确定所述第一关联图中的所述主节点和各所述邻居节点之间的权重;
    根据所述权重将各所述邻居节点的图像特征合并,得到所述主节点的加权特征;
    根据所述主节点的图像特征和所述加权特征,得到所述更新后的目标图像的图像特征。
  22. 根据权利要求21所述的装置,其特征在于,所述特征更新模块还用于:
    根据所述权重,将各所述邻居节点的图像特征进行加权求和,得到所述主节点的加权特征。
  23. 根据权利要求21所述的装置,其特征在于,所述特征更新模块还用于:
    将所述主节点的图像特征与所述加权特征拼接;
    对拼接后的特征进行非线性映射,得到所述更新后的目标图像的图像特征。
  24. 根据权利要求21所述的装置,其特征在于,所述特征更新模块还用于:
    对所述主节点和所述邻居节点进行线性映射;
    对线性映射后的所述主节点和所述邻居节点确定内积;
    根据非线性处理后的所述内积,确定所述主节点与所述邻居节点之间的权重。
  25. 一种特征更新网络的训练装置,包括:
    关联图获得模块,用于获取第二关联图,所述第二关联图中包括训练主节点以及至少一个训练邻居节点,所述训练主节点的节点值表示样本图像的图像特征,所述训练邻居节点的节点值表示训练邻居图像的图像特征,所述训练邻居图像为与所述样本图像相似的图像;
    更新处理模块,用于将所述第二关联图输入特征更新网络,所述特征更新网络根据所述第二关联图中的训练邻居节点的节点值更新所述主节点的节点值,得到更新后的样本图像的图像特征;
    参数调整模块,用于根据所述更新后的样本图像的图像特征,得到所述样本图像的预测信息;根据所述预测信息调整所述特征更新网络的网络参数。
  26. 根据权利要求25所述的装置,还包括:
    图像获取模块,用于在所述关联图获得模块获取所述第二关联图之前,根据所述样本图像,由训练图像库中获取与所述样本图像相似的所述训练邻居图像。
  27. 根据权利要求25所述的装置,还包括:
    预训练模块,用于:
    通过特征提取网络,提取训练图像的图像特征;
    根据所述训练图像的图像特征,获得所述训练图像的预测信息;
    基于所述训练图像的预测信息和标签信息,调整所述特征提取网络的网络参数;其中,所述训练图像是用于训练所述特征提取网络所使用的图像,所述样本图像是特征提取网络训练完成之后用于训练所述特征更新网络的图像;
    所述图像获取模块,用于:
    通过所述特征提取网络分别获取所述样本图像的图像特征和训练图像库中的各个库图像的图像特征;并
    基于所述样本图像的图像特征和各个所述库图像的图像特征之间的特征相似度,确定与所述样本图像相似的所述训练邻居图像。
  28. 一种电子设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至11任一所述的方法,或者实现权利要求12至14任一所述的方法。
  29. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至11任一所述的方法,或,实现权利要求12至14任一所述的方法。
  30. 一种计算机程序,其特征在于,所述计算机程序用于使处理器执行权利要求1至11任一所述的方法的步骤,或,执行权利要求12至14任一所述的方法的步骤。
PCT/CN2019/120028 2019-08-23 2019-11-21 图像特征提取及网络的训练方法、装置和设备 WO2021036028A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022500674A JP2022539423A (ja) 2019-08-23 2019-11-21 画像特徴抽出及びネットワークの訓練方法、装置並びに機器
KR1020227000630A KR20220017497A (ko) 2019-08-23 2019-11-21 이미지 특징 추출 및 네트워크의 훈련 방법, 장치 및 기기
US17/566,740 US20220122343A1 (en) 2019-08-23 2021-12-31 Image feature extraction and network training method, apparatus, and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910782629.9A CN110502659B (zh) 2019-08-23 2019-08-23 图像特征提取及网络的训练方法、装置和设备
CN201910782629.9 2019-08-23

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/566,740 Continuation US20220122343A1 (en) 2019-08-23 2021-12-31 Image feature extraction and network training method, apparatus, and device

Publications (1)

Publication Number Publication Date
WO2021036028A1 true WO2021036028A1 (zh) 2021-03-04

Family

ID=68589288

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/120028 WO2021036028A1 (zh) 2019-08-23 2019-11-21 图像特征提取及网络的训练方法、装置和设备

Country Status (6)

Country Link
US (1) US20220122343A1 (zh)
JP (1) JP2022539423A (zh)
KR (1) KR20220017497A (zh)
CN (1) CN110502659B (zh)
TW (1) TWI747114B (zh)
WO (1) WO2021036028A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221976A (zh) * 2022-08-18 2022-10-21 抖音视界有限公司 一种基于图神经网络的模型训练方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102020111456B4 (de) 2020-04-27 2023-11-16 Ebner Industrieofenbau Gmbh Vorrichtung und Verfahren zum Erwärmen mehrerer Tiegel
CN111985616B (zh) * 2020-08-13 2023-08-08 沈阳东软智能医疗科技研究院有限公司 一种图像特征提取方法、图像检索方法、装置及设备
CN113850179A (zh) * 2020-10-27 2021-12-28 深圳市商汤科技有限公司 图像检测方法及相关模型的训练方法、装置、设备、介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160196479A1 (en) * 2015-01-05 2016-07-07 Superfish Ltd. Image similarity as a function of weighted descriptor similarities derived from neural networks
CN108985190A (zh) * 2018-06-28 2018-12-11 北京市商汤科技开发有限公司 目标识别方法和装置、电子设备、存储介质、程序产品
CN109829433A (zh) * 2019-01-31 2019-05-31 北京市商汤科技开发有限公司 人脸图像识别方法、装置、电子设备及存储介质
CN109934826A (zh) * 2019-02-28 2019-06-25 东南大学 一种基于图卷积网络的图像特征分割方法
CN109934261A (zh) * 2019-01-31 2019-06-25 中山大学 一种知识驱动参数传播模型及其少样本学习方法
CN110111325A (zh) * 2019-05-14 2019-08-09 深圳大学 神经影像分类方法、计算机终端及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008165B (zh) * 2014-05-29 2017-05-24 华东师范大学 一种基于网络拓扑结构和节点属性的社团检测方法
US20180013658A1 (en) * 2016-07-06 2018-01-11 Agt International Gmbh Method of communicating between nodes in a computerized network and system thereof
CN113536019A (zh) * 2017-09-27 2021-10-22 深圳市商汤科技有限公司 一种图像检索方法、装置及计算机可读存储介质
CN109657533B (zh) * 2018-10-27 2020-09-25 深圳市华尊科技股份有限公司 行人重识别方法及相关产品

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160196479A1 (en) * 2015-01-05 2016-07-07 Superfish Ltd. Image similarity as a function of weighted descriptor similarities derived from neural networks
CN108985190A (zh) * 2018-06-28 2018-12-11 北京市商汤科技开发有限公司 目标识别方法和装置、电子设备、存储介质、程序产品
CN109829433A (zh) * 2019-01-31 2019-05-31 北京市商汤科技开发有限公司 人脸图像识别方法、装置、电子设备及存储介质
CN109934261A (zh) * 2019-01-31 2019-06-25 中山大学 一种知识驱动参数传播模型及其少样本学习方法
CN109934826A (zh) * 2019-02-28 2019-06-25 东南大学 一种基于图卷积网络的图像特征分割方法
CN110111325A (zh) * 2019-05-14 2019-08-09 深圳大学 神经影像分类方法、计算机终端及计算机可读存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
2016XING: "Graph Neural Networks overview", 14 August 2019 (2019-08-14), pages 1 - 11, XP009526515, Retrieved from the Internet <URL:http://www.360doc.com/content/19/0814/02/37048517_854721596.shtml> *
WEN WEN ,HUANG JIAMING ,CAI RUICHU ,HAO ZHIFENG ,WANG LIJUAN: "Graph Embedding by Incorporating Prior Knowledge on Vertex Information", JOURNAL OF SOFTWARE, vol. 29, no. 3, 31 March 2018 (2018-03-31), pages 786 - 798, XP055787278, ISSN: 1000-9825, DOI: 10.13328/j.cnki.jos.005437 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115221976A (zh) * 2022-08-18 2022-10-21 抖音视界有限公司 一种基于图神经网络的模型训练方法及装置
CN115221976B (zh) * 2022-08-18 2024-05-24 抖音视界有限公司 一种基于图神经网络的模型训练方法及装置

Also Published As

Publication number Publication date
JP2022539423A (ja) 2022-09-08
CN110502659A (zh) 2019-11-26
CN110502659B (zh) 2022-07-15
TW202109312A (zh) 2021-03-01
US20220122343A1 (en) 2022-04-21
TWI747114B (zh) 2021-11-21
KR20220017497A (ko) 2022-02-11

Similar Documents

Publication Publication Date Title
WO2021036028A1 (zh) 图像特征提取及网络的训练方法、装置和设备
US11949964B2 (en) Generating action tags for digital videos
CN111523621B (zh) 图像识别方法、装置、计算机设备和存储介质
CN107480261B (zh) 一种基于深度学习细粒度人脸图像快速检索方法
WO2024021394A1 (zh) 全局特征与阶梯型局部特征融合的行人重识别方法及装置
CN110851645B (zh) 一种基于深度度量学习下相似性保持的图像检索方法
KR102305568B1 (ko) 일정한 처리 시간 내에 k개의 극값을 찾는 방법
CN108664526B (zh) 检索的方法和设备
CN111709311A (zh) 一种基于多尺度卷积特征融合的行人重识别方法
CN112463976B (zh) 一种以群智感知任务为中心的知识图谱构建方法
CN113393474B (zh) 一种基于特征融合的三维点云的分类和分割方法
JP7430243B2 (ja) 視覚的測位方法及び関連装置
CN113297369B (zh) 基于知识图谱子图检索的智能问答系统
CN113033507B (zh) 场景识别方法、装置、计算机设备和存储介质
CN105183746A (zh) 从多相关图片中挖掘显著特征实现图像检索的方法
Barman et al. A graph-based approach for making consensus-based decisions in image search and person re-identification
Jung et al. Few-shot metric learning: Online adaptation of embedding for retrieval
Negi et al. End-to-end residual learning-based deep neural network model deployment for human activity recognition
CN106557533B (zh) 一种单目标多图像联合检索的方法和装置
CN114579794A (zh) 特征一致性建议的多尺度融合地标图像检索方法及系统
CN113779287A (zh) 基于多阶段分类器网络的跨域多视角目标检索方法及装置
Wu et al. Visual loop closure detection by matching binary visual features using locality sensitive hashing
CN110750672A (zh) 基于深度度量学习和结构分布学习损失的图像检索方法
Barros et al. TReR: A Lightweight Transformer Re-Ranking Approach for 3D LiDAR Place Recognition
Wang et al. A G2G similarity guided pedestrian Re-identification algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19943375

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022500674

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227000630

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19943375

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19943375

Country of ref document: EP

Kind code of ref document: A1