CN109960742B

CN109960742B - Local information searching method and device

Info

Publication number: CN109960742B
Application number: CN201910120165.5A
Authority: CN
Inventors: 龚迅; 肖潇; 晋兆龙
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2019-02-18
Filing date: 2019-02-18
Publication date: 2021-11-05
Anticipated expiration: 2039-02-18
Also published as: CN109960742A

Abstract

The invention relates to the technical field of image processing, in particular to a method and a device for searching local information, wherein the method comprises the following steps: acquiring a picture to be inquired with label information and local information of each picture in a picture library; inputting the picture to be queried to the detection model to obtain the position of each local area in the picture to be queried, the corresponding classification and an example segmentation result in each local area; determining the classification of the user interested areas by using the marking information and the positions of all local areas in the picture to be inquired; respectively extracting example segmentation results in a local area corresponding to the determined classification from the picture to be queried and the picture library; and searching in the picture library based on the local area internal example segmentation result extracted from the picture to be queried so as to obtain the search result of the picture to be queried. The idea of example segmentation is applied to local information search, targeted search can be performed based on the region interested by the user, and the accuracy of image search can be improved.

Description

Local information searching method and device

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for searching local information.

Background

Searching images by using images is an important subject in the field of computer vision, and the main task is to provide a technology for searching related images for human beings by a technology for searching similar images of a single image in an image library. The method relates to the technical fields of computer vision, image processing, mode recognition, information processing and the like, and a large amount of manpower is required to be invested in mature face retrieval, network picture retrieval and license plate vehicle retrieval in the monitoring field.

In recent years, image searching is widely applied to the fields of intelligent video monitoring, automatic vehicle driving, robot environment perception and the like. For example, in a public security big data system, a map search is performed, that is, a vehicle image provided by a user is used as an image of a target vehicle, and a driving record of the target vehicle is searched in a mass of checkpoint or electric alarm data records.

From the application perspective, most of the existing image retrieval related applications are based on global features to perform searching, namely searching images similar to target images in a massive image database. However, the image retrieval method based on the global feature ignores the local detail feature and gets the result of the overall understanding, so that the accuracy of the searched image is low, and it is difficult to provide a fast and targeted clue for security monitoring.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for searching local information, so as to solve the problem of low search accuracy caused by global search.

According to a first aspect, an embodiment of the present invention provides a method for searching local information, including:

acquiring a picture to be inquired with labeling information and each local information of each picture in a picture library, wherein the local information comprises the position of a local area, corresponding classification and an example segmentation result in the local area; the annotation information is used for representing the position information of the region of interest of the user;

inputting the picture to be queried to the detection model to obtain the position of each local area in the picture to be queried, the corresponding classification and an example segmentation result in each local area;

determining the classification of the user interested region by using the labeling information and the position of each local region in the picture to be inquired;

respectively extracting example segmentation results in a local area corresponding to the determined classification from the picture to be queried and each picture in the picture library;

and searching the segmented results of the instances in the local area extracted from each picture in the picture library based on the segmented results of the instances in the local area extracted from the picture to be queried so as to obtain the search result of the picture to be queried.

According to the local information searching method provided by the embodiment of the invention, the concept of example segmentation is applied to local information searching, and the obtained local example of the picture to be inquired is utilized to search in the local example of each picture in the picture library.

With reference to the first aspect, in a first implementation manner of the first aspect, the searching, based on the local-area-internal-instance segmentation result extracted from the picture to be queried, in the local-area-internal-instance segmentation result extracted from each picture in the picture library, includes:

respectively extracting example foreground characteristics from the image to be inquired and the extracted example segmentation result in the local area from each image in the image library so as to construct a first local area characteristic vector and a plurality of second local area characteristic vectors; the first local area feature vector corresponds to the picture to be queried, and the second local area feature vector corresponds to each picture in the picture library;

calculating a similarity based on the first local region feature vector and each of the second local region feature vectors;

and extracting the corresponding picture from the picture library based on the calculation result of the similarity.

According to the local information searching method provided by the embodiment of the invention, the example foreground characteristics are extracted from the example segmentation result in the local area, so that the influence of the background on the local search is reduced, and the searching accuracy is improved.

With reference to the first embodiment of the first aspect, in the second embodiment of the first aspect, the extracting of instance foreground features from the segmented results of the local areas extracted from the picture to be queried and each picture in the picture library to construct a first local area feature vector and a plurality of second local area feature vectors includes:

setting the pixel value of the area of the first example segmentation result and the second example segmentation result, wherein the pixels are segmented into the background, to zero; the first example segmentation result is a local area example segmentation result extracted from the picture to be queried, and the second example segmentation result is a local area example segmentation result extracted from each picture in the picture library;

pooling the first example segmentation result and the second example segmentation result after zero setting to obtain the first local area feature vector and the second local area feature vector with the same size.

According to the local information searching method provided by the embodiment of the invention, the local region characteristic vectors with the same size are obtained through pooling, namely the floating point type image pixel value is obtained through pooling, so that conditions are provided for subsequent local information searching.

With reference to the first embodiment of the first aspect, in the third embodiment of the first aspect, the similarity is calculated by using the following formula:

here, Feat1 is the first local region feature vector, and Feat2 is the second local region feature vector.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the determining, by using the labeling information and the positions of the local regions in the picture to be queried, the classification of the region of interest of the user includes:

calculating the intersection ratio of the user interested region and each local region in the picture to be inquired;

determining the position of a local area corresponding to the user interested area based on the intersection ratio of the user interested area and each local area in the picture to be inquired;

and extracting the classification of the position corresponding to the determined local area to obtain the classification of the user interested area.

According to the local information searching method provided by the embodiment of the invention, the position of the local area corresponding to the interested area is determined through the intersection ratio of the interested area of the user and each local area in the picture to be inquired, and the intersection ratio can reflect the area occupation ratio of the local area in the interested area of the user to a great extent, so that the accuracy of determining the position of the local area corresponding to the interested area of the user is improved by utilizing the intersection ratio.

With reference to the first aspect, in a fifth implementation manner of the first aspect, the training process of the detection model includes the following steps:

initializing parameters of the neural network;

inputting the sample image into the neural network, and outputting the position of a local region and the corresponding classification through forward propagation;

performing instance segmentation on the local area by using a mask branch;

and comparing the position of the local area, the corresponding classification and the example segmentation result with the labeled value of the sample image to optimize the parameters of the neural network.

The method for searching the local information provided by the embodiment of the invention reduces the interference of background noise by using the mask branch, and provides a favorable clue for searching the local area.

With reference to the fifth implementation manner of the first aspect, in the sixth implementation manner of the first aspect, the loss function of the neural network is a sum of the loss function of the local region position, the loss function of the corresponding classification, and the loss function of the example segmentation.

According to a second aspect, an embodiment of the present invention provides an apparatus for searching local information, including:

the first acquisition module is used for acquiring the picture to be inquired with the labeling information and each piece of local information of each picture in the picture library, wherein the local information comprises the position of a local area, corresponding classification and an example segmentation result in the local area; the annotation information is used for representing the position information of the region of interest of the user;

the input module is used for inputting the picture to be inquired into the detection model to obtain the position of each local area in the picture to be inquired, the corresponding classification and the example segmentation result in each local area;

the determining module is used for determining the classification of the user interested region by utilizing the marking information and the position of each local region in the picture to be inquired;

the extraction module is used for respectively extracting example segmentation results in a local area corresponding to the determined classification from the picture to be inquired and each picture in the picture library;

and the searching module is used for searching the segmented results of the instances in the local area extracted from each picture in the picture library based on the segmented results of the instances in the local area extracted from the picture to be inquired so as to obtain the searching result of the picture to be inquired.

The local information searching device provided by the embodiment of the invention applies the concept of example segmentation to local information search, and searches in the local example of each picture in the picture library by using the obtained local example of the picture to be inquired.

According to a third aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, and the processor executing the computer instructions to perform the local information searching method according to the first aspect or any one of the embodiments of the first aspect.

According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores computer instructions for causing a computer to execute the method for searching for local information described in the first aspect or any one of the implementation manners of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a local information search method according to an embodiment of the present invention;

fig. 2 is a flowchart of a local information search method according to an embodiment of the present invention;

fig. 3 is a flowchart of a local information search method according to an embodiment of the present invention;

FIG. 4 is a flow chart of a training method of a detection model according to an embodiment of the invention;

FIG. 5 is a block diagram of a training method of a detection model according to an embodiment of the invention;

fig. 6 is a block diagram of a local information search method according to an embodiment of the present invention;

FIG. 7 is a flowchart of a method of a training phase and a search phase according to an embodiment of the invention;

FIG. 8 is a diagram illustrating search results for local information, according to an embodiment of the present invention;

fig. 9 is a block diagram of a structure of a local information search apparatus according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In accordance with an embodiment of the present invention, there is provided an embodiment of a method for searching for local information, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that presented herein.

In this embodiment, a method for searching local information is provided, which can be used in the above-mentioned electronic devices, such as a mobile phone, a tablet computer, and the like, and fig. 1 is a flowchart of a method for searching local information according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:

s11, obtaining the picture to be inquired with the label information and the local information of each picture in the picture library.

The annotation information is used for representing the position information of the region of interest of the user. For example, the picture to be queried is a person picture, and the region of interest of the user is a shoe with a nike mark. Then, before searching for the local information, the user needs to mark shoes with nike marks on the personnel pictures as the marking information of the personnel pictures. Specifically, during labeling, an image feature tag selection tool (e.g., LabelImg) can be used for manually labeling the picture to be queried, and the labeling information is (X)_p,Y_p,L_p,W_p) Wherein (X)_p,Y_p) As coordinate information of the top left corner of the feature, (L)_p,W_p) The length and width of the pixel occupied by the feature; manual labeling and the like may also be performed.

Further, the picture library comprises a plurality of pictures, the local information of each picture in the picture library acquired by the electronic device is obtained by inputting each picture in the picture library to a detection model, and the detection model is obtained by training a plurality of sample images by utilizing a neural network.

Specifically, the input of the detection model is a picture, and the output is the position of each local area in the picture, the corresponding classification and the example segmentation result in the local area. The corresponding classification is set in advance during training of the detection model, and the preset classification can be set as the K classification. For example, a picture of a person is input, and the output local area includes a limb, a head, an upper body, a lower body, and the like. The detection model selects each local area by using a regression frame (rectangular frame) during output, then represents the classification of each local area, and outputs example segmentation results in each local area (i.e. if a plurality of persons exist on one person image, each person is individually segmented).

The local information of each picture in the picture library may be obtained by inputting a detection model in advance, or may be obtained by inputting a detection model when the local information needs to be searched.

And S12, inputting the picture to be inquired into the detection model, and obtaining the position of each local area in the picture to be inquired, the corresponding classification and the example segmentation result in each local area.

After the electronic equipment acquires the picture to be queried with the labeling information, the picture to be queried is input to the detection model, and then the local information of the picture to be queried can be obtained, wherein the local information comprises the position of each local area, the corresponding classification and the example segmentation result in each local area.

And S13, determining the classification of the user interested region by using the marking information and the position of each local region in the picture to be inquired.

Because the annotation information input to the detection model is used for representing the position information which is interesting to the user, and meanwhile, the position of each local region can be obtained after the picture to be queried passes through the detection model, the local region corresponding to the annotation information can be determined by comparing the position information represented by the annotation information with the position of each local region, so that the classification of the local region corresponding to the annotation information (namely, the classification of the region of interest of the user) is determined.

The local area corresponding to the marking information can be determined by comparing the central point of the position information represented by the mark information with the central point of the position of each local area; the local area corresponding to the label information may be determined by sequentially obtaining the intersection ratio between the position information indicated by the marker information and the position of each local area.

After the electronic device determines the position of the local area corresponding to the user interest area in the picture to be queried, the determined position of the local area is utilized to determine the classification of the local area because the position of the local area corresponds to the classification of the local area. Therefore, the electronic device can determine the classification corresponding to the user interest region in the picture to be queried.

S14, extracting the example segmentation result in the local area corresponding to the determined classification from the picture to be queried and each picture in the picture library.

After determining the classification of the user interest region in the picture to be queried, the electronic device can extract example segmentation results in the local region corresponding to the classification from example segmentation results in each local region in the picture to be queried; in the same way, the electronic device may extract, from each picture of the picture library, an instance segmentation result in the local area corresponding to the classification.

And S15, searching the segmented results of the instances in the local area extracted from each picture in the picture library based on the segmented results of the instances in the local area extracted from the picture to be queried to obtain the search result of the picture to be queried.

On the basis of the local in-region example segmentation result corresponding to the user interest region obtained in S14, the electronic device searches the local in-region example segmentation result extracted from each picture in the picture library to obtain a search result of the picture to be queried. The method comprises the steps of representing a local area internal example segmentation result of a user interested area and a local area internal example segmentation result extracted from each picture in a picture library in a characteristic vector mode, and extracting a search result of a picture to be inquired from the picture library by utilizing the similarity through calculating the similarity between characteristic vectors; or carrying out foreground feature extraction on the local area internal example segmentation result of the user interested area and the segmented local area internal example segmentation result extracted from each picture in the picture library, and then adopting feature vector representation to calculate the similarity; or the search of the picture to be queried can be carried out in other ways.

The method for searching the local information provided by the embodiment applies the concept of example segmentation to local information search, and searches in the local example of each picture in the picture library by using the obtained local example of the picture to be queried.

The embodiment also provides a local information searching method, which can be used in the above electronic devices, such as a mobile phone, a tablet computer, and the like, fig. 1 is a flowchart of a local information searching method according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

s21, obtaining the picture to be inquired with the label information and the local information of each picture in the picture library.

The local information comprises the position of each local area, corresponding classification and an example segmentation result in the local area; the annotation information is used for representing the position information of the region of interest of the user; the local information is obtained by inputting each picture in the picture library to a detection model; the detection model is obtained by training a plurality of sample images by utilizing a neural network.

Please refer to S11 in fig. 1, which is not described herein again.

And S22, inputting the picture to be inquired into the detection model, and obtaining the position of each local area in the picture to be inquired, the corresponding classification and the example segmentation result in each local area.

Please refer to S12 in fig. 1, which is not described herein again.

And S23, determining the classification of the user interested region by using the marking information and the position of each local region in the picture to be inquired.

Please refer to S13 in fig. 1, which is not described herein again.

S24, extracting the example segmentation result in the local area corresponding to the determined classification from the picture to be queried and each picture in the picture library.

Please refer to S14 in fig. 1, which is not described herein again.

And S25, searching the segmented results of the instances in the local area extracted from each picture in the picture library based on the segmented results of the instances in the local area extracted from the picture to be queried to obtain the search result of the picture to be queried.

The electronic equipment searches the picture to be queried by adopting a mode of calculating the similarity of the feature vectors. Specifically, the method comprises the following steps:

and S251, extracting example foreground characteristics from the segmented results of the local area examples extracted from the picture to be inquired and each picture in the picture library respectively to construct a first local area characteristic vector and a plurality of second local area characteristic vectors.

The first local area feature vector corresponds to a picture to be inquired, and the second local area feature vector corresponds to each picture in the picture library.

After the picture is input into the detection model, the position of each local area of the picture, the corresponding classification and the example segmentation result in each local area are output, and the global feature corresponding to the picture can be obtained. The electronic equipment can determine which pixel points belong to the background and which pixel points belong to the foreground by using the global characteristics of the picture.

Therefore, when the electronic device extracts the instance foreground features from the to-be-queried picture and the segmented result of the instances in the local region extracted from each picture in the picture library, the method may include the following steps:

(1) and setting the pixel value of the area of the first example segmentation result and the second example segmentation result, wherein the pixels are segmented into the background, to zero. The first example segmentation result is a local area example segmentation result extracted from the picture to be queried, and the second example segmentation result is a local area example segmentation result extracted from each picture in the picture library.

(2) And pooling the zero-set first example segmentation result and the second example segmentation result to obtain a first local area feature vector and a second local area feature vector which have the same size.

The bilinear interpolation may be performed by region Pooling (ROI alignment stacking), so as to obtain a first local region feature vector and a second local region feature vector with the same size. Specifically, the first local region feature vector corresponds to a picture to be queried; the number of the second local area feature vectors is the same as the number of the pictures in the picture library, that is, each picture in the picture library corresponds to one second local area feature vector.

S252, a similarity is calculated based on the first local region feature vector and each of the second local region feature vectors.

Wherein, the distance between two feature vectors can be calculated, and the distance is used for representing the similarity; the similarity may also be calculated by the following formula:

wherein, Similarity is Similarity, Feat1 is the first local area feature vector, and Feat2 is the second local area feature vector.

When the similarity is calculated by using the above formula, the similarity between the first local region feature vector and each second local region feature vector needs to be calculated in sequence to obtain the similarity with the same number of pictures in the picture library.

And S253, extracting a corresponding picture from the picture library based on the calculation result of the similarity.

After obtaining a plurality of similarities, the electronic device may rank the similarities, and select a plurality of pictures with the highest similarities as search results of the pictures to be queried.

Compared with the embodiment shown in fig. 1, the method for searching local information provided by the embodiment performs example foreground feature extraction on the example segmentation result in the local area, so that the influence of the background on local search is reduced, and the search accuracy is improved.

The embodiment also provides a local information searching method, which can be used in the above electronic devices, such as a mobile phone, a tablet computer, and the like, fig. 1 is a flowchart of a local information searching method according to an embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:

s31, obtaining the picture to be inquired with the label information and the local information of each picture in the picture library.

Please refer to S21 in fig. 2 for details, which are not described herein.

And S32, inputting the picture to be inquired into the detection model, and obtaining the position of each local area in the picture to be inquired, the corresponding classification and the example segmentation result in each local area.

Please refer to S22 in fig. 2 for details, which are not described herein.

And S33, determining the classification of the user interested region by using the marking information and the position of each local region in the picture to be inquired.

The electronic equipment determines the classification of the region of interest of the user in a cross-comparison mode. Specifically, the method comprises the following steps:

and S331, calculating the intersection ratio of the region of interest of the user and the position of each local region in the picture to be inquired.

Since the label information is (X)_p,Y_p,L_p,W_p) Using the label information to express the interested area of the user; the position information of each local region is output from the detection model, and each local region can be represented. The electronic device calculates an Intersection-over-unity (i.e., referred to as an IOU for short) between the region of interest of the user and the position of each local region in the picture to be queried, so as to subsequently determine the position of the local region corresponding to the region of interest of the user.

S332, determining the position of the local area corresponding to the user interested area based on the intersection ratio of the user interested area and each local area in the picture to be inquired.

And sequentially calculating the intersection ratio corresponding to each local area, thereby determining the position of the local area corresponding to the user interested area by using the calculated size.

Or, optionally, when determining the position of the local area corresponding to the user interest area, the center point of the user interest area and the center point of the position of each local area may also be combined. For example, a part of the local area may be excluded by using the central point, and then the intersection ratio may be used to determine the position of the local area corresponding to the user interested area.

S333, extracting the classification of the position corresponding to the determined local area to obtain the classification of the user interested area.

After the electronic device determines the position of the local area corresponding to the user interest area, the classification of the position of the local area can be determined by using the corresponding relation between the position of the local area and the corresponding classification, so that the classification of the user interest area can be obtained.

S34, extracting the example segmentation result in the local area corresponding to the determined classification from the picture to be queried and each picture in the picture library.

Please refer to S24 in fig. 2 for details, which are not described herein.

And S35, searching the segmented results of the instances in the local area extracted from each picture in the picture library based on the segmented results of the instances in the local area extracted from the picture to be queried to obtain the search result of the picture to be queried.

Please refer to S25 in fig. 2 for details, which are not described herein.

Compared with the embodiment shown in fig. 2, in the local information searching method provided by the embodiment, the position of the local area corresponding to the region of interest is determined by the intersection ratio of the region of interest of the user and each local area in the picture to be queried, and since the intersection ratio can greatly reflect the area ratio of the local area in the region of interest of the user, the accuracy of determining the position of the local area corresponding to the region of interest of the user is improved by using the intersection ratio.

As an optional implementation manner of this embodiment, as shown in fig. 4, the training process of the detection model includes the following steps:

and S41, initializing parameters of the neural network.

The constructed neural network can be MASK RCNN, and each parameter in the network is initialized and an initial value is set; while setting the preset classification of the network to class K. The constructed neural network is shown in fig. 5.

S42, the sample image is input to a neural network, and the position of the local region and the corresponding classification are output by forward propagation.

The method comprises the steps of forming a data set by a plurality of sample images, dividing the data set into a training set, a verification set and a test set, marking the data set, carrying out data augmentation processing including horizontal turning, rotation, noise addition, translation, brightness contrast adjustment and the like, ensuring that the input size of each image is scaled to 224 multiplied by 224 pixels, and carrying out uniform mean value reduction operation on the images.

Specifically, first, global depth features of the input image are extracted using Resnet50 as a basic Convolutional Neural Network (CNN for short). Second, like fast RCNN, the RPN structure is used to generate anchors (anchors) to generate proposed regions of the target (Region probes). And thirdly, combining the depth global features, further performing ROI alignment Pooling operation, and generating a Pooling result feature map with the same size for each proposed region. And finally, generating the position and the category of the specific local area classification through a designed Full Convolution Network (FCN) based on the fixed size characteristic obtained by ROI Align Pooling.

S43, the local area is instance-divided using the mask branch.

The detection model generates a pixel segmentation result based on a Mask branch (Mask branch), namely an example segmentation result corresponding to a local area.

And S44, comparing the position of the local area, the corresponding classification and the example segmentation result with the labeled value of the sample image, and optimizing the parameters of the neural network.

And providing a back propagation algorithm training deep learning network based on the position of the local region, the corresponding classification and the comparison between the result of the example segmentation and the real labeled value. The network has three loss functions including a local region classification loss function L_clsLocal area location loss function L_boxAnd loss function L of instance partitioning_mask. The loss function is the sum of the three, and is expressed as:

L＝L_cls+L_box+L_mask。

As a specific implementation manner of this embodiment, for example, as shown in fig. 7, a left side in fig. 7 is a flowchart of a detection model training stage, and a right side in fig. 7 is a flowchart of a search stage, where a search method of local information is specifically described as follows:

step (1), local area information output: and outputting the pre-classification of each local area, the corresponding position of each local area and the example segmentation result in the local area based on the trained detection network.

Step (2), extracting classification foreground features to be checked: and for the image to be inquired, determining which kind of corresponding preset classification according to the region of interest inquired by the user and the positions of the coordinate frames (the positions of all local regions) of all the preset classifications.

And (3) extracting the characteristics of the example foreground of the image to be inquired and the picture library image corresponding to the classification.

Specifically, the pixels of the local area corresponding to the user interest area are divided into areas with the background, all the areas are set to zero, and the features after the background is set to zero are obtained as the feature vectors of the local area corresponding to the image to be queried, as shown in a schematic diagram of a search part of fig. 6.

And (4) similarity calculation: and based on the global features and the related information of the local regions, the local features are scratched. Local feature extraction uses bilinear interpolation to obtain floating-point pixel values, referred to as ROI Align Pooling. And sequentially matching and calculating the cosine similarity between the query image and each picture library image:

and (5) outputting a local information detection search result: and according to the similarity of the extracted features of the query image, arranging the pictures in the picture library in a descending order as output.

An exemplary effect of the method for searching the local information of the person according to the present invention is shown in fig. 8. As can be seen from the figure, according to the left query picture (where the local information corresponding to the first row is the tail of the electric bicycle, the local information corresponding to the second row is the jacket of the pedestrian, and the local information corresponding to the third row is the lower body of the pedestrian), the most similar picture in the 10 picture libraries is given on the right side, so as to achieve the expected effect.

In this embodiment, a local information search apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated after the description is given. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The present embodiment provides a local information search apparatus, as shown in fig. 9, including:

the first obtaining module 61 is configured to obtain a picture to be queried with label information and local information of each picture in the picture library, where the local information includes a position of a local area, a corresponding classification, and an instance segmentation result in the local area; the annotation information is used for representing the position information of the region of interest of the user; the local information is obtained by inputting each picture in the picture library to a detection model; the detection model is obtained by training a plurality of sample images by utilizing a neural network.

And the input module 62 is configured to input the picture to be queried to the detection model, so as to obtain a position of each local region in the picture to be queried, a corresponding classification, and an example segmentation result in each local region.

And the determining module 63 is configured to determine the classification of the user region of interest by using the labeling information and the positions of the local regions in the picture to be queried.

An extracting module 64, configured to extract, from the picture to be queried and each picture in the picture library, an instance segmentation result in the local area corresponding to the determined classification.

And the searching module 65 is configured to search the segmented results of the instances in the local area extracted from each picture in the picture library based on the segmented results of the instances in the local area extracted from the picture to be queried, so as to obtain a search result of the picture to be queried.

The local information search device provided by the embodiment applies the concept of example segmentation to local information search, and searches in the local example of each picture in the picture library by using the obtained local example of the picture to be queried.

The local information search means in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that can provide the above-described functionality.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

An embodiment of the present invention further provides an electronic device, which has the local information search apparatus shown in fig. 9.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, as shown in fig. 10, the electronic device may include: at least one processor 71, such as a CPU (Central Processing Unit), at least one communication interface 73, memory 74, at least one communication bus 72. Wherein a communication bus 72 is used to enable the connection communication between these components. The communication interface 73 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 73 may also include a standard wired interface and a standard wireless interface. The Memory 74 may be a high-speed RAM Memory (volatile Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 74 may alternatively be at least one memory device located remotely from the processor 71. Wherein the processor 71 may be in connection with the apparatus described in fig. 9, an application program is stored in the memory 74, and the processor 71 calls the program code stored in the memory 74 for performing any of the above-mentioned method steps.

The communication bus 72 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 72 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.

The memory 74 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 74 may also comprise a combination of memories of the kind described above.

The processor 71 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.

The processor 71 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the memory 74 is also used for storing program instructions. Processor 71 may invoke program instructions to implement a method of searching for local information as shown in the embodiments of fig. 1-4 of the present application.

The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the local information searching method in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A method for searching local information, comprising:

acquiring a picture to be inquired with labeling information and local information of each picture in a picture library, wherein the local information comprises the position of each local area, corresponding classification and an example segmentation result in the local area; the annotation information is used for representing the position information of the region of interest of the user;

inputting the picture to be queried to a detection model obtained by pre-training to obtain the position, the corresponding classification and the example segmentation result in each local area in the picture to be queried, wherein when the picture to be queried comprises at least one person, each person is individually segmented, and each local area comprises four limbs, a head, an upper half body or a lower half body;

2. The method as claimed in claim 1, wherein the searching in the local-area-inside-instance segmentation result extracted from each picture in the picture library based on the local-area-inside-instance segmentation result extracted from the picture to be queried comprises:

3. The method according to claim 2, wherein the extracting instance foreground features from the segmented results of the instances in the local regions extracted from the picture to be queried and each picture in the picture library to construct a first local region feature vector and a plurality of second local region feature vectors respectively comprises:

4. The method of claim 2, wherein the similarity is calculated using the following formula:

5. The method according to claim 1, wherein the determining the classification of the user's region of interest by using the labeling information and the positions of the local regions in the picture to be queried comprises:

6. The method of claim 1, wherein the training process of the detection model comprises the steps of:

initializing parameters of a neural network;

inputting the sample image into the neural network, and outputting the position of the local area and the corresponding classification through forward propagation;

performing instance segmentation on the local area by using a mask branch;

7. The method of claim 6, wherein the loss function of the neural network is a sum of a loss function of a local region location, a loss function of a corresponding class, and a loss function of an instance segmentation.

8. An apparatus for searching local information, comprising:

the image query module is used for inputting the image to be queried to a detection model to obtain the position of each local area in the image to be queried, the corresponding classification and the example segmentation result in each local area, wherein when the image to be queried comprises at least one person, each person is individually segmented, and each local area comprises four limbs, a head, an upper half body or a lower half body;

9. An electronic device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the local information search method according to any one of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions for causing a computer to execute the local information search method according to any one of claims 1 to 7.