CN111553372B

CN111553372B - Training image recognition network, image recognition searching method and related device

Info

Publication number: CN111553372B
Application number: CN202010332194.0A
Authority: CN
Inventors: 章书豪; 夏雄尉; 谢泽华; 周泽南; 苏雪峰
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2023-08-08
Anticipated expiration: 2040-04-24
Also published as: CN111553372A

Abstract

The application discloses a training image recognition network, an image recognition searching method and a related device, wherein the training image recognition network method comprises the following steps: dividing an original training image into a plurality of training image blocks and marking labels; according to the detection result of the image salient region of the original training image, a plurality of training image blocks are shuffled and rearranged to obtain a rearranged training image of the original training image; and taking the original training image, the rearranged training image and the corresponding annotation data comprising the coarse-granularity image class label, the fine-granularity image class label, the image preprocessing class label and the training image block label sequence as training data, and training the image recognition network to obtain an image recognition model. The image recognition searching method comprises the following steps: acquiring an image to be identified; inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and searching the image database for similar images by utilizing the target characteristics and the target categories of the images to be identified.

Description

Training image recognition network, image recognition searching method and related device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training image recognition network, a method for image recognition search, and a related device.

Background

Along with the rapid development of science and technology, in daily life, a user can shoot the interested articles at hand, and can quickly acquire links of the same type of articles or similar articles by searching the articles by using the article images, so that the searching requirement of the interested articles of the user is met; wherein searching for the item image is actually performing an image recognition search on the item image.

At present, the image recognition searching method generally utilizes a deep learning model to extract global features of an object image for recognition searching. However, for an article image with a complex scene, for example, an article area in the article image is relatively small, only the global feature of the article image can be extracted by using the deep learning model, only the global feature of the article image is focused in the subsequent image recognition searching process, the important feature of the article image is easy to miss, the accuracy of image recognition searching is greatly reduced, and therefore, the user experience of image recognition searching is poor.

Disclosure of Invention

The technical problem to be solved by the application is to provide a training image recognition network, an image recognition searching method and a related device, so that the image recognition network focuses on local features of an image to obtain an image recognition model with enhanced local feature perception capability of the image; even for the images to be identified with complex scenes, the accuracy of image identification searching can be effectively improved, and therefore user experience of image identification searching is improved.

In a first aspect, an embodiment of the present application provides a method for training an image recognition network, the method including:

dividing an original training image to obtain a plurality of training image blocks and marking labels;

performing scrambling rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image to obtain a rearranged training image of the original training image;

based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.

Optionally, the performing, based on the detection result of the image salient region of the original training image, scrambling and rearranging the plurality of training image blocks to obtain a rearranged training image of the original training image includes:

detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;

And performing disorder rearrangement on the training image blocks based on the heat degree of the attention heat map to obtain a rearranged training image of the original training image.

Optionally, the scrambling rearrangement of the plurality of training image blocks based on the image salient region detection result of the original training image includes:

based on the image salient region detection result of the original training image, the disturbing degree of the training image block corresponding to the position with higher salient degree in the image salient region detection result is lower, and the disturbing degree of the training image block corresponding to the position with lower salient degree is higher.

Optionally, the training image recognition network obtains an image recognition model based on the original training image, the rearranged training image and the corresponding labeling data, including:

based on the original training image and the rearranged training image, training features are obtained by utilizing a feature extraction network in the image recognition network;

based on the training characteristics, obtaining prediction data by utilizing an identification network in the image identification network, wherein the prediction data comprises a prediction coarse-granularity image category, a prediction fine-granularity image category and a prediction image preprocessing category;

And training network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.

Optionally, the network penalty function includes a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image preprocessing class classification penalty function, and a reorder training image restoration to an original training image penalty function.

In a second aspect, an embodiment of the present application provides a method for image recognition searching, using the image recognition model according to any one of the first aspect, where the method includes:

acquiring an image to be identified;

obtaining target characteristics and target categories of the image to be identified by using the image identification model;

and searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.

In a third aspect, an embodiment of the present application provides an apparatus for training an image recognition network, the apparatus including:

the segmentation obtaining unit is used for segmenting the original training image to obtain a plurality of training image blocks and marking labels;

a rearrangement obtaining unit, configured to shuffle and rearrange the plurality of training image blocks based on an image salient region detection result of the original training image, to obtain a rearranged training image of the original training image;

The training obtaining unit is used for obtaining an image recognition model by a training image recognition network based on the original training image, the rearranged training image and the corresponding annotation data; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.

Optionally, the rearrangement obtaining unit includes:

the detection obtaining subunit is used for detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;

and the rearrangement obtaining subunit is used for carrying out disorder rearrangement on the plurality of training image blocks based on the heat degree of the attention heat map to obtain a rearranged training image of the original training image.

Optionally, the rearrangement obtaining unit is specifically configured to:

Optionally, the training obtaining unit includes:

a first obtaining subunit, configured to obtain training features by using a feature extraction network in the image recognition network based on the original training image and the rearranged training image;

a second obtaining subunit, configured to obtain, based on the training feature, prediction data by using an identification network in the image identification network, where the prediction data includes a predicted coarse-granularity image category, a predicted fine-granularity image category, and a predicted image preprocessing category;

and the training obtaining subunit is used for training the network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.

In a fourth aspect, an embodiment of the present application provides an apparatus for image recognition search, using the image recognition model according to any one of the first aspect, where the apparatus includes:

the acquisition unit is used for acquiring the image to be identified;

The obtaining unit is used for obtaining target characteristics and target categories of the image to be identified by utilizing the image identification model;

and the searching unit is used for searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.

In a fifth aspect, embodiments of the present application provide an apparatus for training an image recognition network, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

In a sixth aspect, embodiments of the present application provide an apparatus for image recognition searching, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

acquiring an image to be identified;

In a seventh aspect, embodiments of the present application provide a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method of training an image recognition network according to any one of the first aspects above; alternatively, the apparatus is caused to perform the method of image recognition search described in the second aspect.

Compared with the prior art, the application has at least the following advantages:

by adopting the technical scheme of the embodiment of the application, firstly, an original training image is divided into a plurality of training image blocks and marked with marks; then, according to the image salient region detection result of the original training image, a plurality of training image blocks are rearranged in a disordered manner, and a rearranged training image of the original training image is obtained; finally, taking the original training image, the rearranged training image and the corresponding labeling data as training data, and training an image recognition network to obtain an image recognition model; the labeling data comprises coarse-granularity image category labels, fine-granularity image category labels, image preprocessing category labels and training image block labeling sequences, wherein the image preprocessing category labels comprise original labels or rearranged labels. Therefore, the image salient region detection result of the original training image is utilized to pertinently disorder and rearrange a plurality of training image blocks after the original training image is segmented to obtain rearranged training images, the original training images are combined with the rearranged training images to serve as the input of the image recognition network, the image recognition network focuses on the local features of the image, and the image recognition model with enhanced local feature perception capability is obtained through training.

In addition, by adopting the technical scheme of the embodiment of the application, firstly, an image to be identified is obtained; then, inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and finally, searching the similar images in the image database by utilizing the target characteristics and the target categories of the images to be identified. Therefore, the target features of the image to be identified, which are obtained by the image identification model, are focused not only on the global features of the image, but also on the local features of the image, so that the important features of the image to be identified are avoided being omitted; and searching similar pictures of the image to be identified by combining the target characteristics with the target category, and effectively improving the accuracy of image identification search even aiming at the image to be identified with complex scene, thereby improving the user experience of image identification search.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a schematic diagram of a system frame related to an application scenario in an embodiment of the present application;

fig. 2 is a flowchart of a method for training an image recognition network according to an embodiment of the present application;

FIG. 3 is an illustration of an original training image and an attention heat map of the original training image provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of an original training image and a rearranged training image of the original training image according to an embodiment of the present disclosure;

fig. 5 is a flowchart of a method for image recognition search according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus for training an image recognition network according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an apparatus for image recognition search according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an apparatus for training an image recognition network or image recognition search according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Searching with the item image is actually performing an image recognition search on the item image. In the prior art, the image recognition searching method generally utilizes a deep learning model to extract global features of an object image for recognition searching. However, the inventor finds that, aiming at an object image with a complex scene, only the global feature of the object image can be extracted by using the deep learning model, and only the global feature of the object image is concerned in the subsequent image recognition searching process, so that important features of the object image are easily omitted, the accuracy of image recognition searching is low, and the user experience of image recognition searching is influenced.

To solve this problem, in the embodiment of the present application, the original training image is divided into a plurality of training image blocks and labeled with a reference numeral; according to the detection result of the image salient region of the original training image, a plurality of training image blocks are shuffled and rearranged to obtain a rearranged training image of the original training image; taking the original training image, the rearranged training image and the corresponding labeling data as training data, and obtaining an image recognition model by a training image recognition network; the labeling data comprises coarse-granularity image category labels, fine-granularity image category labels, image preprocessing category labels and training image block labeling sequences, wherein the image preprocessing category labels comprise original labels or rearranged labels. Therefore, by utilizing the image salient region detection result of the original training image, a plurality of training image blocks segmented by the original training image are subjected to disorder rearrangement in a targeted manner to obtain a rearranged training image, the original training image is combined with the rearranged training image to serve as the input of the image recognition network, so that the image recognition network focuses on the local features of the image, and the image recognition model with enhanced local feature perception capability of the image is obtained through training.

In addition, in the embodiment of the application, an image to be recognized is acquired; inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and searching the image database for similar images by utilizing the target characteristics and the target categories of the images to be identified. Therefore, the target features of the image to be identified, which are obtained by the image identification model, are focused not only on the global features of the image, but also on the local features of the image, so that the important features of the image to be identified are prevented from being omitted; and searching similar pictures of the image to be identified by combining the target characteristics with the target category, and effectively improving the accuracy of image identification search even aiming at the image to be identified with complex scene, thereby improving the user experience of image identification search.

For example, one of the scenes of the embodiment of the present application may be applied to the scene shown in fig. 1, which includes the terminal device 101, the processor 102, and the image database 103; the terminal device 101 may be a personal computer, or may be another mobile terminal, such as a mobile phone or a tablet pc. The terminal device 101 collects a large number of original training images to form a training set, and the processor 102 acquires the original training images from the terminal device 101, and obtains an image recognition model by adopting the method for training the image recognition network in the embodiment of the application. After the terminal device 101 sends the image to be identified to the processor 102, the processor 102 searches the image database 103 for similar images of the image to be identified by using the method of image identification search in the embodiment of the present application.

It will be appreciated that, in the above application scenario, although the actions of the embodiments of the present application are described as being performed by the processor 102, the present application is not limited in terms of the execution subject, as long as the actions disclosed in the embodiments of the present application are performed.

It is understood that the above scenario is only one example of a scenario provided in the embodiments of the present application, and the embodiments of the present application are not limited to this scenario.

Specific implementation manners of the training image recognition network, the image recognition searching method and the related devices in the embodiments of the present application are described in detail below by way of embodiments with reference to the accompanying drawings.

Exemplary method

Referring to fig. 2, a flowchart of a method for training an image recognition network in an embodiment of the present application is shown. In this embodiment, the method may include, for example, the steps of:

step 201: the original training image is segmented, a plurality of training image blocks are obtained, and labels are marked.

It should be noted that, in the prior art, the deep learning model is only obtained by learning the original training image, and the main focus is on the global feature of the image; aiming at images with complex scenes, the deep learning model can extract global features of the images, and only the global features of the images are focused in the subsequent image recognition and search process, so that important features of the images are easily omitted. In the embodiment of the application, the original training image is considered to be divided into a plurality of training image blocks and recombined to obtain a new training image, and the new training image is required to be learned to pay attention to the local characteristics of the image on the basis of learning the global characteristics of the image paid attention to by the original training image. Therefore, the original training image is first divided to obtain a plurality of training image blocks, and each training image block is marked with a label, so that the sequence of the labels of the training image blocks corresponding to the new training image is obtained by subsequently defining the recombination of the plurality of training image blocks. The number of the plurality of training image blocks may be preset based on the segmentation requirement in a specific scene, for example, the number of the plurality of training image blocks may be 9, 16, 25 or 36, etc.

As an example, in the embodiment of the present application, the number of training image blocks preset based on the segmentation requirement in a specific scene is 9, the original training image is uniformly segmented to obtain a total of 9 training image blocks, and the 9 training image blocks are labeled with the reference numerals 1, 2, 3, 4, 5, 6, 7, 8, and 9 in sequence.

Step 202: and carrying out disorder rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image to obtain a rearranged training image of the original training image.

It should be noted that, after the plurality of training image blocks are obtained in step 201, the plurality of training image blocks are recombined in a disordered order to obtain a new training image, that is, a rearranged training image of the original training image, so that the rearranged training image more obviously represents the significant region of the image and more clearly defines the significant region of the image relative to the original training image, so that the features of the significant region of the image can be focused when the new training image can be learned later. In this embodiment of the present application, the new training image may be obtained by recombining a plurality of training image blocks, which may be obtained by performing a scrambling rearrangement on a plurality of training image blocks by using an image salient region detection result of an original training image, and recording the obtained new training image as a rearranged training image of the original training image.

As an example, the original training image is divided into 9 training image blocks corresponding to the above example, the numbers of the 9 training image blocks are 1, 2, 3, 4, 5, 6, 7, 8 and 9, the training image blocks are rearranged in a disordered manner by using the detection result of the image salient region of the original training image, and the number of the training image blocks corresponding to the rearranged training image of the original training image is 1, 3, 5, 7, 2, 4, 6, 8 and 9.

When step 202 is implemented, firstly, an image salient region detection result of an original training image needs to be obtained, which is usually obtained by performing image salient region detection on the original training image; and then, carrying out disorder rearrangement on the plurality of training image blocks according to the detection result of the image salient region to obtain a rearranged training image of the original training image. The principle of disturbing and rearranging the training image blocks according to the image salient region detection result can be as follows: in the image salient region detection result of the original training image, the disturbing degree of the training image block corresponding to the position with higher salient degree is lower, and the disturbing degree of the training image block corresponding to the position with lower salient degree is higher.

Thus, in an alternative implementation manner of the embodiment of the present application, the step 202 performs a shuffle rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image, to obtain a rearranged training image of the original training image, which may, but is not limited to, include the following steps:

Step A: and detecting the image salient region of the original training image to obtain the detection result of the image salient region.

And (B) step (B): and obtaining a rearranged training image of the original training image based on the image salient region detection result by disturbing and rearranging a plurality of training image blocks.

It should be further noted that, because the attention heat map model is a visual convolutional neural network tool, inputting an image into the attention heat map model can output an attention heat map which obviously and clearly shows the significant areas of the image in the image, and the key areas in the image can be clearly defined by observing the attention heat map; thus, for step a in the embodiments of the present application, the original training image may be input into the attention heat map model, thereby outputting the attention heat map of the original training image. That is, in an optional implementation manner of the embodiment of the present application, the step a performs image salient region detection on the original training image to obtain the image salient region detection result, which may specifically be, for example: and detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image. Of course, in the embodiment of the present application, the image salient region detection may adopt other image salient region detection modes except for the attention heat map model, and correspondingly, the obtained image salient region detection result may also be other image salient region detection results except for the attention heat map.

As an example, an original training image and an attention heat graphical illustration of the original training image as shown in fig. 3. Wherein, the left graph is the original training image, and the right graph is the attention heat graph of the left graph. The right image is obtained by outputting the left image input attention heat image model, the right image can obviously and clearly show the image salient region in the left image, and the key region in the left image can be clearly seen by observing the right image.

Correspondingly, when the detection result of the image salient region is specifically an attention heat map, generally, the degree of disorder of the training image blocks corresponding to the position with higher heat is lower, and the degree of disorder of the training image blocks corresponding to the position with lower heat is higher, a rearranged training image is obtained according to the plurality of training image blocks rearranged according to the heat disorder of the attention heat map. Therefore, in an optional implementation manner of the embodiment of the present application, the step B may be to shuffle and reorder the plurality of training image blocks based on the image salient region detection result to obtain a reordered training image of the original training image, which may be, for example: and performing disorder rearrangement on the training image blocks based on the heat degree of the attention heat map to obtain a rearranged training image of the original training image.

As an example, on the basis of fig. 3, an original training image and a rearranged training image schematic diagram of the original training image are shown in fig. 4. Wherein the left image is an original training image, and the right image is a rearranged training image of the left image. The right image is obtained by performing scrambling rearrangement on the plurality of training image blocks according to the right image in fig. 3 after the left image is divided into the plurality of training image blocks.

Step 203: based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.

It should be noted that, after the rearranged training images of the original training images are obtained in the steps 201-202, not only the original training images are used as the input of the image recognition network, but also the rearranged training images are used as the input of the image recognition network at the same time, so that the image recognition network can learn the rearranged training images on the basis of learning the original training images and focusing on the global features of the original training images, and focusing on the local features of the rearranged training images, so that the perceptibility of the obtained image recognition model to the local features of the images is enhanced. For an original training image or a rearranged training image, the corresponding annotation data comprises a coarse-granularity image type label, a fine-granularity image type label, an image preprocessing type label and a training image block label sequence; the coarse-granularity image type label is obtained by classifying images with coarse granularity, the fine-granularity image type label is obtained by classifying images with fine granularity, namely, the fine-granularity image type label is smaller and finer than the granularity of the image type represented by the coarse-granularity image type label, and the image preprocessing type label comprises an original label or a rearranged label.

In an embodiment of the present application, the image recognition network includes a feature extraction network and a recognition network. In step 203, firstly, the original training image and the rearranged training image are input into the feature extraction network to output the training features; then, the training characteristics are input into a recognition network to output a predicted coarse-granularity image category, a predicted fine-granularity image category and a predicted image preprocessing category as prediction data; and finally, carrying out inverse gradient training on network parameters of the image recognition network by utilizing the network loss function through the prediction data and the labeling data until training is completed, and taking the trained image recognition network as an image recognition model. That is, in an alternative implementation manner of the embodiment of the present application, the step 203 may be to train the image recognition network to obtain the image recognition model based on the original training image, the rearranged training image and the corresponding labeling data, and may include, for example, the following steps C-E:

step C: based on the original training image and the rearranged training image, training features are obtained by utilizing a feature extraction network in the image recognition network.

Step D: based on the training features, obtaining prediction data by utilizing a recognition network in the image recognition network, wherein the prediction data comprises a prediction coarse-granularity image category, a prediction fine-granularity image category and a prediction image preprocessing category.

Step E: and training network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.

In the embodiment of the present application, because coarse-granularity image classification needs to be performed on the original training image and the rearranged training image, fine-granularity image classification needs to be performed on the original training image and the rearranged training image, whether the original training image and the rearranged training image are the original category or the rearranged category is determined, and the rearranged training image is reordered to be restored to the original training image; therefore, 4 penalty functions, i.e., a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image preprocessing class classification penalty function, and a rearrangement training image, are combined to form a network penalty function of the image recognition network. That is, in an alternative implementation of the embodiment of the present application, the network loss function includes a coarse-granularity image class classification loss function, a fine-granularity image class classification loss function, an image preprocessing class classification loss function, and a reorder training image restoration to an original training image loss function.

With the various implementations provided by this example, first, an original training image is divided into a plurality of training image blocks and labeled with labels; then, according to the image salient region detection result of the original training image, a plurality of training image blocks are rearranged in a disordered manner, and a rearranged training image of the original training image is obtained; finally, taking the original training image, the rearranged training image and the corresponding labeling data as training data, and training an image recognition network to obtain an image recognition model; the labeling data comprises coarse-granularity image category labels, fine-granularity image category labels, image preprocessing category labels and training image block labeling sequences, wherein the image preprocessing category labels comprise original labels or rearranged labels. Therefore, the image salient region detection result of the original training image is utilized to pertinently disorder and rearrange a plurality of training image blocks after the original training image is segmented to obtain rearranged training images, the original training images are combined with the rearranged training images to serve as the input of the image recognition network, the image recognition network focuses on the local features of the image, and the image recognition model with enhanced local feature perception capability is obtained through training.

It should be noted that, on the basis of the above embodiment, for the image to be identified with a relatively complex scene, in order to avoid missing the important features of the image to be identified easily, after the image to be identified is obtained, the image to be identified may be input into an image identification model, and even if the scene of the image to be identified is relatively complex, the image identification model may also focus on the global features of the image to be identified and focus on the local features of the image to be identified, so as to obtain the target features and the target category of the image to be identified, and in order to effectively improve the accuracy of image identification search, the similar images of the image to be identified may be searched in the image database through the target features and the target category.

Referring to fig. 5, a flowchart of a method for image recognition search in an embodiment of the present application is shown. In an embodiment of the present application, with the image recognition model described in the foregoing embodiment, the method may include, for example, the following steps:

step 501: and acquiring an image to be identified.

Step 502: and obtaining target characteristics and target categories of the image to be identified by using the image identification model.

In the embodiment of the application, firstly, inputting an image to be identified into a feature extraction network in an image identification model to obtain target features of the image to be identified; then, inputting the target features into a recognition network in the image recognition model to obtain the target category of the image to be recognized.

Step 503: and searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.

In the embodiment of the application, for example, an image set corresponding to a target category may be determined in an image database, a similarity between a target feature and a feature of each image in the image set is calculated, and a similar image of an image to be identified is determined based on the similarity.

Through the various implementations provided in this embodiment, first, an image to be identified is acquired; then, inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and finally, searching the similar images in the image database by utilizing the target characteristics and the target categories of the images to be identified. Therefore, the target features of the image to be identified, which are obtained by the image identification model, are focused not only on the global features of the image, but also on the local features of the image, so that the important features of the image to be identified are avoided being omitted; and searching similar pictures of the image to be identified by combining the target characteristics with the target category, and effectively improving the accuracy of image identification search even aiming at the image to be identified with complex scene, thereby improving the user experience of image identification search.

Exemplary apparatus

Referring to fig. 6, a schematic structural diagram of an apparatus for training an image recognition network in an embodiment of the present application is shown. In the embodiment of the present application, the apparatus may specifically include:

a segmentation obtaining unit 601, configured to segment an original training image, obtain a plurality of training image blocks, and mark labels;

a rearrangement obtaining unit 602, configured to shuffle and rearrange the plurality of training image blocks based on an image salient region detection result of the original training image, to obtain a rearranged training image of the original training image;

a training obtaining unit 603, configured to obtain an image recognition model by using a training image recognition network based on the original training image, the rearranged training image and the corresponding labeling data; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.

In an alternative implementation manner of the embodiment of the present application, the rearrangement obtaining unit 602 includes:

In an alternative implementation manner of the embodiment of the present application, the rearrangement obtaining unit 602 is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the training obtaining unit 603 includes:

In an alternative implementation manner of the embodiment of the present application, the network loss function includes a coarse-granularity image class classification loss function, a fine-granularity image class classification loss function, an image preprocessing class classification loss function, and a rearrangement training image restoration to an original training image loss function.

Referring to fig. 7, a schematic structural diagram of an apparatus for image recognition search in an embodiment of the present application is shown. In an embodiment of the present application, using the image recognition model described in the foregoing embodiment, the apparatus may specifically include:

an acquisition unit 701 for acquiring an image to be recognized;

an obtaining unit 702, configured to obtain a target feature and a target class of the image to be identified using the image identification model;

a searching unit 703, configured to search the image database for similar images of the image to be identified based on the target feature and the target category.

Fig. 8 is a block diagram illustrating an apparatus 800 for training an image recognition network or image recognition search, according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 8, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 816 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform a method of training an image recognition network, the method comprising:

based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the labeling data comprises a coarse-granularity image type label, a fine-granularity image type label, an image preprocessing type label and a training image block labeling sequence, wherein the image preprocessing type label comprises an original label or a rearranged label;

alternatively, a method of enabling a mobile terminal to perform a training image recognition network, the method comprising:

acquiring an image to be identified;

Fig. 9 is a schematic structural diagram of a server in an embodiment of the present application. The server 900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPU) 922 (e.g., one or more processors) and memory 932, one or more storage media 930 (e.g., one or more mass storage devices) storing applications 942 or data 944. Wherein the memory 932 and the storage medium 930 may be transitory or persistent. The program stored in the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 922 may be arranged to communicate with a storage medium 930 to execute a series of instruction operations in the storage medium 930 on the server 900.

The server 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input/output interfaces 958, one or more keyboards 956, and/or one or more operating systems 941, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the present application in any way. While the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application. Any person skilled in the art may make many possible variations and modifications to the technical solution of the present application, or modify equivalent embodiments, using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present application. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present application, which do not depart from the content of the technical solution of the present application, still fall within the scope of protection of the technical solution of the present application.

Claims

1. A method of training an image recognition network, comprising:

the plurality of training image blocks are rearranged in a disturbing mode based on the heat degree of the attention heat map, and a rearranged training image of the original training image is obtained, wherein the lower the disturbing degree of the training image blocks corresponding to the position with higher heat degree in the attention heat map is, the higher the disturbing degree of the training image blocks corresponding to the position with lower heat degree is;

2. The method of claim 1, wherein training an image recognition network to obtain an image recognition model based on the original training image, the rearranged training image, and the corresponding annotation data, comprises:

3. The method of claim 2, wherein the network penalty function comprises a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image pre-processing class classification penalty function, and a reorder training image restoration to an original training image penalty function.

4. A method of image recognition searching, comprising:

acquiring an image to be identified;

obtaining target features and target categories of the image to be identified using an image identification model, the image identification model being trained using the method of training an image identification network as claimed in any one of claims 1 to 3;

5. An apparatus for training an image recognition network, comprising:

A rearrangement obtaining subunit, configured to perform a scrambling rearrangement on the plurality of training image blocks based on the heat level of the attention heat map, to obtain a rearranged training image of the original training image, where the lower the scrambling level of the training image block corresponding to the higher heat level in the attention heat map, the higher the scrambling level of the training image block corresponding to the lower heat level;

6. The apparatus of claim 5, wherein the training obtaining unit comprises:

7. The apparatus of claim 6, wherein the network penalty function comprises a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image pre-processing class classification penalty function, and a reorder training image restoration to an original training image penalty function.

8. An apparatus for image recognition search, comprising:

the acquisition unit is used for acquiring the image to be identified;

an obtaining unit configured to obtain target features and target categories of the image to be recognized using an image recognition model that is trained using the method of training an image recognition network according to any one of claims 1 to 3;

9. An apparatus for training an image recognition network, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

10. The apparatus of claim 9, wherein the training image recognition network obtains an image recognition model based on the original training image, the rearranged training image, and the corresponding annotation data, comprising:

11. The apparatus of claim 10, wherein the network penalty function comprises a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image pre-processing class classification penalty function, and a reorder training image restoration to an original training image penalty function.

12. An apparatus for image recognition searching, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

Acquiring an image to be identified;

13. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method of training an image recognition network of any of claims 1 to 3; alternatively, the apparatus is caused to perform the method of image recognition search of claim 4.