CN111553372B - Training image recognition network, image recognition searching method and related device - Google Patents

Training image recognition network, image recognition searching method and related device Download PDF

Info

Publication number
CN111553372B
CN111553372B CN202010332194.0A CN202010332194A CN111553372B CN 111553372 B CN111553372 B CN 111553372B CN 202010332194 A CN202010332194 A CN 202010332194A CN 111553372 B CN111553372 B CN 111553372B
Authority
CN
China
Prior art keywords
image
training
training image
network
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010332194.0A
Other languages
Chinese (zh)
Other versions
CN111553372A (en
Inventor
章书豪
夏雄尉
谢泽华
周泽南
苏雪峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202010332194.0A priority Critical patent/CN111553372B/en
Publication of CN111553372A publication Critical patent/CN111553372A/en
Application granted granted Critical
Publication of CN111553372B publication Critical patent/CN111553372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a training image recognition network, an image recognition searching method and a related device, wherein the training image recognition network method comprises the following steps: dividing an original training image into a plurality of training image blocks and marking labels; according to the detection result of the image salient region of the original training image, a plurality of training image blocks are shuffled and rearranged to obtain a rearranged training image of the original training image; and taking the original training image, the rearranged training image and the corresponding annotation data comprising the coarse-granularity image class label, the fine-granularity image class label, the image preprocessing class label and the training image block label sequence as training data, and training the image recognition network to obtain an image recognition model. The image recognition searching method comprises the following steps: acquiring an image to be identified; inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and searching the image database for similar images by utilizing the target characteristics and the target categories of the images to be identified.

Description

Training image recognition network, image recognition searching method and related device
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a training image recognition network, a method for image recognition search, and a related device.
Background
Along with the rapid development of science and technology, in daily life, a user can shoot the interested articles at hand, and can quickly acquire links of the same type of articles or similar articles by searching the articles by using the article images, so that the searching requirement of the interested articles of the user is met; wherein searching for the item image is actually performing an image recognition search on the item image.
At present, the image recognition searching method generally utilizes a deep learning model to extract global features of an object image for recognition searching. However, for an article image with a complex scene, for example, an article area in the article image is relatively small, only the global feature of the article image can be extracted by using the deep learning model, only the global feature of the article image is focused in the subsequent image recognition searching process, the important feature of the article image is easy to miss, the accuracy of image recognition searching is greatly reduced, and therefore, the user experience of image recognition searching is poor.
Disclosure of Invention
The technical problem to be solved by the application is to provide a training image recognition network, an image recognition searching method and a related device, so that the image recognition network focuses on local features of an image to obtain an image recognition model with enhanced local feature perception capability of the image; even for the images to be identified with complex scenes, the accuracy of image identification searching can be effectively improved, and therefore user experience of image identification searching is improved.
In a first aspect, an embodiment of the present application provides a method for training an image recognition network, the method including:
dividing an original training image to obtain a plurality of training image blocks and marking labels;
performing scrambling rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image to obtain a rearranged training image of the original training image;
based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
Optionally, the performing, based on the detection result of the image salient region of the original training image, scrambling and rearranging the plurality of training image blocks to obtain a rearranged training image of the original training image includes:
detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;
And performing disorder rearrangement on the training image blocks based on the heat degree of the attention heat map to obtain a rearranged training image of the original training image.
Optionally, the scrambling rearrangement of the plurality of training image blocks based on the image salient region detection result of the original training image includes:
based on the image salient region detection result of the original training image, the disturbing degree of the training image block corresponding to the position with higher salient degree in the image salient region detection result is lower, and the disturbing degree of the training image block corresponding to the position with lower salient degree is higher.
Optionally, the training image recognition network obtains an image recognition model based on the original training image, the rearranged training image and the corresponding labeling data, including:
based on the original training image and the rearranged training image, training features are obtained by utilizing a feature extraction network in the image recognition network;
based on the training characteristics, obtaining prediction data by utilizing an identification network in the image identification network, wherein the prediction data comprises a prediction coarse-granularity image category, a prediction fine-granularity image category and a prediction image preprocessing category;
And training network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.
Optionally, the network penalty function includes a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image preprocessing class classification penalty function, and a reorder training image restoration to an original training image penalty function.
In a second aspect, an embodiment of the present application provides a method for image recognition searching, using the image recognition model according to any one of the first aspect, where the method includes:
acquiring an image to be identified;
obtaining target characteristics and target categories of the image to be identified by using the image identification model;
and searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
In a third aspect, an embodiment of the present application provides an apparatus for training an image recognition network, the apparatus including:
the segmentation obtaining unit is used for segmenting the original training image to obtain a plurality of training image blocks and marking labels;
a rearrangement obtaining unit, configured to shuffle and rearrange the plurality of training image blocks based on an image salient region detection result of the original training image, to obtain a rearranged training image of the original training image;
The training obtaining unit is used for obtaining an image recognition model by a training image recognition network based on the original training image, the rearranged training image and the corresponding annotation data; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
Optionally, the rearrangement obtaining unit includes:
the detection obtaining subunit is used for detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;
and the rearrangement obtaining subunit is used for carrying out disorder rearrangement on the plurality of training image blocks based on the heat degree of the attention heat map to obtain a rearranged training image of the original training image.
Optionally, the rearrangement obtaining unit is specifically configured to:
based on the image salient region detection result of the original training image, the disturbing degree of the training image block corresponding to the position with higher salient degree in the image salient region detection result is lower, and the disturbing degree of the training image block corresponding to the position with lower salient degree is higher.
Optionally, the training obtaining unit includes:
a first obtaining subunit, configured to obtain training features by using a feature extraction network in the image recognition network based on the original training image and the rearranged training image;
a second obtaining subunit, configured to obtain, based on the training feature, prediction data by using an identification network in the image identification network, where the prediction data includes a predicted coarse-granularity image category, a predicted fine-granularity image category, and a predicted image preprocessing category;
and the training obtaining subunit is used for training the network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.
Optionally, the network penalty function includes a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image preprocessing class classification penalty function, and a reorder training image restoration to an original training image penalty function.
In a fourth aspect, an embodiment of the present application provides an apparatus for image recognition search, using the image recognition model according to any one of the first aspect, where the apparatus includes:
the acquisition unit is used for acquiring the image to be identified;
The obtaining unit is used for obtaining target characteristics and target categories of the image to be identified by utilizing the image identification model;
and the searching unit is used for searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
In a fifth aspect, embodiments of the present application provide an apparatus for training an image recognition network, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
dividing an original training image to obtain a plurality of training image blocks and marking labels;
performing scrambling rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image to obtain a rearranged training image of the original training image;
based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
In a sixth aspect, embodiments of the present application provide an apparatus for image recognition searching, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
acquiring an image to be identified;
obtaining target characteristics and target categories of the image to be identified by using the image identification model;
and searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
In a seventh aspect, embodiments of the present application provide a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method of training an image recognition network according to any one of the first aspects above; alternatively, the apparatus is caused to perform the method of image recognition search described in the second aspect.
Compared with the prior art, the application has at least the following advantages:
by adopting the technical scheme of the embodiment of the application, firstly, an original training image is divided into a plurality of training image blocks and marked with marks; then, according to the image salient region detection result of the original training image, a plurality of training image blocks are rearranged in a disordered manner, and a rearranged training image of the original training image is obtained; finally, taking the original training image, the rearranged training image and the corresponding labeling data as training data, and training an image recognition network to obtain an image recognition model; the labeling data comprises coarse-granularity image category labels, fine-granularity image category labels, image preprocessing category labels and training image block labeling sequences, wherein the image preprocessing category labels comprise original labels or rearranged labels. Therefore, the image salient region detection result of the original training image is utilized to pertinently disorder and rearrange a plurality of training image blocks after the original training image is segmented to obtain rearranged training images, the original training images are combined with the rearranged training images to serve as the input of the image recognition network, the image recognition network focuses on the local features of the image, and the image recognition model with enhanced local feature perception capability is obtained through training.
In addition, by adopting the technical scheme of the embodiment of the application, firstly, an image to be identified is obtained; then, inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and finally, searching the similar images in the image database by utilizing the target characteristics and the target categories of the images to be identified. Therefore, the target features of the image to be identified, which are obtained by the image identification model, are focused not only on the global features of the image, but also on the local features of the image, so that the important features of the image to be identified are avoided being omitted; and searching similar pictures of the image to be identified by combining the target characteristics with the target category, and effectively improving the accuracy of image identification search even aiming at the image to be identified with complex scene, thereby improving the user experience of image identification search.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a schematic diagram of a system frame related to an application scenario in an embodiment of the present application;
fig. 2 is a flowchart of a method for training an image recognition network according to an embodiment of the present application;
FIG. 3 is an illustration of an original training image and an attention heat map of the original training image provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of an original training image and a rearranged training image of the original training image according to an embodiment of the present disclosure;
fig. 5 is a flowchart of a method for image recognition search according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an apparatus for training an image recognition network according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an apparatus for image recognition search according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an apparatus for training an image recognition network or image recognition search according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Searching with the item image is actually performing an image recognition search on the item image. In the prior art, the image recognition searching method generally utilizes a deep learning model to extract global features of an object image for recognition searching. However, the inventor finds that, aiming at an object image with a complex scene, only the global feature of the object image can be extracted by using the deep learning model, and only the global feature of the object image is concerned in the subsequent image recognition searching process, so that important features of the object image are easily omitted, the accuracy of image recognition searching is low, and the user experience of image recognition searching is influenced.
To solve this problem, in the embodiment of the present application, the original training image is divided into a plurality of training image blocks and labeled with a reference numeral; according to the detection result of the image salient region of the original training image, a plurality of training image blocks are shuffled and rearranged to obtain a rearranged training image of the original training image; taking the original training image, the rearranged training image and the corresponding labeling data as training data, and obtaining an image recognition model by a training image recognition network; the labeling data comprises coarse-granularity image category labels, fine-granularity image category labels, image preprocessing category labels and training image block labeling sequences, wherein the image preprocessing category labels comprise original labels or rearranged labels. Therefore, by utilizing the image salient region detection result of the original training image, a plurality of training image blocks segmented by the original training image are subjected to disorder rearrangement in a targeted manner to obtain a rearranged training image, the original training image is combined with the rearranged training image to serve as the input of the image recognition network, so that the image recognition network focuses on the local features of the image, and the image recognition model with enhanced local feature perception capability of the image is obtained through training.
In addition, in the embodiment of the application, an image to be recognized is acquired; inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and searching the image database for similar images by utilizing the target characteristics and the target categories of the images to be identified. Therefore, the target features of the image to be identified, which are obtained by the image identification model, are focused not only on the global features of the image, but also on the local features of the image, so that the important features of the image to be identified are prevented from being omitted; and searching similar pictures of the image to be identified by combining the target characteristics with the target category, and effectively improving the accuracy of image identification search even aiming at the image to be identified with complex scene, thereby improving the user experience of image identification search.
For example, one of the scenes of the embodiment of the present application may be applied to the scene shown in fig. 1, which includes the terminal device 101, the processor 102, and the image database 103; the terminal device 101 may be a personal computer, or may be another mobile terminal, such as a mobile phone or a tablet pc. The terminal device 101 collects a large number of original training images to form a training set, and the processor 102 acquires the original training images from the terminal device 101, and obtains an image recognition model by adopting the method for training the image recognition network in the embodiment of the application. After the terminal device 101 sends the image to be identified to the processor 102, the processor 102 searches the image database 103 for similar images of the image to be identified by using the method of image identification search in the embodiment of the present application.
It will be appreciated that, in the above application scenario, although the actions of the embodiments of the present application are described as being performed by the processor 102, the present application is not limited in terms of the execution subject, as long as the actions disclosed in the embodiments of the present application are performed.
It is understood that the above scenario is only one example of a scenario provided in the embodiments of the present application, and the embodiments of the present application are not limited to this scenario.
Specific implementation manners of the training image recognition network, the image recognition searching method and the related devices in the embodiments of the present application are described in detail below by way of embodiments with reference to the accompanying drawings.
Exemplary method
Referring to fig. 2, a flowchart of a method for training an image recognition network in an embodiment of the present application is shown. In this embodiment, the method may include, for example, the steps of:
step 201: the original training image is segmented, a plurality of training image blocks are obtained, and labels are marked.
It should be noted that, in the prior art, the deep learning model is only obtained by learning the original training image, and the main focus is on the global feature of the image; aiming at images with complex scenes, the deep learning model can extract global features of the images, and only the global features of the images are focused in the subsequent image recognition and search process, so that important features of the images are easily omitted. In the embodiment of the application, the original training image is considered to be divided into a plurality of training image blocks and recombined to obtain a new training image, and the new training image is required to be learned to pay attention to the local characteristics of the image on the basis of learning the global characteristics of the image paid attention to by the original training image. Therefore, the original training image is first divided to obtain a plurality of training image blocks, and each training image block is marked with a label, so that the sequence of the labels of the training image blocks corresponding to the new training image is obtained by subsequently defining the recombination of the plurality of training image blocks. The number of the plurality of training image blocks may be preset based on the segmentation requirement in a specific scene, for example, the number of the plurality of training image blocks may be 9, 16, 25 or 36, etc.
As an example, in the embodiment of the present application, the number of training image blocks preset based on the segmentation requirement in a specific scene is 9, the original training image is uniformly segmented to obtain a total of 9 training image blocks, and the 9 training image blocks are labeled with the reference numerals 1, 2, 3, 4, 5, 6, 7, 8, and 9 in sequence.
Step 202: and carrying out disorder rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image to obtain a rearranged training image of the original training image.
It should be noted that, after the plurality of training image blocks are obtained in step 201, the plurality of training image blocks are recombined in a disordered order to obtain a new training image, that is, a rearranged training image of the original training image, so that the rearranged training image more obviously represents the significant region of the image and more clearly defines the significant region of the image relative to the original training image, so that the features of the significant region of the image can be focused when the new training image can be learned later. In this embodiment of the present application, the new training image may be obtained by recombining a plurality of training image blocks, which may be obtained by performing a scrambling rearrangement on a plurality of training image blocks by using an image salient region detection result of an original training image, and recording the obtained new training image as a rearranged training image of the original training image.
As an example, the original training image is divided into 9 training image blocks corresponding to the above example, the numbers of the 9 training image blocks are 1, 2, 3, 4, 5, 6, 7, 8 and 9, the training image blocks are rearranged in a disordered manner by using the detection result of the image salient region of the original training image, and the number of the training image blocks corresponding to the rearranged training image of the original training image is 1, 3, 5, 7, 2, 4, 6, 8 and 9.
When step 202 is implemented, firstly, an image salient region detection result of an original training image needs to be obtained, which is usually obtained by performing image salient region detection on the original training image; and then, carrying out disorder rearrangement on the plurality of training image blocks according to the detection result of the image salient region to obtain a rearranged training image of the original training image. The principle of disturbing and rearranging the training image blocks according to the image salient region detection result can be as follows: in the image salient region detection result of the original training image, the disturbing degree of the training image block corresponding to the position with higher salient degree is lower, and the disturbing degree of the training image block corresponding to the position with lower salient degree is higher.
Thus, in an alternative implementation manner of the embodiment of the present application, the step 202 performs a shuffle rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image, to obtain a rearranged training image of the original training image, which may, but is not limited to, include the following steps:
Step A: and detecting the image salient region of the original training image to obtain the detection result of the image salient region.
And (B) step (B): and obtaining a rearranged training image of the original training image based on the image salient region detection result by disturbing and rearranging a plurality of training image blocks.
It should be further noted that, because the attention heat map model is a visual convolutional neural network tool, inputting an image into the attention heat map model can output an attention heat map which obviously and clearly shows the significant areas of the image in the image, and the key areas in the image can be clearly defined by observing the attention heat map; thus, for step a in the embodiments of the present application, the original training image may be input into the attention heat map model, thereby outputting the attention heat map of the original training image. That is, in an optional implementation manner of the embodiment of the present application, the step a performs image salient region detection on the original training image to obtain the image salient region detection result, which may specifically be, for example: and detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image. Of course, in the embodiment of the present application, the image salient region detection may adopt other image salient region detection modes except for the attention heat map model, and correspondingly, the obtained image salient region detection result may also be other image salient region detection results except for the attention heat map.
As an example, an original training image and an attention heat graphical illustration of the original training image as shown in fig. 3. Wherein, the left graph is the original training image, and the right graph is the attention heat graph of the left graph. The right image is obtained by outputting the left image input attention heat image model, the right image can obviously and clearly show the image salient region in the left image, and the key region in the left image can be clearly seen by observing the right image.
Correspondingly, when the detection result of the image salient region is specifically an attention heat map, generally, the degree of disorder of the training image blocks corresponding to the position with higher heat is lower, and the degree of disorder of the training image blocks corresponding to the position with lower heat is higher, a rearranged training image is obtained according to the plurality of training image blocks rearranged according to the heat disorder of the attention heat map. Therefore, in an optional implementation manner of the embodiment of the present application, the step B may be to shuffle and reorder the plurality of training image blocks based on the image salient region detection result to obtain a reordered training image of the original training image, which may be, for example: and performing disorder rearrangement on the training image blocks based on the heat degree of the attention heat map to obtain a rearranged training image of the original training image.
As an example, on the basis of fig. 3, an original training image and a rearranged training image schematic diagram of the original training image are shown in fig. 4. Wherein the left image is an original training image, and the right image is a rearranged training image of the left image. The right image is obtained by performing scrambling rearrangement on the plurality of training image blocks according to the right image in fig. 3 after the left image is divided into the plurality of training image blocks.
Step 203: based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
It should be noted that, after the rearranged training images of the original training images are obtained in the steps 201-202, not only the original training images are used as the input of the image recognition network, but also the rearranged training images are used as the input of the image recognition network at the same time, so that the image recognition network can learn the rearranged training images on the basis of learning the original training images and focusing on the global features of the original training images, and focusing on the local features of the rearranged training images, so that the perceptibility of the obtained image recognition model to the local features of the images is enhanced. For an original training image or a rearranged training image, the corresponding annotation data comprises a coarse-granularity image type label, a fine-granularity image type label, an image preprocessing type label and a training image block label sequence; the coarse-granularity image type label is obtained by classifying images with coarse granularity, the fine-granularity image type label is obtained by classifying images with fine granularity, namely, the fine-granularity image type label is smaller and finer than the granularity of the image type represented by the coarse-granularity image type label, and the image preprocessing type label comprises an original label or a rearranged label.
In an embodiment of the present application, the image recognition network includes a feature extraction network and a recognition network. In step 203, firstly, the original training image and the rearranged training image are input into the feature extraction network to output the training features; then, the training characteristics are input into a recognition network to output a predicted coarse-granularity image category, a predicted fine-granularity image category and a predicted image preprocessing category as prediction data; and finally, carrying out inverse gradient training on network parameters of the image recognition network by utilizing the network loss function through the prediction data and the labeling data until training is completed, and taking the trained image recognition network as an image recognition model. That is, in an alternative implementation manner of the embodiment of the present application, the step 203 may be to train the image recognition network to obtain the image recognition model based on the original training image, the rearranged training image and the corresponding labeling data, and may include, for example, the following steps C-E:
step C: based on the original training image and the rearranged training image, training features are obtained by utilizing a feature extraction network in the image recognition network.
Step D: based on the training features, obtaining prediction data by utilizing a recognition network in the image recognition network, wherein the prediction data comprises a prediction coarse-granularity image category, a prediction fine-granularity image category and a prediction image preprocessing category.
Step E: and training network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.
In the embodiment of the present application, because coarse-granularity image classification needs to be performed on the original training image and the rearranged training image, fine-granularity image classification needs to be performed on the original training image and the rearranged training image, whether the original training image and the rearranged training image are the original category or the rearranged category is determined, and the rearranged training image is reordered to be restored to the original training image; therefore, 4 penalty functions, i.e., a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image preprocessing class classification penalty function, and a rearrangement training image, are combined to form a network penalty function of the image recognition network. That is, in an alternative implementation of the embodiment of the present application, the network loss function includes a coarse-granularity image class classification loss function, a fine-granularity image class classification loss function, an image preprocessing class classification loss function, and a reorder training image restoration to an original training image loss function.
With the various implementations provided by this example, first, an original training image is divided into a plurality of training image blocks and labeled with labels; then, according to the image salient region detection result of the original training image, a plurality of training image blocks are rearranged in a disordered manner, and a rearranged training image of the original training image is obtained; finally, taking the original training image, the rearranged training image and the corresponding labeling data as training data, and training an image recognition network to obtain an image recognition model; the labeling data comprises coarse-granularity image category labels, fine-granularity image category labels, image preprocessing category labels and training image block labeling sequences, wherein the image preprocessing category labels comprise original labels or rearranged labels. Therefore, the image salient region detection result of the original training image is utilized to pertinently disorder and rearrange a plurality of training image blocks after the original training image is segmented to obtain rearranged training images, the original training images are combined with the rearranged training images to serve as the input of the image recognition network, the image recognition network focuses on the local features of the image, and the image recognition model with enhanced local feature perception capability is obtained through training.
It should be noted that, on the basis of the above embodiment, for the image to be identified with a relatively complex scene, in order to avoid missing the important features of the image to be identified easily, after the image to be identified is obtained, the image to be identified may be input into an image identification model, and even if the scene of the image to be identified is relatively complex, the image identification model may also focus on the global features of the image to be identified and focus on the local features of the image to be identified, so as to obtain the target features and the target category of the image to be identified, and in order to effectively improve the accuracy of image identification search, the similar images of the image to be identified may be searched in the image database through the target features and the target category.
Referring to fig. 5, a flowchart of a method for image recognition search in an embodiment of the present application is shown. In an embodiment of the present application, with the image recognition model described in the foregoing embodiment, the method may include, for example, the following steps:
step 501: and acquiring an image to be identified.
Step 502: and obtaining target characteristics and target categories of the image to be identified by using the image identification model.
In the embodiment of the application, firstly, inputting an image to be identified into a feature extraction network in an image identification model to obtain target features of the image to be identified; then, inputting the target features into a recognition network in the image recognition model to obtain the target category of the image to be recognized.
Step 503: and searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
In the embodiment of the application, for example, an image set corresponding to a target category may be determined in an image database, a similarity between a target feature and a feature of each image in the image set is calculated, and a similar image of an image to be identified is determined based on the similarity.
Through the various implementations provided in this embodiment, first, an image to be identified is acquired; then, inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and finally, searching the similar images in the image database by utilizing the target characteristics and the target categories of the images to be identified. Therefore, the target features of the image to be identified, which are obtained by the image identification model, are focused not only on the global features of the image, but also on the local features of the image, so that the important features of the image to be identified are avoided being omitted; and searching similar pictures of the image to be identified by combining the target characteristics with the target category, and effectively improving the accuracy of image identification search even aiming at the image to be identified with complex scene, thereby improving the user experience of image identification search.
Exemplary apparatus
Referring to fig. 6, a schematic structural diagram of an apparatus for training an image recognition network in an embodiment of the present application is shown. In the embodiment of the present application, the apparatus may specifically include:
a segmentation obtaining unit 601, configured to segment an original training image, obtain a plurality of training image blocks, and mark labels;
a rearrangement obtaining unit 602, configured to shuffle and rearrange the plurality of training image blocks based on an image salient region detection result of the original training image, to obtain a rearranged training image of the original training image;
a training obtaining unit 603, configured to obtain an image recognition model by using a training image recognition network based on the original training image, the rearranged training image and the corresponding labeling data; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
In an alternative implementation manner of the embodiment of the present application, the rearrangement obtaining unit 602 includes:
the detection obtaining subunit is used for detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;
And the rearrangement obtaining subunit is used for carrying out disorder rearrangement on the plurality of training image blocks based on the heat degree of the attention heat map to obtain a rearranged training image of the original training image.
In an alternative implementation manner of the embodiment of the present application, the rearrangement obtaining unit 602 is specifically configured to:
based on the image salient region detection result of the original training image, the disturbing degree of the training image block corresponding to the position with higher salient degree in the image salient region detection result is lower, and the disturbing degree of the training image block corresponding to the position with lower salient degree is higher.
In an optional implementation manner of the embodiment of the present application, the training obtaining unit 603 includes:
a first obtaining subunit, configured to obtain training features by using a feature extraction network in the image recognition network based on the original training image and the rearranged training image;
a second obtaining subunit, configured to obtain, based on the training feature, prediction data by using an identification network in the image identification network, where the prediction data includes a predicted coarse-granularity image category, a predicted fine-granularity image category, and a predicted image preprocessing category;
and the training obtaining subunit is used for training the network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.
In an alternative implementation manner of the embodiment of the present application, the network loss function includes a coarse-granularity image class classification loss function, a fine-granularity image class classification loss function, an image preprocessing class classification loss function, and a rearrangement training image restoration to an original training image loss function.
With the various implementations provided by this example, first, an original training image is divided into a plurality of training image blocks and labeled with labels; then, according to the image salient region detection result of the original training image, a plurality of training image blocks are rearranged in a disordered manner, and a rearranged training image of the original training image is obtained; finally, taking the original training image, the rearranged training image and the corresponding labeling data as training data, and training an image recognition network to obtain an image recognition model; the labeling data comprises coarse-granularity image category labels, fine-granularity image category labels, image preprocessing category labels and training image block labeling sequences, wherein the image preprocessing category labels comprise original labels or rearranged labels. Therefore, the image salient region detection result of the original training image is utilized to pertinently disorder and rearrange a plurality of training image blocks after the original training image is segmented to obtain rearranged training images, the original training images are combined with the rearranged training images to serve as the input of the image recognition network, the image recognition network focuses on the local features of the image, and the image recognition model with enhanced local feature perception capability is obtained through training.
Referring to fig. 7, a schematic structural diagram of an apparatus for image recognition search in an embodiment of the present application is shown. In an embodiment of the present application, using the image recognition model described in the foregoing embodiment, the apparatus may specifically include:
an acquisition unit 701 for acquiring an image to be recognized;
an obtaining unit 702, configured to obtain a target feature and a target class of the image to be identified using the image identification model;
a searching unit 703, configured to search the image database for similar images of the image to be identified based on the target feature and the target category.
Through the various implementations provided in this embodiment, first, an image to be identified is acquired; then, inputting the image to be identified into the image identification model, and outputting the target characteristics and the target category of the image to be identified; and finally, searching the similar images in the image database by utilizing the target characteristics and the target categories of the images to be identified. Therefore, the target features of the image to be identified, which are obtained by the image identification model, are focused not only on the global features of the image, but also on the local features of the image, so that the important features of the image to be identified are avoided being omitted; and searching similar pictures of the image to be identified by combining the target characteristics with the target category, and effectively improving the accuracy of image identification search even aiming at the image to be identified with complex scene, thereby improving the user experience of image identification search.
Fig. 8 is a block diagram illustrating an apparatus 800 for training an image recognition network or image recognition search, according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 8, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 816 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform a method of training an image recognition network, the method comprising:
dividing an original training image to obtain a plurality of training image blocks and marking labels;
performing scrambling rearrangement on the plurality of training image blocks based on the image salient region detection result of the original training image to obtain a rearranged training image of the original training image;
based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the labeling data comprises a coarse-granularity image type label, a fine-granularity image type label, an image preprocessing type label and a training image block labeling sequence, wherein the image preprocessing type label comprises an original label or a rearranged label;
alternatively, a method of enabling a mobile terminal to perform a training image recognition network, the method comprising:
acquiring an image to be identified;
obtaining target characteristics and target categories of the image to be identified by using the image identification model;
And searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
Fig. 9 is a schematic structural diagram of a server in an embodiment of the present application. The server 900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPU) 922 (e.g., one or more processors) and memory 932, one or more storage media 930 (e.g., one or more mass storage devices) storing applications 942 or data 944. Wherein the memory 932 and the storage medium 930 may be transitory or persistent. The program stored in the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 922 may be arranged to communicate with a storage medium 930 to execute a series of instruction operations in the storage medium 930 on the server 900.
The server 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input/output interfaces 958, one or more keyboards 956, and/or one or more operating systems 941, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the present application in any way. While the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application. Any person skilled in the art may make many possible variations and modifications to the technical solution of the present application, or modify equivalent embodiments, using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present application. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present application, which do not depart from the content of the technical solution of the present application, still fall within the scope of protection of the technical solution of the present application.

Claims (13)

1. A method of training an image recognition network, comprising:
dividing an original training image to obtain a plurality of training image blocks and marking labels;
detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;
the plurality of training image blocks are rearranged in a disturbing mode based on the heat degree of the attention heat map, and a rearranged training image of the original training image is obtained, wherein the lower the disturbing degree of the training image blocks corresponding to the position with higher heat degree in the attention heat map is, the higher the disturbing degree of the training image blocks corresponding to the position with lower heat degree is;
Based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
2. The method of claim 1, wherein training an image recognition network to obtain an image recognition model based on the original training image, the rearranged training image, and the corresponding annotation data, comprises:
based on the original training image and the rearranged training image, training features are obtained by utilizing a feature extraction network in the image recognition network;
based on the training characteristics, obtaining prediction data by utilizing an identification network in the image identification network, wherein the prediction data comprises a prediction coarse-granularity image category, a prediction fine-granularity image category and a prediction image preprocessing category;
and training network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.
3. The method of claim 2, wherein the network penalty function comprises a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image pre-processing class classification penalty function, and a reorder training image restoration to an original training image penalty function.
4. A method of image recognition searching, comprising:
acquiring an image to be identified;
obtaining target features and target categories of the image to be identified using an image identification model, the image identification model being trained using the method of training an image identification network as claimed in any one of claims 1 to 3;
and searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
5. An apparatus for training an image recognition network, comprising:
the segmentation obtaining unit is used for segmenting the original training image to obtain a plurality of training image blocks and marking labels;
the detection obtaining subunit is used for detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;
A rearrangement obtaining subunit, configured to perform a scrambling rearrangement on the plurality of training image blocks based on the heat level of the attention heat map, to obtain a rearranged training image of the original training image, where the lower the scrambling level of the training image block corresponding to the higher heat level in the attention heat map, the higher the scrambling level of the training image block corresponding to the lower heat level;
the training obtaining unit is used for obtaining an image recognition model by a training image recognition network based on the original training image, the rearranged training image and the corresponding annotation data; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
6. The apparatus of claim 5, wherein the training obtaining unit comprises:
a first obtaining subunit, configured to obtain training features by using a feature extraction network in the image recognition network based on the original training image and the rearranged training image;
a second obtaining subunit, configured to obtain, based on the training feature, prediction data by using an identification network in the image identification network, where the prediction data includes a predicted coarse-granularity image category, a predicted fine-granularity image category, and a predicted image preprocessing category;
And the training obtaining subunit is used for training the network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.
7. The apparatus of claim 6, wherein the network penalty function comprises a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image pre-processing class classification penalty function, and a reorder training image restoration to an original training image penalty function.
8. An apparatus for image recognition search, comprising:
the acquisition unit is used for acquiring the image to be identified;
an obtaining unit configured to obtain target features and target categories of the image to be recognized using an image recognition model that is trained using the method of training an image recognition network according to any one of claims 1 to 3;
and the searching unit is used for searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
9. An apparatus for training an image recognition network, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
Dividing an original training image to obtain a plurality of training image blocks and marking labels;
detecting the image salient region of the original training image by using an attention heat map model to obtain an attention heat map of the original training image;
the plurality of training image blocks are rearranged in a disturbing mode based on the heat degree of the attention heat map, and a rearranged training image of the original training image is obtained, wherein the lower the disturbing degree of the training image blocks corresponding to the position with higher heat degree in the attention heat map is, the higher the disturbing degree of the training image blocks corresponding to the position with lower heat degree is;
based on the original training image, the rearranged training image and the corresponding annotation data, a training image recognition network obtains an image recognition model; the annotation data comprises a coarse-granularity image category label, a fine-granularity image category label, an image preprocessing category label and a training image block marking sequence, wherein the image preprocessing category label comprises an original label or a rearranged label.
10. The apparatus of claim 9, wherein the training image recognition network obtains an image recognition model based on the original training image, the rearranged training image, and the corresponding annotation data, comprising:
Based on the original training image and the rearranged training image, training features are obtained by utilizing a feature extraction network in the image recognition network;
based on the training characteristics, obtaining prediction data by utilizing an identification network in the image identification network, wherein the prediction data comprises a prediction coarse-granularity image category, a prediction fine-granularity image category and a prediction image preprocessing category;
and training network parameters of the image recognition network by using a network loss function based on the prediction data and the labeling data to obtain the image recognition model.
11. The apparatus of claim 10, wherein the network penalty function comprises a coarse-granularity image class classification penalty function, a fine-granularity image class classification penalty function, an image pre-processing class classification penalty function, and a reorder training image restoration to an original training image penalty function.
12. An apparatus for image recognition searching, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
Acquiring an image to be identified;
obtaining target features and target categories of the image to be identified using an image identification model, the image identification model being trained using the method of training an image identification network as claimed in any one of claims 1 to 3;
and searching similar images of the images to be identified in an image database based on the target characteristics and the target categories.
13. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method of training an image recognition network of any of claims 1 to 3; alternatively, the apparatus is caused to perform the method of image recognition search of claim 4.
CN202010332194.0A 2020-04-24 2020-04-24 Training image recognition network, image recognition searching method and related device Active CN111553372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010332194.0A CN111553372B (en) 2020-04-24 2020-04-24 Training image recognition network, image recognition searching method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010332194.0A CN111553372B (en) 2020-04-24 2020-04-24 Training image recognition network, image recognition searching method and related device

Publications (2)

Publication Number Publication Date
CN111553372A CN111553372A (en) 2020-08-18
CN111553372B true CN111553372B (en) 2023-08-08

Family

ID=72003970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010332194.0A Active CN111553372B (en) 2020-04-24 2020-04-24 Training image recognition network, image recognition searching method and related device

Country Status (1)

Country Link
CN (1) CN111553372B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364918B (en) * 2020-11-10 2024-04-02 深圳力维智联技术有限公司 Abnormality recognition method, terminal, and computer-readable storage medium
CN112561893A (en) * 2020-12-22 2021-03-26 平安银行股份有限公司 Picture matching method and device, electronic equipment and storage medium
CN112633276A (en) * 2020-12-25 2021-04-09 北京百度网讯科技有限公司 Training method, recognition method, device, equipment and medium
CN113256621B (en) * 2021-06-25 2021-11-02 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN113793323A (en) * 2021-09-16 2021-12-14 云从科技集团股份有限公司 Component detection method, system, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567300A (en) * 2011-12-29 2012-07-11 方正国际软件有限公司 Picture document processing method and device
CN106445939A (en) * 2015-08-06 2017-02-22 阿里巴巴集团控股有限公司 Image retrieval, image information acquisition and image identification methods and apparatuses, and image identification system
CN107515895A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 A kind of sensation target search method and system based on target detection
CN109871461A (en) * 2019-02-13 2019-06-11 华南理工大学 The large-scale image sub-block search method to be reordered based on depth Hash network and sub-block
CN110059769A (en) * 2019-04-30 2019-07-26 福州大学 The semantic segmentation method and system rebuild are reset based on pixel for what streetscape understood
CN110263912A (en) * 2019-05-14 2019-09-20 杭州电子科技大学 A kind of image answering method based on multiple target association depth reasoning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6222285B1 (en) * 2016-06-01 2017-11-01 富士電機株式会社 Data processing apparatus, data processing method, and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567300A (en) * 2011-12-29 2012-07-11 方正国际软件有限公司 Picture document processing method and device
CN106445939A (en) * 2015-08-06 2017-02-22 阿里巴巴集团控股有限公司 Image retrieval, image information acquisition and image identification methods and apparatuses, and image identification system
CN107515895A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 A kind of sensation target search method and system based on target detection
CN109871461A (en) * 2019-02-13 2019-06-11 华南理工大学 The large-scale image sub-block search method to be reordered based on depth Hash network and sub-block
CN110059769A (en) * 2019-04-30 2019-07-26 福州大学 The semantic segmentation method and system rebuild are reset based on pixel for what streetscape understood
CN110263912A (en) * 2019-05-14 2019-09-20 杭州电子科技大学 A kind of image answering method based on multiple target association depth reasoning

Also Published As

Publication number Publication date
CN111553372A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111553372B (en) Training image recognition network, image recognition searching method and related device
CN106557768B (en) Method and device for recognizing characters in picture
CN106776890B (en) Method and device for adjusting video playing progress
CN110517185B (en) Image processing method, device, electronic equipment and storage medium
CN105845124B (en) Audio processing method and device
CN108227950B (en) Input method and device
EP2998960A1 (en) Method and device for video browsing
CN112672208B (en) Video playing method, device, electronic equipment, server and system
CN106409317B (en) Method and device for extracting dream speech
CN113676671B (en) Video editing method, device, electronic equipment and storage medium
CN111523346B (en) Image recognition method and device, electronic equipment and storage medium
CN110764627B (en) Input method and device and electronic equipment
CN109034106B (en) Face data cleaning method and device
CN112464031A (en) Interaction method, interaction device, electronic equipment and storage medium
CN110781842A (en) Image processing method and device, electronic equipment and storage medium
CN110019907B (en) Image retrieval method and device
CN104715007A (en) User identification method and device
CN111629270A (en) Candidate item determination method and device and machine-readable medium
CN112784151B (en) Method and related device for determining recommended information
CN111526380B (en) Video processing method, video processing device, server, electronic equipment and storage medium
CN112784858B (en) Image data processing method and device and electronic equipment
CN111382367B (en) Search result ordering method and device
CN110175293B (en) Method and device for determining news venation and electronic equipment
CN113870195A (en) Target map detection model training and map detection method and device
CN112036247A (en) Expression package character generation method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant