CN113869336A

CN113869336A - Image identification searching method and related device

Info

Publication number: CN113869336A
Application number: CN202010616301.2A
Authority: CN
Inventors: 章书豪; 夏雄尉; 谢泽华; 周泽南; 苏雪峰; 陈炜鹏; 许静芳
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2021-12-31
Anticipated expiration: 2040-06-30

Abstract

The application discloses a method for image identification and search and a related device, wherein the method comprises the following steps: firstly, judging whether an intention category of an input image is a commodity category or a non-commodity category through an image intention classification model; then, when the intention category of the input image is a commodity category, extracting image features of the input image through an input image recognition model; and finally, searching at least one commodity image similar to the input image in the commodity image database according to the image characteristics of the input image to form a similar image set of the input image. Therefore, the intention type of the input image is judged by using the image intention classification model, and the operation of searching the commodity by using the image is only carried out when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching the commodity by using the image is not carried out on the input image of the non-commodity type, thereby avoiding the waste of computing resources and greatly saving the computing resources.

Description

Image identification searching method and related device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and a related apparatus for image recognition and search.

Background

With the rapid development of the image recognition technology, in daily life, interested clothes, cosmetics, daily necessities and the like can be shot at will to obtain images, and then the images are used for searching commodities, in fact, the images are subjected to image recognition and searching, and links of the same commodities or similar commodities corresponding to the images are obtained rapidly, so that the requirement of a user for rapidly searching the interested commodities is met.

At present, the specific implementation process of searching for a commodity with an image is as follows: the image features of the input image are extracted by utilizing a deep learning model, and then the distance between the image features and the image features of each commodity image in the commodity image database is calculated so as to return a commodity image similar to the input image, thereby obtaining the link of the same-style commodity or the similar commodity.

However, the inventors have found through studies that the types of input images are various, and there are a possibility that the input images are commodity images or non-commodity images, and the non-commodity images are not actually necessary to perform an operation of searching for commodities by images; however, in the prior art, the above-mentioned operation of searching for a commodity by an image is performed on an input image, that is, the operation of searching for a commodity by an image is also performed on a non-commodity image, which is very likely to cause waste of computing resources.

Disclosure of Invention

The technical problem to be solved by the present application is to provide an image recognition and search method and a related apparatus, in which an input image of a non-commodity category does not perform an operation of searching for a commodity with an image, thereby avoiding a waste of computing resources and greatly saving the computing resources.

In a first aspect, an embodiment of the present application provides a method for image recognition search, where the method includes:

performing intention classification on an input image by using an image intention classification model to obtain an intention category of the input image; the intent categories include a commodity category and a non-commodity category;

if the intention type of the input image is the commodity type, extracting the image characteristics of the input image by using an image recognition model;

and searching at least one commodity image similar to the input image in a commodity image database based on the image characteristics of the input image to obtain a similar image set of the input image.

Optionally, the commodity category includes an apparel commodity category and other commodity categories; correspondingly, the image recognition model comprises a clothing commodity recognition model and other commodity recognition models.

Optionally, if the commodity category of the input image is specifically the clothing commodity category, the extracting, by using the image recognition model, the image feature of the input image includes:

detecting the input image by using a detection network to obtain a plurality of clothing area images in the input image;

extracting the image characteristics of each clothing region image by using the clothing commodity identification model;

or, if the commodity category of the input image is specifically the other commodity category, the extracting, by using the image recognition model, the image feature of the input image is specifically:

and directly extracting the image characteristics of the input image by using the other commodity identification models.

Optionally, the obtaining of the image intention classification model includes:

acquiring image samples of a plurality of labeling intention category labels; the image samples marked with the intention category labels comprise a first image sample marked with a commodity category label and a second image sample marked with a non-commodity category label;

and pre-training a neural network model by taking the image sample as input and the intention category label as output to obtain the image intention classification model.

Optionally, after the obtaining the set of similar images of the input image, the method further includes:

obtaining a rank value corresponding to each similar image based on the distance between the image feature of the input image and the image feature of each similar image in the similar image set;

and filtering similar images with rank values smaller than a preset rank value threshold in the similar image set based on the rank value corresponding to each similar image and the preset rank value threshold to obtain a first target similar image set.

if a plurality of similar images in the similar image set correspond to a plurality of image categories, acquiring an entropy value of each image category based on the number of similar images corresponding to each image category in the similar image set;

and filtering the similar images corresponding to the image categories of which the entropy values are greater than the preset entropy value threshold value in the similar image set based on the entropy value of each image category and the preset entropy value threshold value to obtain a second target similar image set.

determining image categories corresponding to a plurality of similar images in the similar image set;

and determining the sorting order of the plurality of similar images in the similar image set based on the rank value corresponding to each similar image and the image category.

In a second aspect, an embodiment of the present application provides an apparatus for image recognition search, where the apparatus includes:

an intention category obtaining unit, which is used for carrying out intention classification on an input image by utilizing an image intention classification model to obtain an intention category of the input image; the intent categories include a commodity category and a non-commodity category;

an image feature extraction unit configured to extract an image feature of the input image using an image recognition model if the intention type of the input image is the commodity type;

and the similar image set obtaining unit is used for searching at least one commodity image similar to the input image in a commodity image database based on the image characteristics of the input image to obtain a similar image set of the input image.

Optionally, if the commodity category of the input image is specifically the clothing commodity category, the image feature extraction unit includes:

a clothing region image obtaining subunit, configured to detect the input image by using a detection network, and obtain multiple clothing region images in the input image;

the image characteristic extraction subunit is used for extracting the image characteristics of each clothing region image by using the clothing commodity identification model;

or, if the intention type of the input image is the other commodity type, the image feature extraction unit is specifically configured to:

Optionally, the apparatus further includes an image intention classification model obtaining unit, where the image intention classification model obtaining unit includes:

the image sample acquiring subunit is used for acquiring a plurality of image samples marked with intention category labels; the image samples marked with the intention category labels comprise a first image sample marked with a commodity category label and a second image sample marked with a non-commodity category label;

and the image intention classification model obtaining subunit is used for pre-training a neural network model by taking the image sample as input and the intention category label as output to obtain the image intention classification model.

Optionally, the apparatus further comprises:

a rank value obtaining unit, configured to obtain a rank value corresponding to each similar image in the set of similar images based on a distance between an image feature of the input image and an image feature of each similar image;

and the first target similar image set obtaining unit is used for filtering similar images of which the rank values are smaller than a preset rank value threshold value in the similar image set based on the rank value corresponding to each similar image and the preset rank value threshold value to obtain a first target similar image set.

Optionally, the apparatus further comprises:

an entropy obtaining unit, configured to, if a plurality of similar images in the similar image set correspond to a plurality of image categories, obtain an entropy of each image category based on a number of similar images corresponding to each image category in the similar image set;

and the second target similar image set obtaining unit is used for filtering the similar images corresponding to the image categories of which the entropy values are greater than the preset entropy value threshold in the similar image set based on the entropy value and the preset entropy value threshold of each image category to obtain a second target similar image set.

Optionally, the apparatus further comprises:

the image type determining unit is used for determining image types corresponding to a plurality of similar images in the similar image set;

and the sorting order determining unit is used for determining the sorting order of the plurality of similar images in the similar image set based on the rank value corresponding to each similar image and the image category.

In a third aspect, an embodiment of the present application provides an apparatus for image recognition search, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors includes instructions for:

In a fourth aspect, embodiments of the present application provide a machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the method of image recognition search according to any one of the first aspect above.

Compared with the prior art, the method has the advantages that:

according to the technical scheme of the embodiment of the application, firstly, whether the intention type of an input image is a commodity type or a non-commodity type is judged through an image intention classification model; then, when the intention category of the input image is a commodity category, extracting image features of the input image through an input image recognition model; and finally, searching at least one commodity image similar to the input image in the commodity image database according to the image characteristics of the input image to form a similar image set of the input image. Therefore, the intention type of the input image is judged by using the image intention classification model, and the operation of searching the commodity by using the image is only carried out when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching the commodity by using the image is not carried out on the input image of the non-commodity type, thereby avoiding the waste of computing resources and greatly saving the computing resources.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a system framework related to an application scenario in an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for image recognition search according to an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of another method for image recognition search according to an embodiment of the present disclosure;

FIG. 4 is a schematic flowchart of another method for image recognition search according to an embodiment of the present disclosure;

FIG. 5 is a schematic flowchart of another method for image recognition search according to an embodiment of the present disclosure;

FIG. 6 is a schematic flowchart of another method for image recognition search according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an apparatus for image recognition search according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an apparatus for image recognition search according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The inventor finds that, in the present stage, the commodity searching by using the image is generally an operation of directly extracting image features from an input image uploaded by a user to search a commodity image similar to the input image; however, the input images uploaded by the user are various, and may be commodity images or non-commodity images, and the non-commodity images are not actually necessary to perform an operation of searching for commodities by images; in other words, at present, the input images uploaded by the user are all operated to search for the product by the images, which is very likely to cause waste of computing resources.

In order to solve this problem, in the embodiment of the present application, it is determined by an image intention classification model whether an intention category of an input image is a commodity category or a non-commodity category; when the intention type of the input image is a commodity type, extracting image characteristics of the input image through an input image recognition model; and searching at least one commodity image similar to the input image in the commodity image database according to the image characteristics of the input image to form a similar image set of the input image. Therefore, the intention type of the input image is judged by using the image intention classification model, and the operation of searching the commodity by using the image is only carried out when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching the commodity by using the image is not carried out on the input image of the non-commodity type, so that the waste of computing resources is avoided, and the computing resources are greatly saved.

For example, one of the scenarios in the embodiment of the present application may be applied to the scenario shown in fig. 1. A user terminal 101 and a processor 102 in this scenario, wherein the user terminal 101 may be a device such as a smartphone, tablet or personal computer with a camera function. A user shoots an interested article through a user terminal 101 to obtain an article image, and the article image is uploaded to a processor 102 as an input image; the processor 102 stores the pre-trained image intention classification model and image recognition model, achieves the purpose of filtering the input images of non-commodity categories by adopting the method of the embodiment of the application, obtains the similar image set of the input images of the commodity categories and returns the similar image set to the user terminal 101, so that the user terminal 101 displays the similar image set to the user.

It is to be understood that, in the above application scenarios, although the actions of the embodiments of the present application are described as being performed by the processor 102, the present application is not limited in terms of the subject of execution as long as the actions disclosed in the embodiments of the present application are performed.

It is to be understood that the above scenario is only one example of a scenario provided in the embodiment of the present application, and the embodiment of the present application is not limited to this scenario.

The following describes in detail a specific implementation manner of the image recognition search method and the related apparatus in the embodiments of the present application with reference to the drawings.

Exemplary method

Referring to fig. 2, a flowchart of a method for image recognition search in an embodiment of the present application is shown. In this embodiment, the method may include, for example, the steps of:

step 201: performing intention classification on an input image by using an image intention classification model to obtain an intention category of the input image; the intent categories include a merchandise category and a non-merchandise category.

It should be noted that, because the input images uploaded by the user are various, the input images may be commodity images or commodity images, and in most cases, the input images uploaded by the user are non-commodity images; the commodity searching by the non-commodity image has no practical significance, and if the commodity searching by the image is carried out on a large number of non-commodity images, considerable waste of computing resources is caused. Therefore, in the embodiment of the application, in order to realize that the non-commodity image does not carry out the operation of searching for the commodity by the image, the purpose of saving computing resources is achieved; aiming at an input image uploaded by a user, firstly, judging whether the input image is a commodity image or a non-commodity image, specifically, pre-training a neural network model based on a large number of image samples labeled with intention category labels to obtain an image intention classification model, and judging whether the intention category of the input image is a commodity category or a non-commodity category; the image samples comprise a plurality of first image samples marked with commodity category labels and a plurality of second image samples marked with non-commodity category labels.

That is, in an alternative implementation manner of the embodiment of the present application, the obtaining step of the image intent classification model may include, for example: acquiring image samples of a plurality of labeling intention category labels; the image samples marked with the intention category labels comprise a first image sample marked with a commodity category label and a second image sample marked with a non-commodity category label; and pre-training a neural network model by taking the image sample as input and the labeling intention category label as output to obtain the image intention classification model.

Step 202: and if the intention type of the input image is the commodity type, extracting the image characteristics of the input image by using an image recognition model.

After obtaining the intention type of the input image, the input image of the non-commodity type does not need to be subjected to the operation of searching for commodities by images in order to avoid the waste of computing resources, namely, the input image of the non-commodity type needs to be filtered when the operation of searching for commodities by images is carried out; in the input image of the product category, an operation of searching for a product with an image is required, and it is necessary to extract an image feature of the input image of the product category first. Therefore, in the embodiment of the present application, the input image of the commodity category is input and is trained in advance based on a large number of commodity image samples to obtain the image recognition model, so as to extract the image features of the input image of the commodity category.

In the embodiment of the present application, the input image of the commodity category may be divided into the input image of the clothing commodity category and the input image of another commodity category, where the input image of the other commodity category represents the input image of the commodity category other than the input image of the clothing commodity category. Correspondingly, the input image of the clothing commodity category needs to be input and pre-trained on a large number of clothing commodity image samples to obtain a clothing commodity identification model so as to extract the image features of the clothing commodity identification model. The input images of other commodity categories need to be input and pre-trained on a large number of other commodity image samples to obtain other commodity identification models so as to extract the image features of the other commodity identification models. Therefore, in an optional implementation manner of the embodiment of the present application, the commodity category includes an apparel commodity category and other commodity categories; correspondingly, the image recognition model comprises a clothing commodity recognition model and other commodity recognition models.

The following describes a specific implementation of step 202 when the commodity category of the input image is classified into a clothing commodity category and another commodity category:

firstly, when the commodity category of the input image is a clothing commodity category, because the input image of the clothing commodity category may include a plurality of clothing, such as coats, trousers, shoes, and the like, if the input image of the clothing commodity category is directly identified through a single model, the model is more prone to select one clothing in the input image of the clothing commodity category for identification, only the image feature of one clothing in the input image of the clothing commodity category can be extracted, and subsequently, only the similar image can be searched, so that the similar image of one clothing in the input image of the clothing commodity category is obtained, and the effect of searching for the commodity by the image is influenced. Therefore, in the embodiment of the present application, for the input image of the clothing article category, it is required to obtain a plurality of clothing region images through detecting the network frame selection, and then extract the image features of each clothing region image through the clothing article identification model. That is, in an optional implementation manner of this embodiment of the present application, if the commodity category of the input image is specifically the clothing commodity category, the step 202 may include the following steps, for example:

step A: detecting the input image by using a detection network to obtain a plurality of clothing area images in the input image;

and B: and extracting the image characteristics of each clothing region image by using the clothing commodity identification model.

As an example, the input image of the clothing commodity category includes a jacket, trousers and shoes, and in the prior art, the image characteristics of the jacket in the input image are extracted by using an image recognition network; in the embodiment of the application, the input image is detected by using a detection network, and a jacket area image, a trousers area image and a shoes area image in the input image are obtained; and respectively extracting the image characteristics of the coat region image, the image characteristics of the trousers region image and the image characteristics of the shoes subregion image by using the clothing commodity identification model.

Through the implementation mode of the steps A-B, a plurality of clothing area images in the input image of the clothing commodity category are detected by using the detection network, the image characteristics of the clothing area images are extracted, so that the phenomenon that only one image characteristic of clothing can be extracted in the input image of the clothing commodity category is avoided by directly identifying, similar images can be searched subsequently based on the image characteristics of each clothing area image, and the effect of searching for commodities by using the images is greatly improved.

Second, when the product type of the input image is another product type, the image feature of the input image of the other product type may be extracted directly by the other product recognition model with respect to the input image of the other product type. Therefore, in an optional implementation manner of this embodiment of the present application, if the intention category of the input image is the category of the other merchandise, the step 202 may specifically be: and directly extracting the image characteristics of the input image by using the other commodity identification models.

Step 203: and searching at least one commodity image similar to the input image in a commodity image database based on the image characteristics of the input image to obtain a similar image set of the input image.

After obtaining the image features of the input image of the commodity category, the distance between the image features and the image features of each commodity image in the commodity image database may be calculated, where a smaller distance indicates a greater similarity between the commodity image and the input image, and a larger distance indicates a smaller similarity between the commodity image and the input image, and based on each distance and a preset distance threshold, the commodity image corresponding to the distance smaller than the preset distance threshold is determined to be a similar image of the input image, so as to form a similar image set of the input image.

According to various embodiments provided by the embodiment, firstly, an intention category of an input image is judged to be a commodity category or a non-commodity category through an image intention classification model; then, when the intention category of the input image is a commodity category, extracting image features of the input image through an input image recognition model; and finally, searching at least one commodity image similar to the input image in the commodity image database according to the image characteristics of the input image to form a similar image set of the input image. Therefore, the intention type of the input image is judged by using the image intention classification model, and the operation of searching the commodity by using the image is only carried out when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching the commodity by using the image is not carried out on the input image of the non-commodity type, thereby avoiding the waste of computing resources and greatly saving the computing resources.

It should be further noted that, after the similar image set of the input image is obtained by searching the commercial image database in the above embodiment, a similar image more similar to the input image may be obtained by screening from the similar image set of the input image, therefore, on the basis of the above embodiment, a rank value corresponding to each similar image may be calculated first according to a known distance between an image feature of the input image and an image feature of each similar image in the similar image set, where a larger rank value indicates a larger similarity between the similar image and the input image, and a smaller rank value indicates a smaller similarity between the similar image and the input image; and filtering out the similar images with the rank value smaller than the preset rank value threshold in the similar image set, and reserving the similar images with the rank value larger than or equal to the preset rank value threshold in the similar image set, namely, screening the similar image set to obtain images which are more similar to the input images to form a first target similar image set.

Referring now to fig. 3, shown is a flow chart of another method for image recognition search in an embodiment of the present application. In this embodiment, the method may include, for example, the steps of:

step 301: performing intention classification on an input image by using an image intention classification model to obtain an intention category of the input image; the intent categories include a merchandise category and a non-merchandise category.

Step 302: and if the intention type of the input image is the commodity type, extracting the image characteristics of the input image by using an image recognition model.

Step 303: and searching at least one commodity image similar to the input image in a commodity image database based on the image characteristics of the input image to obtain a similar image set of the input image.

In this embodiment, steps 301 to 303 are the same as steps 201 to 202 in the above embodiment, and specific implementation manners of steps 301 to 303 may refer to the specific implementation manners of steps 201 to 202 in the above embodiment, which is not described herein again.

Step 304: and obtaining a rank value corresponding to each similar image based on the distance between the image feature of the input image and the image feature of each similar image in the similar image set.

In the embodiment of the present application, the calculation formula of the rank value corresponding to the similar image may specifically be: rank 2-distance; wherein the distance represents a distance between an image feature of the input image and an image feature of the similar image.

Step 305: and filtering similar images with rank values smaller than a preset rank value threshold in the similar image set based on the rank value corresponding to each similar image and the preset rank value threshold to obtain a first target similar image set.

As an example, assuming that the preset rank value threshold is 1.08, filtering similar images with rank values smaller than 1.08 in the similar image set to obtain a first target similar image set.

According to various embodiments provided by the embodiment, firstly, an intention category of an input image is judged to be a commodity category or a non-commodity category through an image intention classification model; then, when the intention category of the input image is a commodity category, extracting image features of the input image through an input image recognition model; secondly, searching at least one commodity image similar to the input image in a commodity image database according to the image characteristics of the input image to form a similar image set of the input image; thirdly, calculating a rank value corresponding to each similar image according to the distance between the image feature of the input image and the image feature of each similar image in the similar image set; and finally, filtering the similar images with rank values smaller than a preset rank value threshold in the similar image set to obtain a first target similar image set. Therefore, the intention type of the input image is judged by using the image intention classification model, and the operation of searching the commodity by using the image is only carried out when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching the commodity by using the image is not carried out on the input image of the non-commodity type, so that the waste of computing resources is avoided, and the computing resources are greatly saved; in addition, similar images which are more similar to the input image in the similar image set are screened through the rank value corresponding to each similar image, the effect of searching for commodities by the images is improved, and therefore the user searching experience is improved.

It should be further noted that, after the similar image set of the input image is obtained by searching from the commercial image database in the above embodiment, image categories to which different similar images in the similar image set belong may be different, that is, a plurality of similar images in the similar image set correspond to a plurality of image categories; the uncertainty of the similar images corresponding to some image categories in the similar image set may be large, and filtering is required. Therefore, on the basis of the above embodiment, an entropy value of each image category may be calculated first according to the number of similar images corresponding to each image category in the similar image set, where a larger entropy value indicates a larger uncertainty of a similar image corresponding to the image category, and a smaller entropy value indicates a smaller uncertainty of a similar image corresponding to the image category; and filtering the similar images corresponding to the image categories of which the entropy values are greater than the preset entropy value threshold in the similar image set, and reserving the similar images corresponding to the image categories of which the entropy values are less than or equal to the preset entropy value threshold in the similar image set to form a second target similar image set.

Referring now to fig. 4, shown is a flow chart of another method for image recognition search in an embodiment of the present application. In this embodiment, the method may include, for example, the steps of:

step 401: performing intention classification on an input image by using an image intention classification model to obtain an intention category of the input image; the intent categories include a merchandise category and a non-merchandise category.

Step 402: and if the intention type of the input image is the commodity type, extracting the image characteristics of the input image by using an image recognition model.

Step 403: and searching at least one commodity image similar to the input image in a commodity image database based on the image characteristics of the input image to obtain a similar image set of the input image.

In this embodiment, steps 401 to 403 are the same as steps 201 to 202 in the above embodiment, and specific implementation manners of steps 401 to 403 may refer to specific implementation manners of steps 201 to 202 in the above embodiment, which is not described herein again.

Step 404: if the plurality of similar images in the similar image set correspond to a plurality of image categories, acquiring an entropy value of each image category based on the number of the similar images corresponding to each image category in the similar image set.

As an example, the number of similar images in the set of similar images is 30, the input image is a coca-cola image, the plurality of similar images in the set of similar images correspond to 3 image categories, namely, a beverage category, a cup category and a toy category, wherein the number of similar images in the set of similar images corresponding to the beverage category is 20, the number of similar images in the cup category is 8 and the number of similar images in the toy category is 2, and the entropy values of the beverage category, the cup category and the toy category are obtained through calculation based on the above.

Step 405: and filtering the similar images corresponding to the image categories of which the entropy values are greater than the preset entropy value threshold value in the similar image set based on the entropy value of each image category and the preset entropy value threshold value to obtain a second target similar image set.

As an example, on the basis of the above example, if the entropy value of the toy category is greater than the preset entropy threshold, filtering the similar images corresponding to the toy category in the similar image set, retaining the similar images corresponding to the beverage category and the similar images corresponding to the cup category in the similar image set, and obtaining a second target similar image set, where the number of the similar images in the second target similar image set is 28.

According to various embodiments provided by the embodiment, firstly, an intention category of an input image is judged to be a commodity category or a non-commodity category through an image intention classification model; then, when the intention category of the input image is a commodity category, extracting image features of the input image through an input image recognition model; secondly, searching at least one commodity image similar to the input image in a commodity image database according to the image characteristics of the input image to form a similar image set of the input image; thirdly, calculating the entropy value of each image category based on the number of similar images corresponding to each image category in the similar image set; and finally, filtering the similar images corresponding to the entropy value larger than the preset entropy value threshold value in the similar image set to obtain a second target similar image set. Therefore, the intention type of the input image is judged by using the image intention classification model, and the operation of searching the commodity by using the image is only carried out when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching the commodity by using the image is not carried out on the input image of the non-commodity type, so that the waste of computing resources is avoided, and the computing resources are greatly saved; in addition, through entropy values of image categories to which a plurality of similar images in the similar image set belong, the similar images with high uncertainty in the similar image set are filtered, the effect of searching for commodities by the images is improved, and therefore user searching experience is improved.

It should be further noted that the sorting order of the plurality of similar images in the similar image set may also be determined, so that the plurality of similar images sorted in the similar image set are displayed to the user, and the user search experience is improved. The ordering order of the plurality of similar images depends on the rank value corresponding to each similar image in the similar image set and the image categories corresponding to the plurality of similar images in the similar image set.

Referring now to fig. 5, shown is a flow chart of another method for image recognition search in an embodiment of the present application. In this embodiment, the method may include, for example, the steps of:

step 501: performing intention classification on an input image by using an image intention classification model to obtain an intention category of the input image; the intent categories include a merchandise category and a non-merchandise category.

Step 502: and if the intention type of the input image is the commodity type, extracting the image characteristics of the input image by using an image recognition model.

Step 503: and searching at least one commodity image similar to the input image in a commodity image database based on the image characteristics of the input image to obtain a similar image set of the input image.

In this embodiment, steps 501 to 503 are the same as steps 201 to 202 in the above embodiment, and specific implementation manners of steps 501 to 503 may refer to specific implementation manners of steps 201 to 202 in the above embodiment, which are not described herein again.

Step 504: and obtaining a rank value corresponding to each similar image based on the distance between the image feature of the input image and the image feature of each similar image in the similar image set.

In this embodiment, step 504 is the same as step 304 in the above embodiment, and the specific implementation manner of step 504 may refer to the specific implementation manner of step 304 in the above embodiment, which is not described herein again.

Step 505: and determining image categories corresponding to a plurality of similar images in the similar image set.

In this embodiment, step 505 actually refers to obtaining an image category to which each similar image in the similar image set belongs, so as to determine image categories corresponding to a plurality of similar images in the similar image set.

Step 506: and determining the sorting order of the plurality of similar images in the similar image set based on the rank value corresponding to each similar image and the image category.

In this embodiment, first, a plurality of similar images in the similar image set may be ranked based on the rank value corresponding to each similar image to obtain a ranking position value of each similar image, and then, a ranking score of each image category may be calculated and obtained based on the ranking position value of the similar image corresponding to each image category; and finally, adjusting the sequence of the plurality of similar images in the similar image set based on the sequence score of each image category to obtain the sequence of the plurality of similar images in the similar image set.

For example, the number of similar images corresponding to an image category is k, and the calculation formula of the ranking score of the image category is as follows:

where score represents the ranking score of the image category, X_iIndicating the ranking position value of the ith similar image corresponding to the image category.

Of course, in this embodiment, if the plurality of similar images in the similar image set all belong to the same image category, that is, the plurality of similar images in the similar image set correspond to one image category, the plurality of similar images in the similar image set are sorted directly based on the rank value corresponding to each similar image, and the sorting order of the plurality of similar images in the similar image set is determined.

According to various embodiments provided by the embodiment, firstly, an intention category of an input image is judged to be a commodity category or a non-commodity category through an image intention classification model; then, when the intention category of the input image is a commodity category, extracting image features of the input image through an input image recognition model; secondly, searching at least one commodity image similar to the input image in a commodity image database according to the image characteristics of the input image to form a similar image set of the input image; thirdly, calculating a rank value corresponding to each similar image according to the distance between the image feature of the input image and the image feature of each similar image in the similar image set; then, determining the image category to which the similar images in the similar image set belong; and finally, sequencing the plurality of similar images in the similar image set on the basis of the rank value and the image category corresponding to each similar image. Therefore, the intention type of the input image is judged by using the image intention classification model, and the operation of searching the commodity by using the image is only carried out when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching the commodity by using the image is not carried out on the input image of the non-commodity type, so that the waste of computing resources is avoided, and the computing resources are greatly saved; in addition, the plurality of similar images are sorted according to the rank value corresponding to each similar image and the image category to which the similar images belong, so that the effect of searching for commodities by the images is improved, and the user searching experience is improved.

It should be noted that, after the similar image set of the input image is obtained by searching from the commercial image database in the above embodiment, the steps 304 to 305, 404 to 405, and 504 to 506 may be integrated to improve the effect of searching for the commercial product with the image, thereby improving the user search experience.

Referring now to fig. 6, shown is a flow chart of another method for image recognition search in an embodiment of the present application. In this embodiment, the method may include, for example, the steps of:

step 601: performing intention classification on an input image by using an image intention classification model to obtain an intention category of the input image; the intent categories include a merchandise category and a non-merchandise category.

Step 602: and if the intention type of the input image is the commodity type, extracting the image characteristics of the input image by using an image recognition model.

Step 603: and searching at least one commodity image similar to the input image in a commodity image database based on the image characteristics of the input image to obtain a similar image set of the input image.

In this embodiment, steps 601 to 603 are the same as steps 201 to 202 in the above embodiment, and specific implementation manners of steps 601 to 603 may refer to the specific implementation manners of steps 201 to 202 in the above embodiment, which is not described herein again.

Step 604: and obtaining a rank value corresponding to each similar image based on the distance between the image feature of the input image and the image feature of each similar image in the similar image set.

Step 605: and filtering similar images with rank values smaller than a preset rank value threshold in the similar image set based on the rank value corresponding to each similar image and the preset rank value threshold to obtain a first target similar image set.

In this embodiment, steps 604 to 605 are the same as steps 304 to 305 in the above embodiment, and specific implementation manners of steps 604 to 605 may refer to specific implementation manners of steps 304 to 305 in the above embodiment, which are not described herein again.

Step 606: if the plurality of similar images in the first similar image set correspond to a plurality of image categories, acquiring an entropy value of each image category based on the number of similar images corresponding to each image category in the first similar image set.

Step 607: and filtering the similar images corresponding to the entropy value larger than the preset entropy value threshold in the first similar image set based on the entropy value of each image category and the preset entropy value threshold to obtain a second target similar image set.

Step 608: and determining the sorting order of the plurality of similar images in the second similar image set based on the rank value and the image category corresponding to each similar image in the second target similar image set.

Through various implementation manners provided by the embodiment, the intention type is judged on the input image by using the image intention classification model, and the operation of searching for the commodity by using the image is performed only when the judgment result is the commodity type, so that the aim of filtering the input image of the non-commodity type is fulfilled, namely, the operation of searching for the commodity by using the image is not performed on the input image of the non-commodity type, so that the waste of computing resources is avoided, and the computing resources are greatly saved; in addition, screening similar images which are more similar to the input image in the similar image set through a rank value corresponding to each similar image to obtain a first target similar image set; secondly, filtering similar images with high uncertainty in the first target similar image set through entropy values of image categories to which a plurality of similar images in the first target similar image set belong to obtain a second target similar image set; and finally, sequencing the plurality of similar images through the rank value corresponding to each similar image in the second target similar image set and the image category to which the similar image belongs, so that the effect of searching for the commodity by the image is improved, and the user searching experience is improved.

Exemplary devices

Referring to fig. 7, a schematic structural diagram of an apparatus for image recognition search in an embodiment of the present application is shown. In this embodiment, the apparatus may specifically include:

an intention category obtaining unit 701, configured to perform intention classification on an input image by using an image intention classification model, and obtain an intention category of the input image; the intent categories include a commodity category and a non-commodity category;

an image feature extraction unit 702, configured to extract an image feature of the input image by using an image recognition model if the intention type of the input image is the commodity type;

a similar image set obtaining unit 703, configured to search, in a commodity image database, at least one commodity image similar to the input image based on an image feature of the input image, and obtain a similar image set of the input image.

In an optional implementation manner of the embodiment of the present application, the commodity category includes an apparel commodity category and other commodity categories; correspondingly, the image recognition model comprises a clothing commodity recognition model and other commodity recognition models.

In an optional implementation manner of this embodiment of the present application, if the commodity category of the input image is specifically the clothing commodity category, the image feature extraction unit 702 includes:

alternatively, if the intention type of the input image is the other commodity type, the image feature extraction unit 702 is specifically configured to:

In an optional implementation manner of the embodiment of the present application, the apparatus further includes an image intention classification model obtaining unit, where the image intention classification model obtaining unit includes:

In an optional implementation manner of the embodiment of the present application, the apparatus further includes:

FIG. 8 is a block diagram illustrating an apparatus 800 for image recognition searching, according to an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 8, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, images, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure correlated to the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of image recognition searching, the method comprising:

Fig. 9 is a schematic structural diagram of a server in the embodiment of the present application. The server 900 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 922 (e.g., one or more processors) and memory 932, one or more storage media 930 (e.g., one or more mass storage devices) storing applications 942 or data 944. Memory 932 and storage media 930 can be, among other things, transient storage or persistent storage. The program stored on the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 922 may be provided in communication with the storage medium 930 to execute a series of instruction operations in the storage medium 930 on the server 900.

The server 900 may also include one or more power supplies 926, one or more wired or wireless network interfaces 950, one or more input-output interfaces 958, one or more keyboards 956, and/or one or more operating systems 941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a preferred embodiment of the present application and is not intended to limit the present application in any way. Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application. Those skilled in the art can now make numerous possible variations and modifications to the disclosed embodiments, or modify equivalent embodiments, using the methods and techniques disclosed above, without departing from the scope of the claimed embodiments. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present application still fall within the protection scope of the technical solution of the present application without departing from the content of the technical solution of the present application.

Claims

1. A method of image recognition searching, comprising:

2. The method of claim 1, wherein the categories of merchandise include a category of apparel merchandise and a category of other merchandise; correspondingly, the image recognition model comprises a clothing commodity recognition model and other commodity recognition models.

3. The method of claim 2, wherein if the commodity category of the input image is specifically the clothing commodity category, the extracting the image feature of the input image by using the image recognition model comprises:

4. The method according to any one of claims 1 to 3, wherein the obtaining of the image intent classification model comprises:

5. The method of claim 1, further comprising, after the obtaining the set of similar images for the input image:

6. The method of claim 1, further comprising, after the obtaining the set of similar images for the input image:

7. The method of claim 1, further comprising, after the obtaining the set of similar images for the input image:

8. An apparatus for image recognition search, comprising:

9. An apparatus for image recognition searching, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors comprises instructions for:

10. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the method of image recognition searching of any of claims 1-7.