WO2023185787A1

WO2023185787A1 - Article matching method and related device

Info

Publication number: WO2023185787A1
Application number: PCT/CN2023/084241
Authority: WO
Inventors: 邓一萌; 杨坚鑫; 李继忠; 曹朝
Original assignee: 华为技术有限公司
Priority date: 2022-03-31
Filing date: 2023-03-28
Publication date: 2023-10-05
Also published as: CN116932804A

Abstract

An article matching method and a related device. By means of the method, artificial intelligence technology can be applied to the field of article search. The method comprises: acquiring an image which is input by a user, wherein a background and at least two articles are present in the image; on the basis of feature information of the image and feature information of the at least two articles, acquiring, by means of a first neural network, a target category having a matching relationship with the image; and displaying a target article of the target category. Not only can an article to be matched be searched for by means of providing an image, but also when a complex image is input, an article of a target category having a matching relationship with the whole image can still be acquired, thereby greatly expanding the application scenarios of the present solution, and facilitating an improvement in the user viscosity of the present solution.

Description

A method of matching items and related equipment

This application claims priority to the Chinese patent application submitted to the China Patent Office on March 31, 2022, with the application number 202210333006.5 and the invention title "A method of matching articles and related equipment", the entire content of which is incorporated herein by reference. Applying.

Technical field

This application relates to the field of artificial intelligence, and in particular to a method of matching items and related equipment.

Background technique

Artificial Intelligence (AI) is a branch of computer science that attempts to understand the nature of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. With the development of artificial intelligence technology, using artificial intelligence to search for items is a common application method.

Common item search solutions in the industry include photo search. Specifically, users can take photos of the items they want to search for, and then search for similar items based on the input pictures.

However, technicians found in their research that in most scenarios, when users have search needs, they cannot obtain images of similar items for the search needs, that is, the user's search needs cannot be achieved; correspondingly, users can often obtain Items that match the item you are searching for, for example, if you want to search for pants that match a top.

Contents of the invention

The embodiments of the present application provide an item matching method and related equipment. When the user inputs a complex image to be processed (that is, an image including at least two items), it is still possible to obtain a matching image with the entire image to be processed. A target category of items related to each other, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.

In order to solve the above technical problems, the embodiments of this application provide the following technical solutions:

In the first aspect, embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search. The method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; the server or the client device obtains a target category of items that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items in the image to be processed; the client device Show the user items from the aforementioned target categories.

In this implementation, the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network, and then Display the target items corresponding to a target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two (image of an item), it is still possible to obtain a target category of items that has a matching relationship with the entire to-be-processed image, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire to-be-processed image The characteristic information of the image and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be processed is considered, but also the items in the image to be processed are fully considered. Each object, helps improve the accuracy of the determined target category.

In a possible implementation of the first aspect, the method further includes: the server or the client device inputs the image to be processed into a third neural network, so as to perform feature extraction on the image to be processed through the third neural network, and obtain the feature corresponding to the image to be processed. Target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed. The characteristic information of the image to be processed includes the characteristic information of the whole composed of the background and at least two items. That is, the characteristic information of the image to be processed refers to treating the image to be processed as a whole and extracting features from the matching image. Feature information of the image to be processed; as an example, the feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed; at least two items in the image to be processed The characteristic information of can also be called the semantic label set of the image to be processed. The characteristic information of at least two items in the image to be processed can include attribute information of each item. The attribute information of each item includes any one or more of the following information. : The category of the item, the color of the item and the location information of the item in the image to be processed; optionally, it can also include style information of each item, the material of the item, the pattern of the item or other feature information.

In this implementation, the feature information of the image to be processed refers to the feature information obtained by treating the image to be processed as a whole and extracting features from the matching images. The feature information of at least two items in the image to be processed can include each The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and each The characteristic information of an item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item. In the feature extraction process, the information of the object in the image to be processed is fully considered, which is beneficial to improving the determination of the item. accuracy of the target category.

In a possible implementation of the first aspect, the server or the client device obtains a target that has a matching relationship with the image to be processed based on the characteristic information of the image to be processed and the characteristic information of at least two items through the first neural network. Category, including: the server or client device generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category that has a collocation relationship with the image to be processed items; the client device displays M candidate intentions to the user to obtain feedback operations corresponding to the M candidate intentions; the client device determines a target category that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions . The "feedback operation" may be a selection operation on one of the M candidate intentions, or the "feedback operation" may also be a user manually inputting a new search intention, etc.

In this implementation, M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used This method guides the user's search intention and is conducive to improving the accuracy of the determined target category.

In a possible implementation of the first aspect, the method further includes: the client device obtains target text information input by the user, the target text information is used to indicate the user's search intention; the server or the client device inputs the text information into the fourth neural network , to perform feature extraction on the text information through the fourth neural network to obtain the feature information of the text information. The server or client device obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including: combining the characteristic information of the image to be processed and at least The characteristic information of the two items and the characteristic information of the text information are input into the first neural network, so as to obtain a target category that has a matching relationship with the image to be processed through the first neural network.

In this implementation, the target text information input by the user can also be obtained. The target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, when obtaining In the process of having a collocation relationship with the image to be processed, not only the information in the image to be processed can be fully obtained, but also the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intention.

In a possible implementation of the first aspect, the client device obtains an item of a target category that has a collocation relationship with the image to be processed through a first neural network, including: the server obtains an item that has a collocation relationship with the image to be processed through the first neural network N candidate items of the relationship, each candidate item is a target category, and N is an integer greater than 1; the server generates a target score corresponding to the N candidate items through the second neural network, and the target score indicates the relationship between the candidate item and the image to be processed. The matching degree between the N candidate items, that is, the aesthetic score used to indicate the matching renderings of a candidate item and the image to be processed; the server selects K target items from the N candidate items based on the target scores corresponding to the N candidate items, K is an integer greater than or equal to 1. Items of the target category displayed on the customer equipment include: K target items displayed on the customer equipment.

In this implementation, scores corresponding to N candidate items are generated through a neural network. The scores indicate the matching degree between the candidate items and the image to be processed; and based on the matching degree between each candidate item and the image to be processed, from N Select the target item that is finally displayed to the user among the candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.

In a possible implementation of the first aspect, generating a target score corresponding to N candidate items through a second neural network includes: combining the image of each candidate item, the semantic label of each candidate item, the image to be processed, and The semantic labels corresponding to the items in the image to be processed are input into the second neural network, and the target score corresponding to each candidate item output by the second neural network is obtained. The semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed. The semantic label of the candidate item may include at least one attribute information of the candidate item. As an example, the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc.

In a possible implementation of the first aspect, the client device displays items of a target category to the user, including: the client device displays to the user a rendering of a combination of the items of the target category and the image to be processed. The aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc. Optionally, the client device can also display to the user any one or more of the following information about the items in each target category: access links, names, prices, target ratings, or other types of information about the items in each target category. etc., there is no limit here.

In this implementation method, the user is shown the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed, which is conducive to improving the solution. user viscosity.

In the second aspect, embodiments of the present application provide an item matching method, which can apply artificial intelligence technology in the field of item search. The method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; receive an item of a target category sent by the server that has a matching relationship with the image to be processed. The item of the target category is obtained by the server based on the feature information of the image to be processed and the feature information of at least two items; display the items of the target category thing.

In a possible implementation of the second aspect, the feature information of the image includes a background and at least two items. The overall characteristic information of at least two items includes attribute information of each item. The attribute information of each item includes any one or more of the following information: category of item, color of item, style of item, item The material or pattern of the item.

In a possible implementation of the second aspect, the client device receives M candidate intentions corresponding to the image to be processed sent by the server, and displays the M candidate intentions to the user, where M is an integer greater than or equal to 2, and each candidate The intent indicates a category of items that have a matching relationship with the image to be processed; the client device obtains the feedback operations corresponding to the M candidate intentions, and determines a item that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions. A target category is sent to the server.

In the second aspect of this application, the client device can also be used to perform the steps performed by the client device in the first aspect and each possible implementation manner of the first aspect. The specific implementation methods and meanings of the nouns in each possible implementation manner of the second aspect As well as the beneficial effects brought about, please refer to the first aspect and will not be repeated here.

In the third aspect, embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search. The method includes: the server uses the characteristic information of the image to be processed and the characteristic information of at least two items, through the first A neural network obtains a target category that has a matching relationship with the image to be processed, where there is a background and at least two items in the image to be processed; the server sends information about the items of the target category to the client device.

In a possible implementation of the third aspect, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including: The server generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category of items that has a collocation relationship with the image to be processed; the server sends a request to the client device Send M candidate intentions, which are used by the client device to obtain a target category that has a matching relationship with the image to be processed; receive the target category sent by the client device.

In the third aspect of this application, the server can also be used to execute the steps performed by the server in the first aspect and each possible implementation of the first aspect. The specific implementation of the steps in each possible implementation of the third aspect, the meaning of the nouns and the For all the beneficial effects, please refer to the first aspect and will not be repeated here.

In the fourth aspect, embodiments of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search. The item matching device is applied to client equipment in an item matching system. The item matching system also includes a server. , The item matching device includes: an acquisition module, used to obtain an image input by the user, in which there is a background and at least two items; a receiving module, used to receive a target category of items sent by the server that has a matching relationship with the image, The items of the target category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module is used to display the items of the target category.

In the fourth aspect of the present application, the item matching device can also be used to perform the steps performed by the client device in the second aspect and each possible implementation manner of the second aspect. The specific implementation methods and nouns of the steps in each possible implementation manner of the fourth aspect Please refer to the second aspect for its meaning and beneficial effects, and will not be repeated here.

In the fifth aspect, embodiments of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search. The item matching device is applied to a server in an item matching system. The item matching system also includes a client device. , the item matching device includes: an acquisition module, configured to acquire a target category of items that has a matching relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, wherein the image There is a background and at least two items in; the sending module is used to send information of items of the target category to the client device.

In the fifth aspect of this application, the item matching device can also be used to perform the steps performed by the server in the third aspect and each possible implementation of the third aspect. The specific implementation of the steps in each possible implementation of the fifth aspect, the nouns For the meaning and beneficial effects, please refer to the third aspect and will not be repeated here.

In a sixth aspect, embodiments of the present application provide a computer program product. The computer program product includes a program. When the program is run on a computer, it causes the computer to execute the method for matching items described in the second aspect or the third aspect.

In a seventh aspect, embodiments of the present application provide a computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When the program is run on a computer, it causes the computer to execute the second aspect or the third aspect. How to match items.

In an eighth aspect, embodiments of the present application provide a client device, including a processor and a memory. The processor is coupled to the memory. The memory is used to store programs; the processor is used to execute the program in the memory, so that the client device executes the above. Methods performed by client devices in various aspects.

In a ninth aspect, embodiments of the present application provide a server, including a processor and a memory. The processor is coupled to the memory. The memory is used to store programs; the processor is used to execute the program in the memory, so that the server performs the above aspects. The method executed by the server.

In a tenth aspect, the present application provides a chip system, which includes a processor for supporting a terminal device or communication device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods. /or information. In a possible design, the chip system also includes a memory, which is used to store necessary program instructions and data for the terminal device or communication device. The chip system may be composed of chips, or may include chips and other discrete devices.

Description of drawings

Figure 1a is a schematic structural diagram of the artificial intelligence main framework provided by the embodiment of the present application;

Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application;

Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application;

Figure 2b is a schematic flowchart of a method for matching items provided by an embodiment of the present application;

Figure 3 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;

Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and target text information in the item matching method provided by the embodiment of the present application;

Figure 5 is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application;

Figure 6 is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application;

Figure 7 is a schematic flowchart of obtaining a target category in the item matching method provided by the embodiment of the present application;

Figure 8 is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application;

Figure 9 is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application;

Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application;

Figure 11 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;

Figure 12 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;

Figure 13 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;

Figure 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application;

Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application;

Figure 16 is a schematic structural diagram of a client device provided by an embodiment of the present application;

Figure 17 is a schematic structural diagram of a server provided by an embodiment of the present application;

Figure 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.

Detailed ways

The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances, and are merely a way of distinguishing objects with the same attributes in describing the embodiments of the present application. Furthermore, the terms "include" and "having" and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, product or apparatus comprising a series of elements need not be limited to those elements, but may include not explicitly other elements specifically listed or inherent to such processes, methods, products or equipment.

The embodiments of the present application are described below with reference to the accompanying drawings. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1a. Figure 1a shows a structural schematic diagram of the artificial intelligence main framework. The following is from the "intelligent information chain" (horizontal axis) and "IT value chain" ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.

(1)Infrastructure

Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms. Communicate with the outside through sensors; computing power is provided by a smart chip, which can specifically use a central processing unit (CPU), an embedded neural network processor (neural-network processing unit, NPU), a graphics processor ( Graphics processing unit (GPU), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) and other hardware acceleration chips; the basic platform includes distributed computing framework and network and other related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.

(2)Data

Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, Voice and text also involve IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3)Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.

Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.

Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.

(4) General ability

After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.

(5) Intelligent products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.

The embodiments of this application can be applied to various application scenarios in the field of artificial intelligence. Specifically, they can be applied to application scenarios that use pictures to search for items. As an example, please refer to Figure 1b. Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application. As shown in Figure 1b, when the user is using a shopping application, when the user clicks on the icon, you can enter the image to be processed to search for and purchase items of a category that have a matching relationship with the image to be processed.

As another example, when using a decoration design application, the user can input an image to be processed to search for items of a category that have a matching relationship with the image to be processed. It should be understood that the embodiments of the present application can also be applied In other scenarios of obtaining items that have a matching relationship with the image to be processed, other application scenarios will not be listed one by one here.

In conjunction with the above description, the item matching system provided by the embodiment of the present application is first described. Please refer to Figure 2a. Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application. The item matching system 200 includes training Device 210, database 220, execution device 230, data storage system 240 and client device 250. The execution device 230 includes a computing module 231.

Among them, the first training data set is stored in the database 220, the training device 210 generates the first model/rules 201, and uses the first training data set in the database to iteratively train the first model/rules 201 to obtain the mature first Model/Rule 201. The first model/rule 201 may be embodied as a model in the form of a first neural network or a non-neural network. In the embodiment of this application, the first model/rule 201 is a first neural network as an example for description.

The execution device 230 can call data, codes, etc. in the data storage system 240, and can also store data, instructions, etc. in the data storage system 240. The data storage system 240 may be placed in the execution device 230 , or the data storage system 240 may be an external memory relative to the execution device 230 .

The trained first model/rule 201 obtained by the training device 210 may be deployed in the execution device 230 , and the execution device 230 may appear as a server corresponding to the application program deployed on the client device 250 . The computing module 231 of the execution device 230 may obtain, through the first model/rule 201, a target category that has a matching relationship with the image to be processed, where the image to be processed is obtained through the client device 250, and the target category indicates the target category that has a matching relationship with the image to be processed. The image has a collocation relationship with a target category of items.

The client device 250 can be represented by various forms of terminal devices, such as mobile phones, tablets, laptops, virtual reality (VR) devices or augmented reality (AR) devices, etc.

In some embodiments of the present application, please refer to Figure 2a. The execution device 230 and the client device 250 may be independent devices. The execution device 230 is configured with an input/output (I/O) interface for data interaction with the client device 250. The "user" can input the image to be processed to the I/O interface through the client device 250, and the execution device 230 returns the items of the target category that have a matching relationship with the image to be processed to the client device 250 through the I/O interface, and provides them to the user.

It is worth noting that Figure 2a is only a schematic architectural diagram of a matching system for two items provided by an embodiment of the present invention, and the positional relationship between the equipment, components, modules, etc. shown in the figure does not constitute any limitation. For example, in other embodiments of the present application, the execution device 230 and the client device 250 can also be integrated into the same device, which is not limited here.

In the item matching system shown in Figure 2a, please continue to refer to Figure 2b. Figure 2b is a schematic flow chart of the item matching method provided by an embodiment of the present application. S1. Obtain the image to be processed input by the user. There is a background and at least two items in the image to be processed. S2. Based on the characteristic information of the image to be processed and the characteristic information of the at least two items, obtain an item of a target category that has a matching relationship with the image to be processed through the first neural network. S3. Display items of the target category.

In the embodiment of the present application, users can not only search for items they want to match by providing images to be processed, but also when the user inputs a complex image to be processed (that is, an image including at least two items), the user can still obtain A target category of items that matches the entire image to be processed greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.

In conjunction with the above description, the following begins to describe the specific implementation process of the inference phase of the first neural network provided by the embodiment of the present application. In the embodiment of the present application, the item matching system may include a client device and a server. The process of "obtaining a target category that has a matching relationship with the image to be processed" may include feature extraction of the matching image and the extraction of features based on the extracted features. Identify two parts of the target category.

Specifically, in one implementation, the aforementioned two parts can be completely dominated by the server, that is, the execution device of the first neural network and the client device are separated; in another implementation, the operations of the aforementioned two parts It can be completely led by the client device, that is, the execution device of the first neural network and the client device are integrated on the same device; in another implementation, the feature extraction operation can be performed on the client device, and the server can lead the determination of the target category. operation, the execution device of the first neural network and the client device are also separated. Since the specific implementation processes of the above three implementation methods are different, they are described separately below.

(1) Feature extraction and target category determination are both dominated by the server.

In the embodiment of the present application, please refer to Figure 3. Figure 3 is a schematic flowchart of a method of matching items provided by an embodiment of the present application. The method of matching items provided by an embodiment of the present application may include:

301. The client device obtains the image to be processed input by the user.

In the embodiment of the present application, the user can input the image to be processed through the client device. Correspondingly, the client device obtains the image to be processed input by the user to search for items that have a matching relationship with the image to be processed.

One or more items may exist in the image to be processed. Further, the image to be processed can be an image selected by the user from images stored locally on the client device, an image captured by the user using the camera on the client device, or an image downloaded by the user using a browser, etc. , no limitation is made here.

302. The client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.

In some embodiments of the present application, the client device can also obtain target text information input by the user, and the target text information is used to indicate the user's search intention. Further, the item indicated by the target text information may be an item in the image to be processed, or may not be an item in the image to be processed.

In order to understand this solution more intuitively, please refer to Figure 4. Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and the target text information in the item matching method provided by the embodiment of the present application. Figure 4 includes two sub-schematic diagrams (a) and (b). In the sub-schematic diagram (a) of Figure 4, after the user inputs the image to be processed through the icon surface pointed by A1 in the sub-schematic diagram of Figure 4 (a), the image can be Triggering entry into sub-diagram (b) of Figure 4, that is, prompting the user to input target text information through sub-diagram (b) of Figure 4. It should be understood that the example in Figure 4 is only for the convenience of understanding this solution. What kind of interface is used specifically? The schematic diagram can be flexibly set according to the actual product form, and is not limited here.

303. The server inputs the image to be processed into the third neural network to extract features of the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed. The target feature information includes feature information of the items in the image to be processed. and feature information of the image to be processed.

In some embodiments of the present application, after obtaining the image to be processed input by the user, the client can send the image to be processed to the server, and the server can input the received image to be processed into the third neural network to pass the third neural network. The network performs feature extraction on the entire image to be processed to obtain the feature information of the image to be processed. The feature information of the image to be processed includes the overall feature information composed of the background of the image to be processed and at least two items; the server also uses a third neural network Identify each item area in the image to be processed, and perform feature extraction on the items in the image to be processed, to obtain feature information of at least two items in the image to be processed, where the feature information of the at least two items includes attribute information of each item.

The target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed. The aforementioned feature information of the image to be processed refers to the feature information obtained after feature extraction of the image to be processed by treating the image to be processed as a whole (that is, the background of the image to be processed and at least two items); as For example, the feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed.

The characteristic information of at least two items in the image to be processed can also be called the set of semantic tags corresponding to the image to be processed. The characteristic information of the at least two items can include attribute information of each item. The attribute information of each item includes any of the following: One or more types of information: the position information of the item in the image to be processed, the category information of the item, and the color information of the item; optionally, it can also include style information of each item, the material of the item, the pattern of the item, or other Feature information.

Optionally, the characteristic information of items of different categories may include different information. As an example, if the image to be processed includes a bed and a top, the characteristic information of the bed may include the position information of the bed in the image to be processed, the bed's location information, and the location information of the bed. one type identification information, the color of the bed and the style of the bed. The characteristic information of the top may include the position information of the top in the image to be processed, the category information of the top, the color of the top, the shape of the top and the material of the top. It should be understood that here The examples are only used to facilitate understanding of this solution and are not used to limit this solution.

The third neural network can specifically be embodied as a convolutional neural network or other neural networks used for feature extraction. Further, the third neural network may include two different feature extraction networks: a first feature extraction network and a second feature extraction network. The first feature extraction network is used to generate feature information of at least two items in the image to be processed, and the second feature extraction network is used to generate feature information of the entire image to be processed.

In the training phase of the first feature extraction network, the first feature extraction network can be used as part of the neural network used for target recognition of images, that is, the training device can use the training data to train the neural network used for target recognition of images. The network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained first feature extraction network is obtained from it.

As an example, for example, a neural network used for object recognition in an image can identify coffee tables, sideboards, storage cabinets, shoe cabinets and flower racks in the image. That is, the first feature extraction network in the embodiment of the present application can be used in more detailed Feature extraction at a granular level.

In order to understand this solution more intuitively, please refer to Figure 5 , which is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application. As shown in Figure 5, after the image to be processed is input into the first feature extraction network, the first feature extraction network can identify the three item areas in the image to be processed and generate feature information of the items in the image to be processed. It should be understood that , the example in Figure 5 is only for convenience of understanding this solution and is not used to limit this solution.

In the training phase of the second feature extraction network, the second feature extraction network can be used as part of the neural network used to classify the entire image, that is, the training device can use the training data to classify the neural network used to classify the entire image. The network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained second feature extraction network is obtained from it.

In the embodiment of the present application, the characteristic information of the image to be processed refers to the characteristic information obtained by treating the image to be processed as a whole and extracting features from the matching image. The characteristic information of at least two items in the image to be processed may include The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and The feature information of each item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item. In the feature extraction process, the information of the object in the image to be processed is fully considered, which is beneficial to improve Accuracy of identified target categories.

304. The server inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.

In some embodiments of the present application, the server can also input text information into a fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.

Among them, step 302 is an optional step. If step 302 is executed, the text information input into the fourth neural network refers to the target text information obtained in step 302; if step 302 is not executed, the text information input into the fourth neural network is The text information may be the characteristic information of the items in the image to be processed obtained in step 303, that is, the text information input into the fourth neural network may be a set of semantic labels of the image to be processed.

The fourth neural network is a neural network that extracts features from text information. It can be embodied as a recurrent neural network or other types of neural networks, etc., and is not exhaustive here.

It should be noted that step 304 is also an optional step. If step 304 is not executed, step 302 does not need to be executed. After step 303 is executed, step 305 can be executed directly.

305. Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.

In this embodiment of the present application, the server may obtain a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items. Specifically, in an implementation manner, if steps 303 and 304 are executed, the server can input the target feature information and the feature information of the text information into the first neural network, so that the first neural network generates M corresponding to the image to be processed. Candidate intents, each candidate intent indicates a category of items that has a collocation relationship with the image to be processed.

Wherein, M is an integer greater than or equal to 1. Further, when there are at least two objects in the image to be processed, M is an integer greater than or equal to 2.

Optionally, the first neural network can also output M first scores that correspond one-to-one to the M candidate intentions, and each first score is used to indicate the probability that a candidate intention is consistent with the user's search intention.

Further optionally, when the information input to the first neural network changes, the number of candidate intentions output by the first neural network may be the same or different, that is, the first neural network may determine the number of candidate intentions output according to the actual situation.

The server sends the M candidate intentions to the client device to present the M candidate intentions to the user through the display interface of the client device; wherein the client device can present the M candidate intentions to the user in text, images, or other forms.

Optionally, if the first neural network also outputs M first scores, the server can also send the M first scores to the client device, and the client device can evaluate the M first scores according to the first scores corresponding to each candidate intention. M candidate intentions are sorted. The higher the first score, the higher the ranking position.

In order to understand this solution more intuitively, please refer to FIG. 6 , which is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application. For example, the image to be processed contains three main areas, namely bed, wardrobe and wall, and the text information is "wall decoration", then the target feature information can include the feature information of the bed, the feature information of the wardrobe, the feature information of the wall and the entire The characteristic information of the image to be processed, the M candidate intentions may include decorative paintings, pendants and lighting in Figure 6. It should be understood that the examples in Figure 6 are only for convenience of understanding this solution and are not used to limit this solution.

After the client device displays M candidate intentions to the user, in one case, if the client device obtains feedback operations corresponding to the M candidate intentions, it can determine that the image to be processed has the same characteristics as the image to be processed based on the feedback operations for the M candidate intentions. A target category of the matching relationship, and sends the target category that has a matching relationship to the collocation image to the server. Correspondingly, if the server obtains the aforementioned target category sent by the client device within the target time period, it can determine the target category corresponding to the image to be processed.

Among them, the "feedback operation" can be a selection operation for one of the M candidate intentions, or the "feedback operation" can also be the user manually inputting a new search intention, etc. The specific implementation of the "feedback operation" is not mentioned here. List in the form. Correspondingly, the target category may be one of the M candidate intentions, or may be other search intentions other than the M candidate intentions.

In order to understand this solution more intuitively, please refer to Figure 7 , which is a schematic flowchart of obtaining a target category in the item matching method provided by an embodiment of the present application. Among them, E1 and the server input the target feature information and the feature information of the text information into the first neural network, and the first neural network generates M candidate intentions corresponding to the image to be processed. E2. The server sends M candidate intentions to the client device. E3. The client device displays M candidate intentions to the user. E4. The client device determines a target category based on the user's feedback operations for the M candidate intent inputs. E5. The client device sends the target category to the server, and accordingly, the server receives the target category. It should be understood that the example in Figure 7 is only for convenience of understanding this solution and is not used to limit this solution.

In another case, if the client device does not obtain the feedback operations corresponding to the M candidate intentions within the target time period, the client device may not send any feedback information to the server, or the client device may also send the first response to the server. Feedback information, the first feedback information is used to inform the server that the feedback operation input by the user has not been received. Correspondingly, if the server does not receive feedback information from the client device within the target time period, or receives the first feedback information sent by the client device, the candidate intention with the highest first score value among the M candidate intentions can be determined as target category.

In another implementation, if step 303 is executed but step 304 is not executed, the server can input the target feature information into the first neural network, so that the first neural network generates M candidate intentions corresponding to the image to be processed.

The server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device operates based on the feedback, Determine a target category that has a matching relationship with the image to be processed, and send the target category corresponding to the collocated image to the server.

It should be noted that the difference between this implementation method and the previous implementation method is that in the previous implementation method, "target feature information and feature information of text information" are input into the first neural network, while in this implementation method, only "target feature information and text information" are input into the first neural network. "Feature information" is input into the first neural network. For the specific implementation of this implementation, please refer to the description in the previous implementation, and will not be described again here.

In the embodiment of the present application, when performing feature extraction on the image to be processed, not only the feature information of the entire image to be processed can be obtained, but also the feature information of the items in the image to be processed can be obtained, and then based on the feature information of the entire image to be processed and the features to be processed, Process the characteristic information of the items in the image and generate M categories of items that have a matching relationship with the entire image to be processed. That is, not only the information of the entire image to be processed is considered, but also each object in the image to be processed is fully considered. It is beneficial to improve the accuracy of determined candidate intentions.

Optionally, the target text information input by the user can also be obtained. The target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, after obtaining and In the process of establishing a matching relationship between the images to be processed, not only the information in the images to be processed can be fully obtained, but also the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intentions.

In another implementation, if neither step 303 nor step 304 is executed, the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed. feature information; based on the feature information of the entire image to be processed, M candidate intentions corresponding to the image to be processed are generated through the first neural network.

The server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device is based on According to the feedback operation, a target category that has a matching relationship with the image to be processed is determined, and the target category corresponding to the matching image is sent to the server. For specific implementation methods of the aforementioned steps, please refer to the above description.

In the embodiment of this application, M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used This method guides the user's search intention, which is conducive to improving the accuracy of the determined target category.

In another implementation, if steps 303 and 304 are executed, the server can also input the target feature information and the feature information of the text information into the first neural network to obtain an image generated by the first neural network that has a matching relationship with the image to be processed. target category.

In another implementation, if step 303 is executed but step 304 is not executed, the server can also input the target feature information into the first neural network to obtain a target category generated by the first neural network that has a matching relationship with the image to be processed. .

In another implementation, if neither step 303 nor step 304 is executed, the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed. According to the characteristic information of the entire image to be processed, a target category that has a matching relationship with the image to be processed is generated through the first neural network.

306. The server obtains N candidate items, each of which is a target category.

In some embodiments of the present application, after the server determines a target category that has a matching relationship with the image to be processed, it can obtain N candidate items corresponding to the target category from the item library stored in the server. That is, the server can obtain N candidate items from the items. Obtain N candidate items of the target category from the library, where N is an integer greater than 1.

307. The server generates target scores corresponding to the N candidate items through the second neural network. The target scores indicate the matching degree between the candidate items and the image to be processed.

In some embodiments of the present application, the server can generate a target score corresponding to each of the N candidate items through a second neural network, where a target score indicates the matching degree between a candidate item and the image to be processed, That is, it is used to indicate the aesthetic score of the matching effect of a candidate item and the image to be processed.

In order to understand this solution more intuitively, please refer to Figure 8 , which is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application. Figure 8 includes three sub-schematic diagrams (a), (b) and (c). The sub-schematic diagram (a) of Figure 8 shows the three items in the image to be processed; the sub-schematic diagram (b) of Figure 8 shows The candidate item is sofa one, and the score of the matching effect diagram of sofa one and the image to be processed is 0.956 points; the candidate item shown in the sub-schematic diagram (c) of Figure 8 is sofa two, and the matching effect of sofa two and the image to be processed is The score of the graph is 0.425 points. It means that the matching degree between sofa 1 and the image to be processed is higher than the matching degree between sofa 2 and the image to be processed. It should be understood that the example in Figure 8 is only for convenience of understanding this solution and is not used to limit this solution.

Specifically, in one implementation, the server can input the feature information and target feature information of each candidate item into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates N By performing the foregoing operations on each of the candidate items, a target score corresponding to each of the N candidate items can be generated.

In another implementation, the server can also input the image of each candidate item and the image to be processed into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates the N candidates thing If each candidate item in the product performs the above operations, the target score corresponding to each candidate item can be generated.

In another implementation, the server can also input the image of each candidate item, the semantic label of each candidate item, the image to be processed, and the semantic label of the item in the image to be processed into the second neural network to obtain the second neural network. The target score corresponding to each candidate item output by the network.

The second neural network may be a convolutional neural network or other types of neural networks. The semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed. The semantic label of the candidate item may include at least one attribute information of the candidate item. As an example, the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc., are not exhaustive here.

In order to understand this solution more intuitively, please refer to Figure 9 , which is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application. As shown in Figure 9, after the server inputs the image of each candidate item and the semantic label of each candidate item into the second neural network, it performs feature extraction on the image of the candidate item through the second neural network to obtain the characteristics of the image of the candidate item. information, and performs feature extraction on the semantic labels of the candidate items to obtain the feature information of the semantic labels of the candidate items; the server fuses the feature information of the image of the candidate items and the feature information of the semantic labels of the candidate items through the second neural network, and Convolve the fused feature information to obtain the feature information corresponding to the candidate items.

After the server inputs the image to be processed and the semantic labels of the items in the image to be processed into the second neural network, it performs feature extraction on the image to be processed through the second neural network, obtains the feature information of the image to be processed, and obtains the semantic labels of the items in the image to be processed. Perform feature extraction to obtain the feature information of the semantic tag of the item in the image to be processed; the server fuses the feature information of the image to be processed and the feature information of the semantic tag through the second neural network, and convolves the fused feature information , obtain the feature information corresponding to the image to be processed.

As shown in Figure 9, based on the characteristic information corresponding to the candidate item and the characteristic information corresponding to the image to be processed, the server performs the above-mentioned multiplication, fusion and other operations through the second neural network, and then outputs a matching effect of the candidate item and the image to be processed. a target score. It should be understood that the example in Figure 9 is only for convenience of understanding this solution and is not used to limit this solution.

For the training phase of the second neural network. Specifically, a training data set may be stored on the training device, and each training data may include an image to be processed, feature information of items in the image to be processed, images of at least two candidate items, and semantic labels corresponding to each candidate item. , the expected result corresponding to the training data is the one of the aforementioned at least two candidate items that is most suitable for the image to be processed.

The training device can form a set of target data by combining the image to be processed, the characteristic information of the items in the image to be processed, the image of each candidate item, and the semantic label corresponding to the image of each candidate item. Then the training device can obtain a set of target data related to at least two At least two sets of target data corresponding one-to-one to each candidate item.

The training device inputs each set of target data into the second neural network to obtain a target score output by the second neural network; the training device performs the aforementioned operations on each set of at least two sets of target data through the second neural network, then we can obtain At least two target scores are in one-to-one correspondence with at least two sets of target data, that is, at least two target scores are obtained in one-to-one correspondence with at least two candidate items.

The training device selects the most suitable item from at least two candidate items according to the at least two target scores mentioned above and the image to be processed. An item is matched, and the previously selected item is used as the prediction result corresponding to the training data.

The training device generates the function value of the loss function based on the predicted results and expected results corresponding to the training data, and reversely updates the weight parameters of the second neural network, thus completing a training of the second neural network. The training device uses multiple data in the training data set to iteratively train the second neural network until the convergence condition is met, and the trained second neural network is obtained.

308. The server obtains K target items corresponding to the target category, and each target item is a target category.

In the embodiment of the present application, steps 306 and 307 are both optional steps. If steps 306 and 307 are executed, step 308 may include: the server selects K items from the N candidate items based on the target scores corresponding to the N candidate items. Target item, K is an integer greater than or equal to 1. Among them, the candidate item with a higher target score has a greater probability of being selected.

In the embodiment of the present application, a target score corresponding to N candidate items is generated through a neural network. The target score indicates the matching degree between the candidate item and the image to be processed; and based on the matching degree between each candidate item and the image to be processed , select the target item that is finally displayed to the user from N candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.

If steps 306 and 307 are not performed, the server can also directly obtain K target items corresponding to the target category from the item library, and the category of each target item is the target category indicated by the target category.

309. The server sends the information of the target item to the client device.

In this embodiment of the present application, after acquiring K target items corresponding to the target category, the server may acquire the information of each target item among the K target items, and send the information of each target item to the client device.

Among them, the information of each target item may include the image corresponding to the target item; optionally, the information of each target item may also include any one or more of the following information: access link, name, price, target of the target item Ratings or other types of information about items are not limited here.

Further, the image corresponding to the target item may be the image of the target item itself; it may also be a matching rendering of the target item and the image to be processed generated by the server using a neural network. The aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc., and are not limited here.

To understand this solution more intuitively, please refer to Figure 10. Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application. As shown in Figure 10, the sub-schematic diagram on the left shows the image to be processed, and the two sub-schematic diagrams on the right respectively show two different target items and two matching renderings of the image to be processed. It should be understood that Figure The examples in 10 are only for the convenience of understanding this solution and are not used to limit this solution.

310. The client device displays K target items corresponding to one target category to the user.

In this embodiment of the present application, after acquiring the information of each of the K target items sent by the server, the client device will display the K target items corresponding to the one target category to the user.

Specifically, the client device can show the user the image corresponding to each target item; the image corresponding to the target item can be an image of the target item, or a matching effect diagram of each target item and the image to be processed; for the matching effect For a further understanding of the display method of the picture, please refer to the description in the previous step and will not be repeated here.

In the embodiment of the present application, the client device can display to the user the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed. It will help improve the user stickiness of this program.

Optionally, the client device can also display any one or more of the following information about each target item to the user: access links, names, prices, target ratings or other types of information about the items in each target category, etc., There are no limitations here.

In order to understand this solution more intuitively, please refer to FIG. 11 , which is a schematic flowchart of a method for matching items provided by an embodiment of the present application. As shown in Figure 11, after obtaining the image and text information to be processed input by the user (that is, the wall decoration in Figure 11), the client device displays three candidate intentions to the user, namely the decorative paintings and pendants in Figure 11 and lighting, the client device sends feedback information to the server based on the user's selection operation of the candidate intention "decorative painting", and the aforementioned feedback information is used to instruct the server that the target category is "decorative painting".

Based on the target category "decorative painting", the server sends information about two different decorative paintings (ie, target items) to the client device. The information of each decorative painting includes the matching renderings of the decorative painting and the image to be processed, the name of the decorative painting, the price of the decorative painting, and the size of the decorative painting. It should be understood that the example in Figure 11 shows the item from the perspective of the customer's device The implementation process of the matching method. The example in Figure 11 is only for convenience of understanding this solution and is not used to limit this solution.

(2) Feature extraction and target category determination are both dominated by the client.

In the embodiment of the present application, please refer to Figure 12. Figure 12 is a schematic flowchart of a method of matching items provided by an embodiment of the present application. The method of matching items provided by an embodiment of the present application may include:

1201. The client device obtains the image to be processed input by the user.

1202. The client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.

1203. The client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed. The target feature information includes at least the characteristics of the items in the image to be processed. Feature information and feature information of the image to be processed.

1204. The client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.

1205. Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the client device obtains a target category that has a matching relationship with the image to be processed through the first neural network.

In the embodiment of this application, for the specific implementation of steps 1201 to 1205, please refer to the description of steps 301 to 305 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 to 305 are executed by the server, while Figure 12 corresponds to In the embodiment, steps 1203 to 1205 are executed by the client device, and will not be described again here.

1206. The client device sends the target category to the server.

1207. The server obtains N candidate items, each of which is a target category.

1208. The server generates target scores corresponding to the N candidate items through the second neural network. The target scores indicate the matching degree between the candidate items and the image to be processed.

1209. The server obtains K target items corresponding to the target category, and each target item is a target category.

1210. The server sends the information of the target item to the client device.

1211. The client device displays K target items corresponding to one target category to the user.

In the embodiment of this application, for the specific implementation of steps 1207 to 1211, please refer to the steps in the corresponding embodiment in Figure 3 The description of 306 to 310 will not be repeated here.

(3) The feature extraction operation is performed by the client, and the part of determining the target category is led by the server

In the embodiment of the present application, please refer to Figure 13. Figure 13 is a schematic flowchart of a method of matching items provided by an embodiment of the present application. The method of matching items provided by an embodiment of the present application may include:

1301. The client device obtains the image to be processed input by the user.

1302. The client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.

1303. The client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed. The target feature information includes features of the items in the image to be processed. information and feature information of the image to be processed.

1304. The client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.

In the embodiment of this application, for the specific implementation of steps 1301 to 1304, please refer to the description of steps 301 to 304 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 and 304 are executed by the server, while Figure 13 corresponds to In the embodiment, steps 1303 and 1304 are executed by the client device, and will not be described again here.

The client device may send the target feature information to the server; optionally, the client device sends the target feature information and the feature information of the text information to the server.

1305. Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.

1306. The server obtains N candidate items corresponding to the target category, and each candidate item is the target category.

1307. The server generates target scores corresponding to the N candidate items through the second neural network. The target scores indicate the matching degree between the candidate items and the image to be processed.

1308. The server obtains K target items corresponding to the target category, and each target item is a target category.

1309. The server sends the information of the target item to the client device.

1310. The client device displays K target items corresponding to one target category to the user.

In the embodiment of this application, for the specific implementation of steps 1305 to 1310, please refer to the description of steps 305 to 310 in the corresponding embodiment in Figure 3, and will not be described again here.

In the embodiment of the present application, the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network , and then display the items of the target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two items) image), it is still possible to obtain a target category of items that has a matching relationship with the entire image to be processed, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire image to be processed The characteristic information and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be processed is considered, but also each item in the image to be processed is fully considered. Object, which helps to improve the accuracy of the determined target category.

On the basis of the embodiments corresponding to Figures 1a to 13, in order to better implement the above solutions of the embodiments of the present application, relevant equipment for implementing the above solutions is also provided below. Specifically referring to Figure 14, Figure 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application. The item matching device 1400 is applied to the client device in the item matching system. The item matching system also includes a server. The matching device 1400 includes: an acquisition module 1401, which is used to acquire an image input by a user, in which there is a background and at least two items; a receiving module 1402, which is used to receive a target category of items sent by the server that has a matching relationship with the image. The items of the category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module 1403 is used to display the items of the target category.

In a possible design, the characteristic information of the image includes the overall characteristic information composed of the background and at least two items. The characteristic information of the at least two items includes attribute information of each item. The attribute information of each item includes the following Any one or more types of information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.

In one possible design, the receiving module 1402 is also used to receive M candidate intentions corresponding to the image sent by the server. M is an integer greater than or equal to 2. Each candidate intention indicates a type that has a collocation relationship with the image. category of items; the display module 1403 is also used to display M candidate intentions; the acquisition module 1401 is also used to obtain the feedback operations corresponding to the M candidate intentions, and determine the characteristics of the image based on the feedback operations for the M candidate intentions. A target category of collocation relationships.

In one possible design, the display module 1403 is specifically used to display the matching renderings of items and images of the target category.

It should be noted that the information interaction, execution process, etc. between the modules/units in the item matching device 1400 are based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application. For specific content, please refer to this article. The descriptions in the method embodiments shown above will not be repeated here.

Please refer to Figure 15. Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application. The item matching device 1500 is applied to the server in the item matching system. The item matching system also includes client equipment. The matching device 1500 includes: an acquisition module 1501, configured to acquire a target category of items that has a matching relationship with the image through a first neural network based on the feature information of the image and the feature information of at least two items, where there is a background in the image and at least two items; a sending module 1502 configured to send items of the target category to the client device.

In one possible design, the acquisition module 1501 is specifically used for:

Based on the characteristic information of the image and the characteristic information of at least two items, M candidate intentions corresponding to the image are generated through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a matching relationship with the image. items of various categories; sends M candidate intentions to the client device, and the M candidate intentions are used for the client device to obtain a target category that has a matching relationship with the image; receives the target category sent by the client device.

In one possible design, the acquisition module 1501 is specifically used for:

Through the first neural network, N candidate items that have a matching relationship with the image are obtained. Each candidate item is a target category, and N is an integer greater than 1; through the second neural network, scores corresponding to the N candidate items are generated, and the scoring instructions are Matching degree between candidate items and images; select from N candidate items based on the scores corresponding to N candidate items K target items, K is an integer greater than or equal to 1; the sending module is specifically used to send K target items to the client device.

It should be noted that the information interaction, execution process, etc. between the modules/units in the item matching device 1500 are based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application. For specific content, please refer to this article. The descriptions in the method embodiments shown above will not be repeated here.

Next, a client device provided by an embodiment of the present application is introduced. Please refer to Figure 16. Figure 16 is a schematic structural diagram of a client device provided by an embodiment of the present application. The client device 1600 can be embodied as a mobile phone, a tablet, a notebook computer, Smart wearable devices, smart robots or smart homes, etc. are not limited here. Specifically, the client device 1600 includes: a receiver 1601, a transmitter 1602, a processor 1603 and a memory 1604 (the number of processors 1603 in the client device 1600 can be one or more, one processor is taken as an example in Figure 16) , wherein the processor 1603 may include an application processor 16031 and a communication processor 16032. In some embodiments of the present application, the receiver 1601, the transmitter 1602, the processor 1603, and the memory 1604 may be connected by a bus or other means.

Memory 1604 may include read-only memory and random access memory and provides instructions and data to processor 1603 . A portion of memory 1604 may also include non-volatile random access memory (NVRAM). The memory 1604 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.

Processor 1603 controls the operation of the client device. In a specific application, various components of the customer equipment are coupled together through a bus system. In addition to the data bus, the bus system may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, various buses are called bus systems in the figure.

The methods disclosed in the above embodiments of the present application can be applied to the processor 1603 or implemented by the processor 1603. The processor 1603 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1603 . The above-mentioned processor 1603 can be a general processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 1603 can implement or execute each method, step and logical block diagram disclosed in the embodiment of this application. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 1604. The processor 1603 reads the information in the memory 1604 and completes the steps of the above method in combination with its hardware.

The receiver 1601 may be used to receive input numeric or character information and generate signal inputs related to relevant settings and functional controls of the client device. The transmitter 1602 can be used to output numeric or character information through the first interface; the transmitter 1602 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1602 also Can include display devices such as display screens.

In the embodiment of the present application, the processor 1603 is used to execute the item matching method executed by the client device in the corresponding embodiment of FIG. 2b to FIG. 13 . Specifically, the application processor 16031 is used to obtain an image input by the user, in which there is a background and at least two items; receive a target category of items sent by the server that has a matching relationship with the image, and the target category of items is the server's image based on The characteristic information of the item and the characteristic information of at least two items are obtained; items of the target category are displayed.

It should be noted that the specific manner in which the application processor 16031 performs the above steps is based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application, and the technical effects it brings are the same as those in Figures 2b to 13 in this application. 13 corresponds to the same method embodiments. For details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.

The embodiment of the present application also provides a server. Please refer to Figure 17. Figure 17 is a schematic structural diagram of the server provided by the embodiment of the present application. Specifically, the server 1700 is implemented by one or more servers. The server 1700 can be configured or There is a relatively large difference due to different performance, which may include one or more central processing units (CPU) 1722 (for example, one or more processors) and memory 1732, and one or more storage applications 1742 or data 1744 storage medium 1730 (eg, one or more mass storage devices). Among them, the memory 1732 and the storage medium 1730 may be short-term storage or persistent storage. The program stored in the storage medium 1730 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server. Furthermore, the central processor 1722 may be configured to communicate with the storage medium 1730 and execute a series of instruction operations in the storage medium 1730 on the server 1700 .

Server 1700 may also include one or more power supplies 1726, one or more wired or wireless network interfaces 1750, one or more input and output interfaces 1758, and/or, one or more operating systems 1741, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.

In this embodiment of the present application, the central processing unit 1722 is used to execute the item matching method executed by the server in the corresponding embodiment of FIGS. 2b to 13 . Specifically, the central processor 1722 is configured to obtain a target category of items that has a collocation relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, where there are background and At least two items; items of the target category are sent to the client device.

It should be noted that the specific manner in which the central processor 1722 performs the above steps is based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application, and the technical effects it brings are the same as those in Figures 2b to 13 in this application. 13 corresponds to the same method embodiments. For details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.

Embodiments of the present application also provide a computer program product. The computer program product includes a program. When the program is run on a computer, it causes the computer to execute the methods executed by the client device in the methods described in the embodiments shown in Figures 2b to 13. or, causing the computer to perform the steps performed by the server in the method described in the embodiments shown in FIGS. 2b to 13 .

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a program. When the program is run on a computer, it causes the computer to execute the foregoing description of the embodiments shown in Figures 2b to 13. The steps performed by the client device in the method, or causing the computer to perform the steps described in the embodiments shown in Figures 2b to 13 The steps performed by the server in the method described above.

The client device, server or item matching device provided by the embodiment of the present application may specifically be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface. , pins or circuits, etc. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip executes the matching method of items described in the embodiments shown in FIGS. 2b to 13 . Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.

Specifically, please refer to Figure 18. Figure 18 is a structural schematic diagram of a chip provided by an embodiment of the present application. The chip can be represented as a neural network processor NPU 180. The NPU 180 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1803. The arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.

In some implementations, the computing circuit 1803 includes multiple processing units (Process Engine, PE). In some implementations, arithmetic circuit 1803 is a two-dimensional systolic array. The arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1803 is a general-purpose matrix processor.

For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit. The operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .

The unified memory 1806 is used to store input data and output data. The weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802. Input data is also transferred to unified memory 1806 via DMAC.

BIU is the Bus Interface Unit, that is, the bus interface unit 1810, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1809.

The bus interface unit 1810 (Bus Interface Unit, BIU for short) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .

The vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. Mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.

In some implementations, vector calculation unit 1807 can store the processed output vectors to unified memory 1806 . For example, the vector calculation unit 1807 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value. exist In some implementations, vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.

The instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;

The unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.

Among them, the operations of each layer in the first neural network, the second neural network, the third neural network and the fourth neural network shown in the method embodiments corresponding to Figures 2b to 13 can be performed by the operation circuit 1803 or the vector calculation unit 1807 implement.

The processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect.

In addition, it should be noted that the device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate. The physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the device embodiments provided in this application, the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data The center transmits to another website site, computer, training equipment or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or may be a training device, data, or data integrated with one or more available media. Center and other data storage equipment. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Claims

A method for matching items, characterized in that the method includes:

Obtain the image input by the user, and there is a background and at least two items in the image;

Based on the characteristic information of the image and the characteristic information of the at least two items, obtain an item of a target category that has a matching relationship with the image through a first neural network;

Display items of the stated target category.
The method according to claim 1, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of an item. The attribute information of each item includes any one or more of the following information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
The method according to claim 1 or 2, characterized in that, based on the characteristic information of the image and the characteristic information of the at least two items, a first neural network is used to obtain a matching relationship with the image. Target categories of items, including:

Based on the characteristic information of the image and the characteristic information of the at least two items, M candidate intentions corresponding to the image are generated through the first neural network, where M is an integer greater than or equal to 2, each The candidate intent indicates a category of items that has a collocation relationship with the image;

Display the M candidate intentions to obtain feedback operations corresponding to the M candidate intentions;

According to the feedback operation for the M candidate intentions, the one target category that has a matching relationship with the image to be processed is determined, and items of the target category are obtained.
The method according to claim 1 or 2, characterized in that, using the first neural network to obtain a target category of items that has a matching relationship with the image includes:

Obtain N candidate items that have a matching relationship with the image through the first neural network, each of the candidate items is the target category, and the N is an integer greater than 1;

Generate scores corresponding to the N candidate items through a second neural network, the scores indicating the matching degree between the candidate items and the image;

Select K target items from the N candidate items according to the scores corresponding to the N candidate items, where K is an integer greater than or equal to 1;

The displaying the items of the target category includes: displaying the K target items.
The method according to claim 1 or 2, characterized in that displaying the items of the target category includes: displaying a matching effect diagram of the items of the target category and the image.
A method for matching items, characterized in that the method is applied to client devices in a system for matching items, and the system for matching items also includes a server, and the method includes:

Obtain the image input by the user, and there is a background and at least two items in the image;

Receive a target category of items sent by the server that has a matching relationship with the image. The target category of items is obtained by the server based on the feature information of the image and the feature information of the at least two items. ;

Display items of the stated target category.
The method according to claim 6, characterized in that the feature information of the image includes the background and The characteristic information of the whole composed of the at least two items, the characteristic information of the at least two items includes attribute information of each item, and the attribute information of each item includes any one or more of the following information: category of the item, The color of the item, the style of the item, the material of the item, or the pattern of the item.
The method according to claim 6 or 7, characterized in that, the method further includes:

Receive M candidate intents corresponding to the image sent by the server, and display the M candidate intents, where M is an integer greater than or equal to 2, and each of the candidate intent indications has a match with the image. A category of items related to;

Feedback operations corresponding to the M candidate intentions are obtained, and based on the feedback operations for the M candidate intentions, the one target category that has a matching relationship with the image is determined.
A method of matching items, characterized in that the method is applied to a server in a matching system of items, and the matching system of items also includes client equipment, and the method includes:

Based on the feature information of the image and the feature information of at least two items, a first neural network is used to obtain an item of a target category that has a matching relationship with the image, wherein the background and the at least two items are present in the image. ;

Information about items of the target category is sent to the client device.
The method according to claim 9, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of an item. The attribute information of each item includes any one or more of the following information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
The method according to claim 9 or 10, characterized in that, based on the characteristic information of the image and the characteristic information of the at least two items, a first neural network is used to obtain a matching relationship with the image. Target categories of items, including:

Based on the characteristic information of the image and the characteristic information of the at least two items, M candidate intentions corresponding to the image are generated through the first neural network, where M is an integer greater than or equal to 2, each The candidate intent indicates a category of items that has a collocation relationship with the image;

Send the M candidate intentions to the client device, where the M candidate intentions are used for the client device to obtain the one target category that has a matching relationship with the image;

Receive the target category sent by the client device.
An item matching device, characterized in that the device is applied to client equipment in an item matching system, the item matching system also includes a server, and the device includes:

An acquisition module, used to acquire an image input by the user, in which there is a background and at least two items;

A receiving module, configured to receive a target category of items sent by the server that has a matching relationship with the image. The target category of items is the server based on the characteristic information of the image and the at least two items. Characteristic information obtained;

Display module, used to display items of the target category.
The device according to claim 12, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of items. The attribute information of each item includes any one or more of the following information: category of item, color of item, The style of the item, the material of the item, or the pattern of the item.
The device according to claim 12 or 13, characterized in that,

The receiving module is also configured to receive M candidate intentions corresponding to the image sent by the server, where M is an integer greater than or equal to 2, and each of the candidate intention indications has a matching relationship with the image. a category of items;

The display module is also used to display the M candidate intentions;

The acquisition module is also configured to acquire feedback operations corresponding to the M candidate intentions, and determine the one target category that has a collocation relationship with the image based on the feedback operations for the M candidate intentions.
The device according to claim 12 or 13, characterized in that,

The display module is specifically used to display the matching renderings of items of the target category and the image.
An item matching device, characterized in that the device is applied to a server in an item matching system, the item matching system also includes client equipment, and the device includes:

An acquisition module, configured to acquire, through a first neural network, a target category of items that has a collocation relationship with the image based on the feature information of the image and the feature information of at least two items, wherein the background and all items are present in the image. At least two items mentioned above;

A sending module, configured to send information about items of the target category to the client device.
The device according to claim 16, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of an item. The attribute information of each item includes any one or more of the following information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
The device according to claim 16 or 17, characterized in that the acquisition module is specifically used for:

Based on the characteristic information of the image and the characteristic information of the at least two items, M candidate intentions corresponding to the image are generated through the first neural network, where M is an integer greater than or equal to 2, each The candidate intent indicates a category of items that has a collocation relationship with the image;

Send the M candidate intentions to the client device, where the M candidate intentions are used for the client device to obtain the one target category that has a matching relationship with the image;

Receive the target category sent by the client device.
The device according to claim 16 or 17, characterized in that the acquisition module is specifically used for:

Obtain N candidate items that have a matching relationship with the image through the first neural network, each of the candidate items is the target category, and the N is an integer greater than 1;

Generate scores corresponding to the N candidate items through a second neural network, the scores indicating the matching degree between the candidate items and the image;

Select K target items from the N candidate items according to the scores corresponding to the N candidate items, where K is an integer greater than or equal to 1;

The sending module is specifically configured to send the K target items to the client device.
A computer program product, characterized in that the computer program product includes a program that, when the program is run on a computer, causes the computer to execute the method according to any one of claims 1 to 5, or causes the computer to The computer performs the method according to any one of claims 6 to 8, or the computer is caused to perform the method according to any one of claims 9 to 11.
A computer-readable storage medium, characterized in that a program is stored in the computer-readable storage medium. When the program is run on a computer, it causes the computer to execute the method described in any one of claims 1 to 5. The method, or causes the computer to perform the method as described in any one of claims 6 to 8, or causes the computer to perform the method as described in any one of claims 9 to 11.
A client device, characterized by including a processor and a memory, the processor being coupled to the memory,

The memory is used to store programs;

The processor is configured to execute a program in the memory, so that the client device executes the method according to any one of claims 6 to 8.
A server, characterized by comprising a processor and a memory, the processor being coupled to the memory,

The memory is used to store programs;

The processor is configured to execute a program in the memory, so that the server executes the method according to any one of claims 9 to 11.