WO2023185787A1 - Article matching method and related device - Google Patents

Article matching method and related device Download PDF

Info

Publication number
WO2023185787A1
WO2023185787A1 PCT/CN2023/084241 CN2023084241W WO2023185787A1 WO 2023185787 A1 WO2023185787 A1 WO 2023185787A1 CN 2023084241 W CN2023084241 W CN 2023084241W WO 2023185787 A1 WO2023185787 A1 WO 2023185787A1
Authority
WO
WIPO (PCT)
Prior art keywords
items
image
item
candidate
information
Prior art date
Application number
PCT/CN2023/084241
Other languages
French (fr)
Chinese (zh)
Inventor
邓一萌
杨坚鑫
李继忠
曹朝
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023185787A1 publication Critical patent/WO2023185787A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5862Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using texture

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method of matching items and related equipment.
  • AI Artificial Intelligence
  • Common item search solutions in the industry include photo search. Specifically, users can take photos of the items they want to search for, and then search for similar items based on the input pictures.
  • the embodiments of the present application provide an item matching method and related equipment.
  • a complex image to be processed that is, an image including at least two items
  • a target category of items related to each other which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.
  • embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search.
  • the method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; the server or the client device obtains a target category of items that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items in the image to be processed; the client device Show the user items from the aforementioned target categories.
  • the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network, and then Display the target items corresponding to a target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two (image of an item), it is still possible to obtain a target category of items that has a matching relationship with the entire to-be-processed image, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire to-be-processed image
  • the characteristic information of the image and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be
  • the method further includes: the server or the client device inputs the image to be processed into a third neural network, so as to perform feature extraction on the image to be processed through the third neural network, and obtain the feature corresponding to the image to be processed.
  • Target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed.
  • the characteristic information of the image to be processed includes the characteristic information of the whole composed of the background and at least two items. That is, the characteristic information of the image to be processed refers to treating the image to be processed as a whole and extracting features from the matching image.
  • Feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed; at least two items in the image to be processed
  • the characteristic information of can also be called the semantic label set of the image to be processed.
  • the characteristic information of at least two items in the image to be processed can include attribute information of each item.
  • the attribute information of each item includes any one or more of the following information. : The category of the item, the color of the item and the location information of the item in the image to be processed; optionally, it can also include style information of each item, the material of the item, the pattern of the item or other feature information.
  • the feature information of the image to be processed refers to the feature information obtained by treating the image to be processed as a whole and extracting features from the matching images.
  • the feature information of at least two items in the image to be processed can include each The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and each The characteristic information of an item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the information of the object in the image to be processed is fully considered, which is beneficial to improving the determination of the item. accuracy of the target category.
  • the server or the client device obtains a target that has a matching relationship with the image to be processed based on the characteristic information of the image to be processed and the characteristic information of at least two items through the first neural network.
  • Category including: the server or client device generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category that has a collocation relationship with the image to be processed items; the client device displays M candidate intentions to the user to obtain feedback operations corresponding to the M candidate intentions; the client device determines a target category that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions .
  • the "feedback operation” may be a selection operation on one of the M candidate intentions, or the “feedback operation” may also be a user manually inputting a new search intention, etc.
  • M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used. This method guides the user's search intention and is conducive to improving the accuracy of the determined target category.
  • the method further includes: the client device obtains target text information input by the user, the target text information is used to indicate the user's search intention; the server or the client device inputs the text information into the fourth neural network , to perform feature extraction on the text information through the fourth neural network to obtain the feature information of the text information.
  • the server or client device obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including: combining the characteristic information of the image to be processed and at least The characteristic information of the two items and the characteristic information of the text information are input into the first neural network, so as to obtain a target category that has a matching relationship with the image to be processed through the first neural network.
  • the target text information input by the user can also be obtained.
  • the target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, when obtaining
  • the target feature information and the feature information of the target text information are input into the third neural network together, that is, when obtaining
  • the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intention.
  • the client device obtains an item of a target category that has a collocation relationship with the image to be processed through a first neural network, including: the server obtains an item that has a collocation relationship with the image to be processed through the first neural network N candidate items of the relationship, each candidate item is a target category, and N is an integer greater than 1; the server generates a target score corresponding to the N candidate items through the second neural network, and the target score indicates the relationship between the candidate item and the image to be processed.
  • the matching degree between the N candidate items that is, the aesthetic score used to indicate the matching renderings of a candidate item and the image to be processed; the server selects K target items from the N candidate items based on the target scores corresponding to the N candidate items, K is an integer greater than or equal to 1.
  • Items of the target category displayed on the customer equipment include: K target items displayed on the customer equipment.
  • scores corresponding to N candidate items are generated through a neural network.
  • the scores indicate the matching degree between the candidate items and the image to be processed; and based on the matching degree between each candidate item and the image to be processed, from N Select the target item that is finally displayed to the user among the candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.
  • generating a target score corresponding to N candidate items through a second neural network includes: combining the image of each candidate item, the semantic label of each candidate item, the image to be processed, and The semantic labels corresponding to the items in the image to be processed are input into the second neural network, and the target score corresponding to each candidate item output by the second neural network is obtained.
  • the semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed.
  • the semantic label of the candidate item may include at least one attribute information of the candidate item.
  • the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc.
  • the client device displays items of a target category to the user, including: the client device displays to the user a rendering of a combination of the items of the target category and the image to be processed.
  • the aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc.
  • the client device can also display to the user any one or more of the following information about the items in each target category: access links, names, prices, target ratings, or other types of information about the items in each target category. etc., there is no limit here.
  • the user is shown the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed, which is conducive to improving the solution. user viscosity.
  • embodiments of the present application provide an item matching method, which can apply artificial intelligence technology in the field of item search.
  • the method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; receive an item of a target category sent by the server that has a matching relationship with the image to be processed.
  • the item of the target category is obtained by the server based on the feature information of the image to be processed and the feature information of at least two items; display the items of the target category thing.
  • the feature information of the image includes a background and at least two items.
  • the overall characteristic information of at least two items includes attribute information of each item.
  • the attribute information of each item includes any one or more of the following information: category of item, color of item, style of item, item The material or pattern of the item.
  • the client device receives M candidate intentions corresponding to the image to be processed sent by the server, and displays the M candidate intentions to the user, where M is an integer greater than or equal to 2, and each candidate The intent indicates a category of items that have a matching relationship with the image to be processed; the client device obtains the feedback operations corresponding to the M candidate intentions, and determines a item that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions.
  • a target category is sent to the server.
  • the client device can also be used to perform the steps performed by the client device in the first aspect and each possible implementation manner of the first aspect.
  • the specific implementation methods and meanings of the nouns in each possible implementation manner of the second aspect please refer to the first aspect and will not be repeated here.
  • embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search.
  • the method includes: the server uses the characteristic information of the image to be processed and the characteristic information of at least two items, through the first A neural network obtains a target category that has a matching relationship with the image to be processed, where there is a background and at least two items in the image to be processed; the server sends information about the items of the target category to the client device.
  • the server obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including:
  • the server generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category of items that has a collocation relationship with the image to be processed;
  • the server sends a request to the client device Send M candidate intentions, which are used by the client device to obtain a target category that has a matching relationship with the image to be processed; receive the target category sent by the client device.
  • the server can also be used to execute the steps performed by the server in the first aspect and each possible implementation of the first aspect.
  • inventions of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search.
  • the item matching device is applied to client equipment in an item matching system.
  • the item matching system also includes a server.
  • the item matching device includes: an acquisition module, used to obtain an image input by the user, in which there is a background and at least two items; a receiving module, used to receive a target category of items sent by the server that has a matching relationship with the image, The items of the target category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module is used to display the items of the target category.
  • the item matching device can also be used to perform the steps performed by the client device in the second aspect and each possible implementation manner of the second aspect.
  • the specific implementation methods and nouns of the steps in each possible implementation manner of the fourth aspect Please refer to the second aspect for its meaning and beneficial effects, and will not be repeated here.
  • embodiments of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search.
  • the item matching device is applied to a server in an item matching system.
  • the item matching system also includes a client device.
  • the item matching device includes: an acquisition module, configured to acquire a target category of items that has a matching relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, wherein the image There is a background and at least two items in; the sending module is used to send information of items of the target category to the client device.
  • the item matching device can also be used to perform the steps performed by the server in the third aspect and each possible implementation of the third aspect.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a program.
  • the program When the program is run on a computer, it causes the computer to execute the method for matching items described in the second aspect or the third aspect.
  • embodiments of the present application provide a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium. When the program is run on a computer, it causes the computer to execute the second aspect or the third aspect. How to match items.
  • embodiments of the present application provide a client device, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the client device executes the above. Methods performed by client devices in various aspects.
  • embodiments of the present application provide a server, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the server performs the above aspects. The method executed by the server.
  • the present application provides a chip system, which includes a processor for supporting a terminal device or communication device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods. /or information.
  • the chip system also includes a memory, which is used to store necessary program instructions and data for the terminal device or communication device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Figure 1a is a schematic structural diagram of the artificial intelligence main framework provided by the embodiment of the present application.
  • Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application.
  • Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application.
  • Figure 2b is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 3 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and target text information in the item matching method provided by the embodiment of the present application;
  • Figure 5 is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application.
  • Figure 6 is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application.
  • Figure 7 is a schematic flowchart of obtaining a target category in the item matching method provided by the embodiment of the present application.
  • Figure 8 is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application.
  • Figure 9 is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application.
  • Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application;
  • Figure 11 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 12 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 13 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a client device provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • Figure 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1a shows a structural schematic diagram of the artificial intelligence main framework.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by a smart chip, which can specifically use a central processing unit (CPU), an embedded neural network processor (neural-network processing unit, NPU), a graphics processor ( Graphics processing unit (GPU), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) and other hardware acceleration chips;
  • the basic platform includes distributed computing framework and network and other related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, Voice and text also involve IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.
  • Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application. As shown in Figure 1b, when the user is using a shopping application, when the user clicks on the icon, you can enter the image to be processed to search for and purchase items of a category that have a matching relationship with the image to be processed.
  • the user when using a decoration design application, the user can input an image to be processed to search for items of a category that have a matching relationship with the image to be processed. It should be understood that the embodiments of the present application can also be applied In other scenarios of obtaining items that have a matching relationship with the image to be processed, other application scenarios will not be listed one by one here.
  • Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application.
  • the item matching system 200 includes training Device 210, database 220, execution device 230, data storage system 240 and client device 250.
  • the execution device 230 includes a computing module 231.
  • the first training data set is stored in the database 220
  • the training device 210 generates the first model/rules 201
  • uses the first training data set in the database to iteratively train the first model/rules 201 to obtain the mature first Model/Rule 201.
  • the first model/rule 201 may be embodied as a model in the form of a first neural network or a non-neural network.
  • the first model/rule 201 is a first neural network as an example for description.
  • the execution device 230 can call data, codes, etc. in the data storage system 240, and can also store data, instructions, etc. in the data storage system 240.
  • the data storage system 240 may be placed in the execution device 230 , or the data storage system 240 may be an external memory relative to the execution device 230 .
  • the trained first model/rule 201 obtained by the training device 210 may be deployed in the execution device 230 , and the execution device 230 may appear as a server corresponding to the application program deployed on the client device 250 .
  • the computing module 231 of the execution device 230 may obtain, through the first model/rule 201, a target category that has a matching relationship with the image to be processed, where the image to be processed is obtained through the client device 250, and the target category indicates the target category that has a matching relationship with the image to be processed.
  • the image has a collocation relationship with a target category of items.
  • the client device 250 can be represented by various forms of terminal devices, such as mobile phones, tablets, laptops, virtual reality (VR) devices or augmented reality (AR) devices, etc.
  • terminal devices such as mobile phones, tablets, laptops, virtual reality (VR) devices or augmented reality (AR) devices, etc.
  • VR virtual reality
  • AR augmented reality
  • the execution device 230 and the client device 250 may be independent devices.
  • the execution device 230 is configured with an input/output (I/O) interface for data interaction with the client device 250.
  • the "user" can input the image to be processed to the I/O interface through the client device 250, and the execution device 230 returns the items of the target category that have a matching relationship with the image to be processed to the client device 250 through the I/O interface, and provides them to the user.
  • Figure 2a is only a schematic architectural diagram of a matching system for two items provided by an embodiment of the present invention, and the positional relationship between the equipment, components, modules, etc. shown in the figure does not constitute any limitation.
  • the execution device 230 and the client device 250 can also be integrated into the same device, which is not limited here.
  • Figure 2b is a schematic flow chart of the item matching method provided by an embodiment of the present application.
  • S1. Obtain the image to be processed input by the user. There is a background and at least two items in the image to be processed.
  • S2. Based on the characteristic information of the image to be processed and the characteristic information of the at least two items, obtain an item of a target category that has a matching relationship with the image to be processed through the first neural network.
  • S3. Display items of the target category.
  • users can not only search for items they want to match by providing images to be processed, but also when the user inputs a complex image to be processed (that is, an image including at least two items), the user can still obtain A target category of items that matches the entire image to be processed greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.
  • a complex image to be processed that is, an image including at least two items
  • the item matching system may include a client device and a server.
  • the process of "obtaining a target category that has a matching relationship with the image to be processed” may include feature extraction of the matching image and the extraction of features based on the extracted features. Identify two parts of the target category.
  • the aforementioned two parts can be completely dominated by the server, that is, the execution device of the first neural network and the client device are separated; in another implementation, the operations of the aforementioned two parts It can be completely led by the client device, that is, the execution device of the first neural network and the client device are integrated on the same device; in another implementation, the feature extraction operation can be performed on the client device, and the server can lead the determination of the target category. operation, the execution device of the first neural network and the client device are also separated. Since the specific implementation processes of the above three implementation methods are different, they are described separately below.
  • Figure 3 is a schematic flowchart of a method of matching items provided by an embodiment of the present application.
  • the method of matching items provided by an embodiment of the present application may include:
  • the client device obtains the image to be processed input by the user.
  • the user can input the image to be processed through the client device.
  • the client device obtains the image to be processed input by the user to search for items that have a matching relationship with the image to be processed.
  • the image to be processed can be an image selected by the user from images stored locally on the client device, an image captured by the user using the camera on the client device, or an image downloaded by the user using a browser, etc. , no limitation is made here.
  • the client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
  • the client device can also obtain target text information input by the user, and the target text information is used to indicate the user's search intention. Further, the item indicated by the target text information may be an item in the image to be processed, or may not be an item in the image to be processed.
  • Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and the target text information in the item matching method provided by the embodiment of the present application.
  • Figure 4 includes two sub-schematic diagrams (a) and (b).
  • the image can be Triggering entry into sub-diagram (b) of Figure 4, that is, prompting the user to input target text information through sub-diagram (b) of Figure 4.
  • the schematic diagram can be flexibly set according to the actual product form, and is not limited here.
  • the server inputs the image to be processed into the third neural network to extract features of the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed.
  • the target feature information includes feature information of the items in the image to be processed. and feature information of the image to be processed.
  • the client can send the image to be processed to the server, and the server can input the received image to be processed into the third neural network to pass the third neural network.
  • the network performs feature extraction on the entire image to be processed to obtain the feature information of the image to be processed.
  • the feature information of the image to be processed includes the overall feature information composed of the background of the image to be processed and at least two items; the server also uses a third neural network Identify each item area in the image to be processed, and perform feature extraction on the items in the image to be processed, to obtain feature information of at least two items in the image to be processed, where the feature information of the at least two items includes attribute information of each item.
  • the target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed.
  • the aforementioned feature information of the image to be processed refers to the feature information obtained after feature extraction of the image to be processed by treating the image to be processed as a whole (that is, the background of the image to be processed and at least two items); as
  • the feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed.
  • the characteristic information of at least two items in the image to be processed can also be called the set of semantic tags corresponding to the image to be processed.
  • the characteristic information of the at least two items can include attribute information of each item.
  • the attribute information of each item includes any of the following: One or more types of information: the position information of the item in the image to be processed, the category information of the item, and the color information of the item; optionally, it can also include style information of each item, the material of the item, the pattern of the item, or other Feature information.
  • the characteristic information of items of different categories may include different information.
  • the characteristic information of the bed may include the position information of the bed in the image to be processed, the bed's location information, and the location information of the bed. one type identification information, the color of the bed and the style of the bed.
  • the characteristic information of the top may include the position information of the top in the image to be processed, the category information of the top, the color of the top, the shape of the top and the material of the top. It should be understood that here The examples are only used to facilitate understanding of this solution and are not used to limit this solution.
  • the third neural network can specifically be embodied as a convolutional neural network or other neural networks used for feature extraction. Further, the third neural network may include two different feature extraction networks: a first feature extraction network and a second feature extraction network.
  • the first feature extraction network is used to generate feature information of at least two items in the image to be processed, and the second feature extraction network is used to generate feature information of the entire image to be processed.
  • the first feature extraction network can be used as part of the neural network used for target recognition of images, that is, the training device can use the training data to train the neural network used for target recognition of images.
  • the network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained first feature extraction network is obtained from it.
  • a neural network used for object recognition in an image can identify coffee tables, sideboards, storage cabinets, shoe cabinets and flower racks in the image. That is, the first feature extraction network in the embodiment of the present application can be used in more detailed Feature extraction at a granular level.
  • Figure 5 is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application.
  • the first feature extraction network can identify the three item areas in the image to be processed and generate feature information of the items in the image to be processed. It should be understood that , the example in Figure 5 is only for convenience of understanding this solution and is not used to limit this solution.
  • the second feature extraction network can be used as part of the neural network used to classify the entire image, that is, the training device can use the training data to classify the neural network used to classify the entire image.
  • the network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained second feature extraction network is obtained from it.
  • the characteristic information of the image to be processed refers to the characteristic information obtained by treating the image to be processed as a whole and extracting features from the matching image.
  • the characteristic information of at least two items in the image to be processed may include The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and
  • the feature information of each item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the information of the object in the image to be processed is fully considered, which is beneficial to improve Accuracy of identified target categories.
  • the server inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • the server can also input text information into a fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • step 302 is an optional step. If step 302 is executed, the text information input into the fourth neural network refers to the target text information obtained in step 302; if step 302 is not executed, the text information input into the fourth neural network is The text information may be the characteristic information of the items in the image to be processed obtained in step 303, that is, the text information input into the fourth neural network may be a set of semantic labels of the image to be processed.
  • the fourth neural network is a neural network that extracts features from text information. It can be embodied as a recurrent neural network or other types of neural networks, etc., and is not exhaustive here.
  • step 304 is also an optional step. If step 304 is not executed, step 302 does not need to be executed. After step 303 is executed, step 305 can be executed directly.
  • the server Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.
  • the server may obtain a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items. Specifically, in an implementation manner, if steps 303 and 304 are executed, the server can input the target feature information and the feature information of the text information into the first neural network, so that the first neural network generates M corresponding to the image to be processed.
  • candidate intents each candidate intent indicates a category of items that has a collocation relationship with the image to be processed.
  • M is an integer greater than or equal to 1. Further, when there are at least two objects in the image to be processed, M is an integer greater than or equal to 2.
  • the first neural network can also output M first scores that correspond one-to-one to the M candidate intentions, and each first score is used to indicate the probability that a candidate intention is consistent with the user's search intention.
  • the number of candidate intentions output by the first neural network may be the same or different, that is, the first neural network may determine the number of candidate intentions output according to the actual situation.
  • the server sends the M candidate intentions to the client device to present the M candidate intentions to the user through the display interface of the client device; wherein the client device can present the M candidate intentions to the user in text, images, or other forms.
  • the server can also send the M first scores to the client device, and the client device can evaluate the M first scores according to the first scores corresponding to each candidate intention. M candidate intentions are sorted. The higher the first score, the higher the ranking position.
  • FIG. 6 is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application.
  • the image to be processed contains three main areas, namely bed, wardrobe and wall, and the text information is "wall decoration", then the target feature information can include the feature information of the bed, the feature information of the wardrobe, the feature information of the wall and the entire
  • the characteristic information of the image to be processed, the M candidate intentions may include decorative paintings, pendants and lighting in Figure 6. It should be understood that the examples in Figure 6 are only for convenience of understanding this solution and are not used to limit this solution.
  • the client device After the client device displays M candidate intentions to the user, in one case, if the client device obtains feedback operations corresponding to the M candidate intentions, it can determine that the image to be processed has the same characteristics as the image to be processed based on the feedback operations for the M candidate intentions. A target category of the matching relationship, and sends the target category that has a matching relationship to the collocation image to the server. Correspondingly, if the server obtains the aforementioned target category sent by the client device within the target time period, it can determine the target category corresponding to the image to be processed.
  • the "feedback operation” can be a selection operation for one of the M candidate intentions, or the “feedback operation” can also be the user manually inputting a new search intention, etc.
  • the specific implementation of the "feedback operation” is not mentioned here. List in the form.
  • the target category may be one of the M candidate intentions, or may be other search intentions other than the M candidate intentions.
  • Figure 7 is a schematic flowchart of obtaining a target category in the item matching method provided by an embodiment of the present application.
  • E1 and the server input the target feature information and the feature information of the text information into the first neural network, and the first neural network generates M candidate intentions corresponding to the image to be processed.
  • E2. The server sends M candidate intentions to the client device.
  • the client device displays M candidate intentions to the user.
  • the client device determines a target category based on the user's feedback operations for the M candidate intent inputs.
  • E5. The client device sends the target category to the server, and accordingly, the server receives the target category.
  • the example in Figure 7 is only for convenience of understanding this solution and is not used to limit this solution.
  • the client device may not send any feedback information to the server, or the client device may also send the first response to the server.
  • Feedback information the first feedback information is used to inform the server that the feedback operation input by the user has not been received.
  • the server does not receive feedback information from the client device within the target time period, or receives the first feedback information sent by the client device, the candidate intention with the highest first score value among the M candidate intentions can be determined as target category.
  • step 303 if step 303 is executed but step 304 is not executed, the server can input the target feature information into the first neural network, so that the first neural network generates M candidate intentions corresponding to the image to be processed.
  • the server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device operates based on the feedback, Determine a target category that has a matching relationship with the image to be processed, and send the target category corresponding to the collocated image to the server.
  • the feature information of the entire image to be processed when performing feature extraction on the image to be processed, not only the feature information of the entire image to be processed can be obtained, but also the feature information of the items in the image to be processed can be obtained, and then based on the feature information of the entire image to be processed and the features to be processed, Process the characteristic information of the items in the image and generate M categories of items that have a matching relationship with the entire image to be processed. That is, not only the information of the entire image to be processed is considered, but also each object in the image to be processed is fully considered. It is beneficial to improve the accuracy of determined candidate intentions.
  • the target text information input by the user can also be obtained.
  • the target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, after obtaining and In the process of establishing a matching relationship between the images to be processed, not only the information in the images to be processed can be fully obtained, but also the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intentions.
  • the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed.
  • feature information based on the feature information of the entire image to be processed, M candidate intentions corresponding to the image to be processed are generated through the first neural network.
  • the server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device is based on According to the feedback operation, a target category that has a matching relationship with the image to be processed is determined, and the target category corresponding to the matching image is sent to the server.
  • a target category that has a matching relationship with the image to be processed is determined, and the target category corresponding to the matching image is sent to the server.
  • M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used. This method guides the user's search intention, which is conducive to improving the accuracy of the determined target category.
  • the server can also input the target feature information and the feature information of the text information into the first neural network to obtain an image generated by the first neural network that has a matching relationship with the image to be processed. target category.
  • step 303 if step 303 is executed but step 304 is not executed, the server can also input the target feature information into the first neural network to obtain a target category generated by the first neural network that has a matching relationship with the image to be processed. .
  • the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed. According to the characteristic information of the entire image to be processed, a target category that has a matching relationship with the image to be processed is generated through the first neural network.
  • the server obtains N candidate items, each of which is a target category.
  • the server after the server determines a target category that has a matching relationship with the image to be processed, it can obtain N candidate items corresponding to the target category from the item library stored in the server. That is, the server can obtain N candidate items from the items. Obtain N candidate items of the target category from the library, where N is an integer greater than 1.
  • the server generates target scores corresponding to the N candidate items through the second neural network.
  • the target scores indicate the matching degree between the candidate items and the image to be processed.
  • the server can generate a target score corresponding to each of the N candidate items through a second neural network, where a target score indicates the matching degree between a candidate item and the image to be processed, That is, it is used to indicate the aesthetic score of the matching effect of a candidate item and the image to be processed.
  • Figure 8 is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application.
  • Figure 8 includes three sub-schematic diagrams (a), (b) and (c).
  • the sub-schematic diagram (a) of Figure 8 shows the three items in the image to be processed;
  • the sub-schematic diagram (b) of Figure 8 shows The candidate item is sofa one, and the score of the matching effect diagram of sofa one and the image to be processed is 0.956 points;
  • the candidate item shown in the sub-schematic diagram (c) of Figure 8 is sofa two, and the matching effect of sofa two and the image to be processed is The score of the graph is 0.425 points. It means that the matching degree between sofa 1 and the image to be processed is higher than the matching degree between sofa 2 and the image to be processed. It should be understood that the example in Figure 8 is only for convenience of understanding this solution and is not used to limit this solution.
  • the server can input the feature information and target feature information of each candidate item into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates N By performing the foregoing operations on each of the candidate items, a target score corresponding to each of the N candidate items can be generated.
  • the server can also input the image of each candidate item and the image to be processed into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates the N candidates thing If each candidate item in the product performs the above operations, the target score corresponding to each candidate item can be generated.
  • the server can also input the image of each candidate item, the semantic label of each candidate item, the image to be processed, and the semantic label of the item in the image to be processed into the second neural network to obtain the second neural network.
  • the second neural network may be a convolutional neural network or other types of neural networks.
  • the semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed.
  • the semantic label of the candidate item may include at least one attribute information of the candidate item.
  • the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc., are not exhaustive here.
  • Figure 9 is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application.
  • the server inputs the image of each candidate item and the semantic label of each candidate item into the second neural network, it performs feature extraction on the image of the candidate item through the second neural network to obtain the characteristics of the image of the candidate item. information, and performs feature extraction on the semantic labels of the candidate items to obtain the feature information of the semantic labels of the candidate items; the server fuses the feature information of the image of the candidate items and the feature information of the semantic labels of the candidate items through the second neural network, and Convolve the fused feature information to obtain the feature information corresponding to the candidate items.
  • the server After the server inputs the image to be processed and the semantic labels of the items in the image to be processed into the second neural network, it performs feature extraction on the image to be processed through the second neural network, obtains the feature information of the image to be processed, and obtains the semantic labels of the items in the image to be processed. Perform feature extraction to obtain the feature information of the semantic tag of the item in the image to be processed; the server fuses the feature information of the image to be processed and the feature information of the semantic tag through the second neural network, and convolves the fused feature information , obtain the feature information corresponding to the image to be processed.
  • the server performs the above-mentioned multiplication, fusion and other operations through the second neural network, and then outputs a matching effect of the candidate item and the image to be processed. a target score. It should be understood that the example in Figure 9 is only for convenience of understanding this solution and is not used to limit this solution.
  • a training data set may be stored on the training device, and each training data may include an image to be processed, feature information of items in the image to be processed, images of at least two candidate items, and semantic labels corresponding to each candidate item.
  • the expected result corresponding to the training data is the one of the aforementioned at least two candidate items that is most suitable for the image to be processed.
  • the training device can form a set of target data by combining the image to be processed, the characteristic information of the items in the image to be processed, the image of each candidate item, and the semantic label corresponding to the image of each candidate item. Then the training device can obtain a set of target data related to at least two At least two sets of target data corresponding one-to-one to each candidate item.
  • the training device inputs each set of target data into the second neural network to obtain a target score output by the second neural network; the training device performs the aforementioned operations on each set of at least two sets of target data through the second neural network, then we can obtain At least two target scores are in one-to-one correspondence with at least two sets of target data, that is, at least two target scores are obtained in one-to-one correspondence with at least two candidate items.
  • the training device selects the most suitable item from at least two candidate items according to the at least two target scores mentioned above and the image to be processed. An item is matched, and the previously selected item is used as the prediction result corresponding to the training data.
  • the training device generates the function value of the loss function based on the predicted results and expected results corresponding to the training data, and reversely updates the weight parameters of the second neural network, thus completing a training of the second neural network.
  • the training device uses multiple data in the training data set to iteratively train the second neural network until the convergence condition is met, and the trained second neural network is obtained.
  • the server obtains K target items corresponding to the target category, and each target item is a target category.
  • steps 306 and 307 are both optional steps. If steps 306 and 307 are executed, step 308 may include: the server selects K items from the N candidate items based on the target scores corresponding to the N candidate items.
  • Target item, K is an integer greater than or equal to 1. Among them, the candidate item with a higher target score has a greater probability of being selected.
  • a target score corresponding to N candidate items is generated through a neural network.
  • the target score indicates the matching degree between the candidate item and the image to be processed; and based on the matching degree between each candidate item and the image to be processed , select the target item that is finally displayed to the user from N candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.
  • the server can also directly obtain K target items corresponding to the target category from the item library, and the category of each target item is the target category indicated by the target category.
  • the server sends the information of the target item to the client device.
  • the server may acquire the information of each target item among the K target items, and send the information of each target item to the client device.
  • the information of each target item may include the image corresponding to the target item; optionally, the information of each target item may also include any one or more of the following information: access link, name, price, target of the target item Ratings or other types of information about items are not limited here.
  • the image corresponding to the target item may be the image of the target item itself; it may also be a matching rendering of the target item and the image to be processed generated by the server using a neural network.
  • the aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc., and are not limited here.
  • Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application.
  • the sub-schematic diagram on the left shows the image to be processed
  • the two sub-schematic diagrams on the right respectively show two different target items and two matching renderings of the image to be processed. It should be understood that Figure The examples in 10 are only for the convenience of understanding this solution and are not used to limit this solution.
  • the client device displays K target items corresponding to one target category to the user.
  • the client device after acquiring the information of each of the K target items sent by the server, the client device will display the K target items corresponding to the one target category to the user.
  • the client device can show the user the image corresponding to each target item; the image corresponding to the target item can be an image of the target item, or a matching effect diagram of each target item and the image to be processed; for the matching effect
  • the image corresponding to the target item can be an image of the target item, or a matching effect diagram of each target item and the image to be processed; for the matching effect
  • the client device can display to the user the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed. It will help improve the user stickiness of this program.
  • the client device can also display any one or more of the following information about each target item to the user: access links, names, prices, target ratings or other types of information about the items in each target category, etc., There are no limitations here.
  • FIG. 11 is a schematic flowchart of a method for matching items provided by an embodiment of the present application.
  • the client device displays three candidate intentions to the user, namely the decorative paintings and pendants in Figure 11 and lighting, the client device sends feedback information to the server based on the user's selection operation of the candidate intention "decorative painting", and the aforementioned feedback information is used to instruct the server that the target category is "decorative painting”.
  • the server Based on the target category "decorative painting", the server sends information about two different decorative paintings (ie, target items) to the client device.
  • the information of each decorative painting includes the matching renderings of the decorative painting and the image to be processed, the name of the decorative painting, the price of the decorative painting, and the size of the decorative painting.
  • the example in Figure 11 shows the item from the perspective of the customer's device The implementation process of the matching method. The example in Figure 11 is only for convenience of understanding this solution and is not used to limit this solution.
  • Figure 12 is a schematic flowchart of a method of matching items provided by an embodiment of the present application.
  • the method of matching items provided by an embodiment of the present application may include:
  • the client device obtains the image to be processed input by the user.
  • the client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
  • the client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed.
  • the target feature information includes at least the characteristics of the items in the image to be processed. Feature information and feature information of the image to be processed.
  • the client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • the client device Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the client device obtains a target category that has a matching relationship with the image to be processed through the first neural network.
  • steps 1201 to 1205 for the specific implementation of steps 1201 to 1205, please refer to the description of steps 301 to 305 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 to 305 are executed by the server, while Figure 12 corresponds to In the embodiment, steps 1203 to 1205 are executed by the client device, and will not be described again here.
  • the client device sends the target category to the server.
  • the server obtains N candidate items, each of which is a target category.
  • the server generates target scores corresponding to the N candidate items through the second neural network.
  • the target scores indicate the matching degree between the candidate items and the image to be processed.
  • the server obtains K target items corresponding to the target category, and each target item is a target category.
  • the server sends the information of the target item to the client device.
  • the client device displays K target items corresponding to one target category to the user.
  • Figure 13 is a schematic flowchart of a method of matching items provided by an embodiment of the present application.
  • the method of matching items provided by an embodiment of the present application may include:
  • the client device obtains the image to be processed input by the user.
  • the client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
  • the client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed.
  • the target feature information includes features of the items in the image to be processed. information and feature information of the image to be processed.
  • the client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
  • steps 1301 to 1304 for the specific implementation of steps 1301 to 1304, please refer to the description of steps 301 to 304 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 and 304 are executed by the server, while Figure 13 corresponds to In the embodiment, steps 1303 and 1304 are executed by the client device, and will not be described again here.
  • the client device may send the target feature information to the server; optionally, the client device sends the target feature information and the feature information of the text information to the server.
  • the server Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.
  • the server obtains N candidate items corresponding to the target category, and each candidate item is the target category.
  • the server generates target scores corresponding to the N candidate items through the second neural network.
  • the target scores indicate the matching degree between the candidate items and the image to be processed.
  • the server obtains K target items corresponding to the target category, and each target item is a target category.
  • the server sends the information of the target item to the client device.
  • the client device displays K target items corresponding to one target category to the user.
  • steps 1305 to 1310 for the specific implementation of steps 1305 to 1310, please refer to the description of steps 305 to 310 in the corresponding embodiment in Figure 3, and will not be described again here.
  • the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network , and then display the items of the target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two items) image), it is still possible to obtain a target category of items that has a matching relationship with the entire image to be processed, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire image to be processed The characteristic information and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be processed is considered, but also each item in the image
  • FIG 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • the item matching device 1400 is applied to the client device in the item matching system.
  • the item matching system also includes a server.
  • the matching device 1400 includes: an acquisition module 1401, which is used to acquire an image input by a user, in which there is a background and at least two items; a receiving module 1402, which is used to receive a target category of items sent by the server that has a matching relationship with the image.
  • the items of the category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module 1403 is used to display the items of the target category.
  • the characteristic information of the image includes the overall characteristic information composed of the background and at least two items.
  • the characteristic information of the at least two items includes attribute information of each item.
  • the attribute information of each item includes the following Any one or more types of information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the receiving module 1402 is also used to receive M candidate intentions corresponding to the image sent by the server.
  • M is an integer greater than or equal to 2.
  • Each candidate intention indicates a type that has a collocation relationship with the image.
  • category of items the display module 1403 is also used to display M candidate intentions; the acquisition module 1401 is also used to obtain the feedback operations corresponding to the M candidate intentions, and determine the characteristics of the image based on the feedback operations for the M candidate intentions.
  • a target category of collocation relationships is also used to receive M candidate intentions corresponding to the image sent by the server.
  • M is an integer greater than or equal to 2.
  • Each candidate intention indicates a type that has a collocation relationship with the image.
  • category of items the display module 1403 is also used to display M candidate intentions
  • the acquisition module 1401 is also used to obtain the feedback operations corresponding to the M candidate intentions, and determine the characteristics of the image based on the feedback operations for the M candidate intentions.
  • a target category of collocation relationships is also used to obtain the feedback operations corresponding to the
  • the display module 1403 is specifically used to display the matching renderings of items and images of the target category.
  • Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application.
  • the item matching device 1500 is applied to the server in the item matching system.
  • the item matching system also includes client equipment.
  • the matching device 1500 includes: an acquisition module 1501, configured to acquire a target category of items that has a matching relationship with the image through a first neural network based on the feature information of the image and the feature information of at least two items, where there is a background in the image and at least two items; a sending module 1502 configured to send items of the target category to the client device.
  • the characteristic information of the image includes the overall characteristic information composed of the background and at least two items.
  • the characteristic information of the at least two items includes attribute information of each item.
  • the attribute information of each item includes the following Any one or more types of information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  • the acquisition module 1501 is specifically used for:
  • M candidate intentions corresponding to the image are generated through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a matching relationship with the image.
  • items of various categories sends M candidate intentions to the client device, and the M candidate intentions are used for the client device to obtain a target category that has a matching relationship with the image; receives the target category sent by the client device.
  • the acquisition module 1501 is specifically used for:
  • N candidate items that have a matching relationship with the image are obtained.
  • Each candidate item is a target category, and N is an integer greater than 1;
  • scores corresponding to the N candidate items are generated, and the scoring instructions are Matching degree between candidate items and images;
  • the sending module is specifically used to send K target items to the client device.
  • FIG. 16 is a schematic structural diagram of a client device provided by an embodiment of the present application.
  • the client device 1600 can be embodied as a mobile phone, a tablet, a notebook computer, Smart wearable devices, smart robots or smart homes, etc. are not limited here.
  • the client device 1600 includes: a receiver 1601, a transmitter 1602, a processor 1603 and a memory 1604 (the number of processors 1603 in the client device 1600 can be one or more, one processor is taken as an example in Figure 16) , wherein the processor 1603 may include an application processor 16031 and a communication processor 16032.
  • the receiver 1601, the transmitter 1602, the processor 1603, and the memory 1604 may be connected by a bus or other means.
  • Memory 1604 may include read-only memory and random access memory and provides instructions and data to processor 1603 .
  • a portion of memory 1604 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1604 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • Processor 1603 controls the operation of the client device.
  • various components of the customer equipment are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1603 or implemented by the processor 1603.
  • the processor 1603 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1603 .
  • the above-mentioned processor 1603 can be a general processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the processor 1603 can implement or execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1604.
  • the processor 1603 reads the information in the memory 1604 and completes the steps of the above method in combination with its hardware.
  • the receiver 1601 may be used to receive input numeric or character information and generate signal inputs related to relevant settings and functional controls of the client device.
  • the transmitter 1602 can be used to output numeric or character information through the first interface; the transmitter 1602 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1602 also Can include display devices such as display screens.
  • the processor 1603 is used to execute the item matching method executed by the client device in the corresponding embodiment of FIG. 2b to FIG. 13 .
  • the application processor 16031 is used to obtain an image input by the user, in which there is a background and at least two items; receive a target category of items sent by the server that has a matching relationship with the image, and the target category of items is the server's image based on The characteristic information of the item and the characteristic information of at least two items are obtained; items of the target category are displayed.
  • FIG. 17 is a schematic structural diagram of the server provided by the embodiment of the present application.
  • the server 1700 is implemented by one or more servers.
  • the server 1700 can be configured or There is a relatively large difference due to different performance, which may include one or more central processing units (CPU) 1722 (for example, one or more processors) and memory 1732, and one or more storage applications 1742 or data 1744 storage medium 1730 (eg, one or more mass storage devices).
  • the memory 1732 and the storage medium 1730 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1730 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 1722 may be configured to communicate with the storage medium 1730 and execute a series of instruction operations in the storage medium 1730 on the server 1700 .
  • Server 1700 may also include one or more power supplies 1726, one or more wired or wireless network interfaces 1750, one or more input and output interfaces 1758, and/or, one or more operating systems 1741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1741 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processing unit 1722 is used to execute the item matching method executed by the server in the corresponding embodiment of FIGS. 2b to 13 .
  • the central processor 1722 is configured to obtain a target category of items that has a collocation relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, where there are background and At least two items; items of the target category are sent to the client device.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product includes a program.
  • the program When the program is run on a computer, it causes the computer to execute the methods executed by the client device in the methods described in the embodiments shown in Figures 2b to 13. or, causing the computer to perform the steps performed by the server in the method described in the embodiments shown in FIGS. 2b to 13 .
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program.
  • the program When the program is run on a computer, it causes the computer to execute the foregoing description of the embodiments shown in Figures 2b to 13.
  • the steps performed by the client device in the method, or causing the computer to perform the steps described in the embodiments shown in Figures 2b to 13 The steps performed by the server in the method described above.
  • the client device, server or item matching device provided by the embodiment of the present application may specifically be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface. , pins or circuits, etc.
  • the processing unit can execute computer execution instructions stored in the storage unit, so that the chip executes the matching method of items described in the embodiments shown in FIGS. 2b to 13 .
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • Figure 18 is a structural schematic diagram of a chip provided by an embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 180.
  • the NPU 180 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 1803.
  • the arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
  • the computing circuit 1803 includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1803 is a two-dimensional systolic array.
  • the arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1803 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .
  • the unified memory 1806 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802.
  • Input data is also transferred to unified memory 1806 via DMAC.
  • DMAC Direct Memory Access Controller
  • BIU is the Bus Interface Unit, that is, the bus interface unit 1810, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1809.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1810 (Bus Interface Unit, BIU for short) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .
  • the vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • vector calculation unit 1807 can store the processed output vectors to unified memory 1806 .
  • the vector calculation unit 1807 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value.
  • vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;
  • the unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the first neural network, the second neural network, the third neural network and the fourth neural network shown in the method embodiments corresponding to Figures 2b to 13 can be performed by the operation circuit 1803 or the vector calculation unit 1807 implement.
  • the processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data
  • the center transmits to another website site, computer, training equipment or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store or may be a training device, data, or data integrated with one or more available media. Center and other data storage equipment.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

An article matching method and a related device. By means of the method, artificial intelligence technology can be applied to the field of article search. The method comprises: acquiring an image which is input by a user, wherein a background and at least two articles are present in the image; on the basis of feature information of the image and feature information of the at least two articles, acquiring, by means of a first neural network, a target category having a matching relationship with the image; and displaying a target article of the target category. Not only can an article to be matched be searched for by means of providing an image, but also when a complex image is input, an article of a target category having a matching relationship with the whole image can still be acquired, thereby greatly expanding the application scenarios of the present solution, and facilitating an improvement in the user viscosity of the present solution.

Description

一种物品的搭配方法以及相关设备A method of matching items and related equipment
本申请要求于2022年03月31日提交中国专利局、申请号为202210333006.5、发明名称为“一种物品的搭配方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application submitted to the China Patent Office on March 31, 2022, with the application number 202210333006.5 and the invention title "A method of matching articles and related equipment", the entire content of which is incorporated herein by reference. Applying.
技术领域Technical field
本申请涉及人工智能领域,尤其涉及一种物品的搭配方法以及相关设备。This application relates to the field of artificial intelligence, and in particular to a method of matching items and related equipment.
背景技术Background technique
人工智能(Artificial Intelligence,AI)是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。随着人工智能技术的发展,利用人工智能进行物品搜索是常见的一个应用方式。Artificial Intelligence (AI) is a branch of computer science that attempts to understand the nature of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. With the development of artificial intelligence technology, using artificial intelligence to search for items is a common application method.
业界常见的物品搜索方案包括拍照搜索,具体的,用户可以对想要搜索的物品进行拍照,进而可以根据输入的图片,搜索相似的物品。Common item search solutions in the industry include photo search. Specifically, users can take photos of the items they want to search for, and then search for similar items based on the input pictures.
但技术人员在研究中发现,在大部分场景中,当用户存在搜索需求时,无法获取到该搜索需求的相似物品的图像,也即无法实现用户的搜索需求;对应的,用户往往能够获取到需要搜索的物品所要搭配的物品,例如想要搜索与一件上衣搭配的裤子。However, technicians found in their research that in most scenarios, when users have search needs, they cannot obtain images of similar items for the search needs, that is, the user's search needs cannot be achieved; correspondingly, users can often obtain Items that match the item you are searching for, for example, if you want to search for pants that match a top.
发明内容Contents of the invention
本申请实施例提供了一种物品的搭配方法以及相关设备,当用户输入的是复杂的待处理图像(也即包括至少两个物品的图像)时,依旧能够获取到与整个待处理图像具有匹配关系的一种目标类别的物品,大大扩展了本方案的应用场景,有利于提高本方案的用户粘度。The embodiments of the present application provide an item matching method and related equipment. When the user inputs a complex image to be processed (that is, an image including at least two items), it is still possible to obtain a matching image with the entire image to be processed. A target category of items related to each other, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.
为解决上述技术问题,本申请实施例提供以下技术方案:In order to solve the above technical problems, the embodiments of this application provide the following technical solutions:
第一方面,本申请实施例提供一种物品的搭配方法,可将人工智能技术应用于物品搜索领域中,方法包括:客户设备获取用户输入的待处理图像,待处理图像中存在背景与至少两个物品;服务器或客户设备基于待处理图像的特征信息和待处理图像中至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别的物品;客户设备向用户展示前述目标类别的物品。In the first aspect, embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search. The method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; the server or the client device obtains a target category of items that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items in the image to be processed; the client device Show the user items from the aforementioned target categories.
本实现方式中,用户可以提供想要搜索的物品所使用的场景的图像(也即上述待处理图像),则可以通过第一神经网络获取与整个待处理图像具有搭配关系的一个目标类别,进而向用户展示与一个目标类别对应的目标物品;通过前述方案,用户不仅可以通过提供待处理图像来搜索想要搭配的物品,且当用户输入的是复杂的待处理图像(也即包括至少两个物品的图像)时,依旧能够获取到与整个待处理图像具有匹配关系的一种目标类别的物品,大大扩展了本方案的应用场景,有利于提高本方案的用户粘度;此外,基于整个待处理图像的特征信息和待处理图像中物品的特征信息,确定与整个待处理图像具有搭配关系的一种目标类别,也即不仅考虑了整个待处理图像的信息,还充分考虑了待处理图像中的 每个物体,有利于提高确定的目标类别的准确度。In this implementation, the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network, and then Display the target items corresponding to a target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two (image of an item), it is still possible to obtain a target category of items that has a matching relationship with the entire to-be-processed image, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire to-be-processed image The characteristic information of the image and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be processed is considered, but also the items in the image to be processed are fully considered. Each object, helps improve the accuracy of the determined target category.
在第一方面的一种可能实现方式中,方法还包括:服务器或客户设备将待处理图像输入第三神经网络,以通过第三神经网络对待处理图像进行特征提取,得到与待处理图像对应的目标特征信息,目标特征信息包括待处理图像中的至少两个物品的特征信息和待处理图像的特征信息。待处理图像的特征信息包括由背景和至少两个物品构成的整体的特征信息,也即待处理图像的特征信息指的是将该待处理图像视为一个整体,对搭配图像进行特征提取后得到的特征信息;作为示例,例如待处理图像的特征信息可以包括待处理图像的纹理信息、颜色信息、轮廓信息、风格信息、场景信息或其他类型的特征信息等;待处理图像中至少两个物品的特征信息也可以称为待处理图像的语义标签集合,待处理图像中至少两个物品的特征信息可以包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色和物品在待处理图像中的位置信息;可选地,还可以包括每个物品的风格信息、物品的材质、物品的图案或其他特征信息。In a possible implementation of the first aspect, the method further includes: the server or the client device inputs the image to be processed into a third neural network, so as to perform feature extraction on the image to be processed through the third neural network, and obtain the feature corresponding to the image to be processed. Target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed. The characteristic information of the image to be processed includes the characteristic information of the whole composed of the background and at least two items. That is, the characteristic information of the image to be processed refers to treating the image to be processed as a whole and extracting features from the matching image. Feature information of the image to be processed; as an example, the feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed; at least two items in the image to be processed The characteristic information of can also be called the semantic label set of the image to be processed. The characteristic information of at least two items in the image to be processed can include attribute information of each item. The attribute information of each item includes any one or more of the following information. : The category of the item, the color of the item and the location information of the item in the image to be processed; optionally, it can also include style information of each item, the material of the item, the pattern of the item or other feature information.
本实现方式中,待处理图像的特征信息指的是将该待处理图像视为一个整体,对搭配图像进行特征提取后得到的特征信息,待处理图像中至少两个物品的特征信息可以包括每个物品的属性信息,进一步细化了待处理图像的特征信息和至少两个物品的特征信息的概念,有利于更清楚的区分待处理图像的特征信息和至少两个物品的特征信息;且每个物品的特征信息中包括物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案等信息,在特征提取过程中充分考虑了待处理图像中的物体的信息,有利于提高确定的目标类别的准确度。In this implementation, the feature information of the image to be processed refers to the feature information obtained by treating the image to be processed as a whole and extracting features from the matching images. The feature information of at least two items in the image to be processed can include each The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and each The characteristic information of an item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item. In the feature extraction process, the information of the object in the image to be processed is fully considered, which is beneficial to improving the determination of the item. accuracy of the target category.
在第一方面的一种可能实现方式中,服务器或客户设备通过第一神经网络,基于待处理图像的特征信息和至少两个物品的特征信息,获取与待处理图像具有匹配关系的一种目标类别,包括:服务器或客户设备通过第一神经网络生成与待处理图像对应的M个候选意图,M为大于或等于2的整数,每个候选意图指示与待处理图像具有搭配关系的一种类别的物品;客户设备向用户展示M个候选意图,以获取与M个候选意图对应的反馈操作;客户设备根据针对M个候选意图的反馈操作,确定与待处理图像具有匹配关系的一种目标类别。其中,“反馈操作”可以为对M个候选意图中的一个候选意图的选择操作,或者,“反馈操作”也可以为用户手动输入新的搜索意图等。In a possible implementation of the first aspect, the server or the client device obtains a target that has a matching relationship with the image to be processed based on the characteristic information of the image to be processed and the characteristic information of at least two items through the first neural network. Category, including: the server or client device generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category that has a collocation relationship with the image to be processed items; the client device displays M candidate intentions to the user to obtain feedback operations corresponding to the M candidate intentions; the client device determines a target category that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions . The "feedback operation" may be a selection operation on one of the M candidate intentions, or the "feedback operation" may also be a user manually inputting a new search intention, etc.
本实现方式中,先通过第一神经网络生成M个候选意图,再基于用户针对M个候选意图输入的反馈操作,确定与待处理图像具有匹配关系的一种目标类别,也即采用交互式的方式对用户的搜索意图进行引导,有利于提高确定的目标类别的准确性。In this implementation, M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used This method guides the user's search intention and is conducive to improving the accuracy of the determined target category.
在第一方面的一种可能实现方式中,方法还包括:客户设备获取用户输入的目标文本信息,该目标文本信息用于指示用户的搜索意图;服务器或客户设备将文本信息输入第四神经网络,以通过第四神经网络对文本信息进行特征提取,得到文本信息的特征信息。服务器或客户设备基于待处理图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别,包括:将待处理图像的特征信息、至少两个物品的特征信息和文本信息的特征信息输入第一神经网络,以通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别。 In a possible implementation of the first aspect, the method further includes: the client device obtains target text information input by the user, the target text information is used to indicate the user's search intention; the server or the client device inputs the text information into the fourth neural network , to perform feature extraction on the text information through the fourth neural network to obtain the feature information of the text information. The server or client device obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including: combining the characteristic information of the image to be processed and at least The characteristic information of the two items and the characteristic information of the text information are input into the first neural network, so as to obtain a target category that has a matching relationship with the image to be processed through the first neural network.
本实现方式中,还可以获取用户输入的目标文本信息,该目标文本信息用于指示用户的搜索意图,将目标特征信息和目标文本信息的特征信息一起输入第三神经网络中,也即在获取与待处理图像具有搭配关系的过程中,不仅可以充分获取待处理图像中的信息,还可以结合用于指示用户的搜索意图的文本信息,以进一步提高确定的候选意图的准确度。In this implementation, the target text information input by the user can also be obtained. The target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, when obtaining In the process of having a collocation relationship with the image to be processed, not only the information in the image to be processed can be fully obtained, but also the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intention.
在第一方面的一种可能实现方式中,客户设备通过第一神经网络获取与待处理图像具有搭配关系的一种目标类别的物品,包括:服务器通过第一神经网络获取与待处理图像具有搭配关系的N个候选物品,每个候选物品均为目标类别,N为大于1的整数;服务器通过第二神经网络生成与N个候选物品对应的目标评分,目标评分指示候选物品与待处理图像之间的匹配度,也即用于指示一个候选物品与待处理图像的搭配效果图的审美评分;服务器根据与N个候选物品对应的目标评分,从N个候选物品中选取K个目标物品,K为大于或等于1的整数。客户设备展示目标类别的物品包括:客户设备展示K个目标物品。In a possible implementation of the first aspect, the client device obtains an item of a target category that has a collocation relationship with the image to be processed through a first neural network, including: the server obtains an item that has a collocation relationship with the image to be processed through the first neural network N candidate items of the relationship, each candidate item is a target category, and N is an integer greater than 1; the server generates a target score corresponding to the N candidate items through the second neural network, and the target score indicates the relationship between the candidate item and the image to be processed. The matching degree between the N candidate items, that is, the aesthetic score used to indicate the matching renderings of a candidate item and the image to be processed; the server selects K target items from the N candidate items based on the target scores corresponding to the N candidate items, K is an integer greater than or equal to 1. Items of the target category displayed on the customer equipment include: K target items displayed on the customer equipment.
本实现方式中,通过神经网络生成与N个候选物品对应的评分,评分指示候选物品与待处理图像之间的匹配度;并根据每个候选物品与待处理图像之间的匹配度,从N个候选物品中选取最终展示给用户的目标物品。也即对候选物品与待处理图像的搭配美感进行量化评分,在选取目标物品的过程中考虑了搭配效果图美感,从而向用户提供的目标物品与待处理图像的搭配效果图会比较好看,有利于提高本方案的用户粘度。In this implementation, scores corresponding to N candidate items are generated through a neural network. The scores indicate the matching degree between the candidate items and the image to be processed; and based on the matching degree between each candidate item and the image to be processed, from N Select the target item that is finally displayed to the user among the candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.
在第一方面的一种可能实现方式中,通过第二神经网络生成与N个候选物品对应的目标评分,包括:将每个候选物品的图像、每个候选物品的语义标签、待处理图像以及待处理图像中的物品所对应的语义标签输入第二神经网络,得到第二神经网络输出的每个候选物品所对应的目标评分。待处理图像中的物品的语义标签也可以称为待处理图像中的物品的特征信息。候选物品的语义标签可以包括候选物品的至少一种属性信息,作为示例,例如候选物品的语义标签可以包括如下任一项或多项:候选物品的类别、候选物品的风格、候选物品的形状或候选物品的其它属性等等。In a possible implementation of the first aspect, generating a target score corresponding to N candidate items through a second neural network includes: combining the image of each candidate item, the semantic label of each candidate item, the image to be processed, and The semantic labels corresponding to the items in the image to be processed are input into the second neural network, and the target score corresponding to each candidate item output by the second neural network is obtained. The semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed. The semantic label of the candidate item may include at least one attribute information of the candidate item. As an example, the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc.
在第一方面的一种可能实现方式中,客户设备向用户展示目标类别的物品,包括:客户设备向用户展示目标类别的物品与待处理图像的搭配效果图。前述搭配效果图可以为纯图像格式、VR建模后的效果图、AR建模后的效果图或其他格式等等。可选地,客户设备还可以向用户展示每个目标类别的物品的如下任一种或多种信息:每个目标类别的物品的访问链接、名称、价格、目标评分或物品的其它类型的信息等,此处不做限定。In a possible implementation of the first aspect, the client device displays items of a target category to the user, including: the client device displays to the user a rendering of a combination of the items of the target category and the image to be processed. The aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc. Optionally, the client device can also display to the user any one or more of the following information about the items in each target category: access links, names, prices, target ratings, or other types of information about the items in each target category. etc., there is no limit here.
本实现方式中,向用户展示每个目标类别的物品与待处理图像的搭配效果图,从而用户可以更直观地体会到目标类别的物品应用于待处理图像中的搭配效果,有利于提高本方案的用户粘度。In this implementation method, the user is shown the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed, which is conducive to improving the solution. user viscosity.
第二方面,本申请实施例提供一种物品的搭配方法,可将人工智能技术应用于物品搜索领域中,方法包括:客户设备获取用户输入的待处理图像,待处理图像中存在背景与至少两个物品;接收服务器发送的与待处理图像具有搭配关系的一种目标类别的物品,目标类别的物品为服务器基于待处理图像的特征信息和至少两个物品的特征信息得到的;展示目标类别的物品。In the second aspect, embodiments of the present application provide an item matching method, which can apply artificial intelligence technology in the field of item search. The method includes: the client device obtains an image to be processed input by the user, and the image to be processed contains a background and at least two items; receive an item of a target category sent by the server that has a matching relationship with the image to be processed. The item of the target category is obtained by the server based on the feature information of the image to be processed and the feature information of at least two items; display the items of the target category thing.
在第二方面的一种可能实现方式中,图像的特征信息包括由背景和至少两个物品构成 的整体的特征信息,至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案。In a possible implementation of the second aspect, the feature information of the image includes a background and at least two items. The overall characteristic information of at least two items includes attribute information of each item. The attribute information of each item includes any one or more of the following information: category of item, color of item, style of item, item The material or pattern of the item.
在第二方面的一种可能实现方式中,客户设备接收服务器发送的与待处理图像对应的M个候选意图,并向用户展示M个候选意图,M为大于或等于2的整数,每个候选意图指示与待处理图像具有搭配关系的一种类别的物品;客户设备获取与M个候选意图对应的反馈操作,并根据针对M个候选意图的反馈操作,确定与待处理图像具有匹配关系的一种目标类别,向服务器发送该目标类别。In a possible implementation of the second aspect, the client device receives M candidate intentions corresponding to the image to be processed sent by the server, and displays the M candidate intentions to the user, where M is an integer greater than or equal to 2, and each candidate The intent indicates a category of items that have a matching relationship with the image to be processed; the client device obtains the feedback operations corresponding to the M candidate intentions, and determines a item that has a matching relationship with the image to be processed based on the feedback operations for the M candidate intentions. A target category is sent to the server.
本申请第二方面中,客户设备还可以用于执行第一方面以及第一方面的各个可能实现方式中客户设备执行的步骤,第二方面各个可能实现方式中步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。In the second aspect of this application, the client device can also be used to perform the steps performed by the client device in the first aspect and each possible implementation manner of the first aspect. The specific implementation methods and meanings of the nouns in each possible implementation manner of the second aspect As well as the beneficial effects brought about, please refer to the first aspect and will not be repeated here.
第三方面,本申请实施例提供一种物品的搭配方法,可将人工智能技术应用于物品搜索领域中,方法包括:服务器基于待处理图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别,其中,待处理图像中存在背景和至少两个物品;服务器向客户设备发送目标类别的物品的信息。In the third aspect, embodiments of the present application provide an item matching method, which can apply artificial intelligence technology to the field of item search. The method includes: the server uses the characteristic information of the image to be processed and the characteristic information of at least two items, through the first A neural network obtains a target category that has a matching relationship with the image to be processed, where there is a background and at least two items in the image to be processed; the server sends information about the items of the target category to the client device.
在第三方面的一种可能实现方式中,服务器基于待处理图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别,包括:服务器通过第一神经网络生成与待处理图像对应的M个候选意图,M为大于或等于2的整数,每个候选意图指示与待处理图像具有搭配关系的一种类别的物品;服务器向客户设备发送M个候选意图,M个候选意图用于供客户设备得到与待处理图像具有匹配关系的一种目标类别;接收客户设备发送的目标类别。In a possible implementation of the third aspect, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items, including: The server generates M candidate intentions corresponding to the image to be processed through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a category of items that has a collocation relationship with the image to be processed; the server sends a request to the client device Send M candidate intentions, which are used by the client device to obtain a target category that has a matching relationship with the image to be processed; receive the target category sent by the client device.
本申请第三方面中,服务器还可以用于执行第一方面以及第一方面的各个可能实现方式中服务器执行的步骤,第三方面各个可能实现方式中步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。In the third aspect of this application, the server can also be used to execute the steps performed by the server in the first aspect and each possible implementation of the first aspect. The specific implementation of the steps in each possible implementation of the third aspect, the meaning of the nouns and the For all the beneficial effects, please refer to the first aspect and will not be repeated here.
第四方面,本申请实施例提供一种物品的搭配装置,可将人工智能技术应用于物品搜索领域中,物品的搭配装置应用于物品的搭配系统中的客户设备,物品的搭配系统还包括服务器,物品的搭配装置包括:获取模块,用于获取用户输入的图像,图像中存在背景与至少两个物品;接收模块,用于接收服务器发送的与图像具有搭配关系的一种目标类别的物品,目标类别的物品为服务器基于图像的特征信息和至少两个物品的特征信息得到的;展示模块,用于展示目标类别的物品。In the fourth aspect, embodiments of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search. The item matching device is applied to client equipment in an item matching system. The item matching system also includes a server. , The item matching device includes: an acquisition module, used to obtain an image input by the user, in which there is a background and at least two items; a receiving module, used to receive a target category of items sent by the server that has a matching relationship with the image, The items of the target category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module is used to display the items of the target category.
本申请第四方面中,物品的搭配装置还可以用于执行第二方面以及第二方面的各个可能实现方式中客户设备执行的步骤,第四方面各个可能实现方式中步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第二方面,此处不再赘述。In the fourth aspect of the present application, the item matching device can also be used to perform the steps performed by the client device in the second aspect and each possible implementation manner of the second aspect. The specific implementation methods and nouns of the steps in each possible implementation manner of the fourth aspect Please refer to the second aspect for its meaning and beneficial effects, and will not be repeated here.
第五方面,本申请实施例提供一种物品的搭配装置,可将人工智能技术应用于物品搜索领域中,物品的搭配装置应用于物品的搭配系统中的服务器,物品的搭配系统还包括客户设备,物品的搭配装置包括:获取模块,用于基于图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与图像具有搭配关系的一种目标类别的物品,其中,图像 中存在背景和至少两个物品;发送模块,用于向客户设备发送目标类别的物品的信息。In the fifth aspect, embodiments of the present application provide an item matching device that can apply artificial intelligence technology to the field of item search. The item matching device is applied to a server in an item matching system. The item matching system also includes a client device. , the item matching device includes: an acquisition module, configured to acquire a target category of items that has a matching relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, wherein the image There is a background and at least two items in; the sending module is used to send information of items of the target category to the client device.
本申请第五方面中,物品的搭配装置还可以用于执行第三方面以及第三方面的各个可能实现方式中服务器执行的步骤,第五方面各个可能实现方式中步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第三方面,此处不再赘述。In the fifth aspect of this application, the item matching device can also be used to perform the steps performed by the server in the third aspect and each possible implementation of the third aspect. The specific implementation of the steps in each possible implementation of the fifth aspect, the nouns For the meaning and beneficial effects, please refer to the third aspect and will not be repeated here.
第六方面,本申请实施例提供了一种计算机程序产品,计算机程序产品包括程序,当该程序在计算机上运行时,使得计算机执行上述第二方面或第三方面所述的物品的搭配方法。In a sixth aspect, embodiments of the present application provide a computer program product. The computer program product includes a program. When the program is run on a computer, it causes the computer to execute the method for matching items described in the second aspect or the third aspect.
第七方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当该程序在计算机上运行时使得计算机执行上述第二方面或第三方面的物品的搭配方法。In a seventh aspect, embodiments of the present application provide a computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When the program is run on a computer, it causes the computer to execute the second aspect or the third aspect. How to match items.
第八方面,本申请实施例提供了一种客户设备,包括处理器和存储器,处理器与存储器耦合,存储器,用于存储程序;处理器,用于执行存储器中的程序,使得客户设备执行上述各个方面中客户设备执行的方法。In an eighth aspect, embodiments of the present application provide a client device, including a processor and a memory. The processor is coupled to the memory. The memory is used to store programs; the processor is used to execute the program in the memory, so that the client device executes the above. Methods performed by client devices in various aspects.
第九方面,本申请实施例提供了一种服务器,包括处理器和存储器,处理器与存储器耦合,存储器,用于存储程序;处理器,用于执行存储器中的程序,使得服务器执行上述各个方面中服务器执行的方法。In a ninth aspect, embodiments of the present application provide a server, including a processor and a memory. The processor is coupled to the memory. The memory is used to store programs; the processor is used to execute the program in the memory, so that the server performs the above aspects. The method executed by the server.
第十方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持终端设备或通信设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,芯片系统还包括存储器,存储器,用于保存终端设备或通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In a tenth aspect, the present application provides a chip system, which includes a processor for supporting a terminal device or communication device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods. /or information. In a possible design, the chip system also includes a memory, which is used to store necessary program instructions and data for the terminal device or communication device. The chip system may be composed of chips, or may include chips and other discrete devices.
附图说明Description of drawings
图1a为本申请实施例提供的人工智能主体框架的一种结构示意图;Figure 1a is a schematic structural diagram of the artificial intelligence main framework provided by the embodiment of the present application;
图1b为本申请实施例提供的物品的搭配方法的一种应用场景图;Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application;
图2a为本申请实施例提供的物品的搭配系统的一种系统架构图;Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application;
图2b为本申请实施例提供的物品的搭配方法的一种流程示意图;Figure 2b is a schematic flowchart of a method for matching items provided by an embodiment of the present application;
图3为本申请实施例提供的物品的搭配方法的一种流程示意图;Figure 3 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;
图4为本申请实施例提供的物品的搭配方法中获取待处理图像和目标文本信息的一种界面示意图;Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and target text information in the item matching method provided by the embodiment of the present application;
图5为本申请实施例提供的物品的搭配方法中第一特征提取网络的一种示意图;Figure 5 is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application;
图6为本申请实施例提供的物品的搭配方法中展示M个候选意图的一种示意图;Figure 6 is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application;
图7为本申请实施例提供的物品的搭配方法中获取目标类别的一种流程示意图;Figure 7 is a schematic flowchart of obtaining a target category in the item matching method provided by the embodiment of the present application;
图8为本申请实施例提供的物品的搭配方法中目标评分的一种示意图;Figure 8 is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application;
图9为本申请实施例提供的物品的搭配方法中第二神经网络的一种示意图;Figure 9 is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application;
图10为本申请实施例提供的物品的搭配方法中目标物品和待处理图像的搭配效果图的一种示意图; Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application;
图11为本申请实施例提供的物品的搭配方法的一种流程示意图;Figure 11 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;
图12为本申请实施例提供的物品的搭配方法的一种流程示意图;Figure 12 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;
图13为本申请实施例提供的物品的搭配方法的一种流程示意图;Figure 13 is a schematic flowchart of a method for matching items provided by an embodiment of the present application;
图14为本申请实施例提供的物品的搭配装置的一种结构示意图;Figure 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application;
图15为本申请实施例提供的物品的搭配装置的一种结构示意图;Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application;
图16为本申请实施例提供的客户设备的一种结构示意图;Figure 16 is a schematic structural diagram of a client device provided by an embodiment of the present application;
图17是本申请实施例提供的服务器一种结构示意图;Figure 17 is a schematic structural diagram of a server provided by an embodiment of the present application;
图18为本申请实施例提供的芯片的一种结构示意图。Figure 18 is a schematic structural diagram of a chip provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances, and are merely a way of distinguishing objects with the same attributes in describing the embodiments of the present application. Furthermore, the terms "include" and "having" and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, product or apparatus comprising a series of elements need not be limited to those elements, but may include not explicitly other elements specifically listed or inherent to such processes, methods, products or equipment.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application are described below with reference to the accompanying drawings. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.
首先对人工智能系统总体工作流程进行描述,请参见图1a,图1a示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1a. Figure 1a shows a structural schematic diagram of the artificial intelligence main framework. The following is from the "intelligent information chain" (horizontal axis) and "IT value chain" ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
(1)基础设施(1)Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片提供,该智能芯片具体可以采用中央处理器(central processing unit,CPU)、嵌入式神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms. Communicate with the outside through sensors; computing power is provided by a smart chip, which can specifically use a central processing unit (CPU), an embedded neural network processor (neural-network processing unit, NPU), a graphics processor ( Graphics processing unit (GPU), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) and other hardware acceleration chips; the basic platform includes distributed computing framework and network and other related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2)Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、 语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, Voice and text also involve IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3)Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
(5)智能产品及行业应用(5) Intelligent products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、智慧城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.
本申请实施例可以应用于人工智能领域的各个应用场景中,具体的,可以应用于利用图片进行物品搜索的应用场景中。作为示例,请参阅图1b,图1b为本申请实施例提供的物品的搭配方法的一种应用场景图,如图1b所示,用户在使用购物类应用程序时,当用户点击A1示出的图标时,可以输入待处理图像,以搜索并购买与待处理图像具有搭配关系的一种类别的物品。The embodiments of this application can be applied to various application scenarios in the field of artificial intelligence. Specifically, they can be applied to application scenarios that use pictures to search for items. As an example, please refer to Figure 1b. Figure 1b is an application scenario diagram of the item matching method provided by the embodiment of the present application. As shown in Figure 1b, when the user is using a shopping application, when the user clicks on the icon, you can enter the image to be processed to search for and purchase items of a category that have a matching relationship with the image to be processed.
作为另一示例,例如用户在使用装修设计类的应用程序时,可以输入待处理图像,以搜索与待处理图像具有搭配关系的一种类别的物品等,应理解,本申请实施例还可以应用于其他获取与待处理图像具有搭配关系的物品的场景中,此处不再对其他应用场景进行一一列举。As another example, when using a decoration design application, the user can input an image to be processed to search for items of a category that have a matching relationship with the image to be processed. It should be understood that the embodiments of the present application can also be applied In other scenarios of obtaining items that have a matching relationship with the image to be processed, other application scenarios will not be listed one by one here.
结合上述说明,先对本申请实施例提供的物品的搭配系统进行描述,请参阅图2a,图2a为本申请实施例提供的物品的搭配系统的一种系统架构图,物品的搭配系统200包括训练设备210、数据库220、执行设备230、数据存储系统240和客户设备250,执行设备230中包括计算模块231。In conjunction with the above description, the item matching system provided by the embodiment of the present application is first described. Please refer to Figure 2a. Figure 2a is a system architecture diagram of the item matching system provided by the embodiment of the present application. The item matching system 200 includes training Device 210, database 220, execution device 230, data storage system 240 and client device 250. The execution device 230 includes a computing module 231.
其中,数据库220中存储有第一训练数据集合,训练设备210生成第一模型/规则201,并利用数据库中的第一训练数据集合对第一模型/规则201进行迭代训练,得到成熟的第一模型/规则201。第一模型/规则201可以具体表现为第一神经网络或非神经网络形式的模型,本申请实施例中仅以第一模型/规则201为第一神经网络为例进行说明。Among them, the first training data set is stored in the database 220, the training device 210 generates the first model/rules 201, and uses the first training data set in the database to iteratively train the first model/rules 201 to obtain the mature first Model/Rule 201. The first model/rule 201 may be embodied as a model in the form of a first neural network or a non-neural network. In the embodiment of this application, the first model/rule 201 is a first neural network as an example for description.
执行设备230可以调用数据存储系统240中的数据、代码等,也可以将数据、指令等存入数据存储系统240中。数据存储系统240可以置于执行设备230中,也可以为数据存储系统240相对执行设备230是外部存储器。 The execution device 230 can call data, codes, etc. in the data storage system 240, and can also store data, instructions, etc. in the data storage system 240. The data storage system 240 may be placed in the execution device 230 , or the data storage system 240 may be an external memory relative to the execution device 230 .
训练设备210得到的训练后的第一模型/规则201可以部署于执行设备230中,执行设备230可以表现为与客户设备250上部署的应用程序对应的服务器。执行设备230的计算模块231可以通过第一模型/规则201获取与待处理图像具有匹配关系的一种目标类别,其中,待处理图像是通过客户设备250得到的,该一个目标类别指示与待处理图像具有搭配关系的一种目标类别的物品。The trained first model/rule 201 obtained by the training device 210 may be deployed in the execution device 230 , and the execution device 230 may appear as a server corresponding to the application program deployed on the client device 250 . The computing module 231 of the execution device 230 may obtain, through the first model/rule 201, a target category that has a matching relationship with the image to be processed, where the image to be processed is obtained through the client device 250, and the target category indicates the target category that has a matching relationship with the image to be processed. The image has a collocation relationship with a target category of items.
客户设备250可以表现为各种形态的终端设备,例如手机、平板、笔记本电脑、虚拟现实(virtual reality,VR)设备或增强现实(augmented reality,AR)设备等等。The client device 250 can be represented by various forms of terminal devices, such as mobile phones, tablets, laptops, virtual reality (VR) devices or augmented reality (AR) devices, etc.
本申请的一些实施例中,请参阅图2a,执行设备230和客户设备250可以为分别独立的设备,执行设备230配置有输入/输出(I/O)接口,与客户设备250进行数据交互,“用户”可以通过客户设备250向I/O接口输入待处理图像,执行设备230通过I/O接口将与待处理图像具有搭配关系的目标类别的物品返回给客户设备250,提供给用户。In some embodiments of the present application, please refer to Figure 2a. The execution device 230 and the client device 250 may be independent devices. The execution device 230 is configured with an input/output (I/O) interface for data interaction with the client device 250. The "user" can input the image to be processed to the I/O interface through the client device 250, and the execution device 230 returns the items of the target category that have a matching relationship with the image to be processed to the client device 250 through the I/O interface, and provides them to the user.
值得注意的,图2a仅是本发明实施例提供的两种物品的搭配系统的架构示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在本申请的另一些实施例中,执行设备230也可以和客户设备250集成于同一设备中,此处不做限定。It is worth noting that Figure 2a is only a schematic architectural diagram of a matching system for two items provided by an embodiment of the present invention, and the positional relationship between the equipment, components, modules, etc. shown in the figure does not constitute any limitation. For example, in other embodiments of the present application, the execution device 230 and the client device 250 can also be integrated into the same device, which is not limited here.
在图2a示出的物品的搭配系统中,请继续参阅图2b,图2b为本申请实施例提供的物品的搭配方法的一种流程示意图。S1、获取用户输入的待处理图像,待处理图像中存在背景与至少两个物品。S2、基于待处理图像的特征信息和该至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有搭配关系的一种目标类别的物品。S3、展示该目标类别的物品。In the item matching system shown in Figure 2a, please continue to refer to Figure 2b. Figure 2b is a schematic flow chart of the item matching method provided by an embodiment of the present application. S1. Obtain the image to be processed input by the user. There is a background and at least two items in the image to be processed. S2. Based on the characteristic information of the image to be processed and the characteristic information of the at least two items, obtain an item of a target category that has a matching relationship with the image to be processed through the first neural network. S3. Display items of the target category.
本申请实施例中,用户不仅可以通过提供待处理图像来搜索想要搭配的物品,且当用户输入的是复杂的待处理图像(也即包括至少两个物品的图像)时,依旧能够获取到与整个待处理图像匹配的一种目标类别的物品,大大扩展了本方案的应用场景,有利于提高本方案的用户粘度。In the embodiment of the present application, users can not only search for items they want to match by providing images to be processed, but also when the user inputs a complex image to be processed (that is, an image including at least two items), the user can still obtain A target category of items that matches the entire image to be processed greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution.
结合上述描述,下面开始对本申请实施例提供的第一神经网络的推理阶段的具体实现流程进行描述。本申请实施例中,物品的搭配系统可以包括客户设备和服务器,“获取与待处理图像具有匹配关系的一种目标类别”的过程中,可以包括对搭配图像进行特征提取和根据提取到的特征确定目标类别两个部分。In conjunction with the above description, the following begins to describe the specific implementation process of the inference phase of the first neural network provided by the embodiment of the present application. In the embodiment of the present application, the item matching system may include a client device and a server. The process of "obtaining a target category that has a matching relationship with the image to be processed" may include feature extraction of the matching image and the extraction of features based on the extracted features. Identify two parts of the target category.
具体的,在一种实现方式中,前述两个部分可以完全由服务器主导,也即第一神经网络的执行设备和客户设备是分离的;在另一种实现方式中,前述两个部分的操作可以完全由客户设备主导,也即第一神经网络的执行设备和客户设备集成于同一设备;在另一种实现方式中,可以在客户设备上执行特征提取操作,并且由服务器主导确定目标类别的操作,则第一神经网络的执行设备和客户设备也是分离的。由于前述三种实现方式的具体实现流程有所不同,以下分别进行描述。Specifically, in one implementation, the aforementioned two parts can be completely dominated by the server, that is, the execution device of the first neural network and the client device are separated; in another implementation, the operations of the aforementioned two parts It can be completely led by the client device, that is, the execution device of the first neural network and the client device are integrated on the same device; in another implementation, the feature extraction operation can be performed on the client device, and the server can lead the determination of the target category. operation, the execution device of the first neural network and the client device are also separated. Since the specific implementation processes of the above three implementation methods are different, they are described separately below.
(一)特征提取和确定目标类别这两个部分均由服务器主导(1) Feature extraction and target category determination are both dominated by the server.
本申请实施例中,请参阅图3,图3为本申请实施例提供的物品的搭配方法的一种流程示意图,本申请实施例提供的物品的搭配方法可以包括:In the embodiment of the present application, please refer to Figure 3. Figure 3 is a schematic flowchart of a method of matching items provided by an embodiment of the present application. The method of matching items provided by an embodiment of the present application may include:
301、客户设备获取用户输入的待处理图像。 301. The client device obtains the image to be processed input by the user.
本申请实施例中,用户可以通过客户设备输入待处理图像,对应的,客户设备获取用户输入的待处理图像,以搜索与该待处理图像具有搭配关系的物品。In the embodiment of the present application, the user can input the image to be processed through the client device. Correspondingly, the client device obtains the image to be processed input by the user to search for items that have a matching relationship with the image to be processed.
其中,该待处理图像中可以存在一个或多个物品。进一步地,该待处理图像可以为用户从客户设备本地存储的图像中选取的一个图像,也可以为用户利用客户设备上的摄像机拍摄的一个图像,也可以为用户利用浏览器下载的图像等等,此处不做限定。One or more items may exist in the image to be processed. Further, the image to be processed can be an image selected by the user from images stored locally on the client device, an image captured by the user using the camera on the client device, or an image downloaded by the user using a browser, etc. , no limitation is made here.
302、客户设备获取用户输入的目标文本信息,该目标文本信息用于指示用户的搜索意图。302. The client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
本申请的一些实施例中,客户端设备还可以获取用户输入的目标文本信息,该目标文本信息用于指示用户的搜索意图。进一步地,该目标文本信息所指示的物品可以为待处理图像中的物品,也可以不是待处理图像中的物品。In some embodiments of the present application, the client device can also obtain target text information input by the user, and the target text information is used to indicate the user's search intention. Further, the item indicated by the target text information may be an item in the image to be processed, or may not be an item in the image to be processed.
为了更直观地理解本方案,请参阅图4,图4为本申请实施例提供的物品的搭配方法中获取待处理图像和目标文本信息的一种界面示意图。图4包括(a)和(b)两个子示意图,在图4的(a)子示意图中,在用户通过图4的(a)子示意图中A1所指向的图标面输入待处理图像后,可以触发进入图4的(b)子示意图,也即提示用户通过图4的(b)子示意图输入目标文本信息,应理解,图4中的示例仅为方便理解本方案,具体采用什么样的界面示意图可以结合实际产品形态灵活设定,此处不做限定。In order to understand this solution more intuitively, please refer to Figure 4. Figure 4 is a schematic diagram of an interface for obtaining the image to be processed and the target text information in the item matching method provided by the embodiment of the present application. Figure 4 includes two sub-schematic diagrams (a) and (b). In the sub-schematic diagram (a) of Figure 4, after the user inputs the image to be processed through the icon surface pointed by A1 in the sub-schematic diagram of Figure 4 (a), the image can be Triggering entry into sub-diagram (b) of Figure 4, that is, prompting the user to input target text information through sub-diagram (b) of Figure 4. It should be understood that the example in Figure 4 is only for the convenience of understanding this solution. What kind of interface is used specifically? The schematic diagram can be flexibly set according to the actual product form, and is not limited here.
303、服务器将待处理图像输入第三神经网络,以通过第三神经网络对待处理图像进行特征提取,得到与待处理图像对应的目标特征信息,目标特征信息包括待处理图像中的物品的特征信息和待处理图像的特征信息。303. The server inputs the image to be processed into the third neural network to extract features of the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed. The target feature information includes feature information of the items in the image to be processed. and feature information of the image to be processed.
本申请的一些实施例中,客户端在获取到用户输入的待处理图像之后,可以向服务器发送该待处理图像,服务器可以将接收到的待处理图像输入第三神经网络,以通过第三神经网络对整个待处理图像进行特征提取,得到待处理图像的特征信息,待处理图像的特征信息包括由待处理图像的背景和至少两个物品构成的整体的特征信息;服务器还通过第三神经网络识别待处理图像中的各个物品区域,并对待处理图像中的物品进行特征提取,得到待处理图像中的至少两个物品的特征信息,至少两个物品的特征信息包括每个物品的属性信息。In some embodiments of the present application, after obtaining the image to be processed input by the user, the client can send the image to be processed to the server, and the server can input the received image to be processed into the third neural network to pass the third neural network. The network performs feature extraction on the entire image to be processed to obtain the feature information of the image to be processed. The feature information of the image to be processed includes the overall feature information composed of the background of the image to be processed and at least two items; the server also uses a third neural network Identify each item area in the image to be processed, and perform feature extraction on the items in the image to be processed, to obtain feature information of at least two items in the image to be processed, where the feature information of the at least two items includes attribute information of each item.
其中,目标特征信息包括待处理图像中的至少两个物品的特征信息和待处理图像的特征信息。前述待处理图像的特征信息指的是将该待处理图像视为一个整体(也即待处理图像的背景和至少两个物品构成的整体),对待处理图像进行特征提取后得到的特征信息;作为示例,例如待处理图像的特征信息可以包括待处理图像的纹理信息、颜色信息、轮廓信息、风格信息、场景信息或其他类型的特征信息等。The target feature information includes feature information of at least two items in the image to be processed and feature information of the image to be processed. The aforementioned feature information of the image to be processed refers to the feature information obtained after feature extraction of the image to be processed by treating the image to be processed as a whole (that is, the background of the image to be processed and at least two items); as For example, the feature information of the image to be processed may include texture information, color information, contour information, style information, scene information or other types of feature information of the image to be processed.
待处理图像中至少两种物品的特征信息也可以称为待处理图像所对应的语义标签集合,至少两种物品的特征信息可以包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品在待处理图像中的位置信息、物品的类别信息和物品的颜色信息;可选地,还可以包括每个物品的风格信息、物品的材质、物品的图案或其他特征信息。The characteristic information of at least two items in the image to be processed can also be called the set of semantic tags corresponding to the image to be processed. The characteristic information of the at least two items can include attribute information of each item. The attribute information of each item includes any of the following: One or more types of information: the position information of the item in the image to be processed, the category information of the item, and the color information of the item; optionally, it can also include style information of each item, the material of the item, the pattern of the item, or other Feature information.
可选地,不同类别的物品的特征信息所包括的信息可以不同,作为示例,例如若待处理图像中包括床和上衣,床的特征信息可以包括床在待处理图像中的位置信息、床这一类 别信息、床的颜色和床的风格,上衣的特征信息可以包括上衣在待处理图像中的位置信息、上衣这一类别信息、上衣的颜色、上衣的形状和上衣的材质,应理解,此处举例仅为方便理解本方案,不用于限定本方案。Optionally, the characteristic information of items of different categories may include different information. As an example, if the image to be processed includes a bed and a top, the characteristic information of the bed may include the position information of the bed in the image to be processed, the bed's location information, and the location information of the bed. one type identification information, the color of the bed and the style of the bed. The characteristic information of the top may include the position information of the top in the image to be processed, the category information of the top, the color of the top, the shape of the top and the material of the top. It should be understood that here The examples are only used to facilitate understanding of this solution and are not used to limit this solution.
第三神经网络具体可以表现为卷积神经网络或其他用于进行特征提取的神经网络等。进一步地,第三神经网络可以包括第一特征提取网络和第二特征提取网络这两个不同的特征提取网络。第一特征提取网络用于生成待处理图像中的至少两个物品的特征信息,第二特征提取网络用于生成整个待处理图像的特征信息。The third neural network can specifically be embodied as a convolutional neural network or other neural networks used for feature extraction. Further, the third neural network may include two different feature extraction networks: a first feature extraction network and a second feature extraction network. The first feature extraction network is used to generate feature information of at least two items in the image to be processed, and the second feature extraction network is used to generate feature information of the entire image to be processed.
在第一特征提取网络的训练阶段,第一特征提取网络可以作为用于对图像进行目标识别的神经网络中的一部分,也即训练设备可以利用训练数据,对用于对图像进行目标识别的神经网络进行迭代训练直至满足收敛条件,在得到训练后的神经网络后,从中获取训练后的第一特征提取网络。In the training phase of the first feature extraction network, the first feature extraction network can be used as part of the neural network used for target recognition of images, that is, the training device can use the training data to train the neural network used for target recognition of images. The network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained first feature extraction network is obtained from it.
作为示例,例如用于对图像进行目标识别的神经网络可以识别图像中的茶几、餐边柜、收纳柜、鞋柜和花架,也即本申请实施例中的第一特征提取网络能够在更细粒度的层级进行特征提取。As an example, for example, a neural network used for object recognition in an image can identify coffee tables, sideboards, storage cabinets, shoe cabinets and flower racks in the image. That is, the first feature extraction network in the embodiment of the present application can be used in more detailed Feature extraction at a granular level.
为了更直观地理解本方案,请参阅图5,图5为本申请实施例提供的物品的搭配方法中第一特征提取网络的一种示意图。如图5所示,在将待处理图像输入第一特征提取网络后,第一特征提取网络可以识别出待处理图像中的三个物品区域,并生成待处理图像中物品的特征信息,应理解,图5中的示例仅为方便理解本方案,不用于限定本方案。In order to understand this solution more intuitively, please refer to Figure 5 , which is a schematic diagram of the first feature extraction network in the item matching method provided by the embodiment of the present application. As shown in Figure 5, after the image to be processed is input into the first feature extraction network, the first feature extraction network can identify the three item areas in the image to be processed and generate feature information of the items in the image to be processed. It should be understood that , the example in Figure 5 is only for convenience of understanding this solution and is not used to limit this solution.
在第二特征提取网络的训练阶段,第二特征提取网络可以作为用于对整个图像进行分类的神经网络中的一部分,也即训练设备可以利用训练数据,对用于对整个图像进行分类的神经网络进行迭代训练直至满足收敛条件,在得到训练后的神经网络后,从中获取训练后的第二特征提取网络。In the training phase of the second feature extraction network, the second feature extraction network can be used as part of the neural network used to classify the entire image, that is, the training device can use the training data to classify the neural network used to classify the entire image. The network is iteratively trained until the convergence conditions are met. After the trained neural network is obtained, the trained second feature extraction network is obtained from it.
本申请实施例中,待处理图像的特征信息指的是将该待处理图像视为一个整体,对搭配图像进行特征提取后得到的特征信息,待处理图像中至少两个物品的特征信息可以包括每个物品的属性信息,进一步细化了待处理图像的特征信息和至少两个物品的特征信息的概念,有利于更清楚的区分待处理图像的特征信息和至少两个物品的特征信息;且每个物品的特征信息中包括物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案等信息,在特征提取过程中充分考虑了待处理图像中的物体的信息,有利于提高确定的目标类别的准确度。In the embodiment of the present application, the characteristic information of the image to be processed refers to the characteristic information obtained by treating the image to be processed as a whole and extracting features from the matching image. The characteristic information of at least two items in the image to be processed may include The attribute information of each item further refines the concept of the feature information of the image to be processed and the feature information of at least two items, which is conducive to a clearer distinction between the feature information of the image to be processed and the feature information of at least two items; and The feature information of each item includes information such as the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item. In the feature extraction process, the information of the object in the image to be processed is fully considered, which is beneficial to improve Accuracy of identified target categories.
304、服务器将文本信息输入第四神经网络,以通过第四神经网络对文本信息进行特征提取,得到文本信息的特征信息。304. The server inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
本申请的一些实施例中,服务器还可以将文本信息输入第四神经网络中,以通过第四神经网络对文本信息进行特征提取,得到文本信息的特征信息。In some embodiments of the present application, the server can also input text information into a fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
其中,步骤302为可选步骤,若执行步骤302,则输入第四神经网络中的文本信息指的是步骤302中获取到的目标文本信息;若不执行步骤302,则输入第四神经网络中的文本信息可以为步骤303中获取到的待处理图像中的物品的特征信息,也即输入第四神经网络中的文本信息可以为待处理图像的语义标签集合。 Among them, step 302 is an optional step. If step 302 is executed, the text information input into the fourth neural network refers to the target text information obtained in step 302; if step 302 is not executed, the text information input into the fourth neural network is The text information may be the characteristic information of the items in the image to be processed obtained in step 303, that is, the text information input into the fourth neural network may be a set of semantic labels of the image to be processed.
第四神经网络为对文本信息进行特征提取的神经网络,具体可以表现为循环神经网络或其他类型的神经网络等,此处不做穷举。The fourth neural network is a neural network that extracts features from text information. It can be embodied as a recurrent neural network or other types of neural networks, etc., and is not exhaustive here.
需要说明的是,步骤304也是可选步骤,若不执行步骤304,则不需要执行步骤302,则在执行完步骤303之后,可以直接执行步骤305。It should be noted that step 304 is also an optional step. If step 304 is not executed, step 302 does not need to be executed. After step 303 is executed, step 305 can be executed directly.
305、服务器基于待处理图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别。305. Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.
本申请实施例中,服务器可以基于待处理图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别。具体的,在一种实现方式中,若执行步骤303和304,则服务器可以将目标特征信息和文本信息的特征信息输入第一神经网络,得到第一神经网络生成与待处理图像对应的M个候选意图,每个候选意图指示与待处理图像具有搭配关系的一种类别的物品。In this embodiment of the present application, the server may obtain a target category that has a matching relationship with the image to be processed through the first neural network based on the characteristic information of the image to be processed and the characteristic information of at least two items. Specifically, in an implementation manner, if steps 303 and 304 are executed, the server can input the target feature information and the feature information of the text information into the first neural network, so that the first neural network generates M corresponding to the image to be processed. Candidate intents, each candidate intent indicates a category of items that has a collocation relationship with the image to be processed.
其中,M为大于或等于1的整数,进一步地,在待处理图像中存在至少两个物体的情况下,M为大于或等于2的整数。Wherein, M is an integer greater than or equal to 1. Further, when there are at least two objects in the image to be processed, M is an integer greater than or equal to 2.
可选地,第一神经网络还可以输出与M个候选意图一一对应的M个第一评分,每个第一评分用于指示一个候选意图与用户的搜索意图一致的概率。Optionally, the first neural network can also output M first scores that correspond one-to-one to the M candidate intentions, and each first score is used to indicate the probability that a candidate intention is consistent with the user's search intention.
进一步可选地,当输入第一神经网络的信息发生变化时,第一神经网络输出的候选意图的数量可以相同或不同,也即第一神经网络可以根据实际情况确定输出的候选意图的数量。Further optionally, when the information input to the first neural network changes, the number of candidate intentions output by the first neural network may be the same or different, that is, the first neural network may determine the number of candidate intentions output according to the actual situation.
服务器将M个候选意图发送给客户设备,以通过客户设备的显示界面向用户展示M个候选意图;其中,客户设备可以采用文本、图像或其他形式向用户呈现该M个候选意图。The server sends the M candidate intentions to the client device to present the M candidate intentions to the user through the display interface of the client device; wherein the client device can present the M candidate intentions to the user in text, images, or other forms.
可选地,若第一神经网络还输出了M个第一评分,则服务器还可以将M个第一评分发送给客户设备,则客户设备可以按照每个候选意图所对应的第一评分,对M个候选意图进行排序,第一评分越高的候选意图,排序位置越靠前。Optionally, if the first neural network also outputs M first scores, the server can also send the M first scores to the client device, and the client device can evaluate the M first scores according to the first scores corresponding to each candidate intention. M candidate intentions are sorted. The higher the first score, the higher the ranking position.
为了更直观地理解本方案,请参阅图6,图6为本申请实施例提供的物品的搭配方法中展示M个候选意图的一种示意图。例如待处理图像包含3个主要区域,分别为床、衣柜和墙面,文本信息为“墙壁装饰”,则目标特征信息可以包括床的特征信息、衣柜的特征信息、墙面的特征信息和整个待处理图像的特征信息,M个候选意图可以包括图6中的装饰画、挂件和灯饰,应理解,图6中的示例仅为方便理解本方案,不用于限定本方案。In order to understand this solution more intuitively, please refer to FIG. 6 , which is a schematic diagram showing M candidate intentions in the item matching method provided by the embodiment of the present application. For example, the image to be processed contains three main areas, namely bed, wardrobe and wall, and the text information is "wall decoration", then the target feature information can include the feature information of the bed, the feature information of the wardrobe, the feature information of the wall and the entire The characteristic information of the image to be processed, the M candidate intentions may include decorative paintings, pendants and lighting in Figure 6. It should be understood that the examples in Figure 6 are only for convenience of understanding this solution and are not used to limit this solution.
客户设备在向用户展示M个候选意图之后,在一种情况中,若客户设备获取与M个候选意图对应的反馈操作,则可以根据针对M个候选意图的反馈操作,确定与待处理图像具有匹配关系的一种目标类别,并向服务器发送与搭配图像具有匹配关系的目标类别。对应的,若服务器在目标时长内获取到客户设备发送的前述目标类别,则可以确定与待处理图像对应的该目标类别。After the client device displays M candidate intentions to the user, in one case, if the client device obtains feedback operations corresponding to the M candidate intentions, it can determine that the image to be processed has the same characteristics as the image to be processed based on the feedback operations for the M candidate intentions. A target category of the matching relationship, and sends the target category that has a matching relationship to the collocation image to the server. Correspondingly, if the server obtains the aforementioned target category sent by the client device within the target time period, it can determine the target category corresponding to the image to be processed.
其中,“反馈操作”可以为对M个候选意图中的一个候选意图的选择操作,或者,“反馈操作”也可以为用户手动输入新的搜索意图等,此处不对“反馈操作”的具体实现形式进行列举。对应的,目标类别可以为M个候选意图中的一个意图,也可以为是M个候选意图之外的其它搜索意图。 Among them, the "feedback operation" can be a selection operation for one of the M candidate intentions, or the "feedback operation" can also be the user manually inputting a new search intention, etc. The specific implementation of the "feedback operation" is not mentioned here. List in the form. Correspondingly, the target category may be one of the M candidate intentions, or may be other search intentions other than the M candidate intentions.
为了更直观地理解本方案,请参阅图7,图7为本申请实施例提供的物品的搭配方法中获取目标类别的一种流程示意图。其中,E1、服务器将目标特征信息和文本信息的特征信息输入第一神经网络,得到第一神经网络生成与待处理图像对应的M个候选意图。E2、服务器向客户设备发送M个候选意图。E3、客户设备向用户展示M个候选意图。E4、客户设备基于用户针对M个候选意图输入的反馈操作,确定了一个目标类别。E5、客户设备将目标类别发送给服务器,对应的,服务器接收到该目标类别。应理解,图7中的示例仅为方便理解本方案,不用于限定本方案。In order to understand this solution more intuitively, please refer to Figure 7 , which is a schematic flowchart of obtaining a target category in the item matching method provided by an embodiment of the present application. Among them, E1 and the server input the target feature information and the feature information of the text information into the first neural network, and the first neural network generates M candidate intentions corresponding to the image to be processed. E2. The server sends M candidate intentions to the client device. E3. The client device displays M candidate intentions to the user. E4. The client device determines a target category based on the user's feedback operations for the M candidate intent inputs. E5. The client device sends the target category to the server, and accordingly, the server receives the target category. It should be understood that the example in Figure 7 is only for convenience of understanding this solution and is not used to limit this solution.
在另一种情况中,若客户设备在目标时长内未获取到M个候选意图所对应的反馈操作,则客户设备可以不向服务器发送任何反馈信息,或者,客户设备也可以向服务器发送第一反馈信息,第一反馈信息用于告知服务器未收到用户输入的反馈操作。对应的,若服务器在目标时长内未收到客户设备的反馈信息,或者,收到了客户设备发送的第一反馈信息,均可以将M个候选意图中第一评分值最高的一个候选意图确定为目标类别。In another case, if the client device does not obtain the feedback operations corresponding to the M candidate intentions within the target time period, the client device may not send any feedback information to the server, or the client device may also send the first response to the server. Feedback information, the first feedback information is used to inform the server that the feedback operation input by the user has not been received. Correspondingly, if the server does not receive feedback information from the client device within the target time period, or receives the first feedback information sent by the client device, the candidate intention with the highest first score value among the M candidate intentions can be determined as target category.
在另一种实现方式中,若执行步骤303,但不执行步骤304,服务器可以将目标特征信息输入第一神经网络,得到第一神经网络生成与待处理图像对应的M个候选意图。In another implementation, if step 303 is executed but step 304 is not executed, the server can input the target feature information into the first neural network, so that the first neural network generates M candidate intentions corresponding to the image to be processed.
服务器将M个候选意图发送给客户设备,以通过客户设备的显示界面向用户展示M个候选意图,并通过的显示界面客户设备获取与M个候选意图对应的反馈操作;客户设备根据反馈操作,确定与待处理图像具有匹配关系的一种目标类别,并向服务器发送与搭配图像对应的该目标类别。The server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device operates based on the feedback, Determine a target category that has a matching relationship with the image to be processed, and send the target category corresponding to the collocated image to the server.
需要说明的是,本实现方式与上一实现方式的区别在于:上一实现方式中是将“目标特征信息和文本信息的特征信息”输入第一神经网络,本实现方式中是仅将“目标特征信息”输入第一神经网络,本实现方式的具体实现方式可以参阅上一实现方式中的描述,此处不做赘述。It should be noted that the difference between this implementation method and the previous implementation method is that in the previous implementation method, "target feature information and feature information of text information" are input into the first neural network, while in this implementation method, only "target feature information and text information" are input into the first neural network. "Feature information" is input into the first neural network. For the specific implementation of this implementation, please refer to the description in the previous implementation, and will not be described again here.
本申请实施例中,在对待处理图像进行特征提取时,可以不仅获取整个待处理图像的特征信息,还可以获取待处理图像中的物品的特征信息,进而基于整个待处理图像的特征信息和待处理图像中物品的特征信息,生成与整个待处理图像具有搭配关系的M个类别的物品,也即不仅考虑了整个待处理图像的信息,还充分考虑了待处理图像中的每个物体,有利于提高确定的候选意图的准确度。In the embodiment of the present application, when performing feature extraction on the image to be processed, not only the feature information of the entire image to be processed can be obtained, but also the feature information of the items in the image to be processed can be obtained, and then based on the feature information of the entire image to be processed and the features to be processed, Process the characteristic information of the items in the image and generate M categories of items that have a matching relationship with the entire image to be processed. That is, not only the information of the entire image to be processed is considered, but also each object in the image to be processed is fully considered. It is beneficial to improve the accuracy of determined candidate intentions.
可选地,还可以获取用户输入的目标文本信息,该目标文本信息用于指示用户的搜索意图,将目标特征信息和目标文本信息的特征信息一起输入第三神经网络中,也即在获取与待处理图像具有搭配关系的过程中,不仅可以充分获取待处理图像中的信息,还可以结合用于指示用户的搜索意图的文本信息,以进一步提高确定的候选意图的准确度。Optionally, the target text information input by the user can also be obtained. The target text information is used to indicate the user's search intention, and the target feature information and the feature information of the target text information are input into the third neural network together, that is, after obtaining and In the process of establishing a matching relationship between the images to be processed, not only the information in the images to be processed can be fully obtained, but also the text information used to indicate the user's search intention can be combined to further improve the accuracy of the determined candidate intentions.
在另一种实现方式中,若不执行步骤303,也不执行步骤304,则服务器可以将待处理图像输入第一神经网络,通过第一神经网络对待处理图像进行特征提取,得到整个待处理图像的特征信息;根据整个待处理图像的特征信息,通过第一神经网络生成与待处理图像对应的M个候选意图。In another implementation, if neither step 303 nor step 304 is executed, the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed. feature information; based on the feature information of the entire image to be processed, M candidate intentions corresponding to the image to be processed are generated through the first neural network.
服务器将M个候选意图发送给客户设备,以通过客户设备的显示界面向用户展示M个候选意图,并通过的显示界面客户设备获取与M个候选意图对应的反馈操作;客户设备根 据反馈操作,确定与待处理图像具有匹配关系的一种目标类别,并向服务器发送与搭配图像对应的该目标类别。前述步骤的具体实现方式,可以参阅上述描述。The server sends M candidate intentions to the client device to display the M candidate intentions to the user through the display interface of the client device, and obtains feedback operations corresponding to the M candidate intentions through the display interface of the client device; the client device is based on According to the feedback operation, a target category that has a matching relationship with the image to be processed is determined, and the target category corresponding to the matching image is sent to the server. For specific implementation methods of the aforementioned steps, please refer to the above description.
本申请实施例中,先通过第一神经网络生成M个候选意图,再基于用户针对M个候选意图输入的反馈操作,确定与待处理图像具有匹配关系的一种目标类别,也即采用交互式的方式对用户的搜索意图进行引导,有利于提高确定的目标类别的准确性。In the embodiment of this application, M candidate intentions are first generated through the first neural network, and then based on the feedback operation input by the user for the M candidate intentions, a target category that has a matching relationship with the image to be processed is determined, that is, an interactive method is used This method guides the user's search intention, which is conducive to improving the accuracy of the determined target category.
在另一种实现方式中,若执行步骤303和304,服务器也可以将目标特征信息和文本信息的特征信息输入第一神经网络,得到第一神经网络生成与待处理图像具有匹配关系的一种目标类别。In another implementation, if steps 303 and 304 are executed, the server can also input the target feature information and the feature information of the text information into the first neural network to obtain an image generated by the first neural network that has a matching relationship with the image to be processed. target category.
在另一种实现方式中,若执行步骤303,但不执行步骤304,服务器也可以将目标特征信息输入第一神经网络,得到第一神经网络生成与待处理图像具有匹配关系的一种目标类别。In another implementation, if step 303 is executed but step 304 is not executed, the server can also input the target feature information into the first neural network to obtain a target category generated by the first neural network that has a matching relationship with the image to be processed. .
在另一种实现方式中,若不执行步骤303,也不执行步骤304,则服务器可以将待处理图像输入第一神经网络,通过第一神经网络对待处理图像进行特征提取,得到整个待处理图像的特征信息;根据整个待处理图像的特征信息,通过第一神经网络生成与待处理图像具有匹配关系的一种目标类别。In another implementation, if neither step 303 nor step 304 is executed, the server can input the image to be processed into the first neural network, and perform feature extraction on the image to be processed through the first neural network to obtain the entire image to be processed. According to the characteristic information of the entire image to be processed, a target category that has a matching relationship with the image to be processed is generated through the first neural network.
306、服务器获取N个候选物品,每个候选物品均为目标类别。306. The server obtains N candidate items, each of which is a target category.
本申请的一些实施例中,服务器在确定与待处理图像具有匹配关系的一种目标类别之后,可以从服务器存储的物品库中获取与目标类别对应的N个候选物品,也即服务器可以从物品库中获取N个目标类别的候选物品,N为大于1的整数。In some embodiments of the present application, after the server determines a target category that has a matching relationship with the image to be processed, it can obtain N candidate items corresponding to the target category from the item library stored in the server. That is, the server can obtain N candidate items from the items. Obtain N candidate items of the target category from the library, where N is an integer greater than 1.
307、服务器通过第二神经网络生成与N个候选物品对应的目标评分,目标评分指示候选物品与待处理图像之间的匹配度。307. The server generates target scores corresponding to the N candidate items through the second neural network. The target scores indicate the matching degree between the candidate items and the image to be processed.
本申请的一些实施例中,服务器可以通过第二神经网络生成N个候选物品中每个候选物品所对应的目标评分,其中,一个目标评分指示一个候选物品与待处理图像之间的匹配度,也即用于指示一个候选物品与待处理图像的搭配效果图的审美评分。In some embodiments of the present application, the server can generate a target score corresponding to each of the N candidate items through a second neural network, where a target score indicates the matching degree between a candidate item and the image to be processed, That is, it is used to indicate the aesthetic score of the matching effect of a candidate item and the image to be processed.
为了更直观地理解本方案,请参阅图8,图8为本申请实施例提供的物品的搭配方法中目标评分的一种示意图。图8包括(a)、(b)和(c)三个子示意图,图8的(a)子示意图示出的为待处理图像中的三个物品;图8的(b)子示意图中示出的候选物品为沙发一,沙发一和待处理图像的搭配效果图的得分为0.956分;图8的(c)子示意图中示出的候选物品为沙发二,沙发二和待处理图像的搭配效果图的得分为0.425分。则代表沙发一和待处理图像之间的匹配度比沙发二和待处理图像之间的匹配度更高,应理解,图8中的示例仅为方便理解本方案,不用于限定本方案。In order to understand this solution more intuitively, please refer to Figure 8 , which is a schematic diagram of the target score in the item matching method provided by the embodiment of the present application. Figure 8 includes three sub-schematic diagrams (a), (b) and (c). The sub-schematic diagram (a) of Figure 8 shows the three items in the image to be processed; the sub-schematic diagram (b) of Figure 8 shows The candidate item is sofa one, and the score of the matching effect diagram of sofa one and the image to be processed is 0.956 points; the candidate item shown in the sub-schematic diagram (c) of Figure 8 is sofa two, and the matching effect of sofa two and the image to be processed is The score of the graph is 0.425 points. It means that the matching degree between sofa 1 and the image to be processed is higher than the matching degree between sofa 2 and the image to be processed. It should be understood that the example in Figure 8 is only for convenience of understanding this solution and is not used to limit this solution.
具体的,在一种实现方式中,服务器可以将每个候选物品的特征信息和目标特征信息输入第二神经网络,得到第二神经网络输出的每个候选物品所对应的目标评分,服务器对N个候选物品中的每个候选物品均执行前述操作,则可以生成N个候选物品中每个候选物品所对应的目标评分。Specifically, in one implementation, the server can input the feature information and target feature information of each candidate item into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates N By performing the foregoing operations on each of the candidate items, a target score corresponding to each of the N candidate items can be generated.
在另一种实现方式中,服务器也可以将每个候选物品的图像和待处理图像输入第二神经网络,得到第二神经网络输出的每个候选物品所对应的目标评分,服务器对N个候选物 品中的每个候选物品均执行前述操作,则可以生成每个候选物品所对应的目标评分。In another implementation, the server can also input the image of each candidate item and the image to be processed into the second neural network to obtain the target score corresponding to each candidate item output by the second neural network, and the server evaluates the N candidates thing If each candidate item in the product performs the above operations, the target score corresponding to each candidate item can be generated.
在另一种实现方式中,服务器也可以将每个候选物品的图像、每个候选物品的语义标签、待处理图像以及待处理图像中的物品的语义标签输入第二神经网络,得到第二神经网络输出的每个候选物品所对应的目标评分。In another implementation, the server can also input the image of each candidate item, the semantic label of each candidate item, the image to be processed, and the semantic label of the item in the image to be processed into the second neural network to obtain the second neural network. The target score corresponding to each candidate item output by the network.
其中,第二神经网络可以表现为卷积神经网络或其他类型的神经网络。待处理图像中的物品的语义标签也可以称为待处理图像中的物品的特征信息。候选物品的语义标签可以包括候选物品的至少一种属性信息,作为示例,例如候选物品的语义标签可以包括如下任一项或多项:候选物品的类别、候选物品的风格、候选物品的形状或候选物品的其它属性等等,此处不做穷举。The second neural network may be a convolutional neural network or other types of neural networks. The semantic labels of the items in the image to be processed can also be called the feature information of the items in the image to be processed. The semantic label of the candidate item may include at least one attribute information of the candidate item. As an example, the semantic label of the candidate item may include any one or more of the following: the category of the candidate item, the style of the candidate item, the shape of the candidate item, or Other attributes of candidate items, etc., are not exhaustive here.
为了更直观地理解本方案,请参阅图9,图9为本申请实施例提供的物品的搭配方法中第二神经网络的一种示意图。如图9所示,服务器将每个候选物品的图像和每个候选物品的语义标签输入第二神经网络后,通过第二神经网络对候选物品的图像进行特征提取,得到候选物品的图像的特征信息,并对候选物品的语义标签进行特征提取,得到候选物品的语义标签的特征信息;服务器通过第二神经网络将候选物品的图像的特征信息和候选物品的语义标签的特征信息进行融合,并对融合后的特征信息进行卷积,得到候选物品所对应的特征信息。In order to understand this solution more intuitively, please refer to Figure 9 , which is a schematic diagram of the second neural network in the item matching method provided by the embodiment of the present application. As shown in Figure 9, after the server inputs the image of each candidate item and the semantic label of each candidate item into the second neural network, it performs feature extraction on the image of the candidate item through the second neural network to obtain the characteristics of the image of the candidate item. information, and performs feature extraction on the semantic labels of the candidate items to obtain the feature information of the semantic labels of the candidate items; the server fuses the feature information of the image of the candidate items and the feature information of the semantic labels of the candidate items through the second neural network, and Convolve the fused feature information to obtain the feature information corresponding to the candidate items.
服务器将待处理图像和待处理图像中物品的语义标签输入第二神经网络后,通过第二神经网络对待处理图像进行特征提取,得到待处理图像的特征信息,并对待处理图像中物品的语义标签进行特征提取,得到待处理图像中物品的语义标签的特征信息;服务器通过第二神经网络将待处理图像的特征信息和语义标签的特征信息尽享融合,并对融合后的特征信息进行卷积,得到待处理图像所对应的特征信息。After the server inputs the image to be processed and the semantic labels of the items in the image to be processed into the second neural network, it performs feature extraction on the image to be processed through the second neural network, obtains the feature information of the image to be processed, and obtains the semantic labels of the items in the image to be processed. Perform feature extraction to obtain the feature information of the semantic tag of the item in the image to be processed; the server fuses the feature information of the image to be processed and the feature information of the semantic tag through the second neural network, and convolves the fused feature information , obtain the feature information corresponding to the image to be processed.
如图9所示,服务器根据候选物品所对应的特征信息以及待处理图像所对应的特征信息,通过第二神经网络执行上述乘积、融合等操作后,输出一个候选物品和待处理图像之前搭配效果的一个目标评分。应理解,图9中的示例仅为方便理解本方案,不用于限定本方案。As shown in Figure 9, based on the characteristic information corresponding to the candidate item and the characteristic information corresponding to the image to be processed, the server performs the above-mentioned multiplication, fusion and other operations through the second neural network, and then outputs a matching effect of the candidate item and the image to be processed. a target score. It should be understood that the example in Figure 9 is only for convenience of understanding this solution and is not used to limit this solution.
针对第二神经网络的训练阶段。具体的,训练设备上可以存储有训练数据集合,每个训练数据可以包括待处理图像、待处理图像中的物品的特征信息、至少两个候选物品的图像以及每个候选物品所对应的语义标签,与训练数据对应的期望结果为前述至少两个候选物品中与待处理图像最适配的一个物品。For the training phase of the second neural network. Specifically, a training data set may be stored on the training device, and each training data may include an image to be processed, feature information of items in the image to be processed, images of at least two candidate items, and semantic labels corresponding to each candidate item. , the expected result corresponding to the training data is the one of the aforementioned at least two candidate items that is most suitable for the image to be processed.
训练设备可以将待处理图像、待处理图像中的物品的特征信息、每个候选物品的图像以及每个候选物品的图像所对应的语义标签组成一组目标数据,则训练设备可以得到与至少两个候选物品一一对应的至少两组目标数据。The training device can form a set of target data by combining the image to be processed, the characteristic information of the items in the image to be processed, the image of each candidate item, and the semantic label corresponding to the image of each candidate item. Then the training device can obtain a set of target data related to at least two At least two sets of target data corresponding one-to-one to each candidate item.
训练设备将每组目标数据输入第二神经网络,得到第二神经网络输出的一个目标评分;训练设备通过第二神经网络对至少两组目标数据中每组目标数据均执行前述操作,则可以得到与至少两组目标数据一一对应的至少两个目标评分,也即得到了与至少两个候选物品一一对应的至少两个目标评分。The training device inputs each set of target data into the second neural network to obtain a target score output by the second neural network; the training device performs the aforementioned operations on each set of at least two sets of target data through the second neural network, then we can obtain At least two target scores are in one-to-one correspondence with at least two sets of target data, that is, at least two target scores are obtained in one-to-one correspondence with at least two candidate items.
训练设备根据上述至少两个目标评分,从至少两个候选物品中选取与待处理图像最适 配的一个物品,将前述选取的一个物品作为与训练数据对应的预测结果。The training device selects the most suitable item from at least two candidate items according to the at least two target scores mentioned above and the image to be processed. An item is matched, and the previously selected item is used as the prediction result corresponding to the training data.
训练设备根据与训练数据对应的预测结果和期望结果,生成损失函数的函数值,并反向更新第二神经网络的权重参数,从而完成了对第二神经网络的一次训练。训练设备利用训练数据集合中的多个数据对第二神经网络进行迭代训练,直至满足收敛条件,得到训练后的第二神经网络。The training device generates the function value of the loss function based on the predicted results and expected results corresponding to the training data, and reversely updates the weight parameters of the second neural network, thus completing a training of the second neural network. The training device uses multiple data in the training data set to iteratively train the second neural network until the convergence condition is met, and the trained second neural network is obtained.
308、服务器获取与目标类别对应的K个目标物品,每个目标物品均为目标类别。308. The server obtains K target items corresponding to the target category, and each target item is a target category.
本申请实施例中,步骤306和307均为可选步骤,若执行步骤306和307,则步骤308可以包括:服务器根据与N个候选物品对应的目标评分,从N个候选物品中选取K个目标物品,K为大于或等于1的整数。其中,目标评分越高的候选物品,被选中的概率越大。In the embodiment of the present application, steps 306 and 307 are both optional steps. If steps 306 and 307 are executed, step 308 may include: the server selects K items from the N candidate items based on the target scores corresponding to the N candidate items. Target item, K is an integer greater than or equal to 1. Among them, the candidate item with a higher target score has a greater probability of being selected.
本申请实施例中,通过神经网络生成与N个候选物品对应的目标评分,目标评分指示候选物品与待处理图像之间的匹配度;并根据每个候选物品与待处理图像之间的匹配度,从N个候选物品中选取最终展示给用户的目标物品。也即对候选物品与待处理图像的搭配美感进行量化评分,在选取目标物品的过程中考虑了搭配效果图美感,从而向用户提供的目标物品与待处理图像的搭配效果图会比较好看,有利于提高本方案的用户粘度。In the embodiment of the present application, a target score corresponding to N candidate items is generated through a neural network. The target score indicates the matching degree between the candidate item and the image to be processed; and based on the matching degree between each candidate item and the image to be processed , select the target item that is finally displayed to the user from N candidate items. That is to say, the beauty of the matching of the candidate item and the image to be processed is quantitatively scored, and the beauty of the matching rendering is taken into consideration in the process of selecting the target item, so that the matching rendering of the target item and the image to be processed is provided to the user, which will be better-looking. It will help improve the user stickiness of this program.
若不执行步骤306和307,则服务器也可以直接从物品库中,直接获取与目标类别对应的K个目标物品,每个目标物品的类别均为目标类别所指示的目标类别。If steps 306 and 307 are not performed, the server can also directly obtain K target items corresponding to the target category from the item library, and the category of each target item is the target category indicated by the target category.
309、服务器向客户设备发送目标物品的信息。309. The server sends the information of the target item to the client device.
本申请实施例中,服务器在获取到与目标类别对应的K个目标物品之后,可以获取K个目标物品中每个目标物品信息,并向客户设备发送每个目标物品的信息。In this embodiment of the present application, after acquiring K target items corresponding to the target category, the server may acquire the information of each target item among the K target items, and send the information of each target item to the client device.
其中,每个目标物品的信息可以包括目标物品所对应的图像;可选地,每个目标物品的信息还可以包括如下任一种或多种信息:目标物品的访问链接、名称、价格、目标评分或物品的其它类型的信息等,此处不做限定。Among them, the information of each target item may include the image corresponding to the target item; optionally, the information of each target item may also include any one or more of the following information: access link, name, price, target of the target item Ratings or other types of information about items are not limited here.
进一步地,目标物品所对应的图像可以为目标物品自身的图像;也可以是服务器采用神经网络生成的目标物品和待处理图像的搭配效果图。前述搭配效果图可以为纯图像格式、VR建模后的效果图、AR建模后的效果图或其他格式等等,此处不做限定。Further, the image corresponding to the target item may be the image of the target item itself; it may also be a matching rendering of the target item and the image to be processed generated by the server using a neural network. The aforementioned matching renderings can be in pure image format, renderings after VR modeling, renderings after AR modeling, or other formats, etc., and are not limited here.
为更直观地理解本方案,请参阅图10,图10为本申请实施例提供的物品的搭配方法中目标物品和待处理图像的搭配效果图的一种示意图。如图10所示,左侧的子示意图示出的为待处理图像,右侧的两个子示意图分别示出的两个不同的目标物品和待处理图像的两个搭配效果图,应理解,图10中的示例仅为方便理解本方案,不用于限定本方案。To understand this solution more intuitively, please refer to Figure 10. Figure 10 is a schematic diagram of the matching effect diagram of the target item and the image to be processed in the item matching method provided by the embodiment of the present application. As shown in Figure 10, the sub-schematic diagram on the left shows the image to be processed, and the two sub-schematic diagrams on the right respectively show two different target items and two matching renderings of the image to be processed. It should be understood that Figure The examples in 10 are only for the convenience of understanding this solution and are not used to limit this solution.
310、客户设备向用户展示与一种目标类别对应的K个目标物品。310. The client device displays K target items corresponding to one target category to the user.
本申请实施例中,客户设备在获取到服务器发送的K个目标物品中每个物品的信息之后,会向用户展示与该一个目标类别对应的K个目标物品。In this embodiment of the present application, after acquiring the information of each of the K target items sent by the server, the client device will display the K target items corresponding to the one target category to the user.
具体的,客户设备可以向用户展示每个目标物品所对应的图像;目标物品所对应的图像可以为目标物品的图像,也可以为每个目标物品与待处理图像的搭配效果图;对于搭配效果图的展示方式的进一步理解可以参阅上一步骤中的描述,此处不做赘述。Specifically, the client device can show the user the image corresponding to each target item; the image corresponding to the target item can be an image of the target item, or a matching effect diagram of each target item and the image to be processed; for the matching effect For a further understanding of the display method of the picture, please refer to the description in the previous step and will not be repeated here.
本申请实施例中,客户设备可以向用户展示每个目标类别的物品与待处理图像的搭配效果图,从而用户可以更直观地体会到目标类别的物品应用于待处理图像中的搭配效果, 有利于提高本方案的用户粘度。In the embodiment of the present application, the client device can display to the user the matching effect diagram of the items of each target category and the image to be processed, so that the user can more intuitively experience the matching effect of the items of the target category applied to the image to be processed. It will help improve the user stickiness of this program.
可选地,客户设备还可以向用户展示每个目标物品的如下任一种或多种信息:每个目标类别的物品的访问链接、名称、价格、目标评分或物品的其它类型的信息等,此处不做限定。Optionally, the client device can also display any one or more of the following information about each target item to the user: access links, names, prices, target ratings or other types of information about the items in each target category, etc., There are no limitations here.
为了更直观地理解本方案,请参阅图11,图11为本申请实施例提供的物品的搭配方法的一种流程示意图。如图11所示,客户设备在获取到用户输入的待处理图像和文本信息(也即图11中的墙壁装饰)之后,向用户展示3个候选意图,分别为图11中的装饰画、挂件和灯饰,客户设备基于用户对“装饰画”这一候选意图的选择操作,向服务器发送反馈信息,前述反馈信息用于指示服务器目标类别为“装饰画”。In order to understand this solution more intuitively, please refer to FIG. 11 , which is a schematic flowchart of a method for matching items provided by an embodiment of the present application. As shown in Figure 11, after obtaining the image and text information to be processed input by the user (that is, the wall decoration in Figure 11), the client device displays three candidate intentions to the user, namely the decorative paintings and pendants in Figure 11 and lighting, the client device sends feedback information to the server based on the user's selection operation of the candidate intention "decorative painting", and the aforementioned feedback information is used to instruct the server that the target category is "decorative painting".
服务器基于“装饰画”这一目标类别,向客户设备发送了两个不同的装饰画(也即目标物品)的信息。每个装饰画的信息包括装饰画和待处理图像的搭配效果图、装饰画的名称、装饰画的价格和装饰画的尺寸,应理解,图11中的示例是从客户设备的角度展示了物品的搭配方法的实现流程,图11中的示例仅为方便理解本方案,不用于限定本方案。Based on the target category "decorative painting", the server sends information about two different decorative paintings (ie, target items) to the client device. The information of each decorative painting includes the matching renderings of the decorative painting and the image to be processed, the name of the decorative painting, the price of the decorative painting, and the size of the decorative painting. It should be understood that the example in Figure 11 shows the item from the perspective of the customer's device The implementation process of the matching method. The example in Figure 11 is only for convenience of understanding this solution and is not used to limit this solution.
(二)特征提取和确定目标类别这两个部分均由客户端主导(2) Feature extraction and target category determination are both dominated by the client.
本申请实施例中,请参阅图12,图12为本申请实施例提供的物品的搭配方法的一种流程示意图,本申请实施例提供的物品的搭配方法可以包括:In the embodiment of the present application, please refer to Figure 12. Figure 12 is a schematic flowchart of a method of matching items provided by an embodiment of the present application. The method of matching items provided by an embodiment of the present application may include:
1201、客户设备获取用户输入的待处理图像。1201. The client device obtains the image to be processed input by the user.
1202、客户设备获取用户输入的目标文本信息,该目标文本信息用于指示用户的搜索意图。1202. The client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
1203、客户设备将待处理图像输入第三神经网络,以通过第三神经网络对待处理图像进行特征提取,得到与待处理图像对应的目标特征信息,目标特征信息包括待处理图像中的至少物品的特征信息和待处理图像的特征信息。1203. The client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed. The target feature information includes at least the characteristics of the items in the image to be processed. Feature information and feature information of the image to be processed.
1204、客户设备将文本信息输入第四神经网络,以通过第四神经网络对文本信息进行特征提取,得到文本信息的特征信息。1204. The client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
1205、客户设备基于待处理图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别。1205. Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the client device obtains a target category that has a matching relationship with the image to be processed through the first neural network.
本申请实施例中,步骤1201至1205的具体实现方式可以参阅图3对应实施例中步骤301至305的描述,区别在于,图3对应实施例中,步骤303至305由服务器执行,图12对应实施例中,步骤1203至1205由客户设备执行,此处不做赘述。In the embodiment of this application, for the specific implementation of steps 1201 to 1205, please refer to the description of steps 301 to 305 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 to 305 are executed by the server, while Figure 12 corresponds to In the embodiment, steps 1203 to 1205 are executed by the client device, and will not be described again here.
1206、客户设备将目标类别发送给服务器。1206. The client device sends the target category to the server.
1207、服务器获取N个候选物品,每个候选物品均为目标类别。1207. The server obtains N candidate items, each of which is a target category.
1208、服务器通过第二神经网络生成与N个候选物品对应的目标评分,目标评分指示候选物品与待处理图像之间的匹配度。1208. The server generates target scores corresponding to the N candidate items through the second neural network. The target scores indicate the matching degree between the candidate items and the image to be processed.
1209、服务器获取与目标类别对应的K个目标物品,每个目标物品均为目标类别。1209. The server obtains K target items corresponding to the target category, and each target item is a target category.
1210、服务器向客户设备发送目标物品的信息。1210. The server sends the information of the target item to the client device.
1211、客户设备向用户展示与一种目标类别对应的K个目标物品。1211. The client device displays K target items corresponding to one target category to the user.
本申请实施例中,步骤1207至1211的具体实现方式可以参阅图3对应实施例中步骤 306至310的描述,此处不做赘述。In the embodiment of this application, for the specific implementation of steps 1207 to 1211, please refer to the steps in the corresponding embodiment in Figure 3 The description of 306 to 310 will not be repeated here.
(三)特征提取操作由客户端执行,且确定目标类别这一部分由服务器主导(3) The feature extraction operation is performed by the client, and the part of determining the target category is led by the server
本申请实施例中,请参阅图13,图13为本申请实施例提供的物品的搭配方法的一种流程示意图,本申请实施例提供的物品的搭配方法可以包括:In the embodiment of the present application, please refer to Figure 13. Figure 13 is a schematic flowchart of a method of matching items provided by an embodiment of the present application. The method of matching items provided by an embodiment of the present application may include:
1301、客户设备获取用户输入的待处理图像。1301. The client device obtains the image to be processed input by the user.
1302、客户设备获取用户输入的目标文本信息,该目标文本信息用于指示用户的搜索意图。1302. The client device obtains the target text information input by the user, and the target text information is used to indicate the user's search intention.
1303、客户设备将待处理图像输入第三神经网络,以通过第三神经网络对待处理图像进行特征提取,得到与待处理图像对应的目标特征信息,目标特征信息包括待处理图像中的物品的特征信息和待处理图像的特征信息。1303. The client device inputs the image to be processed into the third neural network to perform feature extraction on the image to be processed through the third neural network to obtain target feature information corresponding to the image to be processed. The target feature information includes features of the items in the image to be processed. information and feature information of the image to be processed.
1304、客户设备将文本信息输入第四神经网络,以通过第四神经网络对文本信息进行特征提取,得到文本信息的特征信息。1304. The client device inputs the text information into the fourth neural network to extract features of the text information through the fourth neural network to obtain feature information of the text information.
本申请实施例中,步骤1301至1304的具体实现方式可以参阅图3对应实施例中步骤301至304的描述,区别在于,图3对应实施例中,步骤303和304由服务器执行,图13对应实施例中,步骤1303和1304由客户设备执行,此处不做赘述。In the embodiment of this application, for the specific implementation of steps 1301 to 1304, please refer to the description of steps 301 to 304 in the corresponding embodiment of Figure 3. The difference is that in the corresponding embodiment of Figure 3, steps 303 and 304 are executed by the server, while Figure 13 corresponds to In the embodiment, steps 1303 and 1304 are executed by the client device, and will not be described again here.
客户设备可以将目标特征信息发送给服务器;可选地,客户设备将目标特征信息和文本信息的特征信息发送给服务器。The client device may send the target feature information to the server; optionally, the client device sends the target feature information and the feature information of the text information to the server.
1305、服务器基于待处理图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与待处理图像具有匹配关系的一种目标类别。1305. Based on the characteristic information of the image to be processed and the characteristic information of at least two items, the server obtains a target category that has a matching relationship with the image to be processed through the first neural network.
1306、服务器获取与目标类别对应的N个候选物品,每个候选物品均为目标类别。1306. The server obtains N candidate items corresponding to the target category, and each candidate item is the target category.
1307、服务器通过第二神经网络生成与N个候选物品对应的目标评分,目标评分指示候选物品与待处理图像之间的匹配度。1307. The server generates target scores corresponding to the N candidate items through the second neural network. The target scores indicate the matching degree between the candidate items and the image to be processed.
1308、服务器获取与目标类别对应的K个目标物品,每个目标物品均为目标类别。1308. The server obtains K target items corresponding to the target category, and each target item is a target category.
1309、服务器向客户设备发送目标物品的信息。1309. The server sends the information of the target item to the client device.
1310、客户设备向用户展示与一种目标类别对应的K个目标物品。1310. The client device displays K target items corresponding to one target category to the user.
本申请实施例中,步骤1305至1310的具体实现方式可以参阅图3对应实施例中步骤305至310的描述,此处不做赘述。In the embodiment of this application, for the specific implementation of steps 1305 to 1310, please refer to the description of steps 305 to 310 in the corresponding embodiment in Figure 3, and will not be described again here.
本申请实施例中,用户可以提供想要搜索的物品所使用的场景的图像(也即上述待处理图像),则可以通过第一神经网络获取与整个待处理图像具有搭配关系的一种目标类别,进而向用户展示目标类别的物品;通过前述方案,用户不仅可以通过提供待处理图像来搜索想要搭配的物品,且当用户输入的是复杂的待处理图像(也即包括至少两个物品的图像)时,依旧能够获取到与整个待处理图像具有匹配关系的一种目标类别的物品,大大扩展了本方案的应用场景,有利于提高本方案的用户粘度;此外,基于整个待处理图像的特征信息和待处理图像中物品的特征信息,确定与整个待处理图像具有搭配关系的一种目标类别,也即不仅考虑了整个待处理图像的信息,还充分考虑了待处理图像中的每个物体,有利于提高确定的目标类别的准确度。 In the embodiment of the present application, the user can provide an image of the scene used by the item to be searched (that is, the above-mentioned image to be processed), and then a target category that has a matching relationship with the entire image to be processed can be obtained through the first neural network , and then display the items of the target category to the user; through the above solution, the user can not only search for the items they want to match by providing the image to be processed, but also when the user inputs a complex image to be processed (that is, including at least two items) image), it is still possible to obtain a target category of items that has a matching relationship with the entire image to be processed, which greatly expands the application scenarios of this solution and is conducive to improving the user stickiness of this solution; in addition, based on the entire image to be processed The characteristic information and the characteristic information of the items in the image to be processed determine a target category that has a matching relationship with the entire image to be processed, that is, not only the information of the entire image to be processed is considered, but also each item in the image to be processed is fully considered. Object, which helps to improve the accuracy of the determined target category.
在图1a至图13所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图14,图14为本申请实施例提供的物品的搭配装置的一种结构示意图,物品的搭配装置1400应用于物品的搭配系统中的客户设备,物品的搭配系统还包括服务器,物品的搭配装置1400包括:获取模块1401,用于获取用户输入的图像,中存在背景与至少两个物品;接收模块1402,用于接收服务器发送的与图像具有搭配关系的一种目标类别的物品,目标类别的物品为服务器基于图像的特征信息和至少两个物品的特征信息得到的;展示模块1403,用于展示目标类别的物品。On the basis of the embodiments corresponding to Figures 1a to 13, in order to better implement the above solutions of the embodiments of the present application, relevant equipment for implementing the above solutions is also provided below. Specifically referring to Figure 14, Figure 14 is a schematic structural diagram of an item matching device provided by an embodiment of the present application. The item matching device 1400 is applied to the client device in the item matching system. The item matching system also includes a server. The matching device 1400 includes: an acquisition module 1401, which is used to acquire an image input by a user, in which there is a background and at least two items; a receiving module 1402, which is used to receive a target category of items sent by the server that has a matching relationship with the image. The items of the category are obtained by the server based on the feature information of the image and the feature information of at least two items; the display module 1403 is used to display the items of the target category.
在一种可能的设计中,图像的特征信息包括由背景和至少两个物品构成的整体的特征信息,至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案。In a possible design, the characteristic information of the image includes the overall characteristic information composed of the background and at least two items. The characteristic information of the at least two items includes attribute information of each item. The attribute information of each item includes the following Any one or more types of information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
在一种可能的设计中,接收模块1402,还用于接收服务器发送的与图像对应的M个候选意图,M为大于或等于2的整数,每个候选意图指示与图像具有搭配关系的一种类别的物品;展示模块1403,还用于展示M个候选意图;获取模块1401,还用于获取与M个候选意图对应的反馈操作,并根据针对M个候选意图的反馈操作,确定与图像具有搭配关系的一种目标类别。In one possible design, the receiving module 1402 is also used to receive M candidate intentions corresponding to the image sent by the server. M is an integer greater than or equal to 2. Each candidate intention indicates a type that has a collocation relationship with the image. category of items; the display module 1403 is also used to display M candidate intentions; the acquisition module 1401 is also used to obtain the feedback operations corresponding to the M candidate intentions, and determine the characteristics of the image based on the feedback operations for the M candidate intentions. A target category of collocation relationships.
在一种可能的设计中,展示模块1403,具体用于展示目标类别的物品与图像的搭配效果图。In one possible design, the display module 1403 is specifically used to display the matching renderings of items and images of the target category.
需要说明的是,物品的搭配装置1400中各模块/单元之间的信息交互、执行过程等内容,与本申请中图2b至图13对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the modules/units in the item matching device 1400 are based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application. For specific content, please refer to this article. The descriptions in the method embodiments shown above will not be repeated here.
请参阅图15,图15为本申请实施例提供的物品的搭配装置的一种结构示意图,物品的搭配装置1500应用于物品的搭配系统中的服务器,物品的搭配系统还包括客户设备,物品的搭配装置1500包括:获取模块1501,用于基于图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与图像具有搭配关系的一种目标类别的物品,其中,图像中存在背景和至少两个物品;发送模块1502,用于向客户设备发送目标类别的物品。Please refer to Figure 15. Figure 15 is a schematic structural diagram of an item matching device provided by an embodiment of the present application. The item matching device 1500 is applied to the server in the item matching system. The item matching system also includes client equipment. The matching device 1500 includes: an acquisition module 1501, configured to acquire a target category of items that has a matching relationship with the image through a first neural network based on the feature information of the image and the feature information of at least two items, where there is a background in the image and at least two items; a sending module 1502 configured to send items of the target category to the client device.
在一种可能的设计中,图像的特征信息包括由背景和至少两个物品构成的整体的特征信息,至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案。In a possible design, the characteristic information of the image includes the overall characteristic information composed of the background and at least two items. The characteristic information of the at least two items includes attribute information of each item. The attribute information of each item includes the following Any one or more types of information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
在一种可能的设计中,获取模块1501,具体用于:In one possible design, the acquisition module 1501 is specifically used for:
基于图像的特征信息和至少两个物品的特征信息,通过第一神经网络生成与图像对应的M个候选意图,M为大于或等于2的整数,每个候选意图指示与图像具有搭配关系的一种类别的物品;向客户设备发送M个候选意图,M个候选意图用于供客户设备得到与图像具有搭配关系的一种目标类别;接收客户设备发送的目标类别。Based on the characteristic information of the image and the characteristic information of at least two items, M candidate intentions corresponding to the image are generated through the first neural network, M is an integer greater than or equal to 2, and each candidate intention indicates a matching relationship with the image. items of various categories; sends M candidate intentions to the client device, and the M candidate intentions are used for the client device to obtain a target category that has a matching relationship with the image; receives the target category sent by the client device.
在一种可能的设计中,获取模块1501,具体用于:In one possible design, the acquisition module 1501 is specifically used for:
通过第一神经网络获取与图像具有搭配关系的N个候选物品,每个候选物品均为目标类别,N为大于1的整数;通过第二神经网络生成与N个候选物品对应的评分,评分指示候选物品与图像之间的匹配度;根据与N个候选物品对应的评分,从N个候选物品中选取 K个目标物品,K为大于或等于1的整数;发送模块,具体用于向客户设备发送K个目标物品。Through the first neural network, N candidate items that have a matching relationship with the image are obtained. Each candidate item is a target category, and N is an integer greater than 1; through the second neural network, scores corresponding to the N candidate items are generated, and the scoring instructions are Matching degree between candidate items and images; select from N candidate items based on the scores corresponding to N candidate items K target items, K is an integer greater than or equal to 1; the sending module is specifically used to send K target items to the client device.
需要说明的是,物品的搭配装置1500中各模块/单元之间的信息交互、执行过程等内容,与本申请中图2b至图13对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the modules/units in the item matching device 1500 are based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application. For specific content, please refer to this article. The descriptions in the method embodiments shown above will not be repeated here.
接下来介绍本申请实施例提供的一种客户设备,请参阅图16,图16为本申请实施例提供的客户设备的一种结构示意图,客户设备1600具体可以表现为手机、平板、笔记本电脑、智能穿戴设备、智能机器人或智能家居等,此处不做限定。具体的,客户设备1600包括:接收器1601、发射器1602、处理器1603和存储器1604(其中客户设备1600中的处理器1603的数量可以一个或多个,图16中以一个处理器为例),其中,处理器1603可以包括应用处理器16031和通信处理器16032。在本申请的一些实施例中,接收器1601、发射器1602、处理器1603和存储器1604可通过总线或其它方式连接。Next, a client device provided by an embodiment of the present application is introduced. Please refer to Figure 16. Figure 16 is a schematic structural diagram of a client device provided by an embodiment of the present application. The client device 1600 can be embodied as a mobile phone, a tablet, a notebook computer, Smart wearable devices, smart robots or smart homes, etc. are not limited here. Specifically, the client device 1600 includes: a receiver 1601, a transmitter 1602, a processor 1603 and a memory 1604 (the number of processors 1603 in the client device 1600 can be one or more, one processor is taken as an example in Figure 16) , wherein the processor 1603 may include an application processor 16031 and a communication processor 16032. In some embodiments of the present application, the receiver 1601, the transmitter 1602, the processor 1603, and the memory 1604 may be connected by a bus or other means.
存储器1604可以包括只读存储器和随机存取存储器,并向处理器1603提供指令和数据。存储器1604的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1604存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。Memory 1604 may include read-only memory and random access memory and provides instructions and data to processor 1603 . A portion of memory 1604 may also include non-volatile random access memory (NVRAM). The memory 1604 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
处理器1603控制客户设备的操作。具体的应用中,客户设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。Processor 1603 controls the operation of the client device. In a specific application, various components of the customer equipment are coupled together through a bus system. In addition to the data bus, the bus system may also include a power bus, a control bus, a status signal bus, etc. However, for the sake of clarity, various buses are called bus systems in the figure.
上述本申请实施例揭示的方法可以应用于处理器1603中,或者由处理器1603实现。处理器1603可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1603中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1603可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1603可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1604,处理器1603读取存储器1604中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application can be applied to the processor 1603 or implemented by the processor 1603. The processor 1603 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1603 . The above-mentioned processor 1603 can be a general processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and can further include an application specific integrated circuit (ASIC), a field programmable Gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 1603 can implement or execute each method, step and logical block diagram disclosed in the embodiment of this application. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 1604. The processor 1603 reads the information in the memory 1604 and completes the steps of the above method in combination with its hardware.
接收器1601可用于接收输入的数字或字符信息,以及产生与客户设备的相关设置以及功能控制有关的信号输入。发射器1602可用于通过第一接口输出数字或字符信息;发射器1602还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1602还 可以包括显示屏等显示设备。The receiver 1601 may be used to receive input numeric or character information and generate signal inputs related to relevant settings and functional controls of the client device. The transmitter 1602 can be used to output numeric or character information through the first interface; the transmitter 1602 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1602 also Can include display devices such as display screens.
本申请实施例中,处理器1603,用于执行图2b至图13对应实施例中的客户设备执行的物品的搭配方法。具体的,应用处理器16031,用于获取用户输入的图像,中存在背景与至少两个物品;接收服务器发送的与图像具有搭配关系的一种目标类别的物品,目标类别的物品为服务器基于图像的特征信息和至少两个物品的特征信息得到的;展示目标类别的物品。In the embodiment of the present application, the processor 1603 is used to execute the item matching method executed by the client device in the corresponding embodiment of FIG. 2b to FIG. 13 . Specifically, the application processor 16031 is used to obtain an image input by the user, in which there is a background and at least two items; receive a target category of items sent by the server that has a matching relationship with the image, and the target category of items is the server's image based on The characteristic information of the item and the characteristic information of at least two items are obtained; items of the target category are displayed.
需要说明的是,应用处理器16031执行上述各个步骤的具体方式,与本申请中图2b至图13对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图2b至图13对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the specific manner in which the application processor 16031 performs the above steps is based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application, and the technical effects it brings are the same as those in Figures 2b to 13 in this application. 13 corresponds to the same method embodiments. For details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.
本申请实施例还提供了一种服务器,请参阅图17,图17是本申请实施例提供的服务器一种结构示意图,具体的,服务器1700由一个或多个服务器实现,服务器1700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1722(例如,一个或一个以上处理器)和存储器1732,一个或一个以上存储应用程序1742或数据1744的存储介质1730(例如一个或一个以上海量存储设备)。其中,存储器1732和存储介质1730可以是短暂存储或持久存储。存储在存储介质1730的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1722可以设置为与存储介质1730通信,在服务器1700上执行存储介质1730中的一系列指令操作。The embodiment of the present application also provides a server. Please refer to Figure 17. Figure 17 is a schematic structural diagram of the server provided by the embodiment of the present application. Specifically, the server 1700 is implemented by one or more servers. The server 1700 can be configured or There is a relatively large difference due to different performance, which may include one or more central processing units (CPU) 1722 (for example, one or more processors) and memory 1732, and one or more storage applications 1742 or data 1744 storage medium 1730 (eg, one or more mass storage devices). Among them, the memory 1732 and the storage medium 1730 may be short-term storage or persistent storage. The program stored in the storage medium 1730 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server. Furthermore, the central processor 1722 may be configured to communicate with the storage medium 1730 and execute a series of instruction operations in the storage medium 1730 on the server 1700 .
服务器1700还可以包括一个或一个以上电源1726,一个或一个以上有线或无线网络接口1750,一个或一个以上输入输出接口1758,和/或,一个或一个以上操作系统1741,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。Server 1700 may also include one or more power supplies 1726, one or more wired or wireless network interfaces 1750, one or more input and output interfaces 1758, and/or, one or more operating systems 1741, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
本申请实施例中,中央处理器1722,用于执行图2b至图13对应实施例中的服务器执行的物品的搭配方法。具体的,中央处理器1722,用于基于图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与图像具有搭配关系的一种目标类别的物品,其中,图像中存在背景和至少两个物品;向客户设备发送目标类别的物品。In this embodiment of the present application, the central processing unit 1722 is used to execute the item matching method executed by the server in the corresponding embodiment of FIGS. 2b to 13 . Specifically, the central processor 1722 is configured to obtain a target category of items that has a collocation relationship with the image through the first neural network based on the feature information of the image and the feature information of at least two items, where there are background and At least two items; items of the target category are sent to the client device.
需要说明的是,中央处理器1722执行上述各个步骤的具体方式,与本申请中图2b至图13对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图2b至图13对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the specific manner in which the central processor 1722 performs the above steps is based on the same concept as the various method embodiments corresponding to Figures 2b to 13 in this application, and the technical effects it brings are the same as those in Figures 2b to 13 in this application. 13 corresponds to the same method embodiments. For details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.
本申请实施例中还提供一种计算机程序产品,计算机程序产品包括程序,当该程序在计算机上运行时,使得计算机执行如前述图2b至图13所示实施例描述的方法中客户设备所执行的步骤,或者,使得计算机执行如前述图2b至图13所示实施例描述的方法中服务器所执行的步骤。Embodiments of the present application also provide a computer program product. The computer program product includes a program. When the program is run on a computer, it causes the computer to execute the methods executed by the client device in the methods described in the embodiments shown in Figures 2b to 13. or, causing the computer to perform the steps performed by the server in the method described in the embodiments shown in FIGS. 2b to 13 .
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当该程序在计算机上运行时,使得计算机执行如前述图2b至图13所示实施例描述的方法中客户设备所执行的步骤,或者,使得计算机执行如前述图2b至图13所示实施例描 述的方法中服务器所执行的步骤。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a program. When the program is run on a computer, it causes the computer to execute the foregoing description of the embodiments shown in Figures 2b to 13. The steps performed by the client device in the method, or causing the computer to perform the steps described in the embodiments shown in Figures 2b to 13 The steps performed by the server in the method described above.
本申请实施例提供的客户设备、服务器或物品的搭配装置具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图2b至图13所示实施例描述的物品的搭配方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The client device, server or item matching device provided by the embodiment of the present application may specifically be a chip. The chip includes: a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface. , pins or circuits, etc. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip executes the matching method of items described in the embodiments shown in FIGS. 2b to 13 . Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
具体的,请参阅图18,图18为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 180,NPU 180作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1803,通过控制器1804控制运算电路1803提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to Figure 18. Figure 18 is a structural schematic diagram of a chip provided by an embodiment of the present application. The chip can be represented as a neural network processor NPU 180. The NPU 180 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU. The core part of the NPU is the arithmetic circuit 1803. The arithmetic circuit 1803 is controlled by the controller 1804 to extract the matrix data in the memory and perform multiplication operations.
在一些实现中,运算电路1803内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1803是二维脉动阵列。运算电路1803还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1803是通用的矩阵处理器。In some implementations, the computing circuit 1803 includes multiple processing units (Process Engine, PE). In some implementations, arithmetic circuit 1803 is a two-dimensional systolic array. The arithmetic circuit 1803 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1803 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1802中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1801中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1808中。For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1802 and caches it on each PE in the arithmetic circuit. The operation circuit takes matrix A data and matrix B from the input memory 1801 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1808 .
统一存储器1806用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1805,DMAC被搬运到权重存储器1802中。输入数据也通过DMAC被搬运到统一存储器1806中。The unified memory 1806 is used to store input data and output data. The weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1805, and the DMAC is transferred to the weight memory 1802. Input data is also transferred to unified memory 1806 via DMAC.
BIU为Bus Interface Unit即,总线接口单元1810,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1809的交互。BIU is the Bus Interface Unit, that is, the bus interface unit 1810, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1809.
总线接口单元1810(Bus Interface Unit,简称BIU),用于取指存储器1809从外部存储器获取指令,还用于存储单元访问控制器1805从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1810 (Bus Interface Unit, BIU for short) is used to fetch the memory 1809 to obtain instructions from the external memory, and is also used for the storage unit access controller 1805 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1806或将权重数据搬运到权重存储器1802中或将输入数据数据搬运到输入存储器1801中。DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1806 or the weight data to the weight memory 1802 or the input data to the input memory 1801 .
向量计算单元1807包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1807 includes multiple arithmetic processing units, and if necessary, further processes the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. Mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.
在一些实现中,向量计算单元1807能将经处理的输出的向量存储到统一存储器1806。例如,向量计算单元1807可以将线性函数和/或非线性函数应用到运算电路1803的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在 一些实现中,向量计算单元1807生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1803的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, vector calculation unit 1807 can store the processed output vectors to unified memory 1806 . For example, the vector calculation unit 1807 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1803, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value. exist In some implementations, vector calculation unit 1807 generates normalized values, pixel-wise summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1803, such as for use in a subsequent layer in a neural network.
控制器1804连接的取指存储器(instruction fetch buffer)1809,用于存储控制器1804使用的指令;The instruction fetch buffer 1809 connected to the controller 1804 is used to store instructions used by the controller 1804;
统一存储器1806,输入存储器1801,权重存储器1802以及取指存储器1809均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1806, the input memory 1801, the weight memory 1802 and the fetch memory 1809 are all On-Chip memories. External memory is private to the NPU hardware architecture.
其中,图2b至图13对应的方法实施例中示出的第一神经网络、第二神经网络、第三神经网络和第四神经网络中各层的运算可以由运算电路1803或向量计算单元1807执行。Among them, the operations of each layer in the first neural network, the second neural network, the third neural network and the fourth neural network shown in the method embodiments corresponding to Figures 2b to 13 can be performed by the operation circuit 1803 or the vector calculation unit 1807 implement.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。The processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate. The physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the device embodiments provided in this application, the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, training device, or network device, etc.) to execute the steps described in various embodiments of this application. method.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据 中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。 The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, the computer instructions may be transferred from a website, computer, training device, or data The center transmits to another website site, computer, training equipment or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or may be a training device, data, or data integrated with one or more available media. Center and other data storage equipment. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Claims (23)

  1. 一种物品的搭配方法,其特征在于,所述方法包括:A method for matching items, characterized in that the method includes:
    获取用户输入的图像,所述图像中存在背景与至少两个物品;Obtain the image input by the user, and there is a background and at least two items in the image;
    基于所述图像的特征信息和所述至少两个物品的特征信息,通过第一神经网络获取与所述图像具有搭配关系的一种目标类别的物品;Based on the characteristic information of the image and the characteristic information of the at least two items, obtain an item of a target category that has a matching relationship with the image through a first neural network;
    展示所述目标类别的物品。Display items of the stated target category.
  2. 根据权利要求1所述的方法,其特征在于,所述图像的特征信息包括由所述背景和所述至少两个物品构成的整体的特征信息,所述至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案。The method according to claim 1, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of an item. The attribute information of each item includes any one or more of the following information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  3. 根据权利要求1或2所述的方法,其特征在于,所述基于所述图像的特征信息和所述至少两个物品的特征信息,通过第一神经网络获取与所述图像具有搭配关系的一种目标类别的物品,包括:The method according to claim 1 or 2, characterized in that, based on the characteristic information of the image and the characteristic information of the at least two items, a first neural network is used to obtain a matching relationship with the image. Target categories of items, including:
    基于所述图像的特征信息和所述至少两个物品的特征信息,通过所述第一神经网络生成与所述图像对应的M个候选意图,所述M为大于或等于2的整数,每个所述候选意图指示与所述图像具有搭配关系的一种类别的物品;Based on the characteristic information of the image and the characteristic information of the at least two items, M candidate intentions corresponding to the image are generated through the first neural network, where M is an integer greater than or equal to 2, each The candidate intent indicates a category of items that has a collocation relationship with the image;
    展示所述M个候选意图,以获取与所述M个候选意图对应的反馈操作;Display the M candidate intentions to obtain feedback operations corresponding to the M candidate intentions;
    根据针对所述M个候选意图的反馈操作,确定与所述待处理图像具有搭配关系的所述一种目标类别,并获取所述目标类别的物品。According to the feedback operation for the M candidate intentions, the one target category that has a matching relationship with the image to be processed is determined, and items of the target category are obtained.
  4. 根据权利要求1或2所述的方法,其特征在于,所述通过第一神经网络获取与所述图像具有搭配关系的一种目标类别的物品,包括:The method according to claim 1 or 2, characterized in that, using the first neural network to obtain a target category of items that has a matching relationship with the image includes:
    通过所述第一神经网络获取与所述图像具有搭配关系的N个候选物品,每个所述候选物品均为所述目标类别,所述N为大于1的整数;Obtain N candidate items that have a matching relationship with the image through the first neural network, each of the candidate items is the target category, and the N is an integer greater than 1;
    通过第二神经网络生成与所述N个候选物品对应的评分,所述评分指示所述候选物品与所述图像之间的匹配度;Generate scores corresponding to the N candidate items through a second neural network, the scores indicating the matching degree between the candidate items and the image;
    根据与所述N个候选物品对应的评分,从所述N个候选物品中选取K个目标物品,所述K为大于或等于1的整数;Select K target items from the N candidate items according to the scores corresponding to the N candidate items, where K is an integer greater than or equal to 1;
    所述展示所述目标类别的物品包括:展示所述K个目标物品。The displaying the items of the target category includes: displaying the K target items.
  5. 根据权利要求1或2所述的方法,其特征在于,所述展示所述目标类别的物品,包括:展示所述目标类别的物品与所述图像的搭配效果图。The method according to claim 1 or 2, characterized in that displaying the items of the target category includes: displaying a matching effect diagram of the items of the target category and the image.
  6. 一种物品的搭配方法,其特征在于,所述方法应用于物品的搭配系统中的客户设备,所述物品的搭配系统还包括服务器,所述方法包括:A method for matching items, characterized in that the method is applied to client devices in a system for matching items, and the system for matching items also includes a server, and the method includes:
    获取用户输入的图像,所述图像中存在背景与至少两个物品;Obtain the image input by the user, and there is a background and at least two items in the image;
    接收所述服务器发送的与所述图像具有搭配关系的一种目标类别的物品,所述目标类别的物品为所述服务器基于所述图像的特征信息和所述至少两个物品的特征信息得到的;Receive a target category of items sent by the server that has a matching relationship with the image. The target category of items is obtained by the server based on the feature information of the image and the feature information of the at least two items. ;
    展示所述目标类别的物品。Display items of the stated target category.
  7. 根据权利要求6所述的方法,其特征在于,所述图像的特征信息包括由所述背景和 所述至少两个物品构成的整体的特征信息,所述至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案。The method according to claim 6, characterized in that the feature information of the image includes the background and The characteristic information of the whole composed of the at least two items, the characteristic information of the at least two items includes attribute information of each item, and the attribute information of each item includes any one or more of the following information: category of the item, The color of the item, the style of the item, the material of the item, or the pattern of the item.
  8. 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:The method according to claim 6 or 7, characterized in that, the method further includes:
    接收所述服务器发送的与所述图像对应的M个候选意图,并展示所述M个候选意图,所述M为大于或等于2的整数,每个所述候选意图指示与所述图像具有搭配关系的一种类别的物品;Receive M candidate intents corresponding to the image sent by the server, and display the M candidate intents, where M is an integer greater than or equal to 2, and each of the candidate intent indications has a match with the image. A category of items related to;
    获取与所述M个候选意图对应的反馈操作,并根据针对所述M个候选意图的反馈操作,确定与所述图像具有搭配关系的所述一种目标类别。Feedback operations corresponding to the M candidate intentions are obtained, and based on the feedback operations for the M candidate intentions, the one target category that has a matching relationship with the image is determined.
  9. 一种物品的搭配方法,其特征在于,所述方法应用于物品的搭配系统中的服务器,所述物品的搭配系统还包括客户设备,所述方法包括:A method of matching items, characterized in that the method is applied to a server in a matching system of items, and the matching system of items also includes client equipment, and the method includes:
    基于图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与所述图像具有搭配关系的一种目标类别的物品,其中,所述图像中存在背景和所述至少两个物品;Based on the feature information of the image and the feature information of at least two items, a first neural network is used to obtain an item of a target category that has a matching relationship with the image, wherein the background and the at least two items are present in the image. ;
    向所述客户设备发送所述目标类别的物品的信息。Information about items of the target category is sent to the client device.
  10. 根据权利要求9所述的方法,其特征在于,所述图像的特征信息包括由所述背景和所述至少两个物品构成的整体的特征信息,所述至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案。The method according to claim 9, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of an item. The attribute information of each item includes any one or more of the following information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  11. 根据权利要求9或10所述的方法,其特征在于,所述基于所述图像的特征信息和所述至少两个物品的特征信息,通过第一神经网络获取与所述图像具有搭配关系的一种目标类别的物品,包括:The method according to claim 9 or 10, characterized in that, based on the characteristic information of the image and the characteristic information of the at least two items, a first neural network is used to obtain a matching relationship with the image. Target categories of items, including:
    基于所述图像的特征信息和所述至少两个物品的特征信息,通过所述第一神经网络生成与所述图像对应的M个候选意图,所述M为大于或等于2的整数,每个所述候选意图指示与所述图像具有搭配关系的一种类别的物品;Based on the characteristic information of the image and the characteristic information of the at least two items, M candidate intentions corresponding to the image are generated through the first neural network, where M is an integer greater than or equal to 2, each The candidate intent indicates a category of items that has a collocation relationship with the image;
    向所述客户设备发送所述M个候选意图,所述M个候选意图用于供所述客户设备得到与所述图像具有搭配关系的所述一种目标类别;Send the M candidate intentions to the client device, where the M candidate intentions are used for the client device to obtain the one target category that has a matching relationship with the image;
    接收所述客户设备发送的所述目标类别。Receive the target category sent by the client device.
  12. 一种物品的搭配装置,其特征在于,所述装置应用于物品的搭配系统中的客户设备,所述物品的搭配系统还包括服务器,所述装置包括:An item matching device, characterized in that the device is applied to client equipment in an item matching system, the item matching system also includes a server, and the device includes:
    获取模块,用于获取用户输入的图像,所述图像中存在背景与至少两个物品;An acquisition module, used to acquire an image input by the user, in which there is a background and at least two items;
    接收模块,用于接收所述服务器发送的与所述图像具有搭配关系的一种目标类别的物品,所述目标类别的物品为所述服务器基于所述图像的特征信息和所述至少两个物品的特征信息得到的;A receiving module, configured to receive a target category of items sent by the server that has a matching relationship with the image. The target category of items is the server based on the characteristic information of the image and the at least two items. Characteristic information obtained;
    展示模块,用于展示所述目标类别的物品。Display module, used to display items of the target category.
  13. 根据权利要求12所述的装置,其特征在于,所述图像的特征信息包括由所述背景和所述至少两个物品构成的整体的特征信息,所述至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、 物品的风格、物品的材质或物品的图案。The device according to claim 12, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of items. The attribute information of each item includes any one or more of the following information: category of item, color of item, The style of the item, the material of the item, or the pattern of the item.
  14. 根据权利要求12或13所述的装置,其特征在于,The device according to claim 12 or 13, characterized in that,
    所述接收模块,还用于接收所述服务器发送的与所述图像对应的M个候选意图,所述M为大于或等于2的整数,每个所述候选意图指示与所述图像具有搭配关系的一种类别的物品;The receiving module is also configured to receive M candidate intentions corresponding to the image sent by the server, where M is an integer greater than or equal to 2, and each of the candidate intention indications has a matching relationship with the image. a category of items;
    所述展示模块,还用于展示所述M个候选意图;The display module is also used to display the M candidate intentions;
    所述获取模块,还用于获取与所述M个候选意图对应的反馈操作,并根据针对所述M个候选意图的反馈操作,确定与所述图像具有搭配关系的所述一种目标类别。The acquisition module is also configured to acquire feedback operations corresponding to the M candidate intentions, and determine the one target category that has a collocation relationship with the image based on the feedback operations for the M candidate intentions.
  15. 根据权利要求12或13所述的装置,其特征在于,The device according to claim 12 or 13, characterized in that,
    所述展示模块,具体用于展示所述目标类别的物品与所述图像的搭配效果图。The display module is specifically used to display the matching renderings of items of the target category and the image.
  16. 一种物品的搭配装置,其特征在于,所述装置应用于物品的搭配系统中的服务器,所述物品的搭配系统还包括客户设备,所述装置包括:An item matching device, characterized in that the device is applied to a server in an item matching system, the item matching system also includes client equipment, and the device includes:
    获取模块,用于基于图像的特征信息和至少两个物品的特征信息,通过第一神经网络获取与所述图像具有搭配关系的一种目标类别的物品,其中,所述图像中存在背景和所述至少两个物品;An acquisition module, configured to acquire, through a first neural network, a target category of items that has a collocation relationship with the image based on the feature information of the image and the feature information of at least two items, wherein the background and all items are present in the image. At least two items mentioned above;
    发送模块,用于向所述客户设备发送所述目标类别的物品的信息。A sending module, configured to send information about items of the target category to the client device.
  17. 根据权利要求16所述的装置,其特征在于,所述图像的特征信息包括由所述背景和所述至少两个物品构成的整体的特征信息,所述至少两个物品的特征信息包括每个物品的属性信息,每个物品的属性信息包括如下任一种或多种信息:物品的类别、物品的颜色、物品的风格、物品的材质或物品的图案。The device according to claim 16, characterized in that the characteristic information of the image includes the characteristic information of the whole composed of the background and the at least two items, and the characteristic information of the at least two items includes each Attribute information of an item. The attribute information of each item includes any one or more of the following information: the category of the item, the color of the item, the style of the item, the material of the item, or the pattern of the item.
  18. 根据权利要求16或17所述的装置,其特征在于,所述获取模块,具体用于:The device according to claim 16 or 17, characterized in that the acquisition module is specifically used for:
    基于所述图像的特征信息和所述至少两个物品的特征信息,通过所述第一神经网络生成与所述图像对应的M个候选意图,所述M为大于或等于2的整数,每个所述候选意图指示与所述图像具有搭配关系的一种类别的物品;Based on the characteristic information of the image and the characteristic information of the at least two items, M candidate intentions corresponding to the image are generated through the first neural network, where M is an integer greater than or equal to 2, each The candidate intent indicates a category of items that has a collocation relationship with the image;
    向所述客户设备发送所述M个候选意图,所述M个候选意图用于供所述客户设备得到与所述图像具有搭配关系的所述一种目标类别;Send the M candidate intentions to the client device, where the M candidate intentions are used for the client device to obtain the one target category that has a matching relationship with the image;
    接收所述客户设备发送的所述目标类别。Receive the target category sent by the client device.
  19. 根据权利要求16或17所述的装置,其特征在于,所述获取模块,具体用于:The device according to claim 16 or 17, characterized in that the acquisition module is specifically used for:
    通过所述第一神经网络获取与所述图像具有搭配关系的N个候选物品,每个所述候选物品均为所述目标类别,所述N为大于1的整数;Obtain N candidate items that have a matching relationship with the image through the first neural network, each of the candidate items is the target category, and the N is an integer greater than 1;
    通过第二神经网络生成与所述N个候选物品对应的评分,所述评分指示所述候选物品与所述图像之间的匹配度;Generate scores corresponding to the N candidate items through a second neural network, the scores indicating the matching degree between the candidate items and the image;
    根据与所述N个候选物品对应的评分,从所述N个候选物品中选取K个目标物品,所述K为大于或等于1的整数;Select K target items from the N candidate items according to the scores corresponding to the N candidate items, where K is an integer greater than or equal to 1;
    所述发送模块,具体用于向所述客户设备发送所述K个目标物品。The sending module is specifically configured to send the K target items to the client device.
  20. 一种计算机程序产品,其特征在于,所述计算机程序产品包括程序,当所述程序在计算机上运行时,使得计算机执行如权利要求1至5中任一项所述的方法,或者,使得计 算机执行如权利要求6至8中任一项所述的方法,或者,使得计算机执行如权利要求9至11中任一项所述的方法。A computer program product, characterized in that the computer program product includes a program that, when the program is run on a computer, causes the computer to execute the method according to any one of claims 1 to 5, or causes the computer to The computer performs the method according to any one of claims 6 to 8, or the computer is caused to perform the method according to any one of claims 9 to 11.
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序,当所述程序在计算机上运行时,使得计算机执行如权利要求1至5中任一项所述的方法,或者,使得计算机执行如权利要求6至8中任一项所述的方法,或者,使得计算机执行如权利要求9至11中任一项所述的方法。A computer-readable storage medium, characterized in that a program is stored in the computer-readable storage medium. When the program is run on a computer, it causes the computer to execute the method described in any one of claims 1 to 5. The method, or causes the computer to perform the method as described in any one of claims 6 to 8, or causes the computer to perform the method as described in any one of claims 9 to 11.
  22. 一种客户设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,A client device, characterized by including a processor and a memory, the processor being coupled to the memory,
    所述存储器,用于存储程序;The memory is used to store programs;
    所述处理器,用于执行所述存储器中的程序,使得所述客户设备执行如权利要求6至8中任一项所述的方法。The processor is configured to execute a program in the memory, so that the client device executes the method according to any one of claims 6 to 8.
  23. 一种服务器,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,A server, characterized by comprising a processor and a memory, the processor being coupled to the memory,
    所述存储器,用于存储程序;The memory is used to store programs;
    所述处理器,用于执行所述存储器中的程序,使得所述服务器执行如权利要求9至11中任一项所述的方法。 The processor is configured to execute a program in the memory, so that the server executes the method according to any one of claims 9 to 11.
PCT/CN2023/084241 2022-03-31 2023-03-28 Article matching method and related device WO2023185787A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210333006.5A CN116932804A (en) 2022-03-31 2022-03-31 Matching method of articles and related equipment
CN202210333006.5 2022-03-31

Publications (1)

Publication Number Publication Date
WO2023185787A1 true WO2023185787A1 (en) 2023-10-05

Family

ID=88199121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084241 WO2023185787A1 (en) 2022-03-31 2023-03-28 Article matching method and related device

Country Status (2)

Country Link
CN (1) CN116932804A (en)
WO (1) WO2023185787A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095362A (en) * 2015-06-25 2015-11-25 深圳码隆科技有限公司 Image display method and device based on target object
CN109583514A (en) * 2018-12-19 2019-04-05 成都西纬科技有限公司 A kind of image processing method, device and computer storage medium
CN110909746A (en) * 2018-09-18 2020-03-24 深圳云天励飞技术有限公司 Clothing recommendation method, related device and equipment
CN111401306A (en) * 2020-04-08 2020-07-10 青岛海尔智能技术研发有限公司 Method, device and equipment for recommending clothes putting on
US20210303914A1 (en) * 2020-11-11 2021-09-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Clothing collocation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095362A (en) * 2015-06-25 2015-11-25 深圳码隆科技有限公司 Image display method and device based on target object
CN110909746A (en) * 2018-09-18 2020-03-24 深圳云天励飞技术有限公司 Clothing recommendation method, related device and equipment
CN109583514A (en) * 2018-12-19 2019-04-05 成都西纬科技有限公司 A kind of image processing method, device and computer storage medium
CN111401306A (en) * 2020-04-08 2020-07-10 青岛海尔智能技术研发有限公司 Method, device and equipment for recommending clothes putting on
US20210303914A1 (en) * 2020-11-11 2021-09-30 Beijing Baidu Netcom Science And Technology Co., Ltd. Clothing collocation

Also Published As

Publication number Publication date
CN116932804A (en) 2023-10-24

Similar Documents

Publication Publication Date Title
WO2021238631A1 (en) Article information display method, apparatus and device and readable storage medium
US10032072B1 (en) Text recognition and localization with deep learning
US9875258B1 (en) Generating search strings and refinements from an image
US11232324B2 (en) Methods and apparatus for recommending collocating dress, electronic devices, and storage media
US10346893B1 (en) Virtual dressing room
US9607010B1 (en) Techniques for shape-based search of content
US9990557B2 (en) Region selection for image match
US20180181569A1 (en) Visual category representation with diverse ranking
CN114391160A (en) Hand pose estimation from stereo camera
US9830534B1 (en) Object recognition
US20190012717A1 (en) Appratus and method of providing online sales information of offline product in augmented reality
CN110249304A (en) The Visual intelligent management of electronic equipment
CN110348572A (en) The processing method and processing device of neural network model, electronic equipment, storage medium
US11475500B2 (en) Device and method for item recommendation based on visual elements
US10776417B1 (en) Parts-based visual similarity search
WO2021097750A1 (en) Human body posture recognition method and apparatus, storage medium, and electronic device
US10379721B1 (en) Interactive interfaces for generating annotation information
Zhou et al. A lightweight hand gesture recognition in complex backgrounds
CN111414915B (en) Character recognition method and related equipment
CN112905889A (en) Clothing searching method and device, electronic equipment and medium
US20210166058A1 (en) Image generation method and computing device
KR102444498B1 (en) System and method for providing image-based service to sell and buy product
WO2022179603A1 (en) Augmented reality method and related device thereof
Magassouba et al. Predicting and attending to damaging collisions for placing everyday objects in photo-realistic simulations
CN113627421A (en) Image processing method, model training method and related equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23778152

Country of ref document: EP

Kind code of ref document: A1