WO2022156525A1 - 对象匹配方法、装置及设备 - Google Patents

对象匹配方法、装置及设备 Download PDF

Info

Publication number
WO2022156525A1
WO2022156525A1 PCT/CN2022/070030 CN2022070030W WO2022156525A1 WO 2022156525 A1 WO2022156525 A1 WO 2022156525A1 CN 2022070030 W CN2022070030 W CN 2022070030W WO 2022156525 A1 WO2022156525 A1 WO 2022156525A1
Authority
WO
WIPO (PCT)
Prior art keywords
objects
features
information
image
matching
Prior art date
Application number
PCT/CN2022/070030
Other languages
English (en)
French (fr)
Inventor
赵成
Original Assignee
北京沃东天骏信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022156525A1 publication Critical patent/WO2022156525A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to an object matching method, apparatus and device.
  • an object has title information
  • the title information includes a plurality of keywords for describing the object.
  • the title information can be used to determine the matching relationship between objects. Specifically, for object A and object B, the similarity between the title information of object A and the title information of object B is calculated, and if the similarity is greater than or equal to a preset threshold, it is determined that object A and object B match each other.
  • the present application provides an object matching method, apparatus and device, which are used to improve the accuracy of object matching results.
  • an embodiment of the present application provides an object matching method, including:
  • the matching relationship between the at least two objects is determined according to the graphic and text features of the at least two objects.
  • the graphic and text features of the object are obtained, including:
  • the characteristics of each of the regions of interest are obtained, including:
  • the feature of the region of interest is acquired according to the first feature information and the second feature information.
  • the features of each of the regions of interest and the features of each of the characters are fused to obtain the graphic features of the object, including:
  • the features of each of the regions of interest and the features of each of the characters are projected and embedded into different dimensions of the same feature vector to obtain the graphic features of the object;
  • the at least one embedding method includes one or more of the following: language embedding, segment embedding and sequence embedding.
  • determining the matching relationship between the at least two objects according to the graphic and text features of the at least two objects including:
  • the matching degree is greater than or equal to a preset threshold, it is determined that the first object and the second object match, and if the matching degree is less than the preset threshold, it is determined that the first object and the second object are matched. The two objects do not match;
  • the first object and the second object are any two objects among the at least two objects.
  • the method further includes:
  • the method further includes:
  • the text description information includes at least one of object title information and object attribute information.
  • an object matching device including:
  • a first acquisition module configured to acquire object information of at least two objects, wherein the object information of one object includes the image and text description information of the object;
  • the second acquisition module is used for, for each of the objects, to acquire the graphic and text features of the object according to the image and text description information of the object;
  • a determination module configured to determine the matching relationship between the at least two objects according to the graphic and text features of the at least two objects.
  • the second acquisition module is specifically used for:
  • the second acquisition module is specifically used for:
  • the feature of the region of interest is acquired according to the first feature information and the second feature information.
  • the second acquisition module is specifically used for:
  • the features of each of the regions of interest and the features of each of the characters are projected and embedded into different dimensions of the same feature vector to obtain the graphic features of the object;
  • the at least one embedding method includes one or more of the following: language embedding, segment embedding and sequence embedding.
  • the determining module is specifically used for:
  • the matching degree is greater than or equal to a preset threshold, it is determined that the first object and the second object match, and if the matching degree is less than the preset threshold, it is determined that the first object and the second object are matched. The two objects do not match;
  • the first object and the second object are any two objects among the at least two objects.
  • the determining module is further used for:
  • the determining module is further used for:
  • the text description information includes at least one of object title information and object attribute information.
  • an embodiment of the present application provides an electronic device, including: a memory and a processor, where the memory is used to store a computer program, and the processor executes the computer program to implement any one of the first aspects. method.
  • an embodiment of the present application provides a computer-readable storage medium, including: a computer program, which implements the method according to any one of the first aspect when the computer program is executed by a processor.
  • an embodiment of the present application provides a computer program product, including: a computer program, which implements the method according to any one of the first aspects when the computer program is executed by a processor.
  • Embodiments of the present application provide an object matching method, device, and device.
  • the method includes: acquiring object information of at least two objects, wherein the object information of one object includes image and text description information of the object; for each object, The graphic features of the object are acquired according to the image and text description information of the object, and then, the matching relationship between the at least two objects is determined according to the graphic features of each object.
  • FIG. 1 is a schematic diagram of a possible application scenario provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of another possible application scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an object matching method provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a method for acquiring graphic and text features of an object provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of a target detection process provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an image-text feature extraction model provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an object matching process provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another object matching method provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an object matching apparatus provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the matching relationship of objects refers to whether multiple objects are the same object (objects of the same paragraph), or whether multiple objects are similar objects.
  • the "object” in the embodiments of the present application refers to something existing in the objective world and the network world. Objects can be tangible objects, intangible objects, real objects, or virtual objects.
  • an object has title information
  • the title information includes a plurality of keywords for describing the object.
  • the title information can be used to determine the matching relationship between objects. Specifically, for object A and object B, the similarity between the title information of object A and the title information of object B is calculated, and if the similarity is greater than or equal to a preset threshold, it is determined that object A and object B match each other.
  • a preset threshold a preset threshold
  • the embodiments of the present application provide an object matching method, apparatus, and device, which aim to solve the above-mentioned technical problems.
  • the embodiments of the present application can be used to match objects with image and text description information.
  • object information of at least two objects is acquired, wherein the object information of one object includes the image and text description information of the object; for each object, the object information of the object is acquired according to the image and text description information of the object.
  • Graphical and textual features and then, according to the graphic and textual features of each object, the matching relationship between the at least two objects is determined.
  • objects can refer to users.
  • objects can refer to products.
  • the object may refer to a commodity. This embodiment does not limit this.
  • FIG. 1 is a schematic diagram of a possible application scenario provided by an embodiment of the present application.
  • the application scenario includes: user equipment and e-commerce server.
  • the user equipment and the e-commerce server are connected through a network.
  • the e-commerce server is provided with a commodity database and a commodity matching engine.
  • An e-commerce client is installed in the user device, and the user can access the e-commerce server through the e-commerce client.
  • the user equipment needs to search for a certain commodity, it sends a search request to the e-commerce server.
  • the e-commerce server matches the search request with the commodities in the commodity database through the commodity matching engine to obtain search results, which may include one or more target commodities.
  • the e-commerce server returns the search result to the user device.
  • the user equipment and the e-commerce server adopt an interactive design, the user inputs the product to be searched, and the e-commerce server obtains the target product that meets the user's needs from the product database through the product matching method and recommends it to the user, so that the user can quickly Get what you need.
  • FIG. 2 is a schematic diagram of another possible application scenario provided by an embodiment of the present application.
  • the application scenario includes: a first e-commerce server and a second e-commerce server.
  • the first e-commerce server and the second e-commerce server are connected through a network.
  • the first e-commerce server may acquire commodities provided by the second e-commerce server through interaction with the second e-commerce server.
  • the first e-commerce server may use a web crawling technology to obtain the commodities provided by the second e-commerce server by crawling the web page content provided by the second e-commerce server.
  • the first e-commerce server can use the commodity matching engine to perform matching processing between the commodities obtained from the second e-commerce server and the commodities in its own commodity database, so as to obtain the commodities provided by the first e-commerce server and the second e-commerce server.
  • the first e-commerce server can determine which commodities in the first e-commerce server and which commodities in the second commodity server are the same or similar commodities, thereby establishing the first e-commerce server The matching relationship between the commodity in the server and the commodity in the second e-commerce server.
  • the e-commerce server can be implemented by an independent server or a server cluster composed of multiple physical servers.
  • the e-commerce server can also adopt a distributed architecture.
  • the e-commerce server can also be a cloud server.
  • FIG. 1 and FIG. 2 are only two possible illustrations, and there may be other more application scenarios, which are not limited in this embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an object matching method provided by an embodiment of the present application. As shown in Figure 3, the method of this embodiment includes:
  • S301 Acquire object information of at least two objects, wherein the object information of one object includes image and text description information of the object.
  • the at least two objects are objects whose matching relationship is to be determined.
  • a matching relationship may be determined for two objects, or a matching relationship may be determined for a larger number of objects.
  • the object information refers to some information used to describe the object.
  • the object of this embodiment has image and text description information. Therefore, the object information of an object includes the image and text description information of the object.
  • the text description information may include one or more sentences, or one or more keywords.
  • the number of images each object has can be one or more.
  • FIG. 4 is a schematic diagram of object information provided by an embodiment of the present application.
  • the object shown in Figure 4 is a commodity.
  • the commodity has one or more images, and the commodity also has text description information.
  • the textual description information may include, but is not limited to, product title, product attribute information, and the like.
  • the commodity attribute information includes, but is not limited to, color attributes, shape attributes, size attributes, material attributes, and the like. It can be seen that the commodity information of the commodity shown in FIG. 4 includes: commodity image, commodity title, commodity attribute information and so on.
  • S302 For each of the objects, acquire the graphic and text features of the object according to the image and text description information of the object.
  • the image and text features of the object can be obtained by extracting features from the image and textual description information of the object.
  • the image and text description information of the object is input into the feature extraction model to obtain the image and text features of the object.
  • the graphic and text features of the object include both features extracted from an image of the object and features extracted from textual description information of the object. That is, in this embodiment, the multimodal features of the object are extracted.
  • the matching relationship between objects is generally determined according to the similarity between the title information of the objects.
  • the title information of some commodities such as clothing commodities
  • the matching result is inaccurate.
  • the matching result can be improved. accuracy and can improve the recall rate of matching.
  • S303 Determine the matching relationship between the at least two objects according to the graphic and text features of the at least two objects.
  • the matching relationship between the objects can be determined according to the matching degree between the graphic and text features of the objects.
  • the matching relationship between the two objects may indicate whether the two objects are the same object (or the same item), or indicate whether the two objects are similar objects.
  • the arbitrary two objects are respectively referred to as the first object and the second object, and the first object can be determined in the following manner:
  • the matching model may also directly output a binary result for indicating whether the first object and the second object match. For example, a matching model output of 1 indicates that the first object and the second object match, and an output of 0 indicates that the first object and the second object do not match.
  • the object matching method provided by this embodiment includes: acquiring object information of at least two objects, wherein the object information of one object includes the image and text description information of the object; for each object, according to the image and text description of the object The information obtains the graphic and text features of the object, and then, according to the graphic and text features of each object, the matching relationship between the at least two objects is determined.
  • the multi-modal feature of the object is used for matching, that is to say, both the text description information of the object and the image of the object are considered, so the accuracy of the matching result can be improved, and the matching recall.
  • FIG. 5 is a schematic flowchart of a method for acquiring graphic and text features of an object according to an embodiment of the present application. As shown in Figure 5, the method of this embodiment includes:
  • S501 Perform target detection on an image of an object to obtain at least one region of interest in the image, and obtain features of each region of interest respectively.
  • this embodiment considering that there may be a lot of interference information in the image of the object, for example, taking a commodity as an example, there may be interference information such as background and promotional text in the image of the commodity. Therefore, this embodiment does not directly perform feature extraction on the entire image, but first performs target detection on the image to obtain at least one region of interest (ROI), and then obtains the features of each region of interest separately, thereby obtaining at least one region of interest (ROI). Avoid the influence of interference information on matching results.
  • ROI region of interest
  • FIG. 6 is a schematic diagram of a target detection process provided by an embodiment of the present application.
  • the image of the object can be input into the target detection model, and the target detection model can perform target detection on the image to obtain at least one region of interest.
  • the target detection model uses a rectangular frame to mark two regions of interest in the image, and also identifies the category (Box Label) of each region of interest. For example, one region of interest has the category "shampoo" and another region of interest has the category "text”.
  • the target detection model can be a pre-trained machine learning model, which can use Faster-RCNN (full name: Faster Region Convolutional Neural Network), YOLO (full name: You Only Look Once), Mask R-CNN (full name: Mask Region Convolutional Neural Network) and other models, which are not limited in this embodiment.
  • Faster-RCNN full name: Faster Region Convolutional Neural Network
  • YOLO full name: You Only Look Once
  • Mask R-CNN full name: Mask Region Convolutional Neural Network
  • each region of interest can be obtained in the following manner:
  • the following linear mapping method may be used to extract features from the region of interest to obtain first feature information, which may also be referred to as an image embedding (Image Embedding) vector.
  • first feature information which may also be referred to as an image embedding (Image Embedding) vector.
  • f i represents the ith region of interest
  • vi represents the first feature information corresponding to the ith region of interest
  • W v is the slope of the linear mapping
  • b v is the intercept of the linear mapping.
  • the encoded five-dimensional vector may be used as the second feature information according to the position information of the region of interest in the image.
  • the second feature information may also be referred to as a position embedding (Position Embedding) vector.
  • the five-dimensional vector representation is as follows:
  • c i represents the second feature information of the i-th region of interest.
  • (x tl , y tl ), (x br , y br ) represent the coordinates of the upper left corner and the lower right corner of the rectangular frame of the i-th region of interest, respectively
  • W and H represent the width and height of the image, and in the above five-dimensional vector
  • the fifth component represents the ratio of the area of the region of interest to the area of the entire image.
  • the second feature information may be embedded into the first feature information to obtain the features of the region of interest.
  • the region of interest not only the image embedding vector of the region of interest, but also the position embedding vector of the region of interest is considered in the features of each region of interest, so that the region of interest can not only provide the language part with the visual image of the entire image Visual contexts can also be associated with specific terms through detailed location information, making the characterization of the region of interest more comprehensive.
  • the feature of each character in the text description information can be obtained according to a vector table (vocab).
  • the vector table is stored in the form of a file, and word vectors corresponding to different characters are recorded in the vector table.
  • word vectors corresponding to different characters are recorded in the vector table.
  • the word vector corresponding to each character in the text description information can be obtained.
  • the character features can be obtained from the word vector.
  • the text description information is input into the BERT (Bidirectional Encoder Representations from Transformers) model, and the BERT model can obtain the word vector, text vector and position vector of the text description information.
  • the word vector is obtained through the lookup vector table, that is, each character in the text description information is converted into its corresponding word vector through the lookup vector table.
  • the text vector is automatically learned during the model training process, and is used to describe the global semantic information of the text and fuse with the semantic information of the single character. Since there are differences in the semantic information carried by characters at different positions in the text description information, the BERT model attaches a different position vector to characters at different positions to distinguish them. Further, according to the above word vector, text vector and position vector, the BERT model outputs a vector representation of each character in the text description information fused with the full-text semantic information.
  • S503 Fusion of the features of each of the regions of interest and the features of each of the characters to obtain graphic and text features of the object.
  • the features of each region of interest and the features of each character can be projected into the same feature vector, and the finally obtained feature vector is the graphic feature of the object.
  • the fusion processing may be performed in the following manner: according to at least one embedding manner, the features of each of the regions of interest and the feature projections of each of the characters are embedded in different dimensions of the same feature vector, Obtain the graphic features of the object; wherein, the at least one embedding method includes one or more of the following: language embedding, segment embedding and sequence embedding.
  • FIG. 7 is a schematic diagram of an image-text feature extraction model provided by an embodiment of the present application.
  • the image and text feature extraction model can perform fusion processing on the features of each region of interest and the features of each character to obtain the image and text features of the object.
  • the image-text feature extraction model includes an image embedding layer, a position embedding layer, a language embedding layer, a segmentation embedding layer and a sequence embedding layer.
  • the input of the image-text feature extraction model includes: the text description information of the object and the image of the object (for example, the object "X shampoo” is taken as an example in Fig. 7, image 1 and image 2 are two images corresponding to X shampoo, where , the screen content in image 2 is omitted).
  • the output of the graphic feature extraction model is the graphic feature of the object.
  • the language embedding layer can process the text description information to obtain the characteristics of each character.
  • the image of the object is input into the image embedding layer, and the image embedding layer performs object detection on the image, obtains multiple regions of interest, and obtains the first feature information (image embedding vector) corresponding to each region of interest.
  • the second feature information position embedding vector corresponding to each region of interest is obtained through the position embedding layer.
  • the first feature information and the second feature information are embedded in the same dimension through the language embedding layer projection. In this way, regions of interest can not only provide visual contexts of the entire image for language parts, but can also be associated with specific terms through detailed location information.
  • e (i) represents the final feature vector of the i-th region of interest
  • v (i) represents the image embedding vector
  • s (i) represents the segment embedding vector
  • LN() represents the Layer Normalization process.
  • each image and each text description corresponds to a segment.
  • the image corresponds to segment A
  • the text description information corresponds to segment B.
  • the features of each region of interest in the image are projected and embedded in segment A, and the features of each character in the text description information are projected and embedded in segment B.
  • the information of the segmentation embedding layer reflects the source of the feature, which image or which text it comes from.
  • each region of interest in each image corresponds to the same sequence number.
  • the characters in each text are in a sequential order. Therefore, according to the sequential order of the characters in the text, each character corresponds to a serial number, and different characters correspond to different serial numbers. In this way, the information of the sequence embedding layer reflects the order of features.
  • both the features of the text description information and the features of the image are considered, that is, the multi-modal features of the object are considered, and the model expression ability is enhanced. Helps to improve the accuracy of object matching results and improve the recall rate of matching.
  • FIG. 8 is a schematic diagram of an object matching process provided by an embodiment of the present application.
  • object A and object B as an example, as shown in Figure 8, the image and text description information of object A are input into the graphic feature extraction model to obtain the graphic and text features of object A, and the image and text description information of object B are input.
  • the image and text feature extraction model the image and text features of object B are obtained.
  • the image and text feature extraction model may adopt the model shown in FIG. 7 .
  • the graphic features of object A and the graphic features of object B are input into the matching model, and the matching degree between object A and object B is obtained. If the matching degree is greater than or equal to the preset threshold, it is determined that object A and object B match, and if the matching degree is less than the preset threshold, it is determined that object A and object B do not match.
  • the matching model may include: one or more Transformer layers, one or more fully connected layers, an activation function layer, a batch normalization layer, and a loss function layer.
  • the loss function layer can use cross entropy loss or triple loss.
  • FIG. 9 is a schematic flowchart of another object matching method provided by an embodiment of the present application.
  • the matching process of the first object and the second object is used as an example for description.
  • the method of this embodiment includes:
  • S901 Acquire object information of a first object, and acquire object information of a second object, wherein the object information of each object includes an image and text description information of the object.
  • S902 Acquire the category of the first object, and acquire the category of the second object.
  • the category refers to the category to which the object belongs.
  • a category of objects can include one or more hierarchies.
  • the last-level category of the object may be acquired in S902.
  • e-commerce platforms are divided into multiple levels of categories. For example, mothers and infants are a first-level category, which includes multiple second-level categories such as milk powder, diapers, and feeding bottles.
  • the second-level category of milk powder also includes: infant milk powder, milk powder for pregnant women and other third-level categories.
  • S903 Determine whether the category of the first object and the category of the second object are the same.
  • S910 is executed to determine that the first object and the second object do not match.
  • the last-level category of the object can be obtained, and it is determined whether the last-level category of the two objects is the same, so as to narrow the matching range as much as possible.
  • the same categories described in this embodiment should be understood in a broad sense. , that is, the same category means that the categories are equivalent, rather than strictly identical.
  • the baby milk powder category of e-commerce platform A and the formula milk powder category of e-commerce platform B should be understood as the same category
  • the mobile phone category of e-commerce platform A and the digital communication category of e-commerce platform B should also be understood as same category.
  • the category of the second object in e-commerce platform B can be mapped to the category in e-commerce platform A first. Then, it is determined whether the category of the first object in the e-commerce platform A is the same as the category of the second object mapped to the e-commerce platform A.
  • the following two possible implementation methods can be adopted to map the category of the second object in the e-commerce platform B to the category in the e-commerce platform A:
  • each set of samples includes: a commodity from the e-commerce platform A and a commodity from the e-commerce platform B.
  • the categories of the two products in their respective e-commerce platforms are known. Matches are manually annotated for the two items in each set of samples.
  • the matching relationship between the two commodities in the sample the matching relationship between the categories to which the two commodities belong can be inferred, and then the category mapping relationship between the e-commerce platform A and the e-commerce platform B can be obtained.
  • the election method can be used in the inference process. Further, after determining the category mapping relationship between the e-commerce platform A and the e-commerce platform B, the category of the second object in the e-commerce platform B can be mapped to the e-commerce platform A according to the category mapping relationship. category.
  • features can be extracted from the text description information of each group of samples from the products of the e-commerce platform B, and the group of samples from the categories of the products of the e-commerce platform A can be extracted.
  • the category discrimination model is trained. Further, after the category discrimination model is obtained by training, the text description information of the second object is input into the category discrimination model, and the category discrimination model outputs the category of the second object in the e-commerce platform A.
  • S904 Obtain the brand attribute of the first object according to the object information of the first object, and obtain the brand attribute of the second object according to the object information of the second object.
  • the attribute information of each commodity usually includes a brand attribute. Therefore, the brand attribute can be obtained according to the attribute information of the commodity.
  • S905 Determine whether the brand attribute of the first object is the same as the brand attribute of the second object.
  • the subsequent matching process can be continued, which can narrow the matching scope, reduce the amount of calculation, and improve the matching efficiency.
  • S906 Acquire graphic and textual features of the first object according to the image and text description information of the first object, and acquire graphic and textual features of the second object according to the image and textual description information of the second object.
  • S907 Input the graphic feature of the first object and the graphic feature of the second image into the matching model, and obtain the matching degree between the first object and the second object.
  • S908 Determine whether the matching degree is greater than or equal to a preset threshold.
  • S909 Determine that the first object and the second object match.
  • S910 Determine that the first object and the second object do not match.
  • FIG. 10 is a schematic structural diagram of an object matching apparatus provided by an embodiment of the present application.
  • the apparatus of this embodiment may be in the form of software and/or hardware.
  • the object matching apparatus 1000 provided in this embodiment may include: a first obtaining module 1001 , a second obtaining module 1002 , and a determining module 1003 .
  • the first acquisition module 1001 is configured to acquire object information of at least two objects, wherein the object information of one object includes the image and text description information of the object;
  • the second acquisition module 1002 is configured to, for each of the objects, acquire the graphic and textual features of the object according to the image and text description information of the object;
  • the determining module 1003 is configured to determine the matching relationship between the at least two objects according to the graphic features of the at least two objects.
  • the second obtaining module 1002 is specifically used for:
  • the second obtaining module 1002 is specifically used for:
  • the feature of the region of interest is acquired according to the first feature information and the second feature information.
  • the second obtaining module 1002 is specifically used for:
  • the features of each of the regions of interest and the features of each of the characters are projected and embedded into different dimensions of the same feature vector to obtain the graphic features of the object;
  • the at least one embedding method includes one or more of the following: language embedding, segment embedding and sequence embedding.
  • the determining module 1003 is specifically used for:
  • the matching degree is greater than or equal to a preset threshold, it is determined that the first object and the second object match, and if the matching degree is less than the preset threshold, it is determined that the first object and the second object are matched. The two objects do not match;
  • the first object and the second object are any two objects among the at least two objects.
  • the determining module 1003 is further configured to:
  • the determining module 1003 is further configured to:
  • the text description information includes at least one of object title information and object attribute information.
  • the object matching apparatus provided in this embodiment can be used to execute the object matching method in any of the above method embodiments, and the implementation principle and technical effect thereof are similar, and are not repeated here.
  • FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 11 , the electronic device 1100 in this embodiment includes: a processor 1101 and a memory 1102 .
  • the memory 1102 is used for storing computer programs; the processor 1101 is used for executing the computer programs stored in the memory, so as to realize the object matching method in the above-mentioned embodiment.
  • the memory 1102 may be independent or integrated with the processor 1101 .
  • the electronic device 1100 may further include a communication component 1103 for communicating with other devices.
  • the electronic device 1100 may further include: a bus 1104 for connecting the memory 1102 and the processor 1101 .
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium includes a computer program, and the computer program is used to implement the object matching method in any of the above method embodiments, and its implementation principles and technical effects similar, and will not be repeated here.
  • An embodiment of the present application further provides a chip, including: a memory, a processor, and a computer program, where the computer program is stored in the memory, and the processor executes the computer program to execute the object in any of the foregoing method embodiments
  • the matching method has similar implementation principles and technical effects, and will not be repeated here.
  • Embodiments of the present application also provide a computer program product, including a computer program, which implements the object matching method in any of the above method embodiments when the computer program is executed by a processor, and its implementation principle and technical effect are similar, which will not be repeated here. .
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules is only a logical function division. In actual implementation, there may be other division methods.
  • multiple modules may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be in electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated in one processing unit, or each module may exist physically alone, or two or more modules may be integrated in one unit.
  • the units formed by the above modules can be implemented in the form of hardware, or can be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated modules implemented in the form of software functional modules may be stored in a computer-readable storage medium.
  • the above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (English: processor) to execute the various embodiments of the present application. part of the method.
  • processor may be a central processing unit (English: Central Processing Unit, referred to as: CPU), or other general-purpose processors, digital signal processors (English: Digital Signal Processor, referred to as: DSP), application-specific integrated circuits (English: Application Specific Integrated Circuit, referred to as: ASIC) and so on.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the application can be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the memory may include high-speed RAM memory, and may also include non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a U disk, a removable hard disk, a read-only memory, a magnetic disk or an optical disk, and the like.
  • NVM non-volatile storage
  • the bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, or the like.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the buses in the drawings of the present application are not limited to only one bus or one type of bus.
  • the above-mentioned storage medium may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable except programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • a storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.
  • An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and the storage medium may be located in application specific integrated circuits (Application Specific Integrated Circuits, ASIC for short).
  • ASIC Application Specific Integrated Circuits
  • the processor and the storage medium may also exist in the electronic device or the host device as discrete components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Input (AREA)
  • Traffic Control Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种对象匹配方法、装置及设备,该方法包括:获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;针对每个对象,根据该对象的图像和文本描述信息获取该对象的图文特征,然后,根据各对象的图文特征,确定所述至少两个对象之间的匹配关系。

Description

对象匹配方法、装置及设备
本申请要求于2021年01月25日提交中国专利局、申请号为2021100964492、申请名称为“对象匹配方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种对象匹配方法、装置及设备。
背景技术
一些场景中,需要确定对象之间的匹配关系。例如,以电商应用场景为例,可能需要对商品进行匹配,以确定两个商品是否为同款商品,或者是否为相似商品。
通常,一个对象具有标题信息,标题信息中包括多个用于描述对象的关键词。一些实现方式中,可以利用标题信息确定对象之间的匹配关系。具体的,针对对象A和对象B,计算对象A的标题信息与对象B的标题信息之间的相似度,若相似度大于或者等于预设阈值,则确定对象A和对象B相互匹配。
然而,在实现本申请的过程中发现,上述方式确定出的对象之间的匹配关系可能并不准确。
发明内容
本申请提供一种对象匹配方法、装置及设备,用以提高对象匹配结果的准确性。
第一方面,本申请实施例提供一种对象匹配方法,包括:
获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;
针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征;
根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系。
一种可能的实现方式中,根据所述对象的图像和文本描述信息,获取所述对象的图文特征,包括:
对所述对象的图像进行目标检测,得到所述图像中的至少一个感兴趣区域,并分别获取每个所述感兴趣区域的特征;
获取所述对象的文本描述信息中的每个字符的特征;
对各所述感兴趣区域的特征以及各所述字符的特征进行融合,得到所述对象的图文特征。
一种可能的实现方式中,获取每个所述感兴趣区域的特征,包括:
对所述感兴趣区域进行特征提取,得到第一特征信息;
根据所述感兴趣区域在所述图像中的位置信息,得到第二特征信息;
根据所述第一特征信息和所述第二特征信息,获取所述感兴趣区域的特征。
一种可能的实现方式中,对各所述感兴趣区域的特征以及各所述字符的特征进行融合,得到所述对象的图文特征,包括:
按照至少一种嵌入方式,将各所述感兴趣区域的特征以及各所述字符的特征投影嵌入到同一特征向量的不同维度中,得到所述对象的图文特征;
其中,所述至少一种嵌入方式包括下述中的一种或者多种:语言嵌入、分段嵌入和序列嵌入。
一种可能的实现方式中,根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系,包括:
将第一对象的图文特征和第二对象的图文特征输入训练好的匹配模型中,以使所述匹配模型预测得到所述第一对象与所述第二对象之间的匹配度;
若所述匹配度大于或者等于预设阈值,则确定所述第一对象和所述第二对象匹配,若所述匹配度小于所述预设阈值,则确定所述第一对象和所述第二对象不匹配;
其中,所述第一对象和所述第二对象为所述至少两个对象中的任意两个对象。
一种可能的实现方式中,针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征之前,还包括:
针对每个所述对象,获取所述对象对应的类目;
确定所述至少两个对象对应的类目相同。
一种可能的实现方式中,针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征之前,还包括:
针对每个所述对象,根据所述对象的对象信息,获取所述对象对应的品牌属性;
确定所述至少两个对象对应的品牌属性相同。
一种可能的实现方式中,所述文本描述信息包括:对象标题信息和对象属性信息中的至少一种。
第二方面,本申请实施例提供一种对象匹配装置,包括:
第一获取模块,用于获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;
第二获取模块,用于针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征;
确定模块,用于根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系。
一种可能的实现方式中,所述第二获取模块具体用于:
对所述对象的图像进行目标检测,得到所述图像中的至少一个感兴趣区域,并分别获取每个所述感兴趣区域的特征;
获取所述对象的文本描述信息中的每个字符的特征;
对各所述感兴趣区域的特征以及各所述字符的特征进行融合,得到所述对象的图文特征。
一种可能的实现方式中,所述第二获取模块具体用于:
对所述感兴趣区域进行特征提取,得到第一特征信息;
根据所述感兴趣区域在所述图像中的位置信息,得到第二特征信息;
根据所述第一特征信息和所述第二特征信息,获取所述感兴趣区域的特征。
一种可能的实现方式中,所述第二获取模块具体用于:
按照至少一种嵌入方式,将各所述感兴趣区域的特征以及各所述字符的特征投影嵌入到同一特征向量的不同维度中,得到所述对象的图文特征;
其中,所述至少一种嵌入方式包括下述中的一种或者多种:语言嵌入、分段嵌入和序列嵌入。
一种可能的实现方式中,所述确定模块具体用于:
将第一对象的图文特征和第二对象的图文特征输入训练好的匹配模型中,以使所述匹配模型预测得到所述第一对象与所述第二对象之间的匹配度;
若所述匹配度大于或者等于预设阈值,则确定所述第一对象和所述第二对象匹配,若所述匹配度小于所述预设阈值,则确定所述第一对象和所述第二对象不匹配;
其中,所述第一对象和所述第二对象为所述至少两个对象中的任意两个对象。
一种可能的实现方式中,所述确定模块还用于:
针对每个所述对象,获取所述对象对应的类目;
确定所述至少两个对象对应的类目相同。
一种可能的实现方式中,所述确定模块还用于:
针对每个所述对象,根据所述对象的对象信息,获取所述对象对应的品牌属性;
确定所述至少两个对象对应的品牌属性相同。
一种可能的实现方式中,所述文本描述信息包括:对象标题信息和对象属性信息中的至少一种。
第三方面,本申请实施例提供一种电子设备,包括:存储器和处理器,所述存储器用于存储计算机程序,所述处理器运行所述计算机程序实现如第一方面任一项所述的方法。
第四方面,本申请实施例提供一种计算机可读存储介质,包括:计算机程序,所述计算机程序被处理器执行时实现如第一方面任一项所述的方法。
第五方面,本申请实施例提供一种计算机程序产品,包括:计算机程序,所述计算机程序被处理器执行时实现如第一方面任一项所述的方法。
本申请实施例提供一种对象匹配方法、装置及设备,该方法包括:获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;针对每个对象,根据该对象的图像和文本描述信息获取该对象的图文特征,然后,根据各对象的图文特征,确定所述至少两个对象之间的匹配关系。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前 提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种可能的应用场景的示意图;
图2为本申请实施例提供的另一种可能的应用场景的示意图;
图3为本申请实施例提供的一种对象匹配方法的流程示意图;
图4为本申请实施例提供的一种对象信息的示意图;
图5为本申请实施例提供的一种获取对象的图文特征的方法的流程示意图;
图6为本申请实施例提供的一种目标检测过程的示意图;
图7为本申请实施例提供的图文特征提取模型的示意图;
图8为本申请实施例提供的对象匹配过程的示意图;
图9为本申请实施例提供的另一种对象匹配方法的流程示意图;
图10为本申请实施例提供的一种对象匹配装置的结构示意图;
图11为本申请实施例提供的电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
一些场景中,需要确定对象之间的匹配关系。其中,对象的匹配关系是指多个对象是否为同一对象(同款对象),或者,是指多个对象是否为相似对象。本申请实施例中的“对象”是指客观世界及网络世界中存在的东西。对象可以是有形对象,还可以为无形对象,可以是真实对象,还可以是虚拟对象。
通常,一个对象具有标题信息,标题信息中包括多个用于描述对象的关键词。一些实现方式中,可以利用标题信息确定对象之间的匹配关系。具体的,针对对象A和对象B,计算对象A的标题信息与对象B的标题信息之间的相似度,若相似度大于或者等于预设阈值,则确定对象A和对象B相互匹配。然而,在实现本申请的过程中发现,采用上述方式确定出的对象之间的匹配关系可能并不准确。
为此,本申请实施例提供一种对象匹配方法、装置及设备,旨在解决上述技术问题。本申请实施例可用于对具有图像和文本描述信息的对象进行匹配。
本申请实施例中,获取至少两个对象的对象信息,其中,一个对象的对象信息包 括该对象的图像和文本描述信息;针对每个对象,根据该对象的图像和文本描述信息获取该对象的图文特征,然后,根据各对象的图文特征,确定所述至少两个对象之间的匹配关系。上述匹配过程中,由于是利用对象的多模态特征进行匹配,也就是说,既考虑了对象的文本描述信息,还考虑了对象的图像,因此,确定出的对象匹配关系更加准确。
需要说明的是,不同的应用场景中,对象的指代可以不同。一些场景中,对象可以指代用户。另一些场景中,对象可以指代产品。又一些场景中,对象可以指代商品。本实施例对此不作限定。
为了描述方便,本申请实施例后续举例时,均以电商应用场景为例,描述商品的匹配过程。应理解,当应用于其他场景时,具体的匹配原理和匹配过程是类似的,本申请实施例不作赘述。
下面结合图1和图2对本申请实施例可能的应用场景进行描述。
图1为本申请实施例提供的一种可能的应用场景的示意图。如图1所示,该应用场景包括:用户设备和电商服务器。用户设备和电商服务器之间通过网络连接。电商服务器中设置有商品数据库和商品匹配引擎。用户设备中安装有电商客户端,用户通过电商客户端可以访问电商服务器。用户设备需要搜索某个商品时,将搜索请求发送给电商服务器。电商服务器接收到搜索请求后,通过商品匹配引擎将搜索请求与商品数据库中的商品进行匹配处理,得到搜索结果,搜索结果中可以包括一个或者多个目标商品。电商服务器将搜索结果返回给用户设备。
该场景中,用户设备与电商服务器采用交互式设计,用户输入待搜索的商品,电商服务器通过商品匹配方法,从商品数据库中匹配得到符合用户需求的目标商品并推荐给用户,使得用户快速获取到自己需要的商品。
图2为本申请实施例提供的另一种可能的应用场景的示意图。如图2所示,该应用场景包括:第一电商服务器和第二电商服务器。第一电商服务器和第二电商服务器通过网络连接。第一电商服务器可以通过与第二电商服务器的交互,获取第二电商服务器所提供的商品。示例性的,第一电商服务器可以采用网页爬取技术,通过对第二电商服务器提供的网页进行网页内容抓取,得到第二电商服务器所提供的商品。进一步的,第一电商服务器可以通过商品匹配引擎,将从第二电商服务器获取的商品与自身商品数据库中的商品进行匹配处理,从而得到第一电商服务器所提供的商品与第二电商服务器所提供的商品之间的匹配关系。
该场景中,第一电商服务器通过商品匹配方法,可以确定出第一电商服务器中的哪些商品与第二商品服务器中的哪些商品是同款商品或者相似商品,从而建立出第一电商服务器中的商品与第二电商服务器中的商品之间的匹配关系。
上述场景中,电商服务器可以用独立的服务器或者是多个物理服务器组成的服务器集群来实现。电商服务器还可以采用分布式架构。一些应用场景中,电商服务器还可以为云端服务器。
应理解,图1和图2所示的应用场景仅为两个可能的示意,还可以存在其他更多的应用场景,本申请实施例对此不作限定。
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施 例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图3为本申请实施例提供的一种对象匹配方法的流程示意图。如图3所示,本实施例的方法包括:
S301:获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息。
本申请实施例中,所述至少两个对象为待确定匹配关系的对象。实际应用中,可以针对两个对象确定匹配关系,还可以针对更多数量的对象确定匹配关系。
其中,对象信息是指用于描述对象的一些信息。本实施例的对象具有图像和文本描述信息。因此,一个对象的对象信息包括该对象的图像和文本描述信息。其中,文本描述信息中可以包括一个或者多个句子,或者,一个或者多个关键词。每个对象所具有的图像的数量可以为一个或者多个。
图4为本申请实施例提供的一种对象信息的示意图。以电商应用场景为例,图4示例的对象为商品。如图4所示,该商品具有一个或者多个图像,该商品还具有文本描述信息。文本描述信息可以包括但不限于:商品标题、商品属性信息等。其中,商品属性信息包括但不限于:颜色属性、形状属性、尺码属性、材质属性等。可见,图4所示的商品的商品信息包括:商品图像、商品标题、商品属性信息等。
S302:针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征。
示例性的,可以通过对对象的图像和文本描述信息进行特征提取,得到对象的图文特征。例如,将对象的图像和文本描述信息输入特征提取模型中,得到对象的图文特征。
本实施例中,对象的图文特征中同时包括了从对象的图像中提取的特征,以及从对象的文本描述信息中提取的特征。也就是说,本实施例中提取了对象的多模态特征。
前述的实现方式中,通常是根据对象的标题信息之间的相似度确定对象之间的匹配关系。但是在一些场景中,例如对象为商品时,有些商品(比如服饰商品)的标题信息可能并无显著特征,导致无法判断是否匹配,或者匹配结果并不准确。本申请实施例中,通过引入对象的图像,在提取对象的特征时,不仅考虑对象的标题信息等文本描述信息,还考虑对象的图像,使得提取出的特征更加全面,因此,能够提升匹配结果的准确性,并能够提高匹配的召回率。
S303:根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系。
具体的,可以根据对象的图文特征之间的匹配度,确定对象之间的匹配关系。本实施例中,两个对象之间的匹配关系可以指示该两个对象是否为同一对象(或者说同款对象),或者,指示该两个对象是否为相似对象。
一种可能的实现方式中,针对所述至少两个对象中的任意两个对象,为了描述方便,将该任意两个对象分别称为第一对象和第二对象,可以采用如下方式确定第一对象和第二对象之间的匹配关系:
将第一对象的图文特征和第二对象的图文特征输入训练好的匹配模型中,以使所述匹配模型预测得到所述第一对象与所述第二对象之间的匹配度;若所述匹配度大于 或者等于预设阈值,则确定所述第一对象和所述第二对象匹配,若所述匹配度小于所述预设阈值,则确定所述第一对象和所述第二对象不匹配。
应理解,实际应用中,匹配模型也可以直接输出用于指示第一对象和第二对象是否匹配的二值结果。例如,匹配模型输出1表示第一对象和第二对象匹配,输出0表示第一对象和第二对象不匹配。
本实施例提供的对象匹配方法,包括:获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;针对每个对象,根据该对象的图像和文本描述信息获取该对象的图文特征,然后,根据各对象的图文特征,确定所述至少两个对象之间的匹配关系。上述匹配过程中,由于是利用对象的多模态特征进行匹配,也就是说,既考虑了对象的文本描述信息,还考虑了对象的图像,因此,能够提升匹配结果的准确性,并能够提高匹配的召回率。
在上述任意实施例的基础上,下面结合具体的实施例对S202的可能的实现方式进行更详细的描述。
图5为本申请实施例提供的一种获取对象的图文特征的方法的流程示意图。如图5所示,本实施例的方法包括:
S501:对对象的图像进行目标检测,得到所述图像中的至少一个感兴趣区域,并分别获取每个所述感兴趣区域的特征。
本实施例中,考虑到对象的图像中可能存在较多的干扰信息,例如,以商品为例,商品的图像中可能存在背景、促销文字等干扰信息。因此,本实施例中不是直接对图像整体进行特征提取,而是先对图像进行目标检测,得到至少一个感兴趣区域(Region of Interest,ROI),然后分别获取每个感兴趣区域的特征,从而避免干扰信息对匹配结果的影响。
图6为本申请实施例提供的一种目标检测过程的示意图。如图6所示,可以将对象的图像输入目标检测模型,由目标检测模型对图像进行目标检测,得到至少一个感兴趣区域。参见图6,以X洗发水商品为例,目标检测模型采用矩形框在图像中标注出两个感兴趣区域,并且,还识别出每个感兴趣区域的类别(Box Label)。例如,一个感兴趣区域的类别为“洗发水”,另一个感兴趣区域的类别为“文字”。
应理解,目标检测模型可以是预先训练好的机器学习模型,可以采用Faster-RCNN(全称:Faster Region Convolutional Neural Network)、YOLO(全称:You Only Look Once)、Mask R-CNN(全称:Mask Region Convolutional Neural Network)等模型,本实施例对此不作限定。
一种可能的实现方式中,可以采用如下方式获取每个感兴趣区域的特征:
(1)对感兴趣区域进行特征提取,得到第一特征信息。
示例性的,可以采用如下线性映射的方式从感兴趣区域中提取特征,得到第一特征信息,第一特征信息也可以称为图像嵌入(Image Embedding)向量。
v i=W vf i+b v
其中,f i表示第i个感兴趣区域,v i表示第i个感兴趣区域对应的第一特征信息,W v为线性映射的斜率,b v为线性映射的截距。
(2)根据感兴趣区域在图像中的位置信息,得到第二特征信息。
示例性的,可以根据感兴趣区域在图像的位置信息,编码得的五维向量作为第二特征信息。第二特征信息也可以称为位置嵌入(Position Embedding)向量。五维向量表示如下:
Figure PCTCN2022070030-appb-000001
其中,c i表示第i个感兴趣区域的第二特征信息。(x tl,y tl)、(x br,y br)分别表示第i个感兴趣区域的矩形框的左上角和右下角坐标,W和H表示图像的宽度和高度,上述五维向量中的第5个分量表示感兴趣区域的面积相对于整个图像的面积之比。
(3)根据第一特征信息和第二特征信息,获取感兴趣区域的特征。
示例性的,可以将第二特征信息嵌入到第一特征信息中,得到感兴趣区域的特征。
本实施例中,每个感兴趣区域的特征中不仅考虑了感兴趣区域的图像嵌入向量,还考虑了感兴趣区域的位置嵌入向量,这样,感兴趣区域不仅可以为语言部分提供整个图像的视觉上下文(visual contexts),还可以通过详细的位置信息与特定的术语相关联,使得感兴趣区域的特征更加全面。
S502:获取对象的文本描述信息中的每个字符的特征。
具体的,可以根据向量表(vocab)获取文本描述信息中的每个字符的特征。其中,向量表以文件形式存储,向量表中记录有不同的字符对应的字向量。通过查询向量表,可以获取文本描述信息中的每个字符对应的字向量。进而,可以根据该字向量得到字符的特征。
一种可能的实现方式中,将文本描述信息输入BERT(Bidirectional Encoder Representations from Transformers)模型中,BERT模型可以获取文本描述信息的字向量、文本向量以及位置向量。其中,字向量是通过查询向量表得到的,即,通过查询向量表将文本描述信息中的每个字符转换为其对应的字向量。文本向量是在模型训练过程中自动学习得到的,用于刻画文本的全局语义信息,并与单字符的语义信息相融合。由于文本描述信息中不同位置的字符所携带的语义信息存在差异,因此,BERT模型对不同位置的字符分别附加一个不同的位置向量以作区分。进一步的,BERT模型根据上述字向量、文本向量以及位置向量,输出文本描述信息中各字符融合全文语义信息后的向量表示。
S503:对各所述感兴趣区域的特征以及各所述字符的特征进行融合,得到所述对象的图文特征。
具体的,可以将各感兴趣区域的特征与各字符的特征投影到同一特征向量中,最终得到的特征向量即为对象的图文特征。
一种可能的实现方式中,可以采用如下方式进行融合处理:按照至少一种嵌入方式,将各所述感兴趣区域的特征以及各所述字符的特征投影嵌入到同一特征向量的不同维度中,得到所述对象的图文特征;其中,所述至少一种嵌入方式包括下述中的一种或者多种:语言嵌入、分段嵌入和序列嵌入。
图7为本申请实施例提供的图文特征提取模型的示意图。本实施例中,图文特征提取模型可以对各感兴趣区域的特征和各字符的特征进行融合处理,得到对象的图文特征。
如图7所示,图文特征提取模型包括图像嵌入层、位置嵌入层、语言嵌入层、分段嵌入层和序列嵌入层。图文特征提取模型的输入包括:对象的文本描述信息以及对象的图像(例如,图7中以对象“X洗发水”为例,图像1和图像2为X洗发水对应的两个图像,其中,图像2中的画面内容省略)。图文特征提取模型的输出为该对象的图文特征。
参见图7,语言嵌入层可以对文本描述信息进行处理,得到每个字符的特征。将对象的图像输入图像嵌入层,图像嵌入层对图像进行目标检测,得到多个感兴趣区域,并得到每个感兴趣区域对应的第一特征信息(图像嵌入向量)。然后,通过位置嵌入层得到每个感兴趣区域对应的第二特征信息(位置嵌入向量)。第一特征信息和第二特征信息通过语言嵌入层投影嵌入到同一维度。这样,感兴趣区域不仅可以为语言部分提供整个图像的视觉上下文(visual contexts),还可以通过详细的位置信息与特定的术语相关联。
对于图像中的每个感兴趣区域,其对应的图像嵌入、分段嵌入、位置嵌入、序列嵌入被投影到一个向量中,如下所示。
Figure PCTCN2022070030-appb-000002
其中,e (i)表示第i个感兴趣区域最终得到的特征向量,v (i)表示图像嵌入向量,s (i)表示分段嵌入向量,
Figure PCTCN2022070030-appb-000003
表示位置嵌入向量,
Figure PCTCN2022070030-appb-000004
表示序列嵌入向量,LN()表示层标准化(Layer Normalization)处理。
继续参见图7,在分段嵌入层,每个图像和每个文本描述信息对应一个分段。例如,图7中示例的是图像对应分段A,文本描述信息对应分段B。该图像中的各感兴趣区域的特征被投影嵌入到分段A中,该文本描述信息中的各字符的特征被投影嵌入到分段B中。这样,分段嵌入层的信息体现了特征的来源,是来自于哪个图像或者哪个文本。
继续参见图7,在序列嵌入层,由于图像中的各感兴趣区域不存在先后顺序,因此,每个图像中的各感兴趣区域对应同一序号。而每个文本中的字符是有先后顺序的,因此,按照文本中字符的先后顺序,每个字符对应一个序号,不同字符对应的序号不同。这样,序列嵌入层的信息体现了特征的顺序。
应理解,图7中的图像2的处理过程与图像1是类似的,此处不作详述。应理解,图7中,输入序列的第一个符号为[CLS],这里的CLS用于分隔不同对象的对象信息。不同文本之间采用分隔符(SEP)进行分隔。
本实施例的图文特征提取模型,在提取对象的特征时,既考虑了文本描述信息的特征,也考虑了图像的特征,即,考虑了对象的多模态特征,增强了模型表达能力,有助于提升对象匹配结果的准确性,并提升匹配的召回率。
在上述实施例的基础上,下面结合一个具体的示例描述对象匹配过程。
图8为本申请实施例提供的对象匹配过程的示意图。以对象A和对象B为例,如图8所示,将对象A的图像和文本描述信息输入图文特征提取模型中,得到对象A的图文特征,将对象B的图像和文本描述信息输入图文特征提取模型中,得到对象B的图文特征。其中,图文特征提取模型可以采用如图7所示的模型。将对象A图文特征 和对象B的图文特征输入匹配模型中,得到对象A和对象B之间的匹配度。若匹配度大于或者等于预设阈值,则确定对象A和对象B匹配,若匹配度小于预设阈值,则确定对象A和对象B不匹配。
可选的,匹配模型中可以包括:一个或者多个Transformer层、一个或者多个全连接层、激活函数层、批标准化层和损失函数层。其中,损失函数层可以采用交叉熵损失或三元组损失。
图9为本申请实施例提供的另一种对象匹配方法的流程示意图。本实施例中以第一对象和第二对象的匹配过程为例进行描述。如图9所示,本实施例的方法包括:
S901:获取第一对象的对象信息,并获取第二对象的对象信息,其中,每个对象的对象信息包括该对象的图像和文本描述信息。
S902:获取第一对象的类目,并获取第二对象的类目。
其中,类目是指对象的所属的类别。对象的类目可以包括一个或者多个层级。当对象的类目包括多个层级时,S902中可以获取对象的末级类目。以商品为例,电商平台为了便于对商品的管理,划分了多个层级的类目。例如:母婴为一级类目,该类别下包括奶粉、纸尿裤、奶瓶等多个二级类目。奶粉二级类目下又包括:婴儿奶粉、孕妇奶粉等多个三级类目。
S903:判断第一对象的类目和第二对象的类目是否相同。
若相同,则继续执行后续流程。如不同,则执行S910,确定第一对象和第二对象不匹配。
应理解,当对象的类别包括多个层级时,本实施例中可以获取对象的末级类目,判断两个对象的末级类目是否相同,从而尽可能缩小匹配范围。
本实施例中,若两个对象的类目不同,则直接确定两个对象不匹配。只有当两个对象的类目相同时,才继续后续的匹配流程,这样可以缩小匹配范围,减少计算量,提高匹配效率。
需要说明的是,当本实施例应用于跨电商平台的商品匹配时,由于不同电商平台中商品类目的划分方式不同,因此,本实施例中所述的类目相同应做广义理解,即,类目相同是指类目相当,而不是指严格一致。例如,电商平台A的婴儿奶粉类目与电商平台B的配方奶粉类目应理解为相同类目,电商平台A的手机类目与电商平台B的数码通信类目也应理解为相同类目。
具体的,假设第一对象来自于电商平台A,第二对象来自于电商平台B,可以先将第二对象在电商平台B中的类目映射为电商平台A中的类目。然后,确定第一对象在电商平台A中的类目与第二对象映射到电商平台A中的类目是否相同。
其中,将第二对象在电商平台B中的类目映射为电商平台A中的类目,可以采用如下两种可能的实现方式:
一种可能的实现方式中,获取多组样本,每组样本包括:一个来自于电商平台A的商品和一个来自于电商平台B的商品。并且,两个商品在各自所属电商平台中的类目已知。对每组样本中的两个商品人工标注匹配关系。这样,根据样本中两个商品之间的匹配关系,可以推断出两个商品所属类目之间的匹配关系,进而得到电商平台A和电商平台B之间的类目映射关系。其中,在推断过程中可以采用选举方式。进一步 的,确定出电商平台A和电商平台B之间的类目映射关系之后,可以根据类目映射关系,将第二对象在电商平台B中的类目映射为电商平台A中的类目。
另一种可能的实现方式中,可以基于上述样本,从每组样本来自于电商平台B的商品的文本描述信息中提取特征,并将该组样本来自于电商平台A的商品的类目作为该特征对应的标签,对类目判别模型进行训练。进一步的,训练得到类目判别模型之后,将第二对象的文本描述信息输入类目判别模型中,由类目判别模型输出第二对象在电商平台A中的类目。
S904:根据第一对象的对象信息,获取第一对象的品牌属性,并根据第二对象的对象信息,获取第二对象的品牌属性。
以电商应用场景为例,每个商品的属性信息中通常包括品牌属性,因此,可以根据商品的属性信息获取品牌属性。
S905:判断第一对象的品牌属性与第二对象的品牌属性是否相同。
若相同,则继续执行后续流程,若不同,则执行S910,确定第一对象和第二对象不匹配。
本实施例中,若两个对象的品牌属性不同,则直接确定两个对象不匹配。只有当两个对象的品牌属性相同时,才继续后续的匹配流程,这样可以缩小匹配范围,减少计算量,提高匹配效率。
需要说明的是,实际应用中,本实施例中根据类目进行过滤的过程(S902-S903)以及根据品牌属性进行过滤的过程(S904-S905)可以择一执行,也可以都执行,本实施例对此不作限定。
S906:根据第一对象的图像和文本描述信息,获取第一对象的图文特征,并根据第二对象的图像和文本描述信息,获取第二对象的图文特征。
S907:将第一对象的图文特征和第二图像的图文特征输入匹配模型中,获取所述第一对象和所述第二对象之间的匹配度。
应理解,S906和S907的具体实现方式与上述实施例类似,此处不作赘述。
S908:判断所述匹配度是否大于或者等于预设阈值。
若是,则执行S909。若否,则执行S910。
S909:确定第一对象和第二对象匹配。
S910:确定第一对象和第二对象不匹配。
本实施例中,通过利用对象的类目和/或品牌属性进行过滤,在两个对象的类目和/或品牌属性不同的情况下确定该两个对象不匹配,从而减少匹配计算量,提升匹配效率。
图10为本申请实施例提供的一种对象匹配装置的结构示意图。本实施例的装置可以为软件和/或硬件的形式。如图10所示,本实施例提供的对象匹配装置1000,可以包括:第一获取模块1001、第二获取模块1002和确定模块1003。
其中,第一获取模块1001,用于获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;
第二获取模块1002,用于针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征;
确定模块1003,用于根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系。
一种可能的实现方式中,所述第二获取模块1002具体用于:
对所述对象的图像进行目标检测,得到所述图像中的至少一个感兴趣区域,并分别获取每个所述感兴趣区域的特征;
获取所述对象的文本描述信息中的每个字符的特征;
对各所述感兴趣区域的特征以及各所述字符的特征进行融合,得到所述对象的图文特征。
一种可能的实现方式中,所述第二获取模块1002具体用于:
对所述感兴趣区域进行特征提取,得到第一特征信息;
根据所述感兴趣区域在所述图像中的位置信息,得到第二特征信息;
根据所述第一特征信息和所述第二特征信息,获取所述感兴趣区域的特征。
一种可能的实现方式中,所述第二获取模块1002具体用于:
按照至少一种嵌入方式,将各所述感兴趣区域的特征以及各所述字符的特征投影嵌入到同一特征向量的不同维度中,得到所述对象的图文特征;
其中,所述至少一种嵌入方式包括下述中的一种或者多种:语言嵌入、分段嵌入和序列嵌入。
一种可能的实现方式中,所述确定模块1003具体用于:
将第一对象的图文特征和第二对象的图文特征输入训练好的匹配模型中,以使所述匹配模型预测得到所述第一对象与所述第二对象之间的匹配度;
若所述匹配度大于或者等于预设阈值,则确定所述第一对象和所述第二对象匹配,若所述匹配度小于所述预设阈值,则确定所述第一对象和所述第二对象不匹配;
其中,所述第一对象和所述第二对象为所述至少两个对象中的任意两个对象。
一种可能的实现方式中,所述确定模块1003还用于:
针对每个所述对象,获取所述对象对应的类目;
确定所述至少两个对象对应的类目相同。
一种可能的实现方式中,所述确定模块1003还用于:
针对每个所述对象,根据所述对象的对象信息,获取所述对象对应的品牌属性;
确定所述至少两个对象对应的品牌属性相同。
一种可能的实现方式中,所述文本描述信息包括:对象标题信息和对象属性信息中的至少一种。
本实施例提供的对象匹配装置,可用于执行上述任一方法实施例中的对象匹配方法,其实现原理和技术效果类似,此处不作赘述。
图11为本申请实施例提供的一种电子设备的结构示意图。如图11所示,本实施例的电子设备1100,包括:处理器1101以及存储器1102。
其中,存储器1102,用于存储计算机程序;处理器1101,用于执行存储器中存储的计算机程序,以实现上述实施例中的对象匹配方法。具体可以参见前述方法实施例中的相关描述,其实现原理和技术效果类似,本实施例此处不再赘述。
可选地,存储器1102既可以是独立的,也可以跟处理器1101集成在一起。
可选的,所述电子设备1100还可以包括通信部件1103,用于与其他设备通信。
当所述存储器1102是独立于处理器1101之外的器件时,所述电子设备1100还可以包括:总线1104,用于连接所述存储器1102和处理器1101。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,所述计算机程序用于实现如上任一方法实施例中的对象匹配方法,其实现原理和技术效果类似,此处不作赘述。
本申请实施例还提供一种芯片,包括:存储器、处理器以及计算机程序,所述计算机程序存储在所述存储器中,所述处理器运行所述计算机程序执行上述任一方法实施例中的对象匹配方法,其实现原理和技术效果类似,此处不作赘述。
本申请实施例还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述任一方法实施例中的对象匹配方法,其实现原理和技术效果类似,此处不作赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述模块成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(英文:processor)执行本申请各个实施例所述方法的部分步骤。
应理解,上述处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合申请所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器,还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。
总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended  Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。
上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。
一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称:ASIC)中。当然,处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (12)

  1. 一种对象匹配方法,其特征在于,包括:
    获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;
    针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征;
    根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系。
  2. 根据权利要求1所述的方法,其特征在于,根据所述对象的图像和文本描述信息,获取所述对象的图文特征,包括:
    对所述对象的图像进行目标检测,得到所述图像中的至少一个感兴趣区域,并分别获取每个所述感兴趣区域的特征;
    获取所述对象的文本描述信息中的每个字符的特征;
    对各所述感兴趣区域的特征以及各所述字符的特征进行融合,得到所述对象的图文特征。
  3. 根据权利要求2所述的方法,其特征在于,获取每个所述感兴趣区域的特征,包括:
    对所述感兴趣区域进行特征提取,得到第一特征信息;
    根据所述感兴趣区域在所述图像中的位置信息,得到第二特征信息;
    根据所述第一特征信息和所述第二特征信息,获取所述感兴趣区域的特征。
  4. 根据权利要求2或3所述的方法,其特征在于,对各所述感兴趣区域的特征以及各所述字符的特征进行融合,得到所述对象的图文特征,包括:
    按照至少一种嵌入方式,将各所述感兴趣区域的特征以及各所述字符的特征投影嵌入到同一特征向量的不同维度中,得到所述对象的图文特征;
    其中,所述至少一种嵌入方式包括下述中的一种或者多种:语言嵌入、分段嵌入和序列嵌入。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系,包括:
    将第一对象的图文特征和第二对象的图文特征输入训练好的匹配模型中,以使所述匹配模型预测得到所述第一对象与所述第二对象之间的匹配度;
    若所述匹配度大于或者等于预设阈值,则确定所述第一对象和所述第二对象匹配,若所述匹配度小于所述预设阈值,则确定所述第一对象和所述第二对象不匹配;
    其中,所述第一对象和所述第二对象为所述至少两个对象中的任意两个对象。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征之前,还包括:
    针对每个所述对象,获取所述对象对应的类目;
    确定所述至少两个对象对应的类目相同。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征之前,还包括:
    针对每个所述对象,根据所述对象的对象信息,获取所述对象对应的品牌属性;
    确定所述至少两个对象对应的品牌属性相同。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述文本描述信息包括:对象标题信息和对象属性信息中的至少一种。
  9. 一种对象匹配装置,其特征在于,包括:
    第一获取模块,用于获取至少两个对象的对象信息,其中,一个对象的对象信息包括该对象的图像和文本描述信息;
    第二获取模块,用于针对每个所述对象,根据所述对象的图像和文本描述信息,获取所述对象的图文特征;
    确定模块,用于根据所述至少两个对象的图文特征,确定所述至少两个对象之间的匹配关系。
  10. 一种电子设备,其特征在于,包括:存储器和处理器,所述存储器用于存储计算机程序,所述处理器运行所述计算机程序实现如权利要求1至8任一项所述的方法。
  11. 一种计算机可读存储介质,其特征在于,包括:计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的方法。
  12. 一种计算机程序产品,其特征在于,包括:计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的方法。
PCT/CN2022/070030 2021-01-25 2022-01-04 对象匹配方法、装置及设备 WO2022156525A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110096449.2A CN113762309B (zh) 2021-01-25 2021-01-25 对象匹配方法、装置及设备
CN202110096449.2 2021-01-25

Publications (1)

Publication Number Publication Date
WO2022156525A1 true WO2022156525A1 (zh) 2022-07-28

Family

ID=78786441

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070030 WO2022156525A1 (zh) 2021-01-25 2022-01-04 对象匹配方法、装置及设备

Country Status (2)

Country Link
CN (1) CN113762309B (zh)
WO (1) WO2022156525A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861603A (zh) * 2022-12-29 2023-03-28 宁波星巡智能科技有限公司 兴趣区域锁定方法、装置、设备及存储介质
CN116108282A (zh) * 2023-04-12 2023-05-12 荣耀终端有限公司 一种信息推荐模型的训练方法、信息推荐方法及设备
CN116563573A (zh) * 2023-01-12 2023-08-08 北京爱咔咔信息技术有限公司 一种商品与价签的匹配方法、装置、设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762309B (zh) * 2021-01-25 2023-06-27 北京沃东天骏信息技术有限公司 对象匹配方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861972A (zh) * 2017-09-15 2018-03-30 广州唯品会研究院有限公司 一种用户录入商品信息后显示商品全结果的方法及设备
CN109118336A (zh) * 2018-08-24 2019-01-01 平安科技(深圳)有限公司 信息推荐方法、装置、计算机设备及存储介质
CN111581510A (zh) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 分享内容处理方法、装置、计算机设备和存储介质
CN113297891A (zh) * 2020-11-13 2021-08-24 阿里巴巴集团控股有限公司 视频信息处理方法、装置及电子设备
CN113762309A (zh) * 2021-01-25 2021-12-07 北京沃东天骏信息技术有限公司 对象匹配方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861972A (zh) * 2017-09-15 2018-03-30 广州唯品会研究院有限公司 一种用户录入商品信息后显示商品全结果的方法及设备
CN109118336A (zh) * 2018-08-24 2019-01-01 平安科技(深圳)有限公司 信息推荐方法、装置、计算机设备及存储介质
CN111581510A (zh) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 分享内容处理方法、装置、计算机设备和存储介质
CN113297891A (zh) * 2020-11-13 2021-08-24 阿里巴巴集团控股有限公司 视频信息处理方法、装置及电子设备
CN113762309A (zh) * 2021-01-25 2021-12-07 北京沃东天骏信息技术有限公司 对象匹配方法、装置及设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861603A (zh) * 2022-12-29 2023-03-28 宁波星巡智能科技有限公司 兴趣区域锁定方法、装置、设备及存储介质
CN115861603B (zh) * 2022-12-29 2023-09-26 宁波星巡智能科技有限公司 婴幼儿看护场景下兴趣区域锁定方法、装置、设备及介质
CN116563573A (zh) * 2023-01-12 2023-08-08 北京爱咔咔信息技术有限公司 一种商品与价签的匹配方法、装置、设备及存储介质
CN116563573B (zh) * 2023-01-12 2023-10-13 北京爱咔咔信息技术有限公司 一种商品与价签的匹配方法、装置、设备及存储介质
CN116108282A (zh) * 2023-04-12 2023-05-12 荣耀终端有限公司 一种信息推荐模型的训练方法、信息推荐方法及设备
CN116108282B (zh) * 2023-04-12 2023-08-29 荣耀终端有限公司 一种信息推荐模型的训练方法、信息推荐方法及设备

Also Published As

Publication number Publication date
CN113762309B (zh) 2023-06-27
CN113762309A (zh) 2021-12-07

Similar Documents

Publication Publication Date Title
WO2022156525A1 (zh) 对象匹配方法、装置及设备
CN108416776B (zh) 图像识别方法、图像识别装置、计算机产品和可读存储介质
WO2022116537A1 (zh) 一种资讯推荐方法、装置、电子设备和存储介质
US9411849B2 (en) Method, system and computer storage medium for visual searching based on cloud service
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN111212303B (zh) 视频推荐方法、服务器和计算机可读存储介质
CN111931859B (zh) 一种多标签图像识别方法和装置
CN113221882B (zh) 一种面向课程领域的图像文本聚合方法及系统
CN113011186A (zh) 命名实体识别方法、装置、设备及计算机可读存储介质
CN111666766A (zh) 数据处理方法、装置和设备
WO2022227218A1 (zh) 药名识别方法、装置、计算机设备和存储介质
WO2022160736A1 (zh) 图像标注方法、装置、电子设备、存储介质及程序
WO2023202268A1 (zh) 文本信息提取方法、目标模型的获取方法、装置及设备
CN109426831A (zh) 图片相似匹配及模型训练的方法、装置及计算机设备
CN116958957A (zh) 多模态特征提取网络的训练方法及三维特征表示方法
US11829710B2 (en) Deriving global intent from a composite document to facilitate editing of the composite document
CN114328798B (zh) 搜索文本的处理方法、装置、设备、存储介质和程序产品
CN113761377B (zh) 基于注意力机制多特征融合的虚假信息检测方法、装置、电子设备及存储介质
CN115131811A (zh) 目标识别及模型训练方法、装置、设备、存储介质
CN113255787B (zh) 一种基于语义特征和度量学习的小样本目标检测方法及系统
CN108717436B (zh) 一种基于显著性检测的商品目标快速检索方法
Wajid et al. Neutrosophic-CNN-based image and text fusion for multimodal classification
CN113869371A (zh) 模型训练方法、服装细粒度分割方法及相关装置
CN111814481B (zh) 购物意图识别方法、装置、终端设备及存储介质
CN112163415A (zh) 针对反馈内容的用户意图识别方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742004

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.11.2023)