WO2023024413A1 - 信息的匹配方法、装置、计算机设备及可读存储介质 - Google Patents

信息的匹配方法、装置、计算机设备及可读存储介质 Download PDF

Info

Publication number
WO2023024413A1
WO2023024413A1 PCT/CN2022/071445 CN2022071445W WO2023024413A1 WO 2023024413 A1 WO2023024413 A1 WO 2023024413A1 CN 2022071445 W CN2022071445 W CN 2022071445W WO 2023024413 A1 WO2023024413 A1 WO 2023024413A1
Authority
WO
WIPO (PCT)
Prior art keywords
object information
vector
vectors
modality
embedded
Prior art date
Application number
PCT/CN2022/071445
Other languages
English (en)
French (fr)
Inventor
谯轶轩
陈浩
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023024413A1 publication Critical patent/WO2023024413A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to information matching methods, devices, computer equipment and readable storage media.
  • the matching of object information mainly includes image matching and text description matching.
  • image matching usually uses the local sensitive hash algorithm to detect similar images, and then matches similar objects to the target object.
  • text description matching usually uses a short text matching algorithm, adding cosine similarity or text edit distance to retrieve similar text descriptions.
  • this method is generally used in information retrieval or question answering scenarios. For text descriptions assembled from tag phrases, the accuracy of matched object information is low.
  • the present application provides an information matching method, device, computer equipment, and readable storage medium, the main purpose of which is to solve the problem of low accuracy of object information obtained based on image and text description matching in the prior art.
  • a method for matching information includes:
  • the pre-trained feature extraction model under the corresponding modality representation is called for feature extraction, and embedded vectors with different modality attributes are obtained.
  • the feature extraction model uses the additive angle interval loss function to perform feature extraction training for extracting embedding vectors with modality properties from modality-represented object information;
  • an information matching device includes:
  • an acquisition unit configured to acquire object information represented by different modalities
  • the calling unit is used to call the pre-trained feature extraction model under the corresponding modality representation for the object information represented by each modality to perform feature extraction to obtain embedded vectors with different modality attributes.
  • the feature extraction model uses additive Angle-margin loss function for training to extract embedding vectors with modality properties from modality-represented object information;
  • An update unit configured to update the embedded vectors with different modal attributes by using a neighboring vector mixing algorithm to obtain an object information vector fused with features of neighboring vectors;
  • a calculation unit configured to calculate the similarity between the object information vectors fused with the features of adjacent vectors, and determine the matching degree between the object information according to the similarity calculation results.
  • a computer device including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the steps of the information matching method when executing the computer-readable instructions.
  • a readable storage medium on which computer-readable instructions are stored, and the steps of the information matching method are implemented when the computer-readable instructions are executed by a processor.
  • the present application can extract embedded vectors reflecting object feature information when performing information matching, and fuse embedded vectors with modal attributes, so that object information can be fused with information features between different modalities, and combined with fusion
  • the object information vector under the modal representation is used for object information matching to improve the accuracy of matching object information.
  • FIG. 1 shows a schematic flowchart of an information matching method provided by an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of another information matching method provided by the embodiment of the present application
  • FIG. 3 shows a schematic flow diagram of updating an embedding vector with an adjacent relationship provided by an embodiment of the present application
  • FIG. 4 shows a schematic structural diagram of an information matching device provided by an embodiment of the present application
  • FIG. 5 shows a schematic structural diagram of another information matching device provided by the embodiment of the present application.
  • AI artificial intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the embodiment of the present application provides an information matching method.
  • the feature extraction model can extract the embedded vectors of object information under different modal representations, and improve the accuracy of matching object information.
  • the method include:
  • the object can be the target resource abstracted from the online page.
  • the target object can be the commodity sold on the network platform, the information displayed on the enterprise platform, or the news released on the news platform.
  • the diversity of resources, the object information represented by different modalities can include object information in the form of pictures, object information in the form of text, object information in the form of videos, object information in the form of links, etc., and object information in the form of pictures can be expressed as a whole
  • Object information in the form of text such as diagrams, detail diagrams, and material diagrams
  • video-based object information can be expressed as object introduction videos, object display videos, and object use videos.
  • the object information represented in different modalities can be obtained. Since the object information represented by each modality may have multiple representations, here we can obtain the object information belonging to the same modality. Multiple forms of object information are summarized as the object information represented by the modality. For example, for an object in the form of a picture, the overall map, detail map, and material map of the object can be summarized as the object information represented by the picture. You can also choose The characteristic form of object information belonging to the same modal representation is used as the object information under the modal representation. For example, the object name and object description can be selected and summarized as the object information under the text representation for an object in text form.
  • the execution subject may be an information matching device, which is specifically applied on the server side.
  • the matching process of object information is relatively one-sided through the object information represented by a single modality, and it is difficult to accurately match similar information.
  • object information In this application, the object information represented by different modalities is fused, so that the matching process of the object information takes into account the differences under different information contents, so as to achieve a better matching effect and improve the accuracy of matching object information.
  • the above servers can be independent servers, or provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network) , CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • the object information represented by each modality can be trained offline in advance, so as to obtain a model with feature extraction function, after obtaining the object information represented by each modality, call the corresponding modality representation in advance
  • the trained feature extraction model where the feature extraction model can use the machine algorithm of artificial intelligence to train the network model, by extracting the features of the corresponding modal attributes from the object information represented by each modality, the embedded model with modal attributes can be obtained
  • the object information under the image representation can output the embedded vector with image attributes after the feature extraction model trained for the image modality
  • the object information under the text representation can output the embedded vector with the text after the feature extraction model trained for the text modality
  • An embedded vector of attributes for example, the object information under the image representation can output the embedded vector with image attributes after the feature extraction model trained for the image modality, and the object information under the text representation can output the embedded vector with the text after the feature extraction model trained for the text modality An embedded vector of attributes.
  • the network model used to train the feature extraction model can be selected according to the modality representation of object information.
  • the feature extraction model for image modality can use image encoder, and timm can be used
  • timm can be used
  • the text encoder can be used for the feature extraction model of the text mode
  • the xlm-roberta-large algorithm under the huggingface algorithm library can be used
  • the ArcFace loss function is used to train the model during the model parameter adjustment process.
  • the loss function will play a guiding role in the optimization of the entire network model.
  • the feature extraction model is trained using the additive angle interval loss function, which is used to extract modal attributes from the object information represented by the modal Embedded vectors, so that the extracted embedded vectors can more accurately characterize the object features under the corresponding modality representation.
  • Using a neighboring vector mixing algorithm update the embedded vectors with different modality attributes to obtain an object information vector fused with features of neighboring vectors.
  • the adjacent vector mixing algorithm in this application needs to use embedded vectors with different modal attributes for matching.
  • the KNN classification algorithm of threshold processing it is necessary to ensure that each query has at least two matching items.
  • the threshold The setting of will be relatively high.
  • the adjacent vector of each embedded vector is used to update itself to achieve better information fusion.
  • the cosine distance between the embedded vectors with different modal attributes can be calculated separately. If the cosine distance is greater than the preset threshold, then It is determined that there is an adjacent relationship between the embedded vectors, and further using the update strength mapped by the cosine distance to update the embedded vectors with the adjacent relationship.
  • Embedded vectors are updated, and embedded vectors formed by mixing embedded vectors with different modal properties can also be used.
  • the adjacent vectors of each embedded vector may not have the same modal attributes
  • you can use vectors with different modal attributes to the embedded vector Update specifically for the modal attribute corresponding to the current embedded vector, by querying the embedded vector adjacent to it and belonging to a different modal attribute as the adjacent embedded vector, here you can use whether the distance value between the vectors reaches the threshold to judge Whether the two embedded vectors are adjacent, and further use the adjacent embedded vector to update the current embedded vector.
  • the adjacent text mode can be used The embedded vector of attributes and/or the embedded vector of video attributes are updated.
  • the distance value between embedded vectors can be used as a way to determine the update strength. For two embedded vectors with different modal properties, the distance between the two embedded vectors is closer. If there is a higher similarity between them, a higher update strength can be used for the adjacent embedded vector when updating, and a lower update strength can be used for the adjacent embedded vector with a longer distance.
  • the object information vector has features after multimodal fusion, and considering the characteristics of different modal representations, the difference between different information can be reduced, so that The representation of object vector information is more accurate, and the matching accuracy of subsequent object information is improved.
  • the process of calculating the similarity between object information vectors with adjacent vector features is equivalent to calculating the distance between vectors.
  • the distance There are many ways to calculate the distance, for example, cosine similarity, Euclidean distance, Manhattan distance, Peel Correlation coefficient etc.
  • the matching degree of object information can reflect the similarity between multiple object information to a certain extent. The higher the similarity value, the closer the object information is. Further, similar objects can be pushed to users according to the matching degree of object information. , you can also block the display of similar objects.
  • An information matching method obtained in the embodiment of the present application obtains object information represented by different modalities, and calls a pre-trained feature extraction model under the corresponding modal representation for feature extraction for the object information represented by each modality. Embedded vectors with different modality properties are obtained, and the feature extraction model is trained with an additive angle-margin loss function for extracting embedded vectors with modality properties from the object information represented by modality, further using the adjacent vector mixture.
  • the algorithm updates the embedded vectors with different modal attributes to obtain the object information vectors fused with adjacent vector features, and then calculates the similarity between the object information vectors fused with adjacent vector features, and calculates according to the similarity The results determine the degree of matching between object information.
  • this application can extract embedded vectors reflecting object feature information, and fuse embedded vectors with modality attributes, so that object information It can fuse information features between different modalities, and combine object information vectors under fused modal representations for object information matching, improving the accuracy of matching object information.
  • the feature extraction model can extract the embedded vector of object information under different modal representations, and improve the accuracy of matching object information, as shown in Figure 2, the The methods described include:
  • object information represented by different modalities has different attribute representations in the same attribute dimension, for example, different colors can be represented in the color dimension, and different sizes can be represented in the size dimension.
  • the attribute representation can also preprocess the object information based on the attribute characteristics of the object information on the same attribute dimension, so that the object information represented by different modalities has the same attribute representation.
  • an optional attribute representation can be selected, or Select a representative attribute feature, and you can also select the attribute feature with the highest sales volume of the object.
  • the construction of the feature extraction model under each modality representation can use pre-collected object information sample sets of different modality representations to train the network model, and the network model can be trained using machine learning or deep learning , in the process of constructing the feature extraction model under each mode representation, the network model can be used to process the object information sample sets represented by different modes respectively, and the embedded vector of the object information under different mode representations can be obtained.
  • the object The information sample set carries the object category label, and then for the object information samples characterized by different modalities, the additive angle interval loss function is used to perturb the angle obtained by the dot product of the embedded vector and the weight matrix, and according to the perturbed angle
  • the output target feature vector is further used to predict the category label of the object information on the target feature vector by using the classification function, and a feature extraction model under each mode representation is constructed.
  • the network model is used to process the object information sample sets represented by different modalities respectively, and in the process of obtaining the embedded vectors of object information represented by different modalities, the object information sample sets represented by different modalities can be vectorized to obtain different Then use the pooling layer of the network model to perform feature aggregation on the object vectors of different modal representations to obtain the object feature vectors of different modal representations, and further batch normalization based on the sample dimension and feature dimension-based Regularization standardizes the object feature vectors of feature clusters to obtain embedded vectors of object information under different modal representations.
  • processing the object information sample set represented by the image modality to obtain the application scene of the embedded vector first convert an image into an array of [256, 256, 3], 3 represents the RGB three-color value, and each The value of each element is a certain value between [0, 255], which realizes the function of image digitization or vectorization; then the image represented by [256, 256, 3] is input into the eca_nfnet_l1 model, and the output represents the characteristics of the image.
  • the size [8, 8, 1792] realizes the feature extraction function, averages 1792 feature layers of [8, 8] through GAP (global average pooling), and obtains a 1792-dimensional vector, realizing
  • the feature aggregation function finally applies the batch normalization based on the sample dimension and the regularization based on the feature dimension to act on the 1792-dimensional vector, and obtains the standardized vector representation, that is, the embedded vector of the object information represented by the image modality.
  • the object text is segmented according to the space, which is abbreviated as [t1, t2, t3...,tn], and the word segmentation
  • the final sequence is input to the xlm-roberta-large model, and the updated vector representation sequence [h1,h2...hn] of each word is obtained.
  • Each vector has 1024 dimensions, which realizes the conversion of text from words to vectors.
  • the vector is rich in More semantics in the text, the pooling operation averages the above vector sequence to obtain a 1024-dimensional vector, this step realizes the feature aggregation, and finally applies the batch normalization based on the sample dimension and the regularization based on the feature dimension to act on the 1024-dimensional vector
  • the standardized vector representation is obtained, that is, the embedded vector of the object information under the text modal representation.
  • the object information sample set carries object category labels.
  • the additive angle interval loss function is used to perturb the angle obtained by the dot product of the embedded vector and the weight matrix, and according to the perturbed
  • the additive angle interval loss function can be used to do the point multiplication of the embedded vector and the weight matrix regularized by the embedded vector to obtain the cosine value
  • the angle obtained by inverse operation of the cosine value is perturbed by adding the angle interval, and the cosine value of the perturbed angle is calculated as the target feature vector.
  • different angle intervals may be used for network models represented by different modalities.
  • the appropriate angle interval for image models is 0.8-1.0
  • the appropriate angle interval for text models is 0.6-0.8.
  • the preset loss function can be used to combine the category labels predicted by the object information with the category labels of the object information sample set Parameter adjustment is performed on the feature extraction model represented by each mode, and the feature extraction model is updated.
  • the above-mentioned embedded vectors with different modality attributes can be used to match between object information, specifically, the embedded vectors of image modality can be used to match between object images, and the embedded vectors of text modality can be used Performs a match between object literals.
  • the matching results under the modal representation formed by the embedded vectors with different modal attributes can also be combined, specifically, the image modal attributes and text modal attributes can be combined
  • the matching process is performed to obtain the matching result between the merged object information.
  • the final tendency of the model can be divided into the following situations.
  • the object information to be matched one is to output an object A that is very similar to the target object in the text mode, and the other is to output an object A that is very similar to the target object in the image mode.
  • Similar object G there is another kind of object D, E, F that is similar to the target object after the fusion of text mode and image mode, so the objects that are very similar to the target object fall in A, D, E, F, Among G.
  • the distance value is a metric value capable of characterizing vectors, which may be a cosine value distance or a Mahatton distance, which is not limited here.
  • the update strength of the distance value mapping can be used as the weight value for updating the embedded vector with adjacent relationship. Specifically, each time the embedded vector is updated, it can be based on the original embedded vector, plus the corresponding update strength. The embedding vector of the neighbor relationship, so that the updated embedding vector has richer object information content.
  • the process of updating the embedding vector with adjacent relationship is shown in Figure 3: taking object A as an example, the embedding vector E A of object A is [-0.588, 0.784, 0.196], same as There are also embedded vectors E B , E C , and E D of objects B , C , and D , and the cosine distances between the embedded vector E A of object A and the nodes of objects B, C, and D are calculated to be 0.53, 0.93, and 0.94, the solid line indicates that within the preset threshold of the distance between the two (the preset threshold can be set to 0.5), it indicates that there is an adjacent relationship between the embedded vectors, and the dotted line indicates that it is outside the threshold and does not belong to the adjacent relationship. For each embedded vector, the embedded vectors of its adjacent relationship are used to update itself, and the update strength is given by the cosine distance value.
  • the specific process of updating the embedded vector can be as follows:
  • E A normalize(E A ⁇ 1+E D ⁇ 0.94+E B ⁇ 0.93+E C ⁇ 0.53)
  • E B normalize(E B ⁇ 1+E A ⁇ 0.93)
  • E C normalize(E C ⁇ 1+E A ⁇ 0.53)
  • E D normalize(E D ⁇ 1+E A ⁇ 0.94)
  • the requirement for similar push or shielding of target object information can also be responded to by similarity pushing of target object information after determining the degree of matching between object information.
  • shielding instructions select the object information whose matching degree with the target object information ranks before the preset value as the similar object information, and push or block the similar object information to the user.
  • the object information vectors in the object library in advance, pre-set multiple object classifications, each object classification has corresponding classification characteristics, and aggregate the object information vectors according to the classification characteristics Class, the object information vectors with the same classification characteristics are summarized into the same object classification, so as to obtain the object information vectors under multiple object classifications, and further for the selected object, it is only necessary to first determine the object classification, and then classify the object information for the object
  • the similarity between the embedded vectors is calculated to obtain object information similar to the target object information.
  • the matching process of the above-mentioned object information can be carried out through the network platform, and the object can be recommended or blocked to the user according to the matching result.
  • a similar search button or a similar blocking button can be set on the network platform, and the user can browse according to actual needs.
  • you can further set more filter dimensions for example, filter by price, filter by shipping location, filter by rating, etc.
  • an embodiment of the present application provides an information matching device. As shown in FIG. Unit 34.
  • the acquisition unit 31 can be used to acquire object information represented by different modalities
  • the calling unit 32 can be used to call the pre-trained feature extraction model under the corresponding modality representation for the object information represented by each modality to perform feature extraction to obtain embedded vectors with different modality attributes.
  • the feature extraction model uses The additive angular margin loss function is trained for extracting embedded vectors with modality attributes from the object information of the modality representation;
  • the update unit 33 can be used to update the embedded vectors with different modal attributes by using the adjacent vector mixing algorithm to obtain an object information vector fused with adjacent vector features;
  • the calculation unit 34 may be configured to calculate the similarity between the object information vectors fused with the adjacent vector features, and determine the matching degree between the object information according to the similarity calculation results.
  • An information matching device obtains object information represented by different modalities, and calls a pre-trained feature extraction model under the corresponding modal representation for feature extraction for the object information represented by each modality. Embedded vectors with different modality properties are obtained, and the feature extraction model is trained with an additive angle-margin loss function for extracting embedded vectors with modality properties from the object information represented by modality, further using the adjacent vector mixture.
  • the algorithm updates the embedded vectors with different modal attributes to obtain the object information vectors fused with adjacent vector features, and then calculates the similarity between the object information vectors fused with adjacent vector features, and calculates according to the similarity The results determine the degree of matching between object information.
  • this application can extract embedded vectors reflecting object feature information, and fuse embedded vectors with modality attributes, so that object information It can fuse information features between different modalities, and combine object information vectors under fused modal representations for object information matching, improving the accuracy of matching object information.
  • FIG. 5 is a schematic structural diagram of another information matching device according to an embodiment of the present application. As shown in FIG. 5, the device further includes:
  • the processing unit 35 can be used to call the pre-trained feature extraction model under the corresponding modality representation for the object information represented by each modality to perform feature extraction, and before obtaining embedded vectors with different modality attributes, use the network
  • the model processes object information sample sets represented by different modalities respectively to obtain embedded vectors of object information represented by different modalities, and the object information sample sets carry object category labels;
  • the perturbation unit 36 can be used to perturb the angle obtained by dot producting the embedded vector and the weight matrix by using an additive angle interval loss function for object information samples characterized by different modalities, and output the target according to the perturbed angle Feature vector;
  • the construction unit 37 may be configured to use a classification function to perform category label prediction of object information on the target feature vector, and construct a feature extraction model under each modality representation.
  • the processing unit 35 includes:
  • the vectorization module 351 can be used to vectorize the object information sample sets represented by different modalities to obtain object vectors represented by different modalities;
  • the aggregation module 352 can be used to perform feature aggregation on the object vectors represented by different modalities by using the pooling layer of the network model, to obtain object feature vectors represented by different modalities;
  • the standardization module 353 can be used to standardize the object feature vectors of feature clusters through batch normalization based on the sample dimension and regularization based on the feature dimension, so as to obtain embedded vectors of object information represented by different modalities.
  • the perturbation unit 36 includes:
  • the dot product module 361 can be used to perform dot multiplication between the embedded vector and the weight matrix normalized by the embedded vector using an additive angle interval loss function for the object sample information represented by different modalities to obtain a cosine value ;
  • the perturbation module 362 may be configured to perform perturbation by adding an angle interval to the angle obtained by performing an inverse operation on the cosine value, and calculate the cosine value of the perturbed angle as the target feature vector.
  • the device further includes:
  • the adjustment unit 38 can be used to use the classification function to predict the category label of the object information on the target feature vector, and construct the feature extraction model under each modality representation, and use the preset loss function to combine the object information
  • the predicted category label and the category label of the object information sample set adjust the parameters of the feature extraction model under each modality representation, and update the feature extraction model.
  • the update unit 33 includes:
  • the calculation module 331 can be used to separately calculate the distance value between the embedded vectors with different modal attributes, and if the distance value is greater than a preset threshold, it is determined that there is an adjacent relationship between the embedded vectors;
  • the update module 332 may be configured to update the embedded vector with the neighbor relationship at least once by using the update strength of the distance value mapping.
  • the device further includes:
  • the push unit 39 may be configured to respond to the target object after calculating the similarity between the object information vectors fused with adjacent vector features, and determining the matching degree between the object information according to the similarity calculation results. Instructions to push or block similar information, select the object information whose matching degree with the target object information ranks before the preset value as the similar object information, and push or block the similar object information to the user.
  • this embodiment also provides a readable storage medium, the readable storage medium may be non-volatile or volatile, and Computer-readable instructions are stored on it, and when the computer-readable instructions are executed by the processor, the above-mentioned information matching method as shown in FIG. 1 and FIG. 2 is realized.
  • the technical solution of the present application can be embodied in the form of software products, which can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various implementation scenarios of the present application.
  • a non-volatile storage medium which can be CD-ROM, U disk, mobile hard disk, etc.
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various implementation scenarios of the present application.
  • the embodiment of this application also provides a computer device, which can be a personal computer, Servers, network devices, etc.
  • the physical device includes a readable storage medium and a processor; the readable storage medium is used to store computer-readable instructions; the processor is used to execute computer-readable instructions to achieve the above as shown in Figure 1 and Figure 2
  • the computer device may also include a user interface, a network interface, a camera, a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, a WI-FI module, and the like.
  • the user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the like, and optional user interfaces may also include a USB interface, a card reader interface, and the like.
  • the network interface may include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface) and the like.
  • the physical device structure of the information matching device does not constitute a limitation to the physical device, and may include more or less components, or combine certain components, or different components layout.
  • the readable storage medium may also include an operating system and a network communication module.
  • the operating system is a program that manages the hardware and software resources of the above-mentioned computer equipment, and supports the operation of information processing programs and other software and/or programs.
  • the network communication module is used to realize the communication between the various components inside the readable storage medium, as well as the communication with other hardware and software in the entity device.
  • this application can be realized by means of software plus a necessary general-purpose hardware platform, or by hardware.
  • this application can extract embedded vectors reflecting object feature information, and perform fusion on embedded vectors with modality attributes, so that object information can be fused with different Information features between modalities, combined with object information vectors fused with modal representations for object information matching, to improve the accuracy of matching object information.
  • the accompanying drawing is only a schematic diagram of a preferred implementation scenario, and the modules or processes in the accompanying drawings are not necessarily necessary for implementing the present application.
  • the modules in the devices in the implementation scenario can be distributed among the devices in the implementation scenario according to the description of the implementation scenario, or can be located in one or more devices different from the implementation scenario according to corresponding changes.
  • the modules of the above implementation scenarios can be combined into one module, or can be further split into multiple sub-modules.

Abstract

一种信息的匹配方法、装置、计算机设备及可读存储介质,涉及人工智能技术领域,该方法包括:获取不同模态表征的对象信息;针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征提取模型进行特征提取,得到具有不同模态属性的嵌入式向量,特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;利用邻近向量混合算法,对具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;计算融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。该方法能够结合对象信息在不同模态表征下的嵌入式向量进行对象信息匹配,提高匹配到对象信息的准确率。

Description

信息的匹配方法、装置、计算机设备及可读存储介质
本申请要求与2021年8月25日提交中国专利局、申请号为202110980655.X、申请名称为“信息的匹配方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能技术领域,尤其是涉及到信息的匹配方法、装置、计算机设备及可读存储介质。
背景技术
随着互联网的不断发展,人们每天产生与接收的信息量成爆炸式增长,无形中造成了信息过载的问题。在大量数据集中寻找相似重复的数据是许多网络平台的重要业务,以网络平台中对象为例,商家可以在网络平台上传对象的图片以及简短的对象文字表述。但是不同商家对同一对象所上传的图片可能差异很大,文字描述也存在很大的区别,使得相似对象信息从图片和文字描述方面上很难被辨别,不利于对对象信息进行相似匹配。
目前,针对对象信息的匹配主要包括图片匹配和文字描述匹配两种,基于给定的目标对象信息,图片匹配通常使用局部敏感哈希算法对近似的图片进行检测,进而匹配出与目标对象相似的图片,然而,发明人意识到该方式仅从图片本身出发,并未考虑图片中对象的本质,使得匹配到的对象信息准确率较低;文字描述匹配通常使用短文本匹配算法,加入余弦相似度或文本编辑距离等对近似的文字描述进行检索,然而,该方式一般应用于用于信息检索或问答场景,针对标签短语拼凑的文字描述,使得匹配到的对象信息准确率较低。
发明内容
有鉴于此,本申请提供了一种信息的匹配方法、装置、计算机设备及可读存储介质,主要目的在于解决现有技术中基于图片和文字描述匹配得到的对象信息准确率较低的问题。
依据本申请一个方面,提供了一种信息的匹配方法,该方法包括:
获取不同模态表征的对象信息;
针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,所述特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;
利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;
计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
依据本申请另一个方面,提供了一种信息的匹配装置,所述装置包括:
获取单元,用于获取不同模态表征的对象信息;
调用单元,用于针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,所述特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;
更新单元,用于利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;
计算单元,用于计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
依据本申请又一个方面,提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现信息的匹配方法的步骤。
依据本申请再一个方面,提供了一种可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现信息的匹配方法的步骤。
本申请在进行信息的匹配时能够提取出反映对象特征信息的嵌入式向量,并针对具有模态属性的嵌入式向量进行融合,使得对象信息能够融合不同模态间的信息特征,并结合融合有模态表征下的对象信息向量进行对象信息匹配,提高匹配到对象信息的准确率。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种信息的匹配方法的流程示意图;
图2示出了本申请实施例提供的另一种信息的匹配方法的流程示意图;
图3示出了本申请实施例提供的对具有相邻关系的嵌入向量进行更新的流程示意图;
图4示出了本申请实施例提供的一种信息的匹配装置的结构示意图;
图5示出了本申请实施例提供的另一种信息的匹配装置的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示 例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
本申请实施例提供了一种信息的匹配方法,该特征提取模型能够提取出对象信息在不同模态表征下的嵌入式向量,提高匹配到对象信息的准确率,如图1所示,该方法包括:
101、获取不同模态表征的对象信息。
其中,对象可以为线上页面中抽象出来的目标资源,该目标实物可以为网络平台中售卖的商品,还可以为企业平台中展示的信息,还可以为新闻平台中发布的消息等,由于目标资源的多样性,不同模态表征的对象信息可以包括图片形式的对象信息、文本形式的对象信息、视频形式的对象信息、链接形式的对象信息等,图片形式的对象信息可以表现为对象的整体图、细节图以及材质图等,文本形式的对象信息可以表现为对象名称、对象描述、对象功效等,视频形式的对象信息可以表现为对象介绍视频、对象实物展示视频以及对象使用视频等。
可以理解的是,针对每一种对象都能够获取其在不同模态表征的对象信息,由于每种模态表征下的对象信息可能具有多个表现形式,这里可以通过将属于同一模态表征的对象信息多个表现形式进行汇总,作为该模态表征下的对象信息,例如,图片形式的对象可以将对象的整体图、细节图以及材质图汇总后作为图片表征下的对象信息,还可以选取属于同一模态表征的对象信息中具有特点的表现形式,作为该模态表征下的对象信息,例如,文字形式的对象可以选取对象名称和对象描述汇总后作为文字表征下的对象信息。
在本申请实施例中,执行主体可以为信息的匹配装置,具体应用在服务器端,现有技术中通过单一模态表征的对象信息来实现对象信息的匹配过程比较片面,很难准确匹配到相似的对象信息。本申请通过将不同模态表征的对象信息进行融合,使得对象信息的匹配过程考虑到不同信息内容下的差异性,能够达到更好的匹配效果,提高匹配到对象信息的准确率。
上述服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。
102、针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量。
其中,由于每种模态表征的对象信息可以预先进行离线的模型训练,从而得到具有特征提取功能的模型,因此,在获取每种模态表征的对象信息后,调取相应模态表征下预先训练的特征提取模型,这里特征提取模型可使用人工智能的机器算法对网络模型进行训练,通过对每种模态表征的对象信息进行相应模态属性的特征提取,得到具有模态属性的嵌入式向量,例如,图片表征下的对象信息经过针对图片模态训练的特征提取模型可以输出具有图片属性的嵌入式向量,文本表征下的对象信息经过针对文本模态训练的特征提取模型可以输出具有文本属性的嵌入式向量。
为了进一步获取更准确的特征提取效果,用于训练特征提取模型的网络模型可以根据对象信息的模态表征的进行选取,例如,针对图像模态的特征提取模型可以使用图像编码器,可以使用timm算法库下的eca_nfne_11,针对文本模态的特征提取模型可以使用文本编码器,可以使用huggingface算法库下的xlm-roberta-large等算法,并在模型参数调整过程中使用ArcFace损失函数来训练模型。
可以理解的是,损失函数会对整个网络模型的优化有着导向性作用,本申请中特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量,使得提取到嵌入式向量能更准确表征相应模态表征下的对象特征。
103、利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量。
本申请中邻近向量混合算法需要利用不同模态属性的嵌入式向量进行匹配,在阈值处理的KNN分类算法中,需要保证每个查询至少有两个匹配项,与常规KNN分类算法相比,阈值的设置会比较高,相比于直接将学习到的嵌入式向量输入至KNN分类算法相比,这里利用每个嵌入式向量的相邻向量对其自身进行更新,以实现更好的信息融合。
具体利用邻近向量混合算法,对具有不同模态属性的嵌入式向量进行更新的过程中,可以分别计算具有不同模态属性的嵌入式向量之间的余弦距离,若余弦距离大于预设阈值,则确定嵌入式向量之间具有相邻关系,进一步利用余弦距离所映射的更新力度,对具有相邻关系的嵌入式向量进行更新。
可以理解的是,在更新嵌入式向量的过程中,可以仅针对单一模态属性的嵌入式向量进行更新,如对具有图片模态属性的嵌入式向量进行更新,或者对具有文本模态属性的嵌入式向量进行更新,还可以针对不同模态属性的嵌入式向量混合后形成的嵌入式向量。
作为一种实施场景,由于每个嵌入式向量的相邻向量可能并非是具有相同模态属性,考虑到不同模态属性之间的相互融合,这里可以使用不同模态属性的向量对嵌入式向量进行更新,具体针对当前嵌入式向量对应的模态属性,通过查询与其相邻且属不同模态属性的嵌入式向量作为相邻嵌入式向量,这里可以使用向量之间距离值是否达到阈值来判断两个嵌入式向量是否相邻,进一步利用该相邻嵌入式向量对当前嵌入式向量进行更新,例如, 对于具有图片模态属性的嵌入式向量进行更新过程中,可以使用相邻的文本模态属性的嵌入式向量和/或视频属性的嵌入式向量进行更新。
具体在更新嵌入式向量过程中,可以使用嵌入式向量之间的距离值作为更新力度的确定方式,对于距离越近的具有不同模态属性的两个嵌入式向量,说明两个嵌入式向量之间具有更高的相似性,可在更新时针对该相邻嵌入式向量使用较高的更新力度,而对于距离较远的相邻嵌入式向量可使用较低的更新力度。
104、计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
本申请中,对于融合有相邻向量特征的对象信息向量,该对象信息向量具有多模态融合后的特征,考虑到不同模态表征的特征,能够减小不同信息之间的差异性,使得对象向量信息的表征更准确,提高后续对象信息的匹配精度。具体计算融合有相邻向量特征的对象信息向量之间的相似度的过程相当于计算向量之间的距离,距离计算可以有多种方式,例如,余弦相似度、欧氏距离、曼哈顿距离、皮尔逊相关系数等。
需要说明的是,对象信息的匹配程度能够从一定程度上反映多个对象信息之间相似情况,相似度数值越高,说明对象信息越相近,进一步根据对象信息的匹配程度可以向用户推送相似对象,还可以屏蔽相似对象的展示。
本申请实施例提供的一种信息的匹配方法,通过获取不同模态表征的对象信息,并针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,该特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量,进一步利用邻近向量混合算法,对具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量,进而计算融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。与现有技术中基于图片和文字描述进行的对象信息匹配的方式相比,本申请能够提取出反映对象特征信息的嵌入式向量,并针对具有模态属性的嵌入式向量进行融合,使得对象信息能够融合不同模态间的信息特征,并结合融合有模态表征下的对象信息向量进行对象信息匹配,提高匹配到对象信息的准确率。
本申请实施例提供了另一种信息的匹配方法,该特征提取模型能够提取出对象信息在不同模态表征下的嵌入式向量,提高匹配到对象信息的准确率,如图2所示,所述方法包括:
201、获取不同模态表征的对象信息。
考虑到不同模态表征的对象信息在相同属性维度上具有不同属性表征,例如,颜色维度上可以表现为不同颜色,尺码维度上可以表现为不同尺码,为了避免不同模态表征对对象信息受到不同属性的表征,还可以基于对象信息在相同属性维度上的属性特征,对对象信息进行预处理,以使得不同模态表征的对象信息具有相同的属性表征,这里可以选取任选属性表征,还可以选取具有代表性的属性特征,还可以选取对象销量最高的属性特征。
202、针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量。
在本申请中,每种模态表征下的特征提取模型的构建可以利用预先收集不同模态表征的对象信息样本集对网络模型进行训练,该网络模型可以使用机器学习或者深度学习的方式进行训练,具体构建每种模态表征下的特征提取模型的过程中,可以利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,这里对象信息样本集中携带有对象类别标签,然后针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出的目标特征向量,进一步使用分类函数对目标特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型。
具体利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量过程中,可以将不同模态表征的对象信息样本集进行向量化,得到不同模态表征的对象向量,然后利用网络模型的池化层分别对不同模态表征的对象向量进行特征聚合,得到不同模态表征的对象特征向量,进一步基于样本维度的批标准化和基于特征维度的正则化对特征聚类的对象特征向量进行标准化处理,得到不同模态表征下对象信息的嵌入式向量。
示例性的,针对图片模态表征的对象信息样本集进行处理,得到嵌入式向量的应用场景,首先将一张图片转换成[256,256,3]的数组,3代表RGB三色值,每个元素值为[0,255]之间的某个数值,实现了图片数字化或者向量化的功能;然后将[256,256,3]代表的图片输入到eca_nfnet_l1模型中,输出代表图片的特征,大小[8,8,1792],实现了特征提取的功能,通过GAP(全局池化层,global average pooling)对1792个[8,8]的特征层求平均,得到1792维的向量,实现了特征聚合的功能,最后应用基于样本维度的批标准化和基于特征维度的正则化作用于1792维的向量上,得到标准化后的向量表示,即图片模态表征下对象信息的嵌入式向量。
示例性的,针对文本模态表征的对象信息样本集进行处理,得到嵌入式向量的应用场景,首先将对象文本按照空格进行分词,简记为[t1,t2,t3…,tn],将分词后的序列输入到xlm-roberta-large模型,得到每个词更新后的向量表示序列[h1,h2…hn],每个向量1024维,实现了文本从单词到向量的转化,向量富含了文本中更多的语义,池化操作对上述向量序列取平均,得到1024维向量,这一步实现了特征聚合,最后应用基于样本维度的批标准化和基于特征维度的正则化作用于1024维的向量上,得到标准化后的向量表示,即文本模态表征下对象信息的嵌入式向量。
这里对象信息样本集携带有对象类别标签,具体针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出的目标特征向量过程中,可以针对不同模态表征的对象样本信息,使用加性角度间隔损失函数将嵌入式向量与嵌入式向量正则化后的权重矩阵进行点乘,得到 余弦值,进一步通过对余弦值进行反操作得到的角度加上角度间隔进行扰动,并计算扰动后角度的余弦值作为目标特征向量。应说明的是,这里针对不同模态表征的网络模型可以使用不同的角度间隔,例如,图像模型的角度间隔以0.8~1.0为宜,文本模型的角度间隔以0.6~0.8为宜。在使用增加角度间隔时,可以从0.2开始,将图像模型的角度间隔增加到1.0,文本模型的角度间隔增加到0.8。
进一步地,为了保证特征提取模型的训练精度,还可以在构建每种模态表征下的特征提取模型之后,利用预先设置的损失函数,结合对象信息预测的类别标签与对象信息样本集的类别标签对每种模态表征下的特征提取模型进行参数调整,更新所述特征提取模型。
进一步地,利用上述具有不同模态属性的嵌入式向量可以进行对象信息之间的匹配,具体可以使用图像模态的嵌入式向量进行对象图像之间的匹配,可以使用文本模态的嵌入式向量进行对象文本之间的匹配。为了能够更好展示对象信息之间的匹配结果,还可以将具有不同模态属性的嵌入式向量所形成模态表征下的匹配结果进行合并,具体可将具有图像模态属性和文本模态属性的嵌入式向量合并后在执行匹配过程,以得到合并后对象信息之间的匹配结果。此时,模型最终的倾向可以分以下几种情况,针对待匹配的对象信息,一种是文本模态下输出与目标对象很相似的对象A,一种是图像模态下输出与目标对象很相似的对象G,还有一种是文本模态和图像模态融合后输出与目标对象相似的对象D、E、F,所以最终与目标对象很相似的对象落在A、D、E、F、G之中。
203、分别计算所述具有不同模态属性的嵌入式向量之间的距离值,若所述距离值大于预设阈值,则确定所述嵌入式向量之间具有相邻关系。
这里距离值为能够表征向量之间的度量值,可以为余弦值距离,还可以为买哈顿距离,在此不进行限定。
204、利用所述距离值映射的更新力度,对所述具有相邻关系的嵌入式向量进行至少一次更新。
这里距离值映射的更新力度可作为对具有相邻关系的嵌入式向量进行更新的权重值,具体每次更新嵌入式向量时可以在原有嵌入式向量的基础上,加上在相应更新力度上具有相邻关系的嵌入式向量,以使得更新后嵌入式向量具有更丰富的对象信息内容。
具体在实际应用场景中,对具有相邻关系的嵌入向量进行更新的过程如图3所示:以对象A为例,对象A的嵌入式向量E A为[-0.588,0.784,0.196],同理还有对象B、C、D的嵌入式向量E B、E C、E D,计算对象A的嵌入式向量E A与对象B、C、D节点之间的余弦距离分别是0.53、0.93、0.94,实线表示两者距离的预设阈值(预设阈值可以设置为0.5)内表征嵌入式向量之间属于具有相邻关系,虚线表示在阈值外,不属于相邻关系。对每个嵌入式向量,利用其相邻关系的嵌入式向量对其自身进行更新,更新力度由余弦距离值给定。具体的更新嵌入式向量的过程可以如下:
E A=normalize(E A×1+E D×0.94+E B×0.93+E C×0.53)
E B=normalize(E B×1+E A×0.93)
E C=normalize(E C×1+E A×0.53)
E D=normalize(E D×1+E A×0.94)
其中,normalize为对嵌入向量标准化的过程。上述更新后的嵌入式向量以及各节点之间的关系变化具体如图3中右侧图,具体更新过程如上文所示的公式,每个嵌入式向量根据具有相邻关系的嵌入式向量及余弦值对自身进行更新。这个过程可以重复迭代下去,直到在网络模型的评估指标上不再改善。
205、计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
考虑到对象信息之间的匹配程度能够反映对象相似度,这里对于目标对象信息的相似推送需求或者屏蔽需求,还可以在确定对象信息之间的匹配程度之后,响应于对目标对象信息进行相似推送或屏蔽的指令,选取与目标对象信息之间的匹配程度排名在预设数值之前的对象信息作为相似对象信息,向用户推送或屏蔽相似对象信息。
进一步地,为了节省相似度计算量,还可以预先针对对象库中对象信息向量进行分类,预先设置多个对象分类,每个对象分类具有相应的分类特征,并根据分类特征对对象信息向量进行聚类,将具有相同分类特征的对象信息向量汇总到相同对象分类中,从而得到多个对象分类下的对象信息向量,进一步针对选定对象只需要先确定对象分类后,再针对对象分类下对象信息的嵌入式向量之间的相似度进行计算,以获取与目标对象信息相似的对象信息。
本申请中,可通过网络平台来执行上述对象信息的匹配过程,并根据匹配结果向用户推荐对象或者屏蔽对象,具体可以在网络平台中设置相似查找按钮或者相似屏蔽按钮,用户可根据实际浏览需求来选取,当然还可以在查找相似对象后,进一步设置更多的筛选维度,例如,按照价格筛选,按照发货地点筛选,按照评分筛选等。
进一步地,作为图1所述方法的具体实现,本申请实施例提供了一种信息的匹配装置,如图4所示,所述装置包括:获取单元31、调用单元32、更新单元33、计算单元34。
获取单元31,可以用于获取不同模态表征的对象信息;
调用单元32,可以用于针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,所述特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;
更新单元33,可以用于利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;
计算单元34,可以用于计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
本申请实施例提供的一种信息的匹配装置,通过获取不同模态表征的对象信息,并针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取, 得到具有不同模态属性的嵌入式向量,该特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量,进一步利用邻近向量混合算法,对具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量,进而计算融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。与现有技术中基于图片和文字描述进行的对象信息匹配的方式相比,本申请能够提取出反映对象特征信息的嵌入式向量,并针对具有模态属性的嵌入式向量进行融合,使得对象信息能够融合不同模态间的信息特征,并结合融合有模态表征下的对象信息向量进行对象信息匹配,提高匹配到对象信息的准确率。
作为图4中所示信息的匹配装置的进一步说明,图5是根据本申请实施例另一种信息的匹配装置的结构示意图,如图5所示,所述装置还包括:
处理单元35,可以用于在所述针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量之前,利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,所述对象信息样本集中携带有对象类别标签;
扰动单元36,可以用于针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出的目标特征向量;
构建单元37,可以用于使用分类函数对所述目标特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型。
在具体应用场景中,如图5所示,所述处理单元35包括:
向量化模块351,可以用于将所述不同模态表征的对象信息样本集进行向量化,得到不同模态表征的对象向量;
聚合模块352,可以用于利用网络模型的池化层分别对所述不同模态表征的对象向量进行特征聚合,得到不同模态表征的对象特征向量;
标准化模块353,可以用于基于样本维度的批标准化和基于特征维度的正则化对特征聚类的对象特征向量进行标准化处理,得到不同模态表征下对象信息的嵌入式向量。
在具体应用场景中,如图5所示,所述扰动单元36包括:
点乘模块361,可以用于针对不同模态表征的对象样本信息,使用加性角度间隔损失函数将所述嵌入式向量与所述嵌入式向量正则化后的权重矩阵进行点乘,得到余弦值;
扰动模块362,可以用于通过对所述余弦值进行反操作得到的角度加上角度间隔进行扰动,并计算扰动后角度的余弦值作为目标特征向量。
在具体应用场景中,如图5所示,所述装置还包括:
调整单元38,可以用于在所述使用分类函数对所述目标特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型之后,利用预先设置的损失函数,结合对象信息预测的类别标签与对象信息样本集的类别标签对每种模态表征下的特征提取模 型进行参数调整,更新所述特征提取模型。
在具体应用场景中,如图5所示,所述更新单元33包括:
计算模块331,可以用于分别计算所述具有不同模态属性的嵌入式向量之间的距离值,若所述距离值大于预设阈值,则确定所述嵌入式向量之间具有相邻关系;
更新模块332,可以用于利用所述距离值映射的更新力度,对所述具有相邻关系的嵌入式向量进行至少一次更新。
在具体应用场景中,如图5所示,所述装置还包括:
推送单元39,可以用于在所述计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度之后,响应于对目标对象信息进行相似推送或屏蔽的指令,选取与所述目标对象信息之间的匹配程度排名在预设数值之前的对象信息作为相似对象信息,向用户推送或屏蔽所述相似对象信息。
需要说明的是,本实施例提供的一种信息的匹配装置所涉及各功能单元的其他相应描述,可以参考图1、图2中的对应描述,在此不再赘述。
基于上述如图1、图2所示方法,相应的,本实施例还提供了一种可读存储介质,所述可读存储介质可以是非易失性的,也可以是易失性的,其上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述如图1、图2所示的信息的匹配方法。
基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。
基于上述如图1、图2所示的方法,以及图4、图5所示的虚拟装置实施例,为了实现上述目的,本申请实施例还提供了一种计算机设备,具体可以为个人计算机、服务器、网络设备等,该实体设备包括可读存储介质和处理器;可读存储介质,用于存储计算机可读指令;处理器,用于执行计算机可读指令以实现上述如图1、图2所示的信息的匹配方法
可选地,该计算机设备还可以包括用户接口、网络接口、摄像头、射频(Radio Frequency,RF)电路,传感器、音频电路、WI-FI模块等等。用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard)等,可选用户接口还可以包括USB接口、读卡器接口等。网络接口可选的可以包括标准的有线接口、无线接口(如蓝牙接口、WI-FI接口)等。
本领域技术人员可以理解,本实施例提供的信息的匹配装置的实体设备结构并不构成对该实体设备的限定,可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。
可读存储介质中还可以包括操作系统、网络通信模块。操作系统是管理上述计算机设备硬件和软件资源的程序,支持信息处理程序以及其它软件和/或程序的运行。网络通信模 块用于实现可读存储介质内部各组件之间的通信,以及与该实体设备中其它硬件和软件之间通信。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以借助软件加必要的通用硬件平台的方式来实现,也可以通过硬件实现。通过应用本申请的技术方案,与目前现有技术相比,本申请中能够提取出反映对象特征信息的嵌入式向量,并针对具有模态属性的嵌入式向量进行融合,使得对象信息能够融合不同模态间的信息特征,并结合融合有模态表征下的对象信息向量进行对象信息匹配,提高匹配到对象信息的准确率。
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
上述本申请序号仅仅为了描述,不代表实施场景的优劣。以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。

Claims (20)

  1. 一种信息的匹配方法,其中,所述方法包括:
    获取不同模态表征的对象信息;
    针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,所述特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;
    利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;
    计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
  2. 根据权利要求1所述的方法,其中,在所述针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量之前,所述方法还包括:
    利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,所述对象信息样本集中携带有对象类别标签;
    针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出的目标特征向量;
    使用分类函数对所述目标特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型。
  3. 根据权利要求2所述的方法,其中,所述利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,具体包括:
    将所述不同模态表征的对象信息样本集进行向量化,得到不同模态表征的对象向量;
    利用网络模型的池化层分别对所述不同模态表征的对象向量进行特征聚合,得到不同模态表征的对象特征向量;
    基于样本维度的批标准化和基于特征维度的正则化对特征聚类的对象特征向量进行标准化处理,得到不同模态表征下对象信息的嵌入式向量。
  4. 根据权利要求2所述的方法,其中,所述针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出目标特征向量,具体包括:
    针对不同模态表征的对象样本信息,使用加性角度间隔损失函数将所述嵌入式向量与所述嵌入式向量正则化后的权重矩阵进行点乘,得到余弦值;
    通过对所述余弦值进行反操作得到的角度加上角度间隔进行扰动,并计算扰动后角度的余弦值作为目标特征向量。
  5. 根据权利要求2所述的方法,其中,在所述使用分类函数对所述目标特征向量进 行对象信息的类别标签预测,构建每种模态表征下的特征提取模型之后,所述方法还包括:
    利用预先设置的损失函数,结合对象信息预测的类别标签与对象信息样本集的类别标签对每种模态表征下的特征提取模型进行参数调整,更新所述特征提取模型。
  6. 根据权利要求1所述的方法,其中,所述利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量,具体包括:
    分别计算所述具有不同模态属性的嵌入式向量之间的距离值,若所述距离值大于预设阈值,则确定所述嵌入式向量之间具有相邻关系;
    利用所述距离值映射的更新力度,对所述具有相邻关系的嵌入式向量进行至少一次更新。
  7. 根据权利要求1-6中任一项所述的方法,其中,在所述计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度之后,所述方法还包括:
    响应于对目标对象信息进行相似推送或屏蔽的指令,选取与所述目标对象信息之间的匹配程度排名在预设数值之前的对象信息作为相似对象信息,向用户推送或屏蔽所述相似对象信息。
  8. 一种信息的匹配装置,其中,所述装置包括:
    获取单元,用于获取不同模态表征的对象信息;
    调用单元,用于针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,所述特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;
    更新单元,用于利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;
    计算单元,用于计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现信息的匹配方法的步骤,包括:
    获取不同模态表征的对象信息;
    针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,所述特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;
    利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;
    计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
  10. 根据权利要求9所述的计算机设备,其中,在所述针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量之前,所述方法还包括:
    利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,所述对象信息样本集中携带有对象类别标签;
    针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出的目标特征向量;
    使用分类函数对所述目标特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型。
  11. 根据权利要求10所述的计算机设备,其中,所述利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,具体包括:
    将所述不同模态表征的对象信息样本集进行向量化,得到不同模态表征的对象向量;
    利用网络模型的池化层分别对所述不同模态表征的对象向量进行特征聚合,得到不同模态表征的对象特征向量;
    基于样本维度的批标准化和基于特征维度的正则化对特征聚类的对象特征向量进行标准化处理,得到不同模态表征下对象信息的嵌入式向量。
  12. 根据权利要求10所述的计算机设备,其中,所述针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出目标特征向量,具体包括:
    针对不同模态表征的对象样本信息,使用加性角度间隔损失函数将所述嵌入式向量与所述嵌入式向量正则化后的权重矩阵进行点乘,得到余弦值;
    通过对所述余弦值进行反操作得到的角度加上角度间隔进行扰动,并计算扰动后角度的余弦值作为目标特征向量。
  13. 根据权利要求10所述的计算机设备,其中,在所述使用分类函数对所述目标特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型之后,所述方法还包括:
    利用预先设置的损失函数,结合对象信息预测的类别标签与对象信息样本集的类别标签对每种模态表征下的特征提取模型进行参数调整,更新所述特征提取模型。
  14. 根据权利要求9所述的计算机设备,其中,所述利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量,具体包括:
    分别计算所述具有不同模态属性的嵌入式向量之间的距离值,若所述距离值大于预设阈值,则确定所述嵌入式向量之间具有相邻关系;
    利用所述距离值映射的更新力度,对所述具有相邻关系的嵌入式向量进行至少一次更新。
  15. 一种可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现信息的匹配方法的步骤,包括:
    获取不同模态表征的对象信息;
    针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量,所述特征提取模型使用加性角度间隔损失函数进行训练,用于从模态表征的对象信息中提取具有模态属性的嵌入式向量;
    利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量;
    计算所述融合有相邻向量特征的对象信息向量之间的相似度,并根据相似度计算结果确定对象信息之间的匹配程度。
  16. 根据权利要求15所述的可读存储介质,其中,在所述针对每种模态表征的对象信息,调用相应模态表征下预先训练的特征取模型进行特征提取,得到具有不同模态属性的嵌入式向量之前,所述方法还包括:
    利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,所述对象信息样本集中携带有对象类别标签;
    针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出的目标特征向量;
    使用分类函数对所述目标特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型。
  17. 根据权利要求16所述的可读存储介质,其中,所述利用网络模型分别对不同模态表征的对象信息样本集进行处理,得到不同模态表征下对象信息的嵌入式向量,具体包括:
    将所述不同模态表征的对象信息样本集进行向量化,得到不同模态表征的对象向量;
    利用网络模型的池化层分别对所述不同模态表征的对象向量进行特征聚合,得到不同模态表征的对象特征向量;
    基于样本维度的批标准化和基于特征维度的正则化对特征聚类的对象特征向量进行标准化处理,得到不同模态表征下对象信息的嵌入式向量。
  18. 根据权利要求16所述的可读存储介质,其中,所述针对不同模态表征的对象信息样本,使用加性角度间隔损失函数对所述嵌入式向量与权重矩阵点乘得到的角度进行扰动,并根据扰动后的角度输出目标特征向量,具体包括:
    针对不同模态表征的对象样本信息,使用加性角度间隔损失函数将所述嵌入式向量与所述嵌入式向量正则化后的权重矩阵进行点乘,得到余弦值;
    通过对所述余弦值进行反操作得到的角度加上角度间隔进行扰动,并计算扰动后角度的余弦值作为目标特征向量。
  19. 根据权利要求16所述的可读存储介质,其中,在所述使用分类函数对所述目标 特征向量进行对象信息的类别标签预测,构建每种模态表征下的特征提取模型之后,所述方法还包括:
    利用预先设置的损失函数,结合对象信息预测的类别标签与对象信息样本集的类别标签对每种模态表征下的特征提取模型进行参数调整,更新所述特征提取模型。
  20. 根据权利要求15所述的可读存储介质,其中,所述利用邻近向量混合算法,对所述具有不同模态属性的嵌入式向量进行更新,得到融合有相邻向量特征的对象信息向量,具体包括:
    分别计算所述具有不同模态属性的嵌入式向量之间的距离值,若所述距离值大于预设阈值,则确定所述嵌入式向量之间具有相邻关系;
    利用所述距离值映射的更新力度,对所述具有相邻关系的嵌入式向量进行至少一次更新。
PCT/CN2022/071445 2021-08-25 2022-01-11 信息的匹配方法、装置、计算机设备及可读存储介质 WO2023024413A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110980655.XA CN113657087B (zh) 2021-08-25 2021-08-25 信息的匹配方法及装置
CN202110980655.X 2021-08-25

Publications (1)

Publication Number Publication Date
WO2023024413A1 true WO2023024413A1 (zh) 2023-03-02

Family

ID=78492816

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071445 WO2023024413A1 (zh) 2021-08-25 2022-01-11 信息的匹配方法、装置、计算机设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN113657087B (zh)
WO (1) WO2023024413A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862626A (zh) * 2023-09-05 2023-10-10 广州数说故事信息科技有限公司 一种多模态商品对齐方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657087B (zh) * 2021-08-25 2023-12-15 平安科技(深圳)有限公司 信息的匹配方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763325A (zh) * 2018-05-04 2018-11-06 北京达佳互联信息技术有限公司 一种网络对象处理方法及装置
CN111368870A (zh) * 2019-10-31 2020-07-03 杭州电子科技大学 一种基于模态内间协同多线性池化的视频时序定位方法
CN111563551A (zh) * 2020-04-30 2020-08-21 支付宝(杭州)信息技术有限公司 一种多模态信息融合方法、装置及电子设备
US20200279156A1 (en) * 2017-10-09 2020-09-03 Intel Corporation Feature fusion for multi-modal machine learning analysis
CN112148916A (zh) * 2020-09-28 2020-12-29 华中科技大学 一种基于监督的跨模态检索方法、装置、设备及介质
CN113657087A (zh) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 信息的匹配方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840530A (zh) * 2017-11-24 2019-06-04 华为技术有限公司 训练多标签分类模型的方法和装置
CN112487822A (zh) * 2020-11-04 2021-03-12 杭州电子科技大学 一种基于深度学习的跨模态检索方法
CN112784092B (zh) * 2021-01-28 2022-03-25 电子科技大学 一种混合融合模型的跨模态图像文本检索方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200279156A1 (en) * 2017-10-09 2020-09-03 Intel Corporation Feature fusion for multi-modal machine learning analysis
CN108763325A (zh) * 2018-05-04 2018-11-06 北京达佳互联信息技术有限公司 一种网络对象处理方法及装置
CN111368870A (zh) * 2019-10-31 2020-07-03 杭州电子科技大学 一种基于模态内间协同多线性池化的视频时序定位方法
CN111563551A (zh) * 2020-04-30 2020-08-21 支付宝(杭州)信息技术有限公司 一种多模态信息融合方法、装置及电子设备
CN112148916A (zh) * 2020-09-28 2020-12-29 华中科技大学 一种基于监督的跨模态检索方法、装置、设备及介质
CN113657087A (zh) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 信息的匹配方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIAO YIXUAN, CHEN HAO, WANG JUN, CHEN YIHAO, YE XIANBIN, LI ZILIANG, QI XIANBIAO, GAO PENG, XIE GUOTONG: "Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model", ARXIV:2106.15332V1, 24 June 2021 (2021-06-24), XP093038494, Retrieved from the Internet <URL:https://arxiv.org/pdf/2106.15332.pdf> [retrieved on 20230412], DOI: 10.48550/arxiv.2106.15332 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862626A (zh) * 2023-09-05 2023-10-10 广州数说故事信息科技有限公司 一种多模态商品对齐方法
CN116862626B (zh) * 2023-09-05 2023-12-05 广州数说故事信息科技有限公司 一种多模态商品对齐方法

Also Published As

Publication number Publication date
CN113657087A (zh) 2021-11-16
CN113657087B (zh) 2023-12-15

Similar Documents

Publication Publication Date Title
EP3267362B1 (en) Machine learning image processing
CN112364204B (zh) 视频搜索方法、装置、计算机设备及存储介质
WO2023024413A1 (zh) 信息的匹配方法、装置、计算机设备及可读存储介质
US20230017667A1 (en) Data recommendation method and apparatus, computer device, and storage medium
TW201504829A (zh) 圖像搜尋、獲取圖像文字資訊的方法及裝置
WO2021155691A1 (zh) 用户画像生成方法、装置、存储介质及设备
WO2020233432A1 (zh) 一种信息推荐方法及装置
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
WO2022028147A1 (zh) 图像分类模型训练方法、装置、计算机设备及存储介质
CN113761359B (zh) 数据包推荐方法、装置、电子设备和存储介质
CN111831826A (zh) 跨领域的文本分类模型的训练方法、分类方法以及装置
CN113641797A (zh) 数据处理方法、装置、设备、存储介质及计算机程序产品
CN113128526B (zh) 图像识别方法、装置、电子设备和计算机可读存储介质
CN117271818B (zh) 视觉问答方法、系统、电子设备及存储介质
CN114329004A (zh) 数字指纹生成、数据推送方法、装置和存储介质
CN116630630B (zh) 语义分割方法、装置、计算机设备及计算机可读存储介质
WO2023213157A1 (zh) 数据处理方法、装置、程序产品、计算机设备和介质
CN116955707A (zh) 内容标签的确定方法、装置、设备、介质及程序产品
CN112650869B (zh) 图像检索重排序方法、装置、电子设备及存储介质
CN115620019A (zh) 商品侵权检测方法及其装置、设备、介质、产品
CN114282622A (zh) 训练样本排查方法及其装置、设备、介质、产品
CN116645700B (zh) 特征提取模型处理方法、装置和特征提取方法、装置
WO2023168997A1 (zh) 一种跨模态搜索方法及相关设备
KR20190027560A (ko) 컨텐츠에 포함되는 객체를 분류하는 방법, 장치 및 컴퓨터 프로그램
WO2022262603A1 (zh) 多媒体资源的推荐方法、装置、设备、存储介质及计算机程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22859774

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE