CN117011865A

CN117011865A - Model training method, object matching device, electronic equipment and storage medium

Info

Publication number: CN117011865A
Application number: CN202310842046.7A
Authority: CN
Inventors: 张立平; 单新媛; 王秋霖; 曹俊豪
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-11-07

Abstract

The embodiment of the invention discloses a model training and article matching method, a device, electronic equipment and a storage medium. The model training method comprises the following steps: training images and training texts of training objects are used as a group of training samples, and an original feature extraction model is trained based on the obtained groups of training samples, so that an intermediate feature extraction model is obtained; taking a test image and a test text of the test object as a group of test samples, and inputting the obtained groups of test samples into an intermediate feature extraction model to obtain a first matching degree set; inputting a plurality of groups of training samples into an intermediate feature extraction model to obtain a second matching degree set; and screening a plurality of groups of training samples based on the first matching degree set and the second matching degree set, and training an original feature extraction model based on the screened training samples to obtain a target feature extraction model. According to the technical scheme provided by the embodiment of the invention, the accuracy of matching the articles can be improved.

Description

Model training method, object matching device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computer application, in particular to a method and a device for model training and article matching, electronic equipment and a storage medium.

Background

The article matching technology has wide application scenes in an e-commerce platform, such as article price comparison, same money identification, photographing purchase and the like. It follows that the accuracy of item matching is critical to the e-commerce platform.

In the process of realizing the invention, the inventor finds that the following technical problems exist in the prior art: the existing article matching technology cannot achieve good matching precision and needs to be improved.

Disclosure of Invention

The embodiment of the invention provides a model training method, an article matching method, an apparatus, electronic equipment and a storage medium, so as to improve the accuracy of article matching.

According to an aspect of the present invention, there is provided a model training method, which may include:

training images and training texts of training objects are used as a group of training samples, and an original feature extraction model is trained based on the obtained groups of training samples, so that an intermediate feature extraction model is obtained;

taking a test image and a test text of the test object as a group of test samples, and inputting the obtained groups of test samples into an intermediate feature extraction model to obtain a first matching degree set;

inputting a plurality of groups of training samples into an intermediate feature extraction model to obtain a second matching degree set;

Screening a plurality of groups of training samples based on the first matching degree set and the second matching degree set, and training an original feature extraction model based on the screened training samples to obtain a target feature extraction model;

the first matching degree set is used for representing the matching degree between each test image and each test text in the plurality of groups of test samples, and the second matching degree set is used for representing the matching degree between the training images and the training texts in each group of training samples in the plurality of groups of training samples.

According to another aspect of the present invention, there is provided an item matching method, which may include:

obtaining information to be matched of an object to be matched, and training a target feature extraction model according to the model training method provided by any embodiment of the invention;

inputting information to be matched into a target feature extraction model to obtain features to be matched, wherein the features to be matched comprise image features to be matched and/or text features to be matched;

and matching the features to be matched with the candidate features of at least one candidate item respectively to determine a target item matched with the items to be matched from the at least one candidate item, wherein the candidate features comprise candidate image features and/or candidate text features.

According to another aspect of the present invention, there is provided a model training apparatus, which may include:

the coarse training module is used for taking training images and training texts of training objects as a group of training samples, and training the original feature extraction model based on the obtained multiple groups of training samples to obtain an intermediate feature extraction model;

the first input module is used for taking a test image and a test text of a test object as a group of test samples, and inputting the obtained groups of test samples into the intermediate feature extraction model to obtain a first matching degree set;

the second input module is used for inputting a plurality of groups of training samples into the intermediate feature extraction model so as to obtain a second matching degree set;

the fine training module is used for screening a plurality of groups of training samples based on the first matching degree set and the second matching degree set, training an original feature extraction model based on the screened training samples, and obtaining a target feature extraction model;

According to another aspect of the present invention, there is provided an article matching device, which may include:

the model acquisition module is used for acquiring information to be matched of the object to be matched and a target feature extraction model obtained by training according to the model training method provided by any embodiment of the invention;

the model application module is used for inputting the information to be matched into the target feature extraction model to obtain the feature to be matched, wherein the feature to be matched comprises the image feature to be matched and/or the text feature to be matched;

and the article matching module is used for matching the to-be-matched features with the candidate features of at least one candidate article respectively so as to determine a target article matched with the to-be-matched article from the at least one candidate article, wherein the candidate features comprise candidate image features and/or candidate text features.

According to another aspect of the present invention, there is provided an electronic device, which may include:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to implement the model training method or the item matching method provided by any embodiment of the present invention when executed.

According to another aspect of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions for causing a processor to implement the model training method or the item matching method provided by any embodiment of the present invention when executed.

According to the technical scheme, training images and training texts of training objects are used as a group of training samples, and an original feature extraction model is trained based on the obtained multiple groups of training samples, so that an intermediate feature extraction model is obtained; then, taking a test image and a test text of the test object as a group of test samples, and inputting the obtained groups of test samples into an intermediate feature extraction model to obtain a first matching degree set; inputting a plurality of groups of training samples into the intermediate feature extraction model to obtain a second matching degree set; further, based on the first matching degree set and the second matching degree set, a plurality of groups of training samples are screened, and based on the screened training samples, an original feature extraction model is trained, so that a target feature extraction model which is finally applied in the article matching process is obtained. According to the technical scheme, through mutual matching of the coarse and fine training and the automatic screening of the high-quality training samples, the influence of the low-quality training samples on the model effect can be effectively avoided, the effectiveness of article characterization is improved, and the accuracy of the follow-up application model in article matching is further guaranteed.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention, nor is it intended to be used to limit the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a model training method provided in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of another model training method provided in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of another model training method provided in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of an example model training in another model training method provided in accordance with an embodiment of the present invention;

FIG. 5 is a flow chart of a method of item matching provided in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of an example of a model application in an item matching method provided in accordance with an embodiment of the present invention;

FIG. 7 is a block diagram of a model training apparatus provided in accordance with an embodiment of the present invention;

fig. 8 is a block diagram of an article matching device according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device implementing a model training method or an item matching method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. The cases of "target", "original", etc. are similar and will not be described in detail herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Fig. 1 is a flowchart of a model training method according to an embodiment of the present invention. The method and the device are suitable for training out the target feature extraction model which can be used for extracting the multi-modal features, and are particularly suitable for obtaining the target feature extraction model for extracting the multi-modal features through coarse and fine two-wheeled multi-modal training and a low-quality training sample automatic screening technology. The method may be performed by a model training apparatus provided by an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and the apparatus may be integrated on an electronic device, where the electronic device may be a variety of user terminals or servers.

Referring to fig. 1, the method of the embodiment of the present invention specifically includes the following steps:

s110, training images and training texts of the training objects are used as a group of training samples, and the original feature extraction model is trained based on the obtained multiple groups of training samples, so that an intermediate feature extraction model is obtained.

Wherein a training object is understood as an object for model training, on the basis of which it is also understood as an object for model checking. A training image is understood to mean an image for characterizing a training object, i.e. an image containing a training object, in particular a training object as subject. Training text is understood to be text used to characterize a training item, which may be, for example, an item title or an item attribute, etc. The training images and training texts are used as a group of training samples, so that a plurality of groups of training samples can be obtained, wherein the plurality of groups of training samples comprise training images and training texts of a plurality of training articles.

The original feature extraction model may be understood as a model to be trained for implementing a multimodal feature extraction function, in particular for implementing image feature extraction and text feature extraction, and thus the multimodal may be understood as an image model and a text modality herein. And training the original feature extraction model based on a plurality of groups of training samples to obtain an intermediate feature extraction model. This round of training may be referred to as coarse training.

It should be noted that, in practical applications, one or more low-quality training samples may exist in the multiple sets of training samples, where the low-quality training samples may be understood as training samples in which the training image and/or the training text cannot accurately represent the training object. This is often easy to happen on e-commerce platforms, because training images and training text are often edited by merchants for training items, where there may be erroneous editing content. According to practical experience, the low-quality training samples can directly influence the model effect of the trained intermediate feature extraction model, so that the accuracy of article matching is influenced when the model is applied to article matching.

In order to solve the above problems, the intermediate feature extraction model may be inspected by the following steps, and then, high quality training samples may be obtained by screening from multiple sets of training samples according to the obtained inspection result, that is, the training images and training texts may be obtained by screening to accurately represent the training samples of the training object, and further, model training may be performed based on the screened high quality training samples, so as to obtain the target feature extraction model capable of extracting the multi-modal features effectively representing the training object. In particular, the method comprises the steps of,

S120, taking the inspection image and the inspection text of the inspected object as a group of inspection samples, and inputting the obtained groups of inspection samples into an intermediate feature extraction model to obtain a first matching degree set.

Wherein the first set of matches is used to characterize the matches between each test image and each test text in the plurality of sets of test samples.

The test object is understood to be an object for checking the model effect of the intermediate feature extraction model. A test image is understood to mean an image for characterizing a test item, i.e. an image containing the test item, in particular the test item as subject. The test text is understood to be text for characterizing the test item, which may be, for example, an item title or an item attribute, etc. The test images and test texts are taken as a set of test samples, whereby a plurality of sets of test samples are obtained, the plurality of sets of test samples comprising test images and test texts of a plurality of test articles. In practical application, optionally, in order to ensure the screening accuracy of the training samples, the test images and test texts in the test samples can accurately represent the test objects; alternatively, for a plurality of test articles and a plurality of training articles, there may be partially overlapped articles or completely different articles, which may be set according to practical situations, and are not specifically limited herein.

And inputting a plurality of groups of test samples into the intermediate feature extraction model, so that a first matching degree set can be obtained according to the output result of the intermediate feature extraction model, namely, the feature extraction result aiming at the plurality of groups of test samples.

It should be noted that the first matching degree set may be used to characterize the matching degree between each test image and each test text in the plurality of sets of test samples. By way of example, the plurality of sets of test samples may include test sample 1, test sample 2, and test sample 3, test sample 1 may include test image 1 and test text 1 of test article 1, test sample 2 may include test image 2 and test text 2 of test article 2, and test sample 3 may include test image 3 and test text 3 of test article 3, then six first matches in the first set of matches may be used to characterize the matches between test image 1 and test text 1, the matches between test image 1 and test text 2, the matches between test image 1 and test text 3, the matches between test image 2 and test text 2, the matches between test image 2 and test text 3, and the matches between test image 3 and test text 3, respectively.

S130, inputting a plurality of groups of training samples into the intermediate feature extraction model to obtain a second matching degree set.

The second matching degree set is used for representing matching degrees between training images and training texts in each group of training samples in the plurality of groups of training samples.

In order to screen out high-quality training samples from the plurality of groups of training samples, the plurality of groups of training samples can be input into the intermediate feature extraction model, so that a second matching degree set is obtained according to the output result of the intermediate feature extraction model, namely, the feature extraction result aiming at the plurality of groups of training samples.

It should be noted that the second matching degree set may be used to characterize the matching degree between the training image and the training text in each of the plurality of training samples. Illustratively, similar to the above example, the plurality of sets of training samples may include training sample 1, training sample 2, and training sample 3, training sample 1 includes training image 1 and training text 1 of training article 1, training sample 2 includes training image 2 and training text 2 of training article 2, and training sample 3 includes training image 3 and training text 3 of training article 3, then three first matches in the second set of matches may be used to characterize the matches between training image 1 and training text 1, the matches between training image 2 and training text 2, and the matches between training image 3 and training text 3, respectively.

And S140, screening a plurality of groups of training samples based on the first matching degree set and the second matching degree set, and training an original feature extraction model based on the screened training samples to obtain a target feature extraction model.

From the above description, it can be seen that the first matching degree set can characterize the matching degree between the matched (i.e. corresponding to the same test article) test image and the test text, and can also characterize the matching degree between the unmatched (i.e. unmatched) test image and the test text, and the matching degree between the matched test image and the test text can be obtained by comparing the two matching degrees.

On the basis, as the second matching degree set can reflect the matching degree between the matched training image and the training text, by combining the first matching degree set and the second matching degree set, high-quality training samples can be screened out from multiple groups of training samples, for example, the training samples with the matching degree between the training image and the training text being above the basic level are used as the high-quality training samples.

Further, based on the screened training samples, the original feature extraction model is retrained, so that a target feature extraction model capable of extracting multi-mode features effectively representing the training object is obtained, and the accuracy of object matching by the follow-up application model is ensured. This round of training may be referred to as fine training.

An optional technical solution, the above model training method further includes:

And aiming at each target item in a plurality of item items associated with the item library, sampling the item items positioned under the target item in the item library to obtain sampled items, and selecting training items corresponding to multiple groups of training samples from all the obtained sampled items.

The object item is understood to be an active item among a plurality of item items belonging to all items stored in the item library, and in practical application, the object item belonging to the active item among all items may be selected as the object item. The number of the target categories may be one, two or more, which is related to the actual situation and is not specifically limited herein.

And sampling the articles of the articles which are positioned under the target articles according to each target article, so as to obtain one or more sampled articles, thereby obtaining the sampled articles under different target articles. Further, selecting training articles corresponding to a plurality of groups of training samples from all the obtained sampling articles. For example, a training sample may be constructed by selecting a predetermined proportion of training items from all of the obtained sampled items. On this basis, further, optionally, the sample articles except the training article in all the sample articles may be used as test articles to construct a test sample.

According to the technical scheme, the object samples are sampled under all object classes to construct the training samples for model training, and most of the training samples of the object classes can be basically covered, so that the object feature extraction model obtained through training has good universality; in addition, the universality also enables the model to be not required to be retrained when new products appear later, and the target feature extraction model obtained through training can still effectively represent the multi-mode features of the products under the new products, has good expansibility and can meet the actual business requirements.

FIG. 2 is a flow chart of another model training method provided in an embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, screening multiple sets of training samples based on the first matching degree set and the second matching degree set includes: determining a matching degree screening threshold based on each first matching degree in the first matching degree set; and screening the plurality of groups of training samples based on the matching degree screening threshold and the second matching degree respectively corresponding to each group of training samples in the second matching degree set. Wherein, the explanation of the same or corresponding terms as the above embodiments is not repeated herein.

Referring to fig. 2, the method of this embodiment may specifically include the following steps:

s210, training images and training texts of training objects are used as a group of training samples, and the original feature extraction model is trained based on the obtained multiple groups of training samples, so that an intermediate feature extraction model is obtained.

S220, taking the inspection image and the inspection text of the inspected object as a group of inspection samples, and inputting the obtained groups of inspection samples into an intermediate feature extraction model to obtain a first matching degree set.

S230, inputting a plurality of groups of training samples into the intermediate feature extraction model to obtain a second matching degree set.

S240, determining a matching degree screening threshold based on each first matching degree in the first matching degree set.

It will be appreciated from the foregoing that, based on the first set of matches, a base level of matches between the matched test image and the test text may be obtained, which is referred to herein as a match screening threshold, i.e., a threshold for screening multiple sets of training samples from a match perspective.

S250, screening a plurality of groups of training samples based on a matching degree screening threshold and second matching degrees respectively corresponding to each group of training samples in a second matching degree set.

The second matching degree can be used for representing the matching degree between the training image and the training text of the same training object, so that multiple groups of training samples can be screened based on the matching degree screening threshold and each second matching degree in the second matching degree set. In practical application, optionally, for the second matching degree in the second matching degree set, which corresponds to each group of training samples, the training samples may be reserved to realize the screening process of multiple groups of training samples when the second matching degree is greater than the matching degree screening threshold. That is, when the second matching degree is greater than the matching degree screening threshold, this indicates that the training image and the training text in the training sample corresponding to the second matching degree correspond to the same training object with a high probability, that is, the training image and the training text can accurately represent the training object, so that the training sample belongs to a high-quality training sample and can be reserved.

And S260, training the original feature extraction model based on the screened training samples to obtain a target feature extraction model.

According to the technical scheme, the matching degree screening threshold is determined through the first matching degree set, and then a plurality of groups of training samples are screened based on the matching degree screening threshold and the second matching degree set, so that the effect of accurately screening high-quality training samples is achieved.

On the basis, an optional technical scheme, based on each first matching degree in the first matching degree set, determines a matching degree screening threshold, including:

for each first matching degree in the first matching degree set, obtaining a matching truth value corresponding to the first matching degree, wherein the matching truth value can be used for representing the identity between the test article represented by the test image corresponding to the first matching degree and the test article represented by the test text;

and obtaining a matching degree screening threshold according to each first matching degree in the first matching degree set and the matching truth value corresponding to each first matching degree in the first matching degree set.

For each first matching degree in the first matching degree set, the matching truth value corresponding to the first matching degree can be used for representing the test article represented by the test image corresponding to the first matching degree and whether the test article represented by the test text is the same test article. To better understand this concept, continuing with the above example as an example, the matching truth value of the first degree of matching corresponding to the inspection image 1 and the inspection text 1 is 1, as the inspection items characterized by the inspection image 1 and the inspection text 1 are both inspection item 1; further exemplary, the matching truth value of the first degree of matching corresponding to the inspection image 1 and the inspection text 2 is 0 because the inspection item characterized by the inspection image 1 is the inspection item 1 and the inspection item characterized by the inspection text 2 is the inspection item 2, which are not the same inspection item.

In general, the first matching degree between the inspection image and the inspection text corresponding to different inspection articles is lower than the first matching degree between the inspection image and the inspection text corresponding to the same inspection article, so that the matching degree screening threshold can be obtained according to each first matching degree in the first matching degree set and the matching true values corresponding to the first matching degrees respectively. By the technical scheme, the matching degree screening threshold is accurately determined.

On the basis, optionally, according to each first matching degree in the first matching degree set and the matching truth value corresponding to each first matching degree in the first matching degree set, obtaining the matching degree screening threshold includes:

sequencing each first matching degree in the first matching degree set to obtain a sequencing result;

calculating the accumulated accuracy corresponding to each first matching degree in the sequencing result based on the matching true value corresponding to each first matching degree in the sequencing result;

and obtaining a matching degree screening threshold from each first matching degree in the sequencing result based on a preset accumulated accuracy rate threshold and accumulated accuracy rate.

Wherein, in general, the higher the first matching degree, the greater the likelihood that its corresponding inspection image and inspection text characterize the same inspection item, i.e., the greater the likelihood that its corresponding matching truth value is true. Therefore, the first matching degrees in the first matching degree set can be ranked, and then the accumulated accuracy corresponding to the first matching degrees is calculated according to the matching true values corresponding to the first matching degrees in the ranking result. And then, obtaining a matching degree screening threshold from the first matching degrees according to a preset accumulated accuracy rate threshold and accumulated accuracy rate. For example, in the case of ordering the first matching degrees in a descending order, the cumulative accuracy rates may be calculated from top to bottom, and then the first matching degree corresponding to the cumulative accuracy rate, of which the first one is smaller than the cumulative accuracy rate threshold, is used as the matching degree screening threshold.

FIG. 3 is a flow chart of another model training method provided in an embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the original feature extraction model includes an original image feature extraction network and an original text feature extraction network, and training the original feature extraction model based on the obtained multiple sets of training samples to obtain an intermediate feature extraction model, including: inputting each training image in the obtained multiple groups of training samples into an original image feature extraction network, and inputting each training text into an original text feature extraction network; and comparing the learning loss, carrying out loss calculation on the output result of the original image feature extraction network and the output result of the original text feature extraction network, and adjusting network parameters in the original feature extraction model according to the obtained loss calculation result to obtain an intermediate feature extraction model. Wherein, the explanation of the same or corresponding terms as the above embodiments is not repeated herein.

Referring to fig. 3, the method of this embodiment may specifically include the following steps:

s310, acquiring an original feature extraction model, wherein the original feature extraction model comprises an original image feature extraction network and an original text feature extraction network.

The original image feature extraction network is understood to be a network for extracting image features. Similarly, an original text feature extraction network may be understood as a network for extracting text features. The two feature extraction networks are two branches in the original feature extraction model.

S320, taking training images and training texts of training objects as a group of training samples to obtain a plurality of groups of training samples.

S330, inputting each training image in the plurality of groups of training samples into an original image feature extraction network, and inputting each training text into the original text feature extraction network.

Each training image in the plurality of groups of training samples is input to an original image feature extraction network, so that feature extraction is respectively carried out on the training images through the original image feature extraction network.

Similarly, each training text in the plurality of sets of training samples is input to the original text feature extraction network to perform feature extraction on the training texts through the original text feature extraction network.

S340, utilizing the contrast learning loss to calculate the loss of the output result of the original image feature extraction network and the output result of the original text feature extraction network, and adjusting network parameters in the original feature extraction model according to the obtained loss calculation result to obtain an intermediate feature extraction model.

The embodiment of the invention carries out self-supervision training on the original feature extraction model, so that the output results of the two feature extraction networks can be subjected to loss calculation by utilizing the comparison learning loss so as to adjust the network parameters of the two feature extraction networks, thereby obtaining an intermediate feature extraction model through training. In practical applications, the contrast learning loss may be achieved by InfoLoss, infoNCE, MOCO or SimCLR, which is not limited herein.

S350, taking the inspection image and the inspection text of the inspected object as a group of inspection samples, and inputting the obtained groups of inspection samples into an intermediate feature extraction model to obtain a first matching degree set.

S360, inputting a plurality of groups of training samples into the intermediate feature extraction model to obtain a second matching degree set.

And S370, screening a plurality of groups of training samples based on the first matching degree set and the second matching degree set, and training an original feature extraction model based on the screened training samples to obtain a target feature extraction model.

According to the technical scheme, the original features are extracted through the existing training images and training texts in a self-supervision mode, so that the model training process does not need to depend on labeling data, the model training cost is reduced, and the model training efficiency is improved; in addition, the training image is directly used for integrally representing the training object, namely, the integral characteristics of the training image are directly extracted, instead of detecting the main body area where the training object is located from the training image, and then model training is carried out by representing the training object through the main body area, so that the problem that the training object cannot be matched for searching due to main body detection errors is solved.

On the basis, an optional technical scheme is that the intermediate feature extraction model comprises an intermediate image feature extraction network and an intermediate text feature extraction network;

inputting the obtained plurality of groups of test samples into an intermediate feature extraction model to obtain a first set of matches comprises:

inputting the test images into an intermediate image feature extraction network for each test image in the obtained multiple groups of test samples to obtain test image features;

inputting the test texts into an intermediate text feature extraction network for each test text in a plurality of groups of test samples to obtain test text features;

Calculating a first degree of matching between the test image features and the test text features for each test image feature and each test text feature that have been obtained;

and obtaining a first matching degree set according to the plurality of calculated first matching degrees.

The intermediate image feature extraction network may be understood as a network obtained by training the original image feature extraction network through multiple sets of training samples. And respectively processing each inspection image in the plurality of groups of inspection samples based on the intermediate image feature extraction network to obtain inspection image features respectively corresponding to each inspection image.

Similarly, an intermediate text feature extraction network may be understood as a network that results from training the original text feature extraction network through multiple sets of training samples. And respectively processing each test text in the plurality of groups of test samples based on the intermediate text feature extraction network, so as to obtain the test text features respectively corresponding to each test text.

And carrying out matching degree cross calculation on each test image feature and each test text feature to obtain a plurality of first matching degrees. Illustratively, continuing with the above-described six first-degree-of-matching example, a first degree of matching between the inspection image features of inspection image 1 and the inspection text features of inspection text 1, a first degree of matching between the inspection image features of inspection image 1 and the inspection text features of inspection text 2, a first degree of matching between the inspection image features of inspection image 1 and the inspection text features of inspection text 3, a first degree of matching between the inspection image features of inspection image 2 and the inspection text features of inspection text 2, a first degree of matching between the inspection image features of inspection image 2 and the inspection text features of inspection text 3, and a first degree of matching between the inspection image features of inspection image 3 and the inspection text features of inspection text 3 are calculated, respectively. Further, a first matching degree set is obtained according to the obtained plurality of first matching degrees.

In order to better understand the above-described respective model training schemes as a whole, an exemplary description thereof is given below in connection with specific examples. For example, see fig. 4:

1) Sampling multi-mode data of commodities:

n products are randomly sampled in each three-level category of all active products in the product library, and in practical application, all products can be sampled for three-level categories of less than n products. And respectively extracting a commodity main diagram and a commodity title of each commodity obtained by sampling, and forming a multi-mode data pair of the commodity.

So far, assuming that a plurality of multi-mode data pairs (i.e. samples) N are obtained in total, a sample with a preset proportion p is selected as a training sample set (0<p<1) The remaining samples are used as test sample sets. The training sample set is marked as T, and the number is marked as N _T The method comprises the steps of carrying out a first treatment on the surface of the And, the test sample set is denoted as V, and the number thereof is denoted as N _V 。

2) Multimode self-supervision rough training:

the commodity main graph and commodity title of the training sample set T are respectively input into the image feature extraction branch and the text feature extraction branch in fig. 4, and the contrast learning Loss is utilized to perform multi-mode self-supervision pre-training based on double flow, in this example, a venturi double flow multi-mode training framework is adopted.

3) And (3) matching degree screening threshold statistics:

and (3) performing matching degree cross calculation on the commodity main graph and the commodity title in the test sample set V by utilizing a multi-mode model (namely the intermediate feature extraction model) M obtained by rough training, wherein the method comprises the following specific steps of: i: extracting image feature vectors from commodity main images of each commodity in the test sample set V through image feature extraction branches of the multi-mode model M, and summing N _V The number of vectors is denoted as F _img ；

ii: extracting text feature vectors from the commodity titles of all commodities in the test sample set V through text feature extraction branches of the multi-mode model M, and summing N _V The number of vectors is denoted as F _txt ；

iii: calculating a first matching degree Similarity between any image feature vector and any text feature vector, and marking Vaule (i.e. matching true value), and totaling N _v *N _v The calculation formula is as follows:

wherein m and n respectively represent the mth commodity and the nth commodity in the test sample set V;

iv, sorting all matching results (namely the second matching degree set above) according to the second matching degree from big to small, and calculating the accumulated accuracy from top to bottom, wherein the calculation formula is as follows:

IIV: the first position k of Accuracy less than p (set to 0.95 in this example) is selected and is denoted as s with the Similarity at position k as the match screening threshold.

4) Training sample set T screening:

i: calculating a first matching degree between each commodity main graph in the training sample set T and the commodity title corresponding to the commodity main graph according to the mode of the step 3);

ii: and selecting training samples with the first matching degree larger than s in the training sample set T to form a new training sample set T'.

5) Multimode self-supervision fine training:

according to the mode of the step 2), the new training sample set T 'is utilized to carry out multi-mode self-supervision training, so as to obtain a new multi-mode model M'.

Fig. 5 is a flowchart of an article matching method provided in an embodiment of the present invention. The embodiment is applicable to the condition of matching articles. The method may be performed by the article matching device provided by the embodiment of the present invention, where the device may be implemented by software and/or hardware, and the device may be integrated on an electronic device, where the electronic device may be a variety of user terminals or servers.

Referring to fig. 5, the method of the embodiment of the present invention specifically includes the following steps:

s410, obtaining information to be matched of the object to be matched, and training the obtained target feature extraction model according to the model training method provided by any embodiment of the invention.

The item to be matched is understood to be an item to be matched, in particular an item to be matched in at least one candidate item. The information to be matched can be understood as detailed information of the objects to be matched, specifically, the images to be matched and/or the texts to be matched, and the detailed limitation is not provided herein. And obtaining information to be matched and a target feature extraction model obtained through training according to any technical scheme.

S420, inputting the information to be matched into a target feature extraction model to obtain features to be matched, wherein the features to be matched comprise image features to be matched and/or text features to be matched.

And inputting the information to be matched into the target feature extraction model to obtain the feature to be matched. In practical application, optionally, in the case that the information to be matched is an image to be matched, the feature to be matched may be an image feature to be matched; alternatively, if the information to be matched is a text to be matched, the feature to be matched may be a text feature to be matched; alternatively, in the case that the information to be matched is the image to be matched and the text to be matched, the feature to be matched may be the feature of the image to be matched and the feature of the text to be matched.

And S430, matching the features to be matched with the candidate features of at least one candidate item respectively to determine a target item matched with the item to be matched from the at least one candidate item, wherein the candidate features comprise candidate image features and/or candidate text features.

Wherein, for each candidate item in the at least one candidate item, the candidate feature of the candidate item may be obtained by feature extraction of candidate information of the candidate item, which may include a candidate image feature and/or a candidate text feature. It should be noted that the type of the information to be matched and the type of the candidate information may be completely the same, partially the same, or completely different, because the training image and the training text are aligned in the model training process, the candidate image, the candidate text, the candidate image, and the candidate image and the candidate text may be matched through the image to be matched, the candidate text to be matched, and the candidate image to be matched and the candidate text in the model application process.

And matching the features to be matched with the candidate features of the at least one candidate item respectively so as to determine a target item matched with the item to be matched from the at least one candidate item.

According to the technical scheme, the to-be-matched characteristics are obtained by inputting to-be-matched information of the to-be-matched objects into a target characteristic extraction model; and then, matching the features to be matched with the candidate features of the at least one candidate item respectively to determine a target item matched with the item to be matched from the at least one candidate item. According to the technical scheme, the information to be matched is processed by applying the target feature extraction model capable of effectively carrying out object characterization, so that the accuracy of object matching is improved.

An optional technical solution, the article matching method may further include: and acquiring candidate information of the candidate item aiming at each candidate item in the at least one candidate item, and inputting the candidate information into a target feature extraction model to obtain the candidate feature of the candidate item. The information to be matched and the candidate information are processed through the same feature extraction model, and therefore consistency of the matching reference can be guaranteed when the obtained feature to be matched and the candidate feature are matched.

In order to better understand the above-described individual item matching schemes as a whole, an exemplary description thereof is provided below in connection with specific examples. For an example, see fig. 6:

1) Extracting and warehousing characteristics of a commodity main graph bottom library:

and respectively inputting all commodity main graphs in the commodity main graph base into image feature extraction branches of the multi-mode model M', and storing the obtained image features into the commodity feature base.

2) Extracting and matching image features of commodities to be matched:

inputting the commodity main diagram of the commodity to be matched into an image feature extraction branch of the multi-mode model M', and carrying out matching search on the obtained image features and the image features in the commodity feature base. In practical application, full search can be adopted for the condition that the number of image features in the commodity feature base is small; for the situation that the number of image features in the commodity feature base is large, various vector retrieval optimization methods can be adopted.

3) And (3) returning a matching result:

and returning commodity information corresponding to the TopN matching results with the highest matching degree according to actual needs.

Fig. 7 is a block diagram of a model training apparatus according to an embodiment of the present invention, where the apparatus is configured to perform the model training method according to any of the foregoing embodiments. The device and the model training method of each embodiment belong to the same invention conception, and the detailed content which is not described in detail in the embodiment of the model training device can be referred to the embodiment of the model training method. Referring to fig. 7, the apparatus may specifically include: coarse training module 510, first input module 520, second input module 530, and fine training module 540. Wherein,

The coarse training module 510 is configured to take training images and training texts of training objects as a set of training samples, and train the original feature extraction model based on the obtained multiple sets of training samples to obtain an intermediate feature extraction model;

a first input module 520, configured to take a test image and a test text of a test article as a set of test samples, and input the obtained sets of test samples into an intermediate feature extraction model to obtain a first matching degree set;

a second input module 530, configured to input a plurality of sets of training samples into the intermediate feature extraction model to obtain a second matching degree set;

the fine training module 540 is configured to screen a plurality of sets of training samples based on the first matching degree set and the second matching degree set, and train the original feature extraction model based on the screened training samples to obtain a target feature extraction model;

Optionally, the fine training module 540 includes:

the matching degree screening threshold determining submodule is used for determining a matching degree screening threshold based on each first matching degree in the first matching degree set;

And the training sample screening submodule is used for screening a plurality of groups of training samples based on the matching degree screening threshold and the second matching degree which corresponds to each group of training samples in the second matching degree set.

On this basis, an optional matching degree screening threshold determining submodule includes:

the matching truth value obtaining unit is used for obtaining a matching truth value corresponding to the first matching degree aiming at each first matching degree in the first matching degree set, wherein the matching truth value can be used for representing the identity between the test article represented by the test image corresponding to the first matching degree and the test article represented by the test text;

and the matching degree screening threshold obtaining unit is used for obtaining the matching degree screening threshold according to the first matching degrees in the first matching degree set and the matching truth values corresponding to the first matching degrees in the first matching degree set.

On the basis, optionally, the matching degree screening threshold obtaining unit is specifically configured to:

Alternatively, the training sample screening submodule is specifically configured to:

aiming at the second matching degree which corresponds to each group of training samples in the second matching degree set, the training samples are reserved under the condition that the second matching degree is larger than a matching degree screening threshold so as to realize the screening process of a plurality of groups of training samples.

Alternatively, the original feature extraction model may include an original image feature extraction network and an original text feature extraction network, and the coarse training module 510 may include:

the training text input sub-module is used for inputting each training image in the obtained multiple groups of training samples into the original image feature extraction network and inputting each training text into the original text feature extraction network;

the coarse training sub-module is used for carrying out loss calculation on the output result of the original image feature extraction network and the output result of the original text feature extraction network, and adjusting network parameters in the original feature extraction model according to the obtained loss calculation result so as to obtain an intermediate feature extraction model.

On this basis, optionally, the intermediate feature extraction model may include an intermediate image feature extraction network and an intermediate text feature extraction network, and the first input module 520 may include:

the test image input sub-module is used for inputting the test image into the intermediate image feature extraction network for each test image in the obtained multiple groups of test samples to obtain test image features;

the test text input sub-module is used for inputting the test text into the intermediate text feature extraction network for each test text in the plurality of groups of test samples to obtain test text features;

a first matching degree calculation sub-module for calculating a first matching degree between the inspection image feature and the inspection text feature for each inspection image feature and each inspection text feature that have been obtained;

the first matching degree set obtaining submodule is used for obtaining the first matching degree set according to the calculated first matching degrees.

Optionally, the model training device further includes:

the article sampling module is used for sampling article of the article library, which is positioned under each object article in the plurality of object article articles associated with the article library, to obtain sampled articles, so as to select training articles corresponding to a plurality of groups of training samples from all the obtained sampled articles.

According to the model training device provided by the embodiment of the invention, through the coarse training module, training images and training texts of training objects are used as a group of training samples, and an original feature extraction model is trained based on the obtained groups of training samples, so that an intermediate feature extraction model is obtained; then, using a first input module to take an inspection image and an inspection text of an inspected article as a group of inspection samples, and inputting a plurality of groups of obtained inspection samples into an intermediate feature extraction model to obtain a first matching degree set; inputting a plurality of groups of training samples into the intermediate feature extraction model through a second input module to obtain a second matching degree set; further, through the fine training module, a plurality of groups of training samples are screened based on the first matching degree set and the second matching degree set, and the original feature extraction model is trained based on the screened training samples, so that a target feature extraction model applied in the article matching process is obtained. The device can effectively avoid the influence of low-quality training samples on the model effect by mutually matching the coarse and fine training and the automatic screening of high-quality training samples, thereby improving the effectiveness of the article characterization and further ensuring the accuracy of the subsequent application model in article matching.

The model training device provided by the embodiment of the invention can execute the model training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that, in the embodiment of the model training apparatus, each unit and module included are only divided according to the functional logic, but are not limited to the above-mentioned division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Fig. 8 is a block diagram of an article matching device according to an embodiment of the present invention, where the device is configured to perform the article matching method according to any of the foregoing embodiments. The device and the article matching method of the above embodiments belong to the same inventive concept, and reference may be made to the embodiments of the article matching method for details not described in detail in the embodiments of the article matching device. Referring to fig. 8, the apparatus may specifically include: a model acquisition module 610, a model application module 620, and an item matching module 630.

The model obtaining module 610 is configured to obtain information to be matched of an object to be matched, and a target feature extraction model obtained by training according to the model training method provided by any embodiment of the present invention;

The model application module 620 is configured to input information to be matched into the target feature extraction model to obtain features to be matched, where the features to be matched include image features to be matched and/or text features to be matched;

the item matching module 630 is configured to match the feature to be matched with candidate features of at least one candidate item, so as to determine a target item matched with the item to be matched from the at least one candidate item, where the candidate features include candidate image features and/or candidate text features.

Optionally, the article matching device further includes:

the candidate feature obtaining module is used for obtaining candidate information of the candidate items aiming at each candidate item in the at least one candidate item, and inputting the candidate information into the target feature extraction model to obtain the candidate features of the candidate items.

According to the article matching device provided by the embodiment of the invention, the information to be matched of the article to be matched is input into the target feature extraction model through the model acquisition module, so that the feature to be matched is obtained; inputting the information to be matched of the object to be matched into a target feature extraction model through a model application module to obtain features to be matched; and matching the feature to be matched with the candidate feature of at least one candidate item through an item matching module so as to determine a target item matched with the item to be matched from the at least one candidate item. According to the device, the information to be matched is processed by applying the target feature extraction model capable of effectively carrying out object characterization, so that the accuracy of object matching is improved.

The object matching device provided by the embodiment of the invention can execute the object matching method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that, in the embodiment of the article matching device, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Fig. 9 shows a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 9, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a model training method or an item matching method.

In some embodiments, the model training method or the item matching method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the model training method or the item matching method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the model training method or the item matching method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of model training, comprising:

taking the inspection image and the inspection text of the inspected object as a group of inspection samples, and inputting the obtained groups of inspection samples into the intermediate feature extraction model to obtain a first matching degree set;

Inputting the plurality of groups of training samples into the intermediate feature extraction model to obtain a second matching degree set;

screening the plurality of groups of training samples based on the first matching degree set and the second matching degree set, and training the original feature extraction model based on the screened training samples to obtain a target feature extraction model;

2. The method of claim 1, wherein the screening the plurality of sets of training samples based on the first set of matches and the second set of matches comprises:

determining a matching degree screening threshold based on each first matching degree in the first matching degree set;

and screening the plurality of groups of training samples based on the matching degree screening threshold and the second matching degree respectively corresponding to each group of training samples in the second matching degree set.

3. The method of claim 2, wherein the determining a match screening threshold based on each first match in the first set of matches comprises:

Obtaining a matching truth value corresponding to the first matching degree aiming at each first matching degree in the first matching degree set, wherein the matching truth value is used for representing the identity between the test object represented by the test image corresponding to the first matching degree and the test object represented by the test text;

and obtaining a matching degree screening threshold according to the first matching degrees in the first matching degree set and the matching truth values corresponding to the first matching degrees in the first matching degree set.

4. The method of claim 3, wherein the obtaining a matching degree screening threshold according to each first matching degree in the first matching degree set and the matching truth value corresponding to each first matching degree in the first matching degree set includes:

sequencing all the first matching degrees in the first matching degree set to obtain a sequencing result;

and obtaining a matching degree screening threshold from each first matching degree in the sequencing result based on a preset accumulated accuracy rate threshold and the accumulated accuracy rate.

5. The method of claim 2, wherein screening the plurality of sets of training samples based on the matching degree screening threshold and the second matching degree in the second set of matching degrees respectively corresponding to the each set of training samples comprises:

and aiming at the second matching degree which corresponds to each group of training samples in the second matching degree set, reserving the training samples under the condition that the second matching degree is larger than the matching degree screening threshold so as to realize the screening process of the plurality of groups of training samples.

6. The method of claim 1, wherein the original feature extraction model comprises an original image feature extraction network and an original text feature extraction network, the training the original feature extraction model based on the obtained plurality of sets of training samples to obtain an intermediate feature extraction model, comprising:

inputting each training image in the obtained multiple groups of training samples into the original image feature extraction network, and inputting each training text into the original text feature extraction network;

and utilizing the contrast learning loss to calculate the loss of the output result of the original image feature extraction network and the output result of the original text feature extraction network, and adjusting network parameters in the original feature extraction model according to the obtained loss calculation result to obtain an intermediate feature extraction model.

7. The method of claim 6, wherein the intermediate feature extraction model comprises an intermediate image feature extraction network and an intermediate text feature extraction network, the inputting the resulting plurality of sets of test samples into the intermediate feature extraction model to obtain a first set of matches, comprising:

inputting the test image into the intermediate image feature extraction network for each test image in the obtained multiple groups of test samples to obtain test image features;

inputting the test text into the intermediate text feature extraction network for each test text in the plurality of groups of test samples to obtain test text features;

calculating a first degree of matching between each of the obtained inspection image features and each of the inspection text features;

8. The method as recited in claim 1, further comprising:

and sampling the article class in the article library under each target article class in a plurality of article classes associated with the article library to obtain a sampled article, and selecting training articles corresponding to the plurality of groups of training samples from all the obtained sampled articles.

9. A method of matching items, comprising:

obtaining information to be matched of an article to be matched, and training a target feature extraction model according to the model training method of any one of claims 1-8;

inputting the information to be matched into the target feature extraction model to obtain features to be matched, wherein the features to be matched comprise image features to be matched and/or text features to be matched;

and respectively matching the features to be matched with candidate features of at least one candidate item to determine a target item matched with the item to be matched from the at least one candidate item, wherein the candidate features comprise candidate image features and/or candidate text features.

10. The method as recited in claim 9, further comprising:

and acquiring candidate information of the candidate item aiming at each candidate item in the at least one candidate item, and inputting the candidate information into the target feature extraction model to obtain the candidate feature of the candidate item.

11. A model training device, comprising:

the second input module is used for inputting the plurality of groups of training samples into the intermediate feature extraction model to obtain a second matching degree set;

the fine training module is used for screening the plurality of groups of training samples based on the first matching degree set and the second matching degree set, and training the original feature extraction model based on the screened training samples to obtain a target feature extraction model;

12. An article matching device, comprising:

the model acquisition module is used for acquiring information to be matched of the object to be matched and extracting a model of the target feature obtained by training according to the model training method of any one of claims 1-8;

The model application module is used for inputting the information to be matched into the target feature extraction model to obtain features to be matched, wherein the features to be matched comprise image features to be matched and/or text features to be matched;

and the article matching module is used for matching the to-be-matched features with candidate features of at least one candidate article respectively so as to determine a target article matched with the to-be-matched article from the at least one candidate article, wherein the candidate features comprise candidate image features and/or candidate text features.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to cause the at least one processor to perform the model training method of any one of claims 1-8 or the item matching method of claim 9 or 10.

14. A computer readable storage medium storing computer instructions for causing a processor to implement the model training method of any one of claims 1-8 or the item matching method of claim 9 or 10 when executed.