CN116975735A

CN116975735A - Training method, device, equipment and storage medium of correlation degree prediction model

Info

Publication number: CN116975735A
Application number: CN202310361993.4A
Authority: CN
Inventors: 彭婷; 叶澄灿; 周智毅
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-10-31

Abstract

The application relates to a training method, a training device, computer equipment, a storage medium and a computer program product of a correlation degree prediction model. The method relates to the technical field of artificial intelligence. The method comprises the following steps: determining sample characteristics according to sample entries, video related texts and sample videos, and estimating the probability that training samples respectively belong to each preset degree of correlation category according to the sample characteristics through a degree of correlation estimation model; determining a correlation degree estimated loss according to the probability of belonging to the labeling correlation degree category; determining a relevance classification loss according to the probability of each preset relevance class belonging to the relevance classification corresponding to the labeling relevance class; the correlation degree pre-estimation loss and the correlation classification loss are combined to obtain a trained correlation degree pre-estimation model, so that the accuracy of sequencing videos according to the correlation degree is improved, and the accuracy of video searching can be improved.

Description

Training method, device, equipment and storage medium of correlation degree prediction model

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for training a correlation degree prediction model.

Background

With the development of machine learning technology, different video platforms push corresponding videos to users based on the needs of the users. Specifically, for search terms input by a user in a video platform, a plurality of videos corresponding to the search terms are screened from a plurality of videos, and the videos are sorted according to the correlation degree of the videos and the search terms and then pushed to the user.

In the related art, two-classification training tasks are often adopted to train a correlation degree prediction model, and the trained model can predict the correlation degree between videos in a video library and input entries. However, the model trained in the above manner is prone to two-stage differentiation with scoring of the degree of correlation, resulting in an inability to accurately rank the videos.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a training method, apparatus, computer device, computer-readable storage medium, and computer program product for a correlation degree prediction model that can rank the accuracy of video by correlation degree.

In a first aspect, the present application provides a method for training a correlation degree prediction model. The method comprises the following steps:

Obtaining a training sample; each training sample comprises a sample entry, a video related text of a sample video and a marked related degree category, wherein the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of preset related degree categories, and each preset related degree category corresponds to a related classification;

determining sample characteristics according to the sample entry, the video related text and the sample video, and predicting the probability that the training samples respectively belong to each preset degree of correlation class according to the sample characteristics through the degree of correlation prediction model;

determining a correlation degree estimated loss according to the probability that the training sample belongs to the labeling correlation degree category;

determining a correlation classification loss according to the probability that the training sample belongs to each preset correlation degree category of the correlation classification corresponding to the labeling correlation degree category;

and training the correlation degree estimation model by combining the correlation degree pre-estimation loss and the correlation classification loss, wherein the trained correlation degree estimation model is used for video correlation degree sequencing.

In a second aspect, the application further provides a training device of the correlation degree estimation model. The device comprises:

the acquisition module is used for acquiring training samples; each training sample comprises a sample entry, a video related text of a sample video and a marked related degree category, wherein the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of preset related degree categories, and each preset related degree category corresponds to a related classification;

the probability estimating module is used for determining sample characteristics according to the sample entry, the video related text and the sample video, and estimating the probability that the training samples respectively belong to each preset degree of correlation class according to the sample characteristics through the degree of correlation estimating model;

the correlation degree estimated loss determination module is used for determining the correlation degree estimated loss according to the probability that the training sample belongs to the labeling correlation degree category;

the correlation classification loss determining module is used for determining the correlation classification loss according to the probability that the training sample belongs to each preset correlation degree category of the correlation classification corresponding to the labeling correlation degree category;

The training module is used for combining the correlation degree pre-estimation loss and the correlation classification loss, training the correlation degree pre-estimation model, and using the trained correlation degree pre-estimation model for video correlation degree sequencing.

In some embodiments, the probability estimation module is configured to determine a semantic matching feature according to the sample entry and the video related text, where the semantic matching feature characterizes a correlation between the sample entry and the video related text; determining text statistical characteristics according to the sample entry and the video related text; determining multi-modal characteristics according to the correlation scores of the sample vocabulary entries and the sample video and the correlation scores of the sample vocabulary entries and the video related texts; and splicing the semantic matching features, the text statistical features and the multi-modal features to obtain sample features.

In some embodiments, the probability estimation module is configured to input the sample feature into the correlation degree estimation model; outputting probability density distribution of the training samples belonging to each preset correlation degree category through the correlation degree prediction model; and converting the probability density distribution into probability distribution of the training samples belonging to each preset correlation degree category through an activation function, and obtaining probability of the training samples belonging to each preset correlation degree category according to the probability distribution.

In some embodiments, the probability estimating module is configured to perform feature extraction on the sample features through a multi-scene feature extracting network of the correlation degree estimating model to obtain common features corresponding to a plurality of scenes and characteristic features corresponding to each scene; weighting and summing the common characteristics and the characteristic characteristics corresponding to each scene to obtain multi-scene characteristics; respectively extracting the characteristics of the multi-scene characteristics through an explicit scene characteristic extraction network of the correlation degree estimation model to obtain explicit scene characteristics corresponding to each scene; carrying out weighted summation on the explicit scene characteristics corresponding to each scene to obtain output characteristics; and determining the probability that the training sample belongs to each preset correlation degree category according to the output characteristics.

In some embodiments, the multi-scene feature extraction network comprises a first scene feature extraction network, a second scene feature extraction network, and a multi-scene common feature extraction network; the probability estimating module is used for extracting characteristic features corresponding to the first scene through the first scene feature extracting network; extracting characteristic features corresponding to a second scene through the second scene feature extraction network; and extracting the common characteristics corresponding to the first scene and the second scene through the multi-scene common characteristic extraction network.

In some embodiments, the probability estimation module is configured to determine, through a first gating network of the correlation degree estimation model, feature weights corresponding to the first scene feature extraction network, the second scene feature extraction network, and the multi-scene common feature extraction network according to the sample features; and weighting and summing the characteristic features corresponding to the first scene, the characteristic features corresponding to the second scene and the common features according to the respective corresponding feature weights to obtain multi-scene features.

In some embodiments, the explicit scene feature extraction network comprises a first explicit scene feature extraction network and a second explicit scene feature extraction network; the probability estimating module is used for extracting the explicit scene characteristics corresponding to the first scene through the first explicit scene characteristic extracting network; and extracting the explicit scene characteristics corresponding to the second scene through the second explicit scene characteristic extraction network.

In some embodiments, the probability estimation module is configured to determine, through a second gating network of the correlation degree estimation model, feature weights corresponding to the first explicit scene feature extraction network and the second explicit scene feature extraction network respectively according to the sample features; and weighting and summing the explicit scene characteristics corresponding to the first scene and the explicit scene characteristics corresponding to the second scene according to the respective corresponding characteristic weights to obtain output characteristics.

In some embodiments, the correlation degree estimated loss determination module is configured to calculate, according to a probability that the training sample belongs to the labeled correlation degree category, a cross entropy corresponding to the correlation degree; taking the cross entropy corresponding to the correlation degree as a correlation degree estimated loss; and the correlation degree estimated loss is inversely correlated with the probability that the training sample belongs to the labeling correlation degree category.

In some embodiments, the relevance classification loss determination module is configured to determine a relevance classification of the training sample according to the labeled relevance class; superposing probabilities that the training samples belong to each preset correlation degree category of the correlation classification to obtain correlation probabilities; determining a relevance classification loss according to the relevance probability; the relevance classification loss is inversely related to the relevance probability.

In some embodiments, the training module is configured to superimpose the correlation degree pre-estimation loss and the correlation classification loss to obtain a target loss; and training the correlation degree estimation model by taking the minimization of the target loss as a target to obtain a trained correlation degree estimation model.

In some embodiments, the apparatus further comprises a ranking module for obtaining the input vocabulary entry and the video-related text of each candidate video; for each candidate video, determining a correlation degree estimation result between the input vocabulary entry and the candidate video according to the input vocabulary entry, the candidate video and the video related text of the candidate video through a trained correlation degree estimation model, wherein the correlation degree estimation result comprises probabilities that the correlation degree between the input vocabulary entry and the candidate video is respectively of each preset correlation degree category; and determining a correlation degree score between the input vocabulary entry and the candidate videos according to the probability that the correlation degree between the input vocabulary entry and each candidate video is respectively in each preset correlation degree category, and sequencing each candidate video according to the correlation degree score.

In a third aspect, the present application also provides a computer device. The computer equipment comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the training method of the correlation degree estimation model when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the training method of the correlation degree estimation model described above.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the training method of the correlation degree estimation model.

The training method, the training device, the computer equipment, the storage medium and the computer program product of the correlation degree estimation model are characterized by acquiring training samples; each training sample comprises a sample entry, a video related text of the sample video and a marked related degree category, the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of the preset related degree categories, each preset related degree category corresponds to a related classification, and the degree of distinction between related and unrelated between the sample entry and the sample video in the training sample is increased; determining sample characteristics according to sample entries, video related texts and sample videos, and intuitively and accurately estimating the probability that training samples respectively belong to each preset degree of correlation class according to the sample characteristics through a degree of correlation estimation model; according to the probability that the training sample belongs to the labeling correlation degree category, the correlation degree estimated loss is determined, so that model training is carried out based on the correlation degree estimated loss, and the model can be used for distinguishing scores of different correlation degrees. According to the probability that the training sample belongs to each preset correlation degree category of the correlation classification corresponding to the labeling correlation degree category, the correlation classification loss is determined, so that model training is carried out based on the correlation classification loss, the global physical meaning of model scoring can be improved, namely the scoring is similar as much as possible when the correlation classifications are the same; and combining the correlation degree pre-estimation loss and the correlation classification loss, training a correlation degree pre-estimation model, and using the trained correlation degree pre-estimation model for video correlation degree sequencing. Based on the method, the trained correlation degree prediction model can improve the global physical meaning of scoring while improving the classification degree of the correlation degree, and the robustness is improved, so that the accuracy of sequencing videos according to the correlation degree is ensured.

Drawings

FIG. 1 is an application environment diagram of a training method of a correlation degree estimation model in one embodiment;

FIG. 2 is a flowchart of a training method of a correlation degree estimation model in one embodiment;

FIG. 3 is a schematic diagram of a training sample in one embodiment;

FIG. 4 is a flow chart illustrating the steps for determining sample characteristics in one embodiment;

FIG. 5 is a schematic diagram of probabilities in one embodiment;

FIG. 6 is a flow diagram of the determine probability step in one embodiment;

FIG. 7A is a schematic diagram of a search interface in one embodiment;

FIG. 7B is a schematic diagram of a search interface in another embodiment;

FIG. 8 is a flow chart illustrating a ranking step of candidate videos in one embodiment;

FIG. 9 is a flowchart of a training method of a correlation degree estimation model in one embodiment;

FIG. 10 is a diagram of video dependency ranking in one embodiment;

FIG. 11 is a block diagram of a training apparatus for a correlation degree estimation model in one embodiment;

fig. 12 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The application provides a training method of a correlation degree prediction model, which relates to an artificial intelligence (Artificial Intelligence, AI) technology, wherein the artificial intelligence is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. The embodiment of the application provides a training method of a correlation degree prediction model, and particularly relates to an artificial intelligence machine learning technology.

In the related art, based on search terms input by a user in a video interactive interface, videos to be screened are sequentially obtained through recall and rough ranking, at this time, the videos to be screened are ranked and screened again through a fine ranking model, a plurality of videos corresponding to the search terms are screened, and the videos are ranked according to the degree of correlation with the search terms and then pushed to the user. In the related art, a training task of two classifications is generally adopted to train the fine-ranking model, and the degree of correlation between the video in the video library and the input entry can be estimated by the fine-ranking model obtained through training. However, the fine-ranking model trained in the above manner is prone to two-stage differentiation with a degree of correlation scoring, and each video cannot be accurately ranked.

According to the training method of the correlation degree estimation model, which is provided by the embodiment of the application, training samples are obtained; each training sample comprises a sample entry, a video related text of the sample video and a marked related degree category, the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of the preset related degree categories, each preset related degree category corresponds to a related classification, and the degree of distinction between related and unrelated between the sample entry and the sample video in the training sample is increased; determining sample characteristics according to sample entries, video related texts and sample videos, and intuitively and accurately estimating the probability that training samples respectively belong to each preset degree of correlation class according to the sample characteristics through a degree of correlation estimation model; according to the probability that the training sample belongs to the labeling correlation degree category, the correlation degree estimated loss is determined, so that model training is carried out based on the correlation degree estimated loss, and the model can be used for distinguishing scores of different correlation degrees. According to the probability that the training sample belongs to each preset correlation degree category of the correlation classification corresponding to the labeling correlation degree category, the correlation classification loss is determined, so that model training is carried out based on the correlation classification loss, the global physical meaning of model scoring can be improved, namely the scoring is similar as much as possible when the correlation classifications are the same; and combining the correlation degree pre-estimation loss and the correlation classification loss, training a correlation degree pre-estimation model, and using the trained correlation degree pre-estimation model for video correlation degree sequencing. Based on the method, the trained correlation degree prediction model can improve the global physical meaning of scoring while improving the classification degree of the correlation degree, and the robustness is improved, so that the accuracy of sequencing videos according to the correlation degree is ensured.

The training method of the correlation degree estimation model provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be provided separately, integrated on the server 104, or integrated on the cloud or other servers. The training method of the correlation degree estimation model can be independently executed by the terminal 102 and the server 104, and the training method of the correlation degree estimation model can be cooperatively executed by the terminal 102 and the server 104.

In some embodiments, the method for training the correlation degree estimation model provided in the embodiments of the present application may be performed by the server 104. The server 104 acquires a training sample; each training sample comprises a sample entry, a video related text of a sample video and a labeled degree of correlation class, wherein the labeled degree of correlation class represents the degree of correlation between the sample entry and the sample video, the labeled degree of correlation class belongs to one of various preset degree of correlation classes, and each preset degree of correlation class corresponds to a correlation class; the server 104 determines sample characteristics according to the sample entry, the video related text and the sample video, and predicts the probability that the training samples respectively belong to each preset degree of correlation class according to the sample characteristics through a degree of correlation prediction model; the server 104 determines the estimated correlation loss according to the probability that the training sample belongs to the class of the labeling correlation degree; the server 104 determines a relevance classification loss according to the probability that the training sample belongs to each preset relevance class of the relevance classifications corresponding to the labeling relevance classes; the server 104 combines the relevance degree pre-estimation loss and the relevance classification loss, trains a relevance degree pre-estimation model, and the trained relevance degree pre-estimation model is used for video relevance degree sequencing.

Optionally, the server 104 invokes the trained relevance prediction model to rank the relevance of the candidate videos, so that the server 104 determines a target video from the candidate videos and pushes the target video to the terminal 102. The terminal 102 is provided with a client with a video pushing function, and the client can be an e-commerce client, a video client, a social application client and the like. For example, for an input term input by a target user on a video client, the server 104 sorts the relevance degrees of the candidate videos by calling a trained relevance degree prediction model, determines a plurality of target videos, and pushes the plurality of target videos to the video client in the terminal 102, so that the video client displays the plurality of target videos.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers, or may be implemented by a cloud server.

In one embodiment, as shown in fig. 2, a training method of a correlation degree estimation model is provided, and the method is applied to the computer device (may be the server 104 or the terminal 102 in fig. 1) in fig. 1 for illustration, and includes the following steps:

step 202, obtaining a training sample; each training sample comprises a sample entry, a video related text of the sample video and a marked related degree category, the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of the preset related degree categories, and the preset related degree categories are correspondingly provided with related classifications.

The sample entry is a search entry for model training, the sample entry can be a search entry used for historical video search, and the sample entry is a search sentence used for video search. In an actual video search process, there are different forms of sample entries. For example, the video name is directly used as a sample term, for example, in order to increase the accuracy of searching, the video name and the name of an actor of the video are used as sample terms, for example, multiple versions of the video exist in the same subject, and at this time, the video name and the video version are used as sample terms.

The sample video is a video to be screened which is pre-stored in a video library, and the sample video can be a long video or a short video. The long video may be a video having a duration exceeding a preset duration, such as a movie, or an episode, and in the case that the long video is an episode, the long video includes a plurality of sub-videos. The duration of the short video is shorter than the duration of the long video compared to the long video.

The video related text is used to describe video information of the video corresponding thereto, including at least one of a scenario profile, a video title, an alias of the video, an actor name, a video description, a tag of the video, and the like.

The labeled correlation degree category is a category of correlation degree between the labeled characterization sample entry and the sample video. The higher the degree of correlation corresponding to the labeled degree of correlation category, the more the sample entry is matched with the sample video. In order to facilitate comparison of the degree of correlation between different sample videos and the same sample entry, a plurality of predetermined degree of correlation categories are defined in advance, and the different predetermined degree of correlation categories are ordered, for example, in the order of the degree of correlation from high to low, the degree of correlation corresponding to the former predetermined degree of correlation category is higher than the degree of correlation corresponding to the latter predetermined degree of correlation category. It follows that the degree of correlation between any two predetermined degree of correlation categories is comparable, e.g. there are four predetermined degree of correlation categories, very correlated (excelent), comparative correlated (good), weak correlated (less) and completely uncorrelated (bad), respectively. For the four predetermined correlation degree categories, the correlation degree of the very correlation is higher than the correlation degree of the comparative correlation, the correlation degree of the comparative correlation is higher than the correlation degree of the weak correlation, and the correlation degree of the weak correlation is higher than the correlation degree of the complete uncorrelation. The embodiment of the application is not limited to the identification of four predetermined degree of correlation categories, for example, a very correlation may be represented by identification (1, 0), a comparative correlation may be represented by identification (0, 1, 0), a weak correlation may be represented by the designation (0, 1, 0) and a complete uncorrelation may be represented by the designation (0, 1).

To further distinguish whether the sample term is related or not related to the video related text, a relevance classification is introduced, which is used to characterize whether the sample term is related or not related to the video related text, i.e. the relevance classification comprises both not related and related. Based on this, each predetermined degree of correlation class may be further categorized, i.e. each predetermined degree of correlation class corresponds to a correlation class. For example, both a very relevant and a comparative relevant belong to a correlation in the relevance classification; weak correlations and complete uncorrelation belong to uncorrelation in the correlation classification.

Optionally, for each sample entry, the computer device arbitrarily selects one sample video from the videos after the coarse-ranking stage screening, and determines the video-related text of the sample video. The computer device determines a tagged relevance class for the sample entry and the sample video. The computer device determines a corresponding training sample based on the sample entry, the video-related text, and the annotation-related degree category.

For each sample entry, the computer device performs word segmentation processing on the sample entry to obtain each word segmentation result corresponding to the sample entry, the computer device randomly selects one sample video from videos obtained in a coarse arrangement stage, determines a sequence code of a video related text of the sample video, queries the video related text of the sample video based on the sequence code, and performs word segmentation processing on the video related text to obtain each word segmentation result corresponding to the video related text. The computer device determines a tagged relevance class for the sample entry and the sample video. The computer device determines a training sample based on the sample term, each word segmentation result corresponding to the sample term, the annotation relevance class, the sequence encoding of the video related text of the sample video, the video related text, and each word segmentation result corresponding to the video related text.

For example, as shown in fig. 3, a schematic diagram of a training sample in one embodiment is shown. The sample entry (query) is 'Xiaoming calendar dangerous Zhang Sany', the sample entry is composed of an actor name and a TV play name, and word segmentation processing is carried out on the sample entry to obtain a word segmentation result 1 corresponding to the actor name and a word segmentation result 2 corresponding to the TV play name. The word segmentation result 1 is that the word weight corresponding to the actor name 'Zhang Sanj' is a1, and the corresponding idf (reverse file frequency) field is b1. The word segmentation result 2 is that the word weight corresponding to the television series name 'Xiaoming calendar dangerous record' is a2, and the corresponding idf field is b2. And randomly selecting a sample video based on the sample entry, and acquiring a sequence code H of a video related text of the sample video. Based on the sequence encoding H, the video-related text is determined, the video-related text including video information. For example, a scenario profile is written in the video-related text, i.e. "Xiaoming is forced to start a calendar tour due to the loss of credentials at M". Video information such as video titles, actor names, love tags of the video and the like is also included in the video related text. The computer equipment performs word segmentation processing on the video related text to obtain a corresponding word segmentation result, for example, word segmentation is performed on 'the certificate is lost in M places and the calendar is forced to be started', the word weight and the idf field corresponding to each word are obtained, the word weight corresponding to 'the Xiaoming' is a3, and the corresponding idf field is b3; the word weight corresponding to the "cause" is a4, and the idf field corresponding to the "cause" is b4; the word weight corresponding to the 'at M' is a5, and the idf field corresponding to the 'at M' is b5; the word weight corresponding to the missing is a6, and the idf field corresponding to the missing is b6; the word weight corresponding to the certificate is a7, and the idf field corresponding to the certificate is b7; the word weight corresponding to the word "and" is a8, and the idf field corresponding to the word "and" is b8; the word weight corresponding to the "forced" is a9, and the idf field corresponding to the "forced" is b9; the word weight corresponding to the "opened" is a10, and the idf field corresponding to the "opened" is b10; the word weight corresponding to "calendar risk itinerary" is a11 and the idf field corresponding to "calendar risk itinerary" is b11. The computer device obtains a tag for the sample video, the tag characterizing whether the sample video belongs to a long video category or a short video category. If the tag is 1, it indicates that the sample video is of a long video category, and if the tag is 0, it indicates that the sample video is of a short video category.

It should be noted that, the present application is applied to the fine-ranking stage, and the number of samples involved in the fine-ranking stage is smaller than that of samples involved in the coarse-ranking stage, based on this, the training samples in the present application are not unsupervised samples, that is, the positive and negative sample pairs are not required to be determined by clicking data or a clicking behavior sequence, and only the type of the degree of correlation of the labeling between the sample entry and the sample video is required to be manually labeled.

And 204, determining sample characteristics according to the sample entry, the video related text and the sample video, and predicting the probability that the training samples respectively belong to each preset degree of correlation class according to the sample characteristics through a degree of correlation prediction model.

The sample features are used for performing correlation degree estimation. The correlation degree prediction model is used for predicting the probability that the training sample belongs to each preset correlation degree category. The correlation degree estimation model is a neural network model.

Optionally, the computer device performs multi-level semantic matching on the sample entry, the video related text and the sample video to determine the sample characteristics. The computer equipment inputs the sample characteristics into a correlation degree prediction model to obtain the probability that the training sample belongs to each preset correlation degree category respectively.

The multi-level semantic matching refers to matching processing from the aspects of text statistics, semantic matching and multi-modal matching. The text statistics can be the entity word of the statistics sample entry and the entity word of the video related text, can be the category of the statistics video related text, and can also be the click data of the statistics sample entry and the video related text. Semantic matching refers to the matching relationship of semantics between sample entries and related texts of videos, and multi-mode matching refers to matching from information of modes such as videos, texts and the like.

The computer device performs multi-level semantic matching according to the sample entry, the video related text and the sample video, and then, the computer device inputs the sample characteristics into a correlation degree prediction model to obtain probability P1 that the training sample belongs to very correlation, probability P2 that the training sample belongs to relatively correlation, probability P3 that the training sample belongs to weak correlation and probability P4 that the training sample belongs to complete uncorrelation.

Step 206, determining the estimated correlation loss according to the probability that the training sample belongs to the class of the labeled correlation degree.

The correlation degree prediction loss is used for learning the correlation degree between the entry and the video.

Optionally, for each training sample, the computer device determines a labeled relevance class for that training sample and calculates a predicted relevance loss based on the probability of labeled relevance for each training sample. The estimated loss of correlation is inversely correlated with the probability of each annotation correlation class.

For each training sample, the computer device determines the labeling correlation degree category of the training sample, and calculates the sub-correlation degree estimated loss corresponding to the training sample according to the labeling correlation degree category of the training sample. The computer equipment stacks the sub-correlation degree estimated loss corresponding to each training sample, and determines the correlation degree estimated loss.

Step 208, determining a relevance classification loss according to the probability that the training sample belongs to each predetermined relevance class of the relevance classifications corresponding to the labeling relevance classes.

Wherein the relevance classification penalty is used to learn the relevance between the term and the video, i.e. whether the term is relevant or not relevant to the video.

Optionally, for each training sample, the computer device determines a relevance class corresponding to the labeled relevance class based on the labeled relevance class for that training sample. The computer device calculates a probability of each predetermined degree of correlation class belonging to the correlation class and calculates a sub-correlation class penalty corresponding to the training sample based on the probability that the training sample belongs to each predetermined degree of correlation class of the correlation class. And superposing sub-correlation classification losses corresponding to the training samples by the computer equipment to obtain the correlation classification losses.

Illustratively, for training sample 1, the labeled degree of correlation category for training sample 1 is very relevant, i.e., the relevance classification is determined to be relevant. Because both the very correlations and the comparative correlations belong to correlations in the correlation classifications, based on this, the computer device determines a sub-correlation classification penalty for training sample 1 based on the probability of the very correlations and the probability of the comparative correlations for training sample 1. For training sample 2, the labeled degree of correlation class of the training sample 2 is a comparison correlation, the correlation class is determined to be a correlation, and because both the very correlation and the comparison correlation belong to correlations in the correlation class, based on this, the computer device determines a sub-correlation class loss corresponding to the training sample 2 based on the probability of the very correlation and the probability of the comparison correlation corresponding to the training sample 2. For training sample n, the labeled degree of correlation class of the training sample n is completely uncorrelated, the correlation classification is determined to be uncorrelated, and because the weak correlation and the complete uncorrelation both belong to the uncorrelation in the correlation classification, based on the correlation classification, the computer equipment determines the sub-correlation classification loss corresponding to the training sample n based on the probability of the weak correlation and the probability of the complete uncorrelation corresponding to the training sample n. And superposing sub-correlation classification losses corresponding to the training samples 1 to n by the computer equipment to obtain the correlation classification loss.

Based on this, the correlation classification loss can ensure that the correlation classifications are identical as similar as possible, ensuring that the correlation scores have a global physical meaning.

Step 210, combining the correlation degree pre-estimation loss and the correlation classification loss, training a correlation degree pre-estimation model, and using the trained correlation degree pre-estimation model for video correlation degree sequencing.

Optionally, the computer device combines the degree of correlation pre-estimate loss and the correlation classification loss to determine a target loss. And the computer equipment carries out iterative training on the correlation degree estimation model according to the target loss until reaching the training stop condition, so as to obtain the trained correlation degree estimation model.

The training stopping condition may be that the target loss reaches a preset value, or the iteration number reaches a preset number, which is not limited in particular.

According to the training method of the correlation degree estimation model, training samples are obtained; each training sample comprises a sample entry, a video related text of the sample video and a marked related degree category, the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of the preset related degree categories, each preset related degree category corresponds to a related classification, and the degree of distinction between related and unrelated between the sample entry and the sample video in the training sample is increased; determining sample characteristics according to sample entries, video related texts and sample videos, and intuitively and accurately estimating the probability that training samples respectively belong to each preset degree of correlation class according to the sample characteristics through a degree of correlation estimation model; according to the probability that the training sample belongs to the labeling correlation degree category, the correlation degree estimated loss is determined, so that model training is carried out based on the correlation degree estimated loss, and the model can be used for distinguishing scores of different correlation degrees. According to the probability that the training sample belongs to each preset correlation degree category of the correlation classification corresponding to the labeling correlation degree category, the correlation classification loss is determined, so that model training is carried out based on the correlation classification loss, the global physical meaning of model scoring can be improved, namely the scoring is similar as much as possible when the correlation classifications are the same; and combining the correlation degree pre-estimation loss and the correlation classification loss, training a correlation degree pre-estimation model, and using the trained correlation degree pre-estimation model for video correlation degree sequencing. Based on the method, the trained correlation degree prediction model can improve the global physical meaning of scoring while improving the classification degree of the correlation degree, and the robustness is improved, so that the accuracy of sequencing videos according to the correlation degree is ensured.

In some embodiments, as shown in fig. 4, a flow chart of the step of determining the characteristics of the sample in one embodiment is shown. Determining sample characteristics according to the sample entry, the video related text and the sample video comprises:

step 402, determining semantic matching features according to the sample entry and the video related text, wherein the semantic matching features characterize the correlation between the sample entry and the video related text.

The semantic matching feature can be regarded as a vector representation, and represents a feature on the semantic level.

Optionally, the computer device performs feature engineering calculation of the semantic matching feature based on the sample entry and the video related text to obtain the semantic matching feature representing the correlation between the sample entry and the video related text.

Illustratively, the computer device acquires a semantic matching feature extraction model, and inputs sample entries and video related text into the semantic matching feature extraction model to obtain semantic matching features. The semantic matching feature extraction model may be a bert (Bidirectional Encoder Representations from Transformers, bi-directional encoder from transformer) model, extracting the semantic matching feature bert_embedding.

Step 404, determining text statistics according to the sample entry and the video related text.

Optionally, the computer equipment performs feature engineering calculation of the text statistical feature according to the sample entry and the video related text to obtain the text statistical feature. The text statistics include basic text features, posterior features, knowledge features, and text category features.

The basic text features include the word term matching degree of the sample term and the video related text, the word weight coverage ratio (covered_query_real_weight_ratio) of the video related text to the sample term, the word term hit state (hit_status) of the sample term and the video related text, and the like. The step of acquiring the basic text features comprises the following steps: the computer equipment obtains the word term matching degree of the sample word entry and the video related text, the word weight coverage ratio of the video related text to the sample word entry and the word term hit state of the sample word entry and the video related text through probability retrieval calculation based on the sample word entry and the video related text. The probability search calculation may be calculated by BM25 (Best Match 25), or calculated by BM25F (Best Match25Field ), and is not particularly limited.

The posterior feature is obtained based on a click relation between the sample term and the video-related text, for example, the computer device constructs a bipartite graph based on the click relation between the sample term and the video-related text, calculates a click similarity (click similarity) between the sample term and the video-related text based on the bipartite graph, and determines the posterior feature based on the click similarity between the sample term and the video-related text and the intention of whether the video-related text hits the sample term.

The knowledge features are calculated based on the entity words of the sample vocabulary entries and the entity words of the video related texts, and comprise the number of the entity words of the sample vocabulary entries, the number of the entity words of the video related texts and the entity word coverage ratio of the sample vocabulary entries. For example, the computer device determines, based on the sample term and the video-related text, a number of entity words (query_ner_number) of the sample term corresponding to the named entity recognition, a number of entity words (doc_ner_number) of the video-related text, and an entity word coverage ratio (covered_query_ner_ratio) of the sample term, and the like, through the named entity recognition (named entity recognition, ner). The computer equipment determines the number of entity words (query_kg_number) of the sample word corresponding to the knowledge graph, the number of entity words (doc_kg_number) of the video related text, the entity word coverage ratio (covered_query_kg_ratio) of the sample word and the like through knowledge graph recognition based on the sample word and the video related text. The computer apparatus determines a knowledge feature based on a number of entity words (query_ner_number) of the sample term corresponding to the named entity recognition, a number of entity words (doc_ner_number) of the video-related text, and a ratio of entity word coverage (covered_query_ner_ratio) of the sample term, a number of entity words (query_kg_number) of the sample term corresponding to the knowledge graph, a number of entity words (doc_kg_number) of the video-related text, and a ratio of entity word coverage (covered_query_kg_ratio) of the sample term.

The text category features whether the video related text belongs to a long video category or a short video category. The computer equipment obtains the label of the sample video from the training sample, and takes the label of the sample video as the text category characteristic.

Step 406, determining the multi-modal feature according to the correlation score of the sample entry and the sample video and the correlation score of the sample entry and the video related text.

Optionally, the computer equipment obtains the semantic features of the sample vocabulary entry, the semantic features of the sample video and the semantic features of the video related text through the feature engineering calculation of the multi-mode features according to the sample vocabulary entry, the sample video and the video related text. The computer equipment determines the correlation score of the sample entry and the sample video according to the semantic features of the sample entry and the semantic features of the sample video; and determining the correlation score of the sample entry and the video related text according to the semantic features of the sample entry and the semantic features of the video related text. The computer equipment determines multi-modal characteristics according to the semantic features of the sample vocabulary entries, the semantic features of the sample video, the semantic features of the video related texts, the related scores of the sample vocabulary entries and the sample video and the related scores of the sample vocabulary entries and the video related texts.

The computer device obtains key frames of a plurality of important episodes of the sample video and cover frames corresponding to the cover, fuses each key frame and the cover frame to obtain a fused image, and can reflect information of the sample video based on the fused image. Therefore, the whole sample video is not required to be extracted in the follow-up process, the fusion image is directly extracted by the root, and the extracted semantic features are directly used as the semantic features of the sample video.

Based on the above, the computer equipment acquires a multi-modal model, inputs a sample entry, a fusion image and a video related text into the multi-modal model, obtains semantic features (embdding 1) of the sample entry, semantic features (embdding 2) of the sample video and semantic features (embdding 3) of the video related text, and calculates a related score 1 of the sample entry and the sample video according to the semantic features (embdding 1) of the sample entry and the semantic features (embdding 2) of the sample video; and calculating a correlation score 2 of the sample entry and the video related text according to the semantic features (emmbding 1) of the sample entry and the semantic features (emmbding 3) of the video related text. The computer equipment obtains multi-mode features according to the semantic features (emmbdding 1) of the sample entry, the semantic features (emmbdding 2) of the sample video, the semantic features (emmbdding 3) of the video related text, the related score 1 and the related score 2.

In step 408, the semantic matching feature, the text statistics feature, and the multi-modal feature are spliced to obtain sample features.

Optionally, the computer device splices the semantic matching feature, the text statistics feature and the multi-modal feature to obtain a sample feature.

In this embodiment, by using the sample entry and the video related text, semantic matching features that can characterize the correlation between the sample entry and the video related text are determined. Text statistics are determined from the sample entry and the video related text. And extracting multi-mode characteristics according to the sample entry, the sample video and the video related text. In this way, the related information between the sample entry and the sample video can be comprehensively obtained according to the semantic matching features, the text statistical features and the multi-modal features, so that the sample features with rich related information can be obtained, and further, the probability that the training samples respectively belong to each preset degree of correlation category can be accurately estimated, and the estimated degree of correlation loss and the classified degree of correlation loss can be obtained. Based on the method, the correlation degree pre-estimation loss and the correlation classification loss are combined to obtain a trained correlation degree pre-estimation model, and the trained correlation degree pre-estimation model can improve the global physical meaning of scoring while improving the scoring division, so that the robustness is improved, and the accuracy of sequencing videos according to the correlation degree is ensured.

In some embodiments, estimating, by the relevance estimation model, probabilities that the training samples respectively belong to each predetermined relevance class according to the sample characteristics includes: inputting the sample characteristics into a correlation degree estimation model; outputting probability density distribution of the training samples belonging to each preset correlation degree category through a correlation degree prediction model; and converting the probability density distribution into probability distribution of the training samples belonging to each preset correlation degree category through an activation function, and obtaining probability of the training samples belonging to each preset correlation degree category according to the probability distribution.

Optionally, after the computer device inputs the sample features to the correlation degree prediction model, a probability density distribution about the training samples belonging to each predetermined correlation degree category is output. For each predetermined relevance class, the computer device determines a probability distribution for the predetermined relevance class by activating a function based on the probability density distribution for the predetermined relevance class. The computer device obtains a probability that the training sample belongs to the predetermined correlation class based on the probability distribution.

Illustratively, after the computer device inputs the sample features to the relevance prediction model, the computer device outputs values for random variables of the training samples belonging to respective predetermined relevance categories. For each predetermined degree of correlation class, the computer device determines a probability density distribution corresponding to the predetermined degree of correlation class based on the value of the random variable of the predetermined degree of correlation class. The predetermined degree of correlation categories are arranged in a manner that the degree of correlation is high to low. If the value of the random variable corresponding to the first preset correlation degree category is minimum, for each preset correlation degree category, if the preset correlation degree category is not the first preset correlation degree category, the computer equipment determines the probability of the preset correlation degree category through an activation function based on the probability density distribution of the preset correlation degree category and the previous probability density distribution of the previous preset correlation degree category before the preset correlation degree category. If the predetermined correlation degree category is the first preset correlation degree category, the probability of the first preset correlation degree category is determined directly based on the probability density distribution of the first preset correlation degree category through an activation function.

For example, there are 4 predetermined correlation degree categories, which are in turn very correlated, comparatively correlated, weakly correlated, and completely uncorrelated, ordered by correlation degree from high to low. For training sample 1, after the input features of the training sample 1 are input into the correlation degree estimation model, the value of the random variable corresponding to the very correlation is θ ₁ -x, the corresponding probability density distribution being f ₁ (θ ₁ -x); comparing the value of the corresponding random variable with theta ₂ -x, the corresponding probability density distribution being f ₂ (θ ₂ -x); the value of the random variable corresponding to weak correlation is theta ₃ -x, the corresponding probability density distribution being f ₃ (θ ₃ -x); the value of the random variable corresponding to complete uncorrelation is + -infinity, and the corresponding probability density distribution is f ₄ (+ infinity is provided A kind of electronic device. X is a random value, and the value range of the random value is more than or equal to 0 and less than or equal to 1.θ ₁ 、θ ₂ 、θ ₃ Is super-parameter and can be understood as three dividing points, theta ₁ 、θ ₂ 、θ ₃ Sequentially increasing the number of (c). Based on this, the value of the random variable corresponding to the first preset correlation degree category is the smallest. The computer equipment obtains an activation function, and determines probability distribution of the training sample belonging to each preset degree category according to the activation function, wherein the interval is [ a, b ] for the value m of the random variable ]Integrating the probability density distribution of the interval to obtainThe probability of a piece (i.e. the probability that the training sample belongs to a certain preset degree of correlation class) occurring within the interval can be regarded geometrically as the area determined by the interval, the curve of the probability density distribution function and the abscissa. Namely:

wherein P (a is less than or equal to m is less than or equal to b) is the probability that the training sample belongs to a certain preset correlation degree category, sigma (level) is a probability distribution function, and f (level) is a probability density distribution function. Thus, after the values of the respective random variables are determined as described above, the value 1 of the corresponding probability distribution is highly correlated, i.e., σ (θ ₁ -x); comparing the value 2 of the probability distribution corresponding to the correlation, i.e. σ (θ ₂ -x); the value 3 of the probability distribution corresponding to the weak correlation, i.e., σ (θ ₃ -x); the value of the probability distribution of completely uncorrelated pairs is 4, i.e., σ (+infinity) =1; as shown in fig. 5, a schematic diagram of probabilities in one embodiment is shown. The curve in fig. 5 is the sigmoid function curve, which is the probability density distribution curve. Let theta ₁ 、θ ₂ 、θ ₃ The values of (2) are-2.5, 0, 3, and x is 0.5, then θ ₁ -x、θ ₂ -x、θ ₃ -x is, -3, -0.5 and 2.5 in this order. The training samples respectively belong to the intervals corresponding to the very correlation, the comparative correlation, the weak correlation and the complete uncorrelation of [ - +% and-3 ]、[-3,-0.5]、[-0.5,2.5]、[2.5,+∞]. For the case where the training samples are very correlated, the interval 1 of values of the random variable is [ - ≡3 [ - ], 3]The probability P1 that the training sample belongs to a very relevant model is σ (-3) - σ (- ≡) =σ (-3), the value of σ (-fact) is 0, and the geometric meaning of P1 can be understood as the area of a shadow region formed by the x-axis, the sigmoid curve and the straight line of x= -3; for the case where the training samples are of comparative relevance, the interval 2 of values of the random variable is [ -3, -0.5]The probability P2 that the training sample belongs to a very relevant class is σ (-0.5) - σ (-3), and the geometric meaning of P2 can be understood as the area of the shadow region consisting of the x-axis, the sigmoid curve, the straight line of x= -3, and the straight line of x= -0.5. For the training sample belong to weakIn the related case, the interval 3 of the values of the random variables is [ -0.5,2.5]The probability P3 that the training sample belongs to weak correlation is σ (2.5) - σ (-0.5), and the geometric meaning of P3 can be understood as the area of a shadow region consisting of the x-axis, the sigmoid curve, the straight line of x=2.5, and the straight line of x= -0.5. For the case where the training samples are completely uncorrelated, the interval 4 of values of the random variable is 2.5, ++ infinity]The probability P4 that the training sample belongs to complete uncorrelation is σ (+infinity) - σ (2.5) =1- σ (2.5), the value of σ (+infinity) is 1, the geometric meaning of P4 can be understood as being defined by the x-axis, sigmoid curve straight line of x=2.5 and straight line of x= +++ infinity area of the shadow region formed.

In this embodiment, after the sample features are input into the correlation degree prediction model, probability density distribution of each predetermined correlation degree category is obtained, probability that the training sample belongs to each predetermined correlation degree category can be accurately determined based on each probability density distribution and the activation function, and the correlation degree pre-estimation loss and the correlation classification loss can be accurately reflected based on the labeling of the correlation degree category and the probability of each predetermined correlation degree category. Based on the method, the correlation degree pre-estimation loss and the correlation classification loss are combined to obtain a trained correlation degree pre-estimation model, and the trained correlation degree pre-estimation model can improve the global physical meaning of scoring while improving the scoring division, so that the robustness is improved, and the accuracy of sequencing videos according to the correlation degree is ensured.

In some embodiments, as shown in fig. 6, a flow chart of the step of determining the probability in one embodiment is shown. Estimating the probability that the training samples respectively belong to each preset correlation degree category according to the sample characteristics by using a correlation degree estimation model, wherein the method comprises the following steps:

step 602, extracting features of the sample features through a multi-scene feature extraction network of the correlation degree prediction model to obtain common features corresponding to a plurality of scenes and characteristic features corresponding to each scene.

The scenes include a long video scene and a short video scene, as shown in fig. 7A, which is a schematic diagram of a search interface in an embodiment. In the video searching process, the terminal displays a searching interface, entries such as 'small-sized calendar risk' are input in a searching box of the searching interface through a user account, and in response to input operation, a fine area is displayed in the searching interface, wherein the fine area is used for displaying long videos. The top-quality area displays a screening menu bar corresponding to the short video, wherein the screening menu bar displays screening controls such as 'all', 'movie', 'user', 'novel', 'cartoon', and other screening controls, such as an icon of a drama name of 'Xiaoming calendar' shown in fig. 7A, and a plurality of actor names, such as actor one to actor five, for shooting the drama. The top-quality area also shows trigger controls for immediate play and caching. The top-quality area also displays other relevant movie icons such as "parse small calendar risk" belonging to the television series category, "small calendar risk cartoon" belonging to the cartoon category, and "small calendar risk movie" belonging to the movie category "

After the user browses the top-quality area, the integrated area is displayed in the search interface through a sliding operation, as shown in fig. 7B, which is a schematic diagram of the search interface in another embodiment. The integrated area is used to display short video. The integrated area shows a screening menu bar corresponding to the short videos, which shows screening controls such as "all", "movie", "user", "novice", "cartoon", and other screening controls, and shows individual thematic videos such as icons of "little calendar risk score collection" and icons of "little calendar risk score interpretation report" in fig. 7B. The location of each icon in the search interface may be determined based on the amount of play, i.e., the higher the amount of play, the first location in the integrated area is presented. The play amount is shown on each icon. The playing amount of the 'Xiaoming calendar risk flower aggregate' and the 'Xiaoming calendar risk interpretation report' are 160 ten thousand and 150 ten thousand respectively.

The common feature is a feature common to a plurality of scenes, and the characteristic feature represents a characteristic of each scene. The multi-scene feature extraction network is used for extracting common features among a plurality of scenes and characteristic features of each scene. That is, commonalities and characteristics of different scenes can be implicitly learned through the multi-scene feature extraction network.

Optionally, the multi-scene feature extraction network includes a plurality of scene feature extraction networks, and the computer device inputs the sample features to each scene feature extraction network respectively, so as to obtain the common features and the characteristic features.

A scene feature extraction network for extracting common features and a scene feature extraction network for extracting characteristic features may be preset, based on which the computer device extracts common features among a plurality of scenes through the scene feature extraction network for extracting common features based on the sample features.

Based on the sample characteristics, the computer equipment extracts the characteristic characteristics corresponding to each scene through a scene characteristic extraction network for extracting the characteristic characteristics of each scene. That is, the input sample features are processed through one scene feature extraction network, so that the characteristic features corresponding to the respective scenes can be output at a time.

Of course, for each characteristic feature corresponding to the scene, the characteristic feature corresponding to the scene may be extracted from the sample feature through a scene feature extraction network for extracting the characteristic feature of the scene. That is, according to the number of scenes, a scene feature extraction network for extracting characteristic features of the scene, that is, the number of scene feature extraction networks for extracting characteristic features is provided for each scene to coincide with the number of scenes.

And step 604, carrying out weighted summation on the common characteristics and the characteristic characteristics corresponding to each scene to obtain the multi-scene characteristics.

Optionally, the computer device determines the common feature and the feature weight of each feature, and performs weighted summation on the common feature and the feature corresponding to each scene according to the common feature and the feature weight of each feature, so as to obtain the multi-scene feature.

For each training sample, the computer device determines the scene to which the training sample belongs, based on the labels of the sample videos that the training sample includes. And the computer equipment determines the common characteristics and the characteristic weights of the characteristic characteristics according to the scene to which the training sample belongs.

For example, if the scene to which the sample video in the training sample belongs is a scene of a long video, the feature weights of the feature features corresponding to the common feature and the scene of the long video are higher than the feature weights of the feature features corresponding to the scene of the short video.

And step 606, respectively carrying out feature extraction on the multi-scene features through an explicit scene feature extraction network of the correlation degree prediction model to obtain explicit scene features corresponding to each scene.

For each scene, the explicit scene extraction network corresponding to the scene is used for determining the explicit scene characteristics of the scene. The scene features are displayed for determining probabilities for respective predetermined relevance categories.

Optionally, the computer device performs feature extraction on the multi-scene features through an explicit scene feature extraction network of each scene of the correlation degree prediction model to obtain explicit scene features corresponding to each scene.

For example, the computer device splits according to different scenes to obtain explicit scene feature extraction networks corresponding to the scenes respectively, and inputs the multi-scene features into the explicit scene feature extraction networks respectively to obtain the explicit scene features output by the explicit scene feature extraction networks.

And 608, carrying out weighted summation on the explicit scene characteristics corresponding to each scene to obtain output characteristics.

Optionally, the computer device obtains weights corresponding to the scenes respectively, and performs weighted summation on explicit scene features corresponding to the scenes according to feature weights corresponding to the scenes respectively to obtain output features.

Step 610, determining the probability that the training sample belongs to each predetermined correlation class according to the output characteristics.

Wherein the output features comprise probability density distributions of training samples belonging to respective predetermined correlation degree categories.

Optionally, the computer device determines, according to the output characteristics, a probability distribution of the training sample belonging to each predetermined degree of correlation class by activating a function, and determines, according to the probability distribution of each predetermined degree of correlation class, a probability of the training sample belonging to each predetermined degree of correlation class.

In this embodiment, through a multi-scene feature extraction network of the correlation degree prediction model, feature extraction is performed on sample features to determine common features among different scenes and characteristic features corresponding to each scene respectively. In this way, the common characteristics and the characteristic characteristics are fused in a weighted summation mode, and the multi-scene characteristics comprising scene characteristic information and common information are obtained. And respectively carrying out feature extraction on the multi-scene features through an explicit scene feature extraction network of the correlation degree prediction model, determining explicit scene features corresponding to each scene, and carrying out weighted summation on the explicit scene features corresponding to each scene to obtain output features matched with the scene to which the sample video belongs, so that the probability that the training sample belongs to each preset correlation degree category can be accurately predicted based on the output features. Thus, the estimated correlation degree loss and the classification correlation loss can be accurately reflected. Based on the method, the pre-estimation loss of the correlation degree and the classification loss of the correlation are combined, and a trained correlation degree estimation model capable of accurately estimating the score of the video correlation degree is obtained. Based on the method, the sequencing tasks of the video correlation degree under different scenes are completed through multi-scene modeling, and the accuracy of the video correlation degree sequencing is ensured.

In some embodiments, the multi-scene feature extraction network comprises a first scene feature extraction network, a second scene feature extraction network, and a multi-scene common feature extraction network; extracting features of the sample features through a multi-scene feature extraction network of a correlation degree prediction model to obtain common features corresponding to a plurality of scenes and characteristic features corresponding to each scene, wherein the method comprises the following steps: extracting characteristic features corresponding to the first scene through a first scene feature extraction network; extracting characteristic features corresponding to the second scene through a second scene feature extraction network; and extracting the common characteristics corresponding to the first scene and the second scene through a multi-scene common characteristic extraction network.

The first scene feature extraction network is used for extracting characteristic features of the first scene, and the second scene feature extraction network is used for extracting characteristic features of the second scene. The first scene and the second scene are different scenes. The multi-scene common feature extraction network is used for extracting common features of the first scene and the second scene.

Optionally, the computer device acquires a sample feature, inputs the sample feature to the first scene feature extraction network, learns scene knowledge of a first scene in the sample feature through the first scene feature extraction network, and extracts a characteristic feature corresponding to the first scene. And inputting the sample characteristics into a second scene characteristic extraction network, and learning scene knowledge of a second scene in the sample characteristics through the second scene characteristic extraction network to extract characteristic characteristics corresponding to the second scene. The computer equipment inputs the sample characteristics into a multi-scene common characteristic extraction network, and learns scene knowledge shared by different scenes in the sample characteristics through the multi-scene common characteristic extraction network, so as to extract common characteristics corresponding to the first scene and the second scene.

Illustratively, a first scene feature extraction network is used to extract characteristic features 1 of scenes of long videos, a second scene feature extraction network is used to extract characteristic features 2 of scenes of short videos, and a multi-scene sharing feature extraction network is used to extract common features 3 common to scenes of long videos and scenes of short videos.

For each training sample, the computer equipment respectively inputs sample characteristics corresponding to the training sample into a first scene characteristic extraction network, a second scene characteristic extraction network and a multi-scene shared characteristic extraction network to sequentially obtain characteristic characteristics 1, characteristic characteristics 2 and characteristic characteristics 3.

In this embodiment, through the first scene feature extraction network, the characteristic feature matched with the scene information of the first scene can be extracted from the sample feature accurately. Through the second scene feature extraction network, characteristic features matched with scene information of the second scene can be accurately extracted from the sample features. Therefore, by respectively introducing respective scene feature extraction networks for different scenes, the characteristic features of the corresponding scenes can be rapidly and accurately extracted, and the accuracy of the characteristic features is ensured. Meanwhile, the common characteristics corresponding to the two scenes can be extracted rapidly through the multi-scene common characteristic extraction network. Based on the characteristic features and the common features of different scenes, the output features matched with the scenes to which the sample video belongs can be obtained later, and based on the output features, the probability that the training sample belongs to each preset correlation degree category can be estimated accurately. Thus, the estimated correlation degree loss and the classification correlation loss can be accurately reflected. Based on the method, the pre-estimation loss of the correlation degree and the classification loss of the correlation are combined, and a trained correlation degree estimation model capable of accurately estimating the score of the video correlation degree is obtained. In this way, through multi-scene joint modeling, the isomerism between long and short videos is fully considered, the problem of video relevance distribution difference in respective scenes of the long and short videos is solved through the first scene feature extraction network, the second scene feature extraction network and the multi-scene commonality feature extraction network, the accuracy of long-tail videos in relevance and scoring of entities can be improved, and further, the ordering of videos is optimized.

In some embodiments, the weighting and summing the common feature and the characteristic feature corresponding to each scene to obtain the multi-scene feature includes: determining the feature weights corresponding to the first scene feature extraction network, the second scene feature extraction network and the multi-scene common feature extraction network respectively according to the sample features through a first gating network of the correlation degree prediction model; and weighting and summing the characteristic features corresponding to the first scene, the characteristic features corresponding to the second scene and the common characteristic according to the respective corresponding characteristic weights to obtain the multi-scene features.

The first gating network is used for learning the characteristic weights of the first scene characteristic extraction network, the second scene characteristic extraction network and the multi-scene commonality characteristic network.

Optionally, the computer device inputs the sample feature to a first gating network of the correlation degree estimation model to obtain respective feature weights of the first scene feature extraction network, the second scene feature extraction network and the multi-scene common feature network. And the computer equipment extracts the characteristic features and the characteristic weights corresponding to the network according to the first scene features, extracts the characteristic features and the characteristic weights corresponding to the network according to the second scene features, extracts the common features and the characteristic weights corresponding to the network according to the common features and the characteristic weights corresponding to the multi-scene common features, and obtains the multi-scene features through weighted summation.

It should be noted that, by learning the feature weights of the first scene feature extraction network, the second scene feature extraction network and the multi-scene common feature network through the first gating network, it is ensured that under the condition that the number of training samples of a certain scene is sparse, learning is performed through rich training samples of another scene, information for promoting each scene mutually is retained, information contradicting each other between other scenes and scenes to which the training samples belong is abandoned, and therefore the problem that the number of training samples corresponding to the certain scene is sparse is solved.

In this embodiment, the feature weights of the first scene feature extraction network, the second scene feature extraction network, and the multi-scene common feature extraction network can be adaptively adjusted through the sample feature and the first gating network. In this way, the multi-scene feature including the scene characteristic information and the commonality information can be obtained based on the respective feature weights. Based on the multi-scene features, the probability that the training sample belongs to each preset correlation degree category is accurately estimated subsequently. Thus, the estimated correlation degree loss and the classification correlation loss can be accurately reflected. Based on the method, the pre-estimation loss of the correlation degree and the classification loss of the correlation are combined to obtain a trained correlation degree pre-estimation model capable of accurately pre-estimating the score of the video correlation degree, so that the accuracy of sequencing the videos according to the correlation degree is ensured.

In some embodiments, the explicit scene feature extraction network comprises a first explicit scene feature extraction network and a second explicit scene feature extraction network; the method comprises the steps of respectively carrying out feature extraction on the multi-scene features through an explicit scene feature extraction network of a correlation degree prediction model to obtain explicit scene features corresponding to each scene, and comprises the following steps: extracting explicit scene characteristics corresponding to a first scene through a first explicit scene characteristic extraction network; and extracting the explicit scene characteristics corresponding to the second scene through the second explicit scene characteristic extraction network.

Optionally, the computer device inputs the multi-scene feature to a first explicit scene feature extraction network, and extracts explicit scene features corresponding to the first scene. The computer equipment inputs the multi-scene characteristics into a second display scene characteristic extraction network, and extracts the explicit scene characteristics corresponding to the second scene.

In this embodiment, through the first explicit scene feature extraction network, explicit scene features that match the first scene can be obtained, and through the second display scene feature extraction network, explicit scene features that match the second scene can be obtained. Thus, according to the explicit scene characteristics corresponding to each scene, the output characteristics matched with the scene to which the sample video belongs can be obtained, and based on the output characteristics, the probability that the training sample belongs to each preset degree of correlation category can be accurately estimated, so that the degree of correlation pre-estimation loss and the degree of correlation classification loss are determined. Based on the method, the pre-estimation loss of the correlation degree and the classification loss of the correlation are combined to obtain a trained correlation degree pre-estimation model capable of accurately pre-estimating the score of the video correlation degree, so that the accuracy of sequencing the videos according to the correlation degree is ensured.

In some embodiments, after step 604, the method further comprises: and determining the scene to which the sample video belongs in the training sample, and taking the scene to which the sample video belongs as a target scene. Scene attribute features of the target scene are determined based on the sample entry of the training sample and the video-related text.

Wherein, the scene attribute features are unique features of the target scene, are explicit low-dimensional features, and are features of the target scene actually existing, relative to the characteristic features corresponding to the aforementioned scenes. The characteristic features are one high-dimensional feature which is learned by the first scene feature extraction network and the second scene feature extraction network.

Optionally, the computer device extracts, for the target scene, scene attribute features under the target scene to which the sample video belongs through the scene attribute feature model based on the sample entry of the training sample and the video related text.

In this embodiment, by determining the target scene corresponding to the current training sample, the scene attribute features of the target scene are extracted in real time, and the unique scene features of the target scene can be accurately reflected based on the scene attribute features.

In some embodiments, after determining the scene attribute features of the target scene, the explicit scene feature extraction network comprises a first explicit scene feature extraction network and a second explicit scene feature extraction network; the method comprises the steps of respectively carrying out feature extraction on the multi-scene features through an explicit scene feature extraction network of a correlation degree prediction model to obtain explicit scene features corresponding to each scene, and comprises the following steps: the computer equipment fuses the scene attribute characteristics and the multi-scene characteristics to obtain fusion characteristics, and if the target scene is a first scene, the computer equipment inputs the fusion characteristics into a first explicit scene characteristic extraction network to obtain explicit scene characteristics corresponding to the first scene. The computer equipment inputs the multi-scene features into a second explicit scene feature extraction network to obtain explicit scene features corresponding to the second scene.

If the target scene is the second scene, the computer equipment inputs the multi-scene characteristics into the first explicit scene characteristic extraction network to obtain explicit scene characteristics corresponding to the first scene. And the computer equipment inputs the fusion characteristics into a second explicit scene characteristic extraction network to obtain explicit scene characteristics corresponding to the second scene.

In this embodiment, the scene attribute features belonging to the target scene and the multi-scene features are fused through the scene attribute features of the target scene, so that scene detail information is enhanced, and fusion features are obtained. In this way, the extraction process of the corresponding explicit scene characteristics can be custom designed for the target scene, the explicit scene characteristics corresponding to the target scene with richer scene detail information are obtained, and thus, based on the display scene characteristics of each scene, the output characteristics which are more matched with the scene to which the sample video belongs can be obtained, and based on the output characteristics, the probability that the training sample belongs to each preset correlation degree category can be accurately estimated. Thus, the estimated correlation degree loss and the classification correlation loss can be accurately reflected. Based on the method, the pre-estimation loss of the correlation degree and the classification loss of the correlation are combined to obtain a trained correlation degree pre-estimation model capable of accurately pre-estimating the score of the video correlation degree, so that the accuracy of sequencing the videos according to the correlation degree is ensured.

In some embodiments, weighting and summing explicit scene features corresponding to each scene to obtain an output feature, including: determining the feature weights corresponding to the first explicit scene feature extraction network and the second explicit scene feature extraction network respectively according to the sample features through a second gating network of the correlation degree prediction model; and weighting and summing the explicit scene characteristics corresponding to the first scene and the explicit scene characteristics corresponding to the second scene according to the respective corresponding characteristic weights to obtain output characteristics.

The second gating network is used for learning the feature weights of the first explicit scene feature extraction network and the second explicit scene feature extraction network.

Optionally, the computer device inputs the sample feature to a second gating network of the correlation degree estimation model, so as to obtain respective feature weights of the first explicit scene feature extraction network and the second explicit scene feature extraction network. And the computer equipment extracts the explicit scene characteristics and the characteristic weights corresponding to the network according to the first explicit scene characteristics, extracts the explicit scene characteristics and the characteristic weights corresponding to the network according to the second explicit scene characteristics, and obtains output characteristics through weighted summation.

It should be noted that, in order to prevent the scene feature information from being gradually submerged in the multi-layer information transmission, the input of the second gating network is a sample feature, that is, the original input of the correlation degree estimation model.

In this embodiment, by inputting the sample feature to the second gating network, it is ensured that the scene feature information is not lost due to multi-layer information transfer, and accuracy of the feature weights of the first explicit scene feature extraction network and the second explicit scene feature extraction network is ensured. Therefore, the probability that the training sample belongs to each preset correlation degree category can be estimated more accurately. Thus, the estimated correlation degree loss and the classification correlation loss can be accurately reflected. Based on the method, the pre-estimation loss of the correlation degree and the classification loss of the correlation are combined to obtain a trained correlation degree pre-estimation model capable of accurately pre-estimating the score of the video correlation degree, so that the accuracy of sequencing the videos according to the correlation degree is ensured.

In some embodiments, determining the predicted loss of relevance from the probability that the training sample belongs to the labeled relevance class comprises: calculating cross entropy corresponding to the correlation degree according to the probability that the training sample belongs to the labeling correlation degree category; taking the cross entropy corresponding to the correlation degree as the estimated loss of the correlation degree; the estimated loss of the degree of correlation is inversely related to the probability that the training sample belongs to the class of the degree of correlation of the label.

Optionally, for each training sample, the computer device determines a probability that the training sample belongs to the labeling correlation degree category, and calculates a first sub-cross entropy corresponding to the training sample. And the computer equipment superimposes the first sub-cross entropy corresponding to each training sample to obtain the cross entropy corresponding to the correlation degree. The computer equipment takes the cross entropy corresponding to the correlation degree as the estimated loss of the correlation degree. The estimated correlation loss is inversely correlated with the probability that each training sample belongs to the labeled correlation class.

For each training sample, the computer device illustratively treats the true probability that the training sample belongs to the labeled relevance class as a unit value, and treats the true probability of other predetermined relevance classes that do not belong to the labeled relevance class as zero. The computer equipment calculates cross entropy based on the estimated probability and true probability that the training sample belongs to each preset correlation degree category respectively, obtains a first sub cross entropy corresponding to the training sample, superimposes the first sub cross entropy corresponding to each training sample, obtains cross entropy corresponding to the correlation degree, and takes the cross entropy corresponding to the correlation degree as the estimated correlation degree loss.

Based on the estimated probability and each real probability that the training sample respectively belongs to each predetermined correlation degree category, performing cross entropy calculation to obtain a first sub-cross entropy corresponding to the training sample, including: for each predetermined degree of correlation class, the logarithm of the predetermined degree of correlation class is determined by a log-likelihood function based on the probability of the predetermined degree of correlation class. And calculating the negative number of the product of the logarithm corresponding to the same preset correlation degree category and the true probability, and superposing the negative numbers corresponding to the preset correlation degree categories to obtain the first sub-cross entropy corresponding to the training sample. Because the true probability of other preset correlation degree categories which do not belong to the annotation correlation degree category is regarded as zero, for each training sample, the negative number corresponding to the annotation correlation degree category is directly used as the first sub-cross entropy corresponding to the training sample.

For example, the estimated correlation loss L is determined by the following formula ₁ ：

Wherein p is _i ^′ And p _i The true probability and the probability that the training sample i belongs to the labeling correlation degree category are respectively. P is p _i ^′ The constant is 1, and n is the number of training samples. Above- (p) _i ^′ log(p _i ) For the first sub-cross entropy, Σ, corresponding to training sample i _n (-) is a sum function.

In this embodiment, according to the probability that the training sample belongs to the labeled correlation degree category, the cross entropy corresponding to the correlation degree is accurately obtained, and the cross entropy corresponding to the correlation degree is directly used as the correlation degree prediction loss. The estimated loss of the degree of correlation is inversely related to the probability that the training sample belongs to the class of the degree of correlation of the label. In this way, model training is performed based on the estimated loss of correlation, so that the model can distinguish scores of different correlation degrees. Therefore, the subsequent joint correlation classification loss and correlation degree prediction loss are used for obtaining a trained correlation degree prediction model capable of accurately predicting the score of the video correlation degree, so that the accuracy of sequencing the videos according to the correlation degree is ensured.

In some embodiments, determining the relevance classification penalty from the probabilities that the training samples belong to respective predetermined relevance classes of the relevance classifications corresponding to the labeled relevance classes includes: determining the relevance classification of the training sample according to the labeling relevance degree category; superposing probabilities that the training samples belong to each preset correlation degree category of the correlation classification to obtain correlation probabilities; determining a relevance classification loss according to the relevance probability; the relevance classification loss is inversely related to the relevance probability.

Optionally, for each training sample, the computer device determines, according to the label correlation degree category corresponding to the training sample, a correlation class to which the label correlation degree category belongs. The computer equipment determines the probability of each preset correlation degree category belonging to the correlation classification, and superimposes the probabilities of the training samples belonging to each preset correlation degree category of the correlation classification to obtain the correlation probability corresponding to the training samples. And the computer equipment calculates a second sub-cross entropy corresponding to the training sample according to the correlation probability. And superposing a second sub-cross entropy corresponding to each training sample by the computer equipment to obtain the correlation classification loss. The correlation classification loss and the correlation probability corresponding to each training sample are all in negative correlation.

For each training sample, the computer equipment takes the relevance classification of the labeling relevance class as the target relevance classification according to the labeling relevance class corresponding to the training sample. The computer equipment determines the probability that the training sample belongs to each preset correlation degree category of the target correlation classification, and superimposes the probabilities that the training sample belongs to each preset correlation degree category of the target correlation classification to obtain a first correlation probability of the target correlation category corresponding to the training sample. The computer equipment superimposes probabilities of all preset correlation degree categories which do not belong to the target correlation classification, and a second correlation probability corresponding to the training sample is obtained.

The computer equipment calculates the logarithm corresponding to each of the first correlation probability and the second correlation probability through a log likelihood function, calculates the negative number of the product of the real correlation probability corresponding to the first correlation probability and the logarithm, obtains a first negative number, and calculates the negative number of the product of the real correlation probability corresponding to the second correlation probability and the logarithm, and obtains a second negative number. Since the computer device regards the true correlation probability of the target correlation class corresponding to the training sample as a unit value, the true correlation probability that is not the target correlation class is treated as zero. The computer device directly uses the first negative number as a second sub-cross entropy corresponding to the training sample. And superposing a second sub-cross entropy corresponding to each training sample by the computer equipment to obtain the correlation classification loss.

For example, the correlation classification loss L is determined using the following formula ₂ ：

Wherein, the liquid crystal display device comprises a liquid crystal display device,and->The first correlation probability of the training sample i and the true correlation probability corresponding to the first correlation probability are respectively given. />The constant is 1, and n is the number of training samples. Above->A second sub-cross entropy, Σ, for training sample j should be _n (-) is a sum function. For example, for training sample i, the class of degree of correlation is noted as being very relevant, and the corresponding class of correlation belongs to correlation. The computer equipment superimposes the very relevant probability corresponding to the training sample i and the probability of the comparison correlation to obtain a first correlation probability, and determines a second cross entropy corresponding to the training sample i based on the first correlation probability. The computer device calculates a first cross entropy corresponding to the training sample i based on the very relevant probabilities.

In the embodiment, according to the labeling correlation degree category, determining the correlation classification of the training sample; superposing probabilities that the training samples belong to each preset correlation degree category of the correlation classification to obtain correlation probabilities; determining a relevance classification loss according to the relevance probability; the relevance classification loss is inversely related to the relevance probability. In this way, the global physical meaning of model scoring can be improved based on the relevance classification loss, namely scoring is similar as possible when the relevance classifications are the same. Therefore, the relevance degree pre-estimation loss and the relevance classification loss are combined, a relevance degree pre-estimation model is trained, and the trained relevance degree pre-estimation model is used for video relevance degree sequencing. Based on the method, the trained correlation degree prediction model can improve the global physical meaning of scoring while improving the scoring division, and the robustness is improved, so that the accuracy of sequencing videos according to the correlation degree is ensured.

In some embodiments, combining the degree of correlation pre-estimate loss and the degree of correlation classification loss, training a degree of correlation pre-estimate model, comprises: superposing the correlation degree pre-estimated loss and the correlation classification loss to obtain target loss; and training the correlation degree estimation model by taking the minimized target loss as a target to obtain a trained correlation degree estimation model.

Optionally, the computer device superimposes the correlation degree pre-estimate loss and the correlation classification loss to obtain the target loss. According to the target loss, the computer equipment carries out iterative training on the correlation degree estimation model until the target loss tends to be minimized. And obtaining a trained correlation degree estimation model. For each training process, comprising: the computer equipment acquires a plurality of training samples at the present time, for each training sample, executes the steps 204 to 206 to obtain a loss value of the target loss corresponding to the present time iteration, if the loss value of the target loss is not a preset value, it is determined that the target loss does not tend to be minimized, model parameters are adjusted based on the loss value of the target loss corresponding to the present time iteration to obtain an adjusted model, the adjusted model is used as a correlation degree estimation model corresponding to the next training, the next training is performed, and the step of acquiring the plurality of training samples at the present time is continuously performed until the loss value of the target loss is the preset value.

The preset value can be regarded as zero or a non-zero value infinitely close to 0, and if the loss value of the target loss corresponding to the current training is zero under the condition that the preset value is zero, the correlation degree estimation model corresponding to the current training is a trained correlation degree estimation model, and the correlation degree between the sample entry and the sample video in the training sample can be accurately estimated based on the trained correlation degree estimation model. For example, the labeled correlation degree category of the training sample 1 is completely correlated, and the correlation degree of the sample entry in the training sample 1 and the sample video can be accurately estimated to be completely correlated based on the correlation degree estimation model corresponding to the current training. Under the condition that the preset value is a non-zero value which is infinitely close to 0, if the loss value of the target loss corresponding to the current time reaches the preset value, the correlation degree prediction model corresponding to the current time training is a trained correlation degree prediction model, and the correlation classification corresponding to the sample entry and the sample video in the training sample can be accurately predicted based on the trained correlation degree prediction model. For example, the labeled degree of correlation class of the training sample 1 is a complete correlation, and the correlation class is a correlation. At this time, the correlation degree estimation model corresponding to the current training may estimate that the correlation degree between the sample entry and the sample video in the training sample 1 is relatively relevant, and although there is an error compared with labeling the correlation degree category, the accuracy of the correlation classification is ensured, that is, the correlation is still relevant.

In the present embodiment, the target loss can be obtained by superimposing the correlation degree predicted loss and the correlation classification loss. And the training process of the correlation degree estimation model is completed by taking the minimum target loss as a target. In this way, the trained correlation degree prediction model ensures that the index of the positive and negative sequence ratio is optimized while the accuracy of correlation classification is not reduced, so that the global physical meaning of scoring is improved while the partition indexing is improved, the robustness is improved, the accuracy of sequencing videos according to the correlation degree is ensured, and the consumption of long and short video duration is promoted.

In some embodiments, as shown in fig. 8, a flow chart of the ranking step of candidate videos in one embodiment is shown. The method further comprises the steps of:

step 802, obtaining an input entry and video related text of each candidate video.

Optionally, the computer device obtains an input term input by the user, and according to the input term, the computer device sequentially performs recall operation and coarse-ranking operation based on a plurality of videos stored in advance in the video library, so as to obtain each candidate video. The computer device obtains video related text for each candidate video.

Step 804, for each candidate video, determining a correlation degree estimation result between the input vocabulary entry and the candidate video according to the input vocabulary entry, the candidate video and the video related text of the candidate video by using the trained correlation degree estimation model, where the correlation degree estimation result includes probabilities that the correlation degree between the input vocabulary entry and the candidate video is respectively in each predetermined correlation degree category.

Optionally, the computer device obtains a trained correlation degree estimation model, and for each candidate video, the computer device determines target features corresponding to the input vocabulary entry and the candidate video through multi-level semantic matching based on the input vocabulary entry, the candidate video and video related text of the candidate video. The computer equipment inputs the target feature into a trained correlation degree estimation model to obtain a correlation degree estimation result between the input entry and the candidate video. The correlation degree estimation result comprises probabilities that the correlation degree categories between the input entry and the candidate video are respectively preset correlation degree categories.

For each candidate video, the computer device determines a semantic matching feature corresponding to the candidate video from the video-related text of the input term and the candidate video, the semantic matching feature corresponding to the candidate video characterizing a correlation of the input term and the video-related text of the candidate video. The computer equipment determines the text statistical characteristics corresponding to the candidate video according to the text related to the video of the input entry and the candidate video. The text statistical features corresponding to the candidate video comprise basic text features, posterior features, knowledge features and text category features corresponding to the candidate video.

The computer device determines a multimodal feature corresponding to the candidate video based on a relevance score of the input term and the candidate video, and a relevance score of the input term and video-related text of the candidate video.

And the computer equipment splices the semantic matching features, the text statistical features and the multi-modal features corresponding to the candidate video to obtain target features corresponding to the input entry and the candidate video. The computer equipment inputs the target feature into a trained correlation degree estimation model to obtain a correlation degree estimation result between the input entry and the candidate video. For example, for candidate video 1, the estimated relevance class result 1 between the input term and candidate video 1 includes probabilities that the relevance classes between the input term and candidate video 1 are each estimated relevance classes.

Step 806, determining a correlation degree score between the input vocabulary entry and the candidate videos according to the probability that the correlation degree between the input vocabulary entry and the candidate videos is respectively in each predetermined correlation degree category, and sorting the candidate videos according to the correlation degree score.

For each candidate video, if the correlation degree score between the input entry and the candidate video is higher, the input entry and the candidate video are more relevant, and the corresponding correlation degree is higher.

Optionally, the computer device obtains scoring weights corresponding to each predetermined relevance class. The computer equipment fuses the probabilities of the predetermined correlation degree categories according to the probability that the correlation degree between the input entry and each candidate video is the predetermined correlation degree category and the corresponding scoring weight of the predetermined correlation degree category, so as to obtain a correlation degree score. The degree of correlation characterized by each predetermined degree of correlation class is positively correlated with the scoring weight corresponding to the predetermined degree of correlation. That is, the higher the degree of correlation between the input term characterized by a predetermined degree of correlation class and the candidate video, the higher the scoring weight of the predetermined degree of correlation class. The computer device ranks the candidate videos according to their corresponding relevance scores.

Illustratively, there are four predetermined relevance categories, and for candidate video 1, the computer device, after determining the probabilities that the relevance categories between the input term and the candidate video 1 are each estimated relevance categories, makes the probabilities of being very relevant, relatively relevant, weakly relevant, and completely irrelevant P1, P2, P3, P4, respectively. The scoring weights for the very correlated, the comparative correlated, the weak correlated, and the complete uncorrelated are sequentially 1, 2/3, 1/4, and 0. And the computer performs weighted summation on the probability of each preset correlation degree category according to the scoring weight of each preset correlation degree category to obtain the correlation degree score between the input entry and the candidate video 1, namely P1+2/3P2+1/4P3.

It will be appreciated that the higher the correlation between the candidate video 1 and the input term, the higher the probability of being highly correlated, i.e. the higher the probability of being higher than the other three predetermined correlation categories, and the higher the correlation score obtained. If the degree of correlation between the candidate video 1 and the input term is lower, the probability of complete irrelevance is higher, namely, the probability is higher than the probability of other three predetermined correlation degree categories, and the obtained correlation degree score is lower. Based on this, the degree of correlation score is calculated by the scoring weight corresponding to each predetermined degree of correlation class, and it is possible to ensure uniformity of degree of correlation scoring.

After ranking the candidate videos, the computer device screens out a plurality of first videos from the candidate videos for scenes of the long video. Each first video is a long video. The computer device uses the first video having a correlation degree score greater than a threshold value as an exposure video in a scene of the long video. For scenes of the short video, the computer device screens out a plurality of second videos from the candidate videos. Each of the second videos is a short video. And for each second video, the computer equipment respectively re-scores the second video through each preset scoring model to obtain each preset score between the input entry and the second video. The computer equipment acquires the sorting fusion factors of the preset scoring models and the sorting fusion factors of the relevant degree scores, determines the final score of each second video according to the sorting fusion factors of the preset scoring models, the sorting fusion factors of the relevant degree scores, the preset scores of the preset scoring models and the relevant degree scores, performs rearrangement operation based on the final score of each second video, and obtains the final video after rearrangement operation screening, wherein the final video is used as an exposure video in a scene of a short video.

In this embodiment, by inputting an entry, video-related text for each candidate video of the fine-ranking stage is determined. For each candidate video, the correlation degree estimation result between the input vocabulary entry and the candidate video can be accurately estimated through the trained correlation degree estimation model, so that the probability that the correlation degree categories between the input vocabulary entry and the candidate video are respectively of preset correlation degree categories can be intuitively and accurately reflected, the correlation degree score between the input vocabulary entry and the candidate video can be accurately obtained, and further, the candidate videos corresponding to the same input vocabulary entry can be accurately ranked.

The application also provides an application scene, which applies the training method of the correlation degree estimation model. Specifically, the application of the training method of the correlation degree estimation model in the application scene is as follows: in the episode recommendation scene, aiming at an input entry for episode search input by any user, for pushing out an exposure video with high correlation degree with the input entry for the user, the computer equipment adopts the training method of the correlation degree estimation model provided by the application to train the correlation degree estimation model. Specifically, a computer device acquires a training sample; each training sample comprises a sample entry, a video related text of a sample video and a labeled degree of correlation class, wherein the labeled degree of correlation class represents the degree of correlation between the sample entry and the sample video, the labeled degree of correlation class belongs to one of various preset degree of correlation classes, and each preset degree of correlation class corresponds to a correlation class; determining sample characteristics according to sample entries, video related texts and sample videos, and estimating the probability that training samples respectively belong to each preset degree of correlation category according to the sample characteristics through a degree of correlation estimation model; determining a correlation degree estimated loss according to the probability that the training sample belongs to the labeling correlation degree category; determining a relevance classification loss according to the probability that the training sample belongs to each preset relevance class of the relevance classification corresponding to the labeling relevance class; and combining the correlation degree pre-estimation loss and the correlation classification loss, training a correlation degree pre-estimation model, and obtaining a trained correlation degree pre-estimation model. In this way, the computer equipment accurately and rapidly completes the sequencing of the video correlation degree through the trained correlation degree prediction model by acquiring the input entry input by a certain user in real time so as to determine the exposure video finally used for exposure.

Of course, the method for training the correlation degree estimation model is not limited to the method, and the method can be applied to other application scenes, for example, in an audio/video pushing scene, aiming at input entries of a search music video album input by a user, and pushing matched exposure videos for the user through the trained correlation degree estimation model. Based on the method, the trained correlation degree prediction model capable of accurately sequencing the correlation of the music video can be obtained through the training method of the correlation degree prediction model. In this way, the computer equipment sorts the video correlation degree of each candidate audio and video by acquiring an input entry input by a certain user at the audio and video client and using the trained correlation degree prediction model, so as to obtain the exposed audio and video finally used for exposure.

The above application scenario is only illustrative, and it can be understood that the application of the training method of the correlation degree estimation model provided by the embodiments of the present application is not limited to the above scenario.

In a specific embodiment, the application provides a training method of a correlation degree estimation model. The method is performed by a computer device. Fig. 9 is a schematic flow chart of a training method of a correlation degree estimation model in an embodiment, which specifically includes the following steps:

The first step: obtaining a training sample; each training sample comprises a sample entry, a video related text of the sample video and a marked related degree category, the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of the preset related degree categories, and the preset related degree categories are correspondingly provided with related classifications.

And a second step of: according to the sample entry and the video related text, determining semantic matching features by utilizing a coding layer of a semantic matching feature extraction model, wherein the semantic matching features characterize the sample entry and the video related textCorrelation of video related text. For example, sample entry and video-related text are processed into word sequences, each word sequence being preceded by a special classification symbol [ CLS ]]Representing the beginning of a word sequence, embedding a special classification symbol [ SEP ] after the last position of the word sequence]Representing the end of the word sequence. For example, the word sequence 1 of the sample entry is CLS q ₁ q ₂ … SEP; the word sequence of the video related text is CLS d ₁ d ₂ … SEP. Based on the two word sequences, the coding layer of the semantic matching feature extraction model is utilized to obtain a word sequence vector 1 after the maximum pooling and a word sequence vector 2 after the average pooling. And splicing the CLS, the word sequence vector 1 and the word sequence vector 2 to obtain semantic matching features. And determining text statistical characteristics according to the sample entry and the video related text. And determining the multi-modal characteristics according to the correlation scores of the sample entries and the sample videos and the correlation scores of the sample entries and the video related texts. And splicing the semantic matching features, the text statistical features and the multi-modal features to obtain sample features.

And a third step of: the multi-scene feature extraction network comprises a first scene feature extraction network, a second scene feature extraction network and a multi-scene common feature extraction network; extracting characteristic features corresponding to the first scene through a first scene feature extraction network; extracting characteristic features corresponding to the second scene through a second scene feature extraction network; and extracting the common characteristics corresponding to the first scene and the second scene through a multi-scene common characteristic extraction network. Determining the feature weights corresponding to the first scene feature extraction network, the second scene feature extraction network and the multi-scene common feature extraction network respectively according to the sample features through a first gating network of the correlation degree prediction model; and weighting and summing the characteristic features corresponding to the first scene, the characteristic features corresponding to the second scene and the common characteristic according to the respective corresponding characteristic weights to obtain the multi-scene features.

Fourth step: the explicit scene feature extraction network comprises a first explicit scene feature extraction network and a second explicit scene feature extraction network; extracting explicit scene characteristics corresponding to a first scene through a first explicit scene characteristic extraction network; and extracting the explicit scene characteristics corresponding to the second scene through the second explicit scene characteristic extraction network. Determining the feature weights corresponding to the first explicit scene feature extraction network and the second explicit scene feature extraction network respectively according to the sample features through a second gating network of the correlation degree prediction model; and weighting and summing the explicit scene characteristics corresponding to the first scene and the explicit scene characteristics corresponding to the second scene according to the respective corresponding characteristic weights to obtain output characteristics. The output features may be considered as probability density distributions for training samples belonging to respective predetermined correlation classes. And converting the probability density distribution into probability distribution of the training samples belonging to each preset correlation degree category through an activation function, and obtaining probability of the training samples belonging to each preset correlation degree category according to the probability distribution.

Fifth step: calculating cross entropy corresponding to the correlation degree according to the probability that the training sample belongs to the labeling correlation degree category; taking the cross entropy corresponding to the correlation degree as the estimated loss of the correlation degree; the estimated loss of correlation is inversely related to the probability of belonging to the class of annotation correlation. Determining the relevance classification of the training sample according to the labeling relevance degree category; superposing probabilities that the training samples belong to each preset correlation degree category of the correlation classification to obtain correlation probabilities; determining a relevance classification loss according to the relevance probability; the relevance classification loss is inversely related to the relevance probability. Superposing the correlation degree pre-estimated loss and the correlation classification loss to obtain target loss; and training the correlation degree estimation model by taking the minimized target loss as a target to obtain a trained correlation degree estimation model.

After the trained correlation degree estimation model is obtained, as shown in fig. 10, a schematic diagram of video correlation ranking in an embodiment is shown. The target user inputs an entry in a search box of the video client, and after the computer equipment acquires the input entry, the target user intention information is obtained through query processing in an entry understanding stage based on the input entry. Based on the intent information, the computer device recalls a plurality of videos from the video library through a recall process of a recall stage. And screening the recalled multiple videos through coarse ranking processing in the coarse ranking stage to obtain video related texts of each candidate video corresponding to the input entry. The computer device determines a relevance score for each candidate video by means of the candidate videos and the input vocabulary entry, and ranks the candidate videos by means of a fine ranking stage. The specific fine discharge stage process is as follows: for each candidate video, determining a correlation degree estimation result between the input vocabulary entry and the candidate video according to the trained correlation degree estimation model and the video correlation texts of the input vocabulary entry, the candidate video and the candidate video, wherein the correlation degree estimation result comprises probabilities respectively belonging to preset correlation degree categories; and determining the correlation degree score between the input entry and the candidate video according to the probability of belonging to each preset correlation degree category, and sequencing the candidate videos according to the correlation degree score. For scenes of long videos, the computer device screens out a plurality of first videos from among the candidate videos. Each first video is a long video. The computer device uses the first video having a correlation degree score greater than a threshold value as an exposure video in a scene of the long video. For scenes of the short video, the computer device screens out a plurality of second videos from the candidate videos. Each of the second videos is a short video. For each second video, the computer device obtains a filtered second video by scoring based on the degree of correlation of the second video. And the computer equipment performs rearrangement operation on the screened second video to obtain a final video screened by the rearrangement operation, wherein the final video is used as an exposure video in a scene of the short video.

In this embodiment, a training sample is obtained; each training sample comprises a sample entry, a video related text of the sample video and a marked related degree category, the marked related degree category represents the related degree between the sample entry and the sample video, the marked related degree category belongs to one of the preset related degree categories, each preset related degree category corresponds to a related classification, and the degree of distinction between related and unrelated between the sample entry and the sample video in the training sample is increased; determining sample characteristics according to sample entries, video related texts and sample videos, and intuitively and accurately estimating the probability that training samples respectively belong to each preset degree of correlation class according to the sample characteristics through a degree of correlation estimation model; according to the probability that the training sample belongs to the labeling correlation degree category, the correlation degree estimated loss is determined, so that model training is carried out based on the correlation degree estimated loss, and the model can be used for distinguishing scores of different correlation degrees. According to the probability that the training sample belongs to each preset correlation degree category of the correlation classification corresponding to the labeling correlation degree category, the correlation classification loss is determined, so that model training is carried out based on the correlation classification loss, the global physical meaning of model scoring can be improved, namely the scoring is similar as much as possible when the correlation classifications are the same; and combining the correlation degree pre-estimation loss and the correlation classification loss, training a correlation degree pre-estimation model, and using the trained correlation degree pre-estimation model for video correlation degree sequencing. Based on the method, the trained correlation degree prediction model can improve the global physical meaning of scoring while improving the classification degree of correlation degree, and the robustness is improved, so that the accuracy of sequencing videos according to the correlation degree is ensured, the clicking of the whole pushing result is finally improved, and the user experience is improved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a training device for realizing the correlation degree estimation model of the training method of the correlation degree estimation model. The implementation scheme of the solution provided by the device is similar to the implementation scheme described in the above method, so the specific limitation in the embodiments of the training device for one or more correlation degree estimation models provided below may be referred to the limitation of the training method for the correlation degree estimation model hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 11, a training apparatus 1100 for a correlation degree estimation model is provided, including: an acquisition module 1102, a probability estimation module 1104, a correlation degree estimation loss determination module 1106, a correlation classification loss determination module 1108, and a training module 1110, wherein:

an obtaining module 1102, configured to obtain a training sample; each training sample comprises a sample entry, a video related text of a sample video and a labeled degree of correlation class, wherein the labeled degree of correlation class represents the degree of correlation between the sample entry and the sample video, the labeled degree of correlation class belongs to one of various preset degree of correlation classes, and each preset degree of correlation class corresponds to a correlation class;

the probability estimating module 1104 is configured to determine sample features according to the sample entry, the video related text and the sample video, and estimate, according to the sample features, probabilities that the training samples respectively belong to each predetermined correlation degree category through a correlation degree estimating model;

the correlation degree estimated loss determination module 1106 is configured to determine a correlation degree estimated loss according to a probability that the training sample belongs to the labeled correlation degree category;

a relevance classification loss determination module 1108, configured to determine a relevance classification loss according to probabilities that the training sample belongs to each predetermined relevance class of the relevance classifications corresponding to the labeled relevance classes;

The training module 1110 is configured to combine the predicted correlation degree loss and the classified correlation degree loss, train a predicted correlation degree model, and use the predicted trained correlation degree model for video correlation degree ranking.

In some embodiments, the probability estimation module 1104 is configured to determine a semantic matching feature according to the sample entry and the video related text, where the semantic matching feature characterizes a correlation between the sample entry and the video related text; determining text statistical characteristics according to the sample entry and the video related text; determining multi-modal characteristics according to the correlation scores of the sample entries and the sample videos and the correlation scores of the sample entries and the video related texts; and splicing the semantic matching features, the text statistical features and the multi-modal features to obtain sample features.

In some embodiments, the probability estimation module 1104 is configured to input the sample feature into the correlation estimation model; outputting probability density distribution of the training samples belonging to each preset correlation degree category through a correlation degree prediction model; and converting the probability density distribution into probability distribution of the training samples belonging to each preset correlation degree category through an activation function, and obtaining probability of the training samples belonging to each preset correlation degree category according to the probability distribution.

In some embodiments, the probability estimation module 1104 is configured to perform feature extraction on the sample features through a multi-scene feature extraction network of the correlation degree estimation model to obtain common features corresponding to multiple scenes and characteristic features corresponding to each scene; weighting and summing the common characteristics and the characteristic characteristics corresponding to each scene to obtain multi-scene characteristics; respectively carrying out feature extraction on the multi-scene features through an explicit scene feature extraction network of the correlation degree prediction model to obtain explicit scene features corresponding to each scene; carrying out weighted summation on the explicit scene characteristics corresponding to each scene to obtain output characteristics; and determining the probability that the training samples belong to each preset correlation degree category according to the output characteristics.

In some embodiments, the multi-scene feature extraction network comprises a first scene feature extraction network, a second scene feature extraction network, and a multi-scene common feature extraction network; the probability estimating module 1104 is configured to extract, through the first scene feature extraction network, a characteristic feature corresponding to the first scene; extracting characteristic features corresponding to the second scene through a second scene feature extraction network; and extracting the common characteristics corresponding to the first scene and the second scene through a multi-scene common characteristic extraction network.

In some embodiments, the probability estimation module 1104 is configured to determine, through a first gating network of the correlation degree estimation model, feature weights corresponding to the first scene feature extraction network, the second scene feature extraction network, and the multi-scene common feature extraction network according to the sample features; and weighting and summing the characteristic features corresponding to the first scene, the characteristic features corresponding to the second scene and the common characteristic according to the respective corresponding characteristic weights to obtain the multi-scene features.

In some embodiments, the explicit scene feature extraction network comprises a first explicit scene feature extraction network and a second explicit scene feature extraction network; the probability estimation module 1104 is configured to extract, through the first explicit scene feature extraction network, explicit scene features corresponding to the first scene; and extracting the explicit scene characteristics corresponding to the second scene through the second explicit scene characteristic extraction network.

In some embodiments, the probability estimation module 1104 is configured to determine, through a second gating network of the correlation degree estimation model, feature weights corresponding to the first explicit scene feature extraction network and the second explicit scene feature extraction network respectively according to the sample features; and weighting and summing the explicit scene characteristics corresponding to the first scene and the explicit scene characteristics corresponding to the second scene according to the respective corresponding characteristic weights to obtain output characteristics.

In some embodiments, the correlation degree prediction loss determination module 1106 is configured to calculate a cross entropy corresponding to the correlation degree according to a probability that the training sample belongs to the labeled correlation degree class; taking the cross entropy corresponding to the correlation degree as the estimated loss of the correlation degree; the estimated loss of the degree of correlation is inversely related to the probability that the training sample belongs to the class of the degree of correlation of the label.

In some embodiments, the relevance classification loss determination module 1108 is configured to determine a relevance classification of the training sample according to the labeled relevance class; superposing probabilities that the training samples belong to each preset correlation degree category of the correlation classification to obtain correlation probabilities; determining a relevance classification loss according to the relevance probability; the relevance classification loss is inversely related to the relevance probability.

In some embodiments, the training module 1110 is configured to superimpose the correlation degree predicted loss and the correlation classification loss to obtain a target loss; and training the correlation degree estimation model by taking the minimized target loss as a target to obtain a trained correlation degree estimation model.

In some embodiments, the apparatus further comprises a ranking module for obtaining the input term and the video-related text of each candidate video; for each candidate video, determining a correlation degree estimation result between the input vocabulary entry and the candidate video according to the trained correlation degree estimation model and the video correlation texts of the input vocabulary entry, the candidate video and the candidate video, wherein the correlation degree estimation result comprises probabilities that the correlation degree between the input vocabulary entry and the candidate video is respectively of each preset correlation degree category; and determining a correlation degree score between the input vocabulary entry and the candidate videos according to the probability that the correlation degree between the input vocabulary entry and each candidate video is respectively of each preset correlation degree category, and sequencing each candidate video according to the correlation degree score.

All or part of the modules in the training device of the correlation degree estimation model can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server or a terminal, and the internal structure of the computer device may be as shown in fig. 12. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for training a correlation degree estimation model.

It will be appreciated by those skilled in the art that the structure shown in FIG. 12 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. The method for training the correlation degree estimation model is characterized by comprising the following steps:

2. The method of claim 1, wherein the determining sample features from the sample entry, the video-related text, and the sample video comprises:

determining semantic matching features according to the sample entry and the video related text, wherein the semantic matching features represent the correlation between the sample entry and the video related text;

determining text statistical characteristics according to the sample entry and the video related text;

Determining multi-modal characteristics according to the correlation scores of the sample vocabulary entries and the sample video and the correlation scores of the sample vocabulary entries and the video related texts;

and splicing the semantic matching features, the text statistical features and the multi-modal features to obtain sample features.

3. The method according to claim 1, wherein estimating, by the correlation degree estimation model, the probability that the training samples respectively belong to each predetermined correlation degree category according to the sample characteristics includes:

inputting the sample characteristics into the correlation degree estimation model;

outputting probability density distribution of the training samples belonging to each preset correlation degree category through the correlation degree prediction model;

and converting the probability density distribution into probability distribution of the training samples belonging to each preset correlation degree category through an activation function, and obtaining probability of the training samples belonging to each preset correlation degree category according to the probability distribution.

4. The method according to claim 1, wherein estimating, by the correlation degree estimation model, the probability that the training samples respectively belong to each predetermined correlation degree category according to the sample characteristics includes:

Extracting features of the sample features through a multi-scene feature extraction network of the correlation degree prediction model to obtain common features corresponding to a plurality of scenes and characteristic features corresponding to each scene;

weighting and summing the common characteristics and the characteristic characteristics corresponding to each scene to obtain multi-scene characteristics;

respectively extracting the characteristics of the multi-scene characteristics through an explicit scene characteristic extraction network of the correlation degree estimation model to obtain explicit scene characteristics corresponding to each scene;

carrying out weighted summation on the explicit scene characteristics corresponding to each scene to obtain output characteristics;

and determining the probability that the training sample belongs to each preset correlation degree category according to the output characteristics.

5. The method of claim 4, wherein the multi-scene feature extraction network comprises a first scene feature extraction network, a second scene feature extraction network, and a multi-scene common feature extraction network;

the extracting the characteristics of the sample by the multi-scene characteristic extracting network of the correlation degree pre-estimation model to obtain the common characteristics corresponding to a plurality of scenes and the characteristic characteristics corresponding to each scene, comprising the following steps:

Extracting characteristic features corresponding to the first scene through the first scene feature extraction network;

extracting characteristic features corresponding to a second scene through the second scene feature extraction network;

and extracting the common characteristics corresponding to the first scene and the second scene through the multi-scene common characteristic extraction network.

6. The method of claim 5, wherein the weighting and summing the common feature and the feature corresponding to each scene to obtain the multi-scene feature comprises:

determining the feature weights corresponding to the first scene feature extraction network, the second scene feature extraction network and the multi-scene common feature extraction network according to the sample features through a first gating network of the correlation degree prediction model;

and weighting and summing the characteristic features corresponding to the first scene, the characteristic features corresponding to the second scene and the common features according to the respective corresponding feature weights to obtain multi-scene features.

7. The method of claim 4, wherein the explicit scene feature extraction network comprises a first explicit scene feature extraction network and a second explicit scene feature extraction network;

The explicit scene feature extraction network through the correlation degree prediction model performs feature extraction on the multi-scene features to obtain explicit scene features corresponding to each scene, and the method comprises the following steps:

extracting explicit scene characteristics corresponding to a first scene through the first explicit scene characteristic extraction network;

and extracting the explicit scene characteristics corresponding to the second scene through the second explicit scene characteristic extraction network.

8. The method of claim 7, wherein the weighting and summing the explicit scene features corresponding to each scene to obtain the output features comprises:

determining the feature weights corresponding to the first explicit scene feature extraction network and the second explicit scene feature extraction network respectively according to the sample features through a second gating network of the correlation degree estimation model;

and weighting and summing the explicit scene characteristics corresponding to the first scene and the explicit scene characteristics corresponding to the second scene according to the respective corresponding characteristic weights to obtain output characteristics.

9. The method of claim 1, wherein determining the predicted loss of relevance from the probability that the training sample belongs to the labeled relevance class comprises:

Calculating cross entropy corresponding to the correlation degree according to the probability that the training sample belongs to the labeling correlation degree category;

taking the cross entropy corresponding to the correlation degree as a correlation degree estimated loss; and the correlation degree estimated loss is inversely correlated with the probability that the training sample belongs to the labeling correlation degree category.

10. The method of claim 1, wherein determining a relevance class penalty based on probabilities that the training samples belong to respective predetermined relevance classes of the relevance classes corresponding to the labeled relevance class comprises:

determining the relevance classification of the training sample according to the labeling relevance degree category;

superposing probabilities that the training samples belong to each preset correlation degree category of the correlation classification to obtain correlation probabilities;

determining a relevance classification loss according to the relevance probability; the relevance classification loss is inversely related to the relevance probability.

11. The method of claim 1, wherein the combining the pre-estimate of degree of correlation and the classification of correlation loss, training the pre-estimate of degree of correlation model, comprises:

Superposing the correlation degree pre-estimation loss and the correlation classification loss to obtain a target loss;

and training the correlation degree estimation model by taking the minimization of the target loss as a target to obtain a trained correlation degree estimation model.

12. The method according to any one of claims 1 to 11, further comprising:

acquiring an input entry and video related texts of each candidate video;

for each candidate video, determining a correlation degree estimation result between the input vocabulary entry and the candidate video according to the input vocabulary entry, the candidate video and the video related text of the candidate video through a trained correlation degree estimation model, wherein the correlation degree estimation result comprises probabilities that the correlation degree between the input vocabulary entry and the candidate video is respectively of each preset correlation degree category;

and determining a correlation degree score between the input vocabulary entry and the candidate videos according to the probability that the correlation degree between the input vocabulary entry and each candidate video is respectively in each preset correlation degree category, and sequencing each candidate video according to the correlation degree score.

13. A training device for a correlation degree estimation model, the device comprising:

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 12 when the computer program is executed.

15. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 12.

16. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 12.