CN110309359A

CN110309359A - Video correlation prediction technique, device, equipment and storage medium

Info

Publication number: CN110309359A
Application number: CN201910420026.4A
Authority: CN
Inventors: 田永鸿; 李宗贤; 李晟; 薛岚天
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-05-20
Filing date: 2019-05-20
Publication date: 2019-10-08
Anticipated expiration: 2039-05-20
Also published as: CN110309359B

Abstract

This application discloses a kind of video correlation prediction techniques, device, equipment and storage medium, relationship between constructing and training half twin neural network model study related or uncorrelated video, it will be in half twin neural network model described in target source video and at least one video input to be predicted, the correlation results between the video to be predicted and the target source video are exported after described half twin neural network model analysis, relative list of videos can be predicted to the video for lacking user behavior information to realize, the correlation given between the correlation size and each associated video of each associated video is symmetrical, effectively increase the accuracy of video recommendations.

Description

Video correlation prediction technique, device, equipment and storage medium

Technical field

This application involves video data process field, in particular to a kind of video correlation prediction technique, device, equipment and Storage medium.

Background technique

With the development of information technology and Internet technology, slowly developed to based on the media data of audio-video from scarcity Information overload.In the data of magnanimity, the interested information of user how is found, and interested according to some that user finds Information goes excavation to find that more information that are interested, being associated become to be increasingly difficult to.

Recommender system screens the data of magnanimity, mistake as a kind of information filtering platform by specific algorithm Filter, for predicting user to " preference " of information.In general, effective solution party of the search engine as information overload, needs It could be that it realizes accurate commending contents that user, which provides clear demand, this is a very passively process.And recommender system one As for, do not need user and one clear demand be provided, by the previous habit of analysis user, it is modeled, it is real Referring now to the personalized recommendation of Different Individual.And video correlation prediction be in online streaming media service most important task it One and recommender system algorithm research important component, according to user watch or search for video record, recommender system Personalized recommendation is capable of providing to help user to find more interested video contents, and this will inevitably bring " cold start-up " problem, i.e. system are difficult to provide associated recommendation because that can not obtain the user behavior record of new video.

Currently, for there are two ways to solving the problems, such as above-mentioned " cold start-up ": one is the collaborative filterings based on user；Separately One is content-based recommendations.Wherein, the collaborative filtering based on user is the content phase recommended and liked before it to user As content, but this mode is not from content angle, but likes A and user likes B based on user, then A and B phase Seemingly.This mode (leads some associated metadatas of new video content because lacking when solving the problems, such as " to be cold-started " Drill, performer), so recommendation effect is bad.Content-based recommendation is similar with the content that it is being liked to user's recommendation Other contents, therefore do not need the problem of historical behavior data of user are provided, can solve " cold start-up ".

But current recommender system be based on new video, to user recommend may interested video when, often neglect Depending on the correlation size relation between video, all associated videos are made no exception, and recommend the correlation between video And it is asymmetric.Therefore, the technical issues of accuracy for how improving video recommendations is this field urgent need to resolve.

Summary of the invention

The application's is designed to provide a kind of video correlation prediction technique, device, equipment and storage medium, to improve The accuracy of video recommendations.

In a first aspect, the embodiment of the present application provides a kind of video correlation prediction technique, comprising:

Obtain source video sample and its associated video list and uncorrelated list of videos；Include in the associated video list Multiple associated video samples sorted from large to small according to correlation include multiple uncorrelated views in the uncorrelated list of videos Frequency sample；

The primary features of the source video sample, the associated video sample and the uncorrelated video sample are extracted respectively Vector；

To the primary features vector of the source video sample, the associated video sample and the uncorrelated video sample into Row resampling constructs sample data pair；

To the sample data to symmetry division is carried out, symmetric data set and asymmetric data set are obtained；

Initial neural network is constructed, the initial neural network is trained according to the symmetric data set, is obtained Initial neural network model；

Half twin neural network is constructed, the half twin neural network includes two initial neural network models, root Stand-alone training is carried out to described half twin neural network according to the asymmetric data set, obtains half twin neural network model；

By in half twin neural network model described in target source video and at least one video input to be predicted, through described half The correlation results between the video to be predicted and the target source video are exported after twin neural network model analysis.

Second aspect, the embodiment of the present application provide a kind of video correlation prediction meanss, comprising:

Neural metwork training module, is used for:

Dependency prediction module, is used for:

The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: memory and processor；

The memory, for storing computer program；

Wherein, the processor executes the computer program in the memory, to realize described in above-mentioned first aspect Method.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage Computer program is stored in medium, for realizing described in above-mentioned first aspect when the computer program is executed by processor Method.

Compared with prior art, video correlation prediction technique, device, equipment and storage medium provided by the present application lead to The relationship constructed and between related/uncorrelated video of half twin neural network model study of training is crossed, is realized to lacking user behavior The video of information can predict relative list of videos, give the correlation size of each associated video and each related view Correlation between frequency is symmetrical, effectively increases the accuracy of video recommendations.

Detailed description of the invention

Fig. 1 is the flow diagram for the video correlation prediction technique that the embodiment of the present application one provides；

Fig. 2 is the structural schematic diagram for the video correlation prediction meanss that the embodiment of the present application two provides；

Fig. 3 is the structural schematic diagram for the electronic equipment that the embodiment of the present application three provides.

Specific embodiment

With reference to the accompanying drawing, the specific embodiment of the application is described in detail, it is to be understood that the guarantor of the application Shield range is not limited by the specific implementation.

Unless otherwise explicitly stated, otherwise in entire disclosure and claims, term " includes " or its change Changing such as "comprising" or " including " etc. will be understood to comprise stated element or component, and not exclude other members Part or other component parts.

Fig. 1 is the flow diagram for the video correlation prediction technique that the embodiment of the present application one provides, as shown in Figure 1, should Method includes:

S101, source video sample and its associated video list and uncorrelated list of videos are obtained；The associated video list In include multiple associated video samples sorted from large to small according to correlation, include in the uncorrelated list of videos it is multiple not Associated video sample；

In practical application, the video correlation information data obtained using actual acquisition in daily life, according to its correlation Property size sequence, obtain associated video list, following format: (source video A: associated video 1, associated video 2 ..., associated video N).It is far longer than associated video quantity with other incoherent number of videos of source video.

Specifically, for obtained video correlation information data, for each source video v_i, obtain relative List of videosWhereinWherein j represents the significance level of the video, M_iIndicate related The total length of list of videos and the total number of associated video.That is, having its corresponding phase for each source video List of videos is closed, the video property associated therewith before list is big, and the subsequent video of list property associated therewith is smaller.

S102, the first of the source video sample, the associated video sample and the uncorrelated video sample is extracted respectively Grade feature vector；

In practical application, S102 specific implementation are as follows: the source is extracted using the convolutional neural networks model of pre-training respectively The primary features vector of video sample, the associated video sample and the uncorrelated video sample.

Specifically, the convolutional neural networks model can instruct on super large data set (such as ImageNet image data set) The C3D convolutional neural networks model got, for each specific source video v_i, obtain its corresponding primary features Characteristic dimension is 512.For associated video listObtain its corresponding primary features list relatedWhereinTo put it more simply, willWrite F_i, willWriting

S103, to the primary features of the source video sample, the associated video sample and the uncorrelated video sample Vector carries out resampling, constructs sample data pair；

Specifically, the sample data to include positive sample data to and negative sample data pair.S103 specific implementation are as follows: really Total sampling number N of the fixed primary features vector to the source video sample, N are the integer greater than 1；Using harmonious sampling policy Resampling is carried out to the primary features vector of the associated video sample, constructs positive sample data pair, quantity N/2；It is described Positive sample data are to the primary features vector F including source video sample_a, associated video sample primary features vector F_bAnd correlation Label is denoted as (F_a,F_b,1)；Resampling is carried out to the primary features vector of the uncorrelated video sample, constructs negative sample data It is right, quantity N/2；The negative sample data are to the primary features vector F including source video sample_a, uncorrelated video sample Primary features vector F_bWith uncorrelated label, it is denoted as (F_a,F_b,0)。F_aIndicate first sample of sample data pair, F_bIt indicates Second sample of sample data pair.

In practical application, for each specific video V_iCorresponding primary features F_i, its available corresponding phase Close list of videos V^rWith feature list relatedPrimary features F corresponding to the video_iCarry out construction of the sample data to P, structure At the sample data pair of following form: (F_i,Fj,y)；Wherein, y is correlation label, if video primary features F_iAt the beginning of video Grade feature F_jCorrelation, then correlation label y is 1, otherwise is 0.

Therefore, for feature list relatedEach of primary featuresPositive sample data are to can indicate are as follows:

Because with F_iIncoherent feature quantity is far longer than relative quantity M_i, in order to balance the number of positive negative sample Amount, to F_iThe positive sample data of construction N/2 are to the negative sample data of (label 1) and N/2 to (label 0).

Further, resampling is carried out using primary features vector of the harmonious sampling policy to the associated video sample, Include:

The frequency for needing to be sampled using the primary features vector that frequency calculation formula calculates the associated video sample Then the primary features vector of the associated video sample needs the number that is sampled to be

The frequency calculation formula are as follows:

Wherein,Wherein, i represents source video sample, and j represents associated video Significance level, the smaller importance of j is stronger, M_iIndicate the total number of associated video, P is default hyper parameter.

That is, for any one primary featuresThe number for needing to be resampled isNeed structure It buildsIt is aPositive sample data pair, then from F_iAppoint in incoherent all video features and takes N/2 composition Negative sample data pair.

Resampling is carried out to each source video according to the above harmonious sampling policy, constitutes the sample data with correlation label To P.

S104, to the sample data to carry out symmetry division, obtain symmetric data set and asymmetric data set；

In practical application, S104 specific implementation are as follows: according to the whether symmetrical principle of correlation, to the sample data into Row symmetry divides, and obtains symmetric data set and asymmetric data set.

The whether symmetrical principle of correlation are as follows: for any one data to (F_a,F_b, y), if F_bIn F_aCorrelation While in list, F_aAlso in F_bDependency list in, i.e. y_a,b=y_b,a, then it is assumed that (F_a,F_b,y_a,b) and (F_b,F_a,y_b,a) be Symmetrically, whereas if F_bIn F_aDependency list in but F_aNot in F_bDependency list, then it is assumed that (F_a,F_b,y_a,b) and (F_b,F_a,y_b,a) it is asymmetric.

According to the above principle, symmetry division is carried out to P to sample data, obtains symmetric data to set P1 and asymmetric Data are to set P2.

S105, the initial neural network of building, are trained the initial neural network according to the symmetric data set, Obtain initial neural network model；

S106, half twin neural network of building, the half twin neural network include two initial neural network moulds Type carries out stand-alone training to described half twin neural network according to the asymmetric data set, obtains half twin neural network Model；

In practical application, the full Connection Neural Network comprising 5 hidden layers can be designed, the dimension of input layer meets initially The dimension of feature F, for example, 512.First hidden layer includes 2048 nodes, second and third hidden layer includes 512 nodes, 4th hidden layer includes 1100 nodes, and the 5th hidden layer includes 800 nodes, and ReLU is used after the 5th hidden layer Activation primitive activates feature, and output layer is 512 nodes, that is, exports and be characterized in 512 dimensions, carries out by tanh function Activation.

Initial characteristics F can be mapped to a new feature space by the full Connection Neural Network of design, obtain new spy Levy S.

Further, since P1 is the symmetric data pair divided by screening, that is, (F is existed simultaneously_a,F_b, 1) and (F_b,F_a, 1) either (F_a,F_b, 0) and (F_b,F_a, 0), sum is N1.Using twin neural network to sample datas all in P1 to progress Supervised training, specific training method are as follows:

(1) feature extraction is carried out to N1 data respectively using initial neural network M, for F_a,F_b, it is obtained new Feature Mapping in feature space(512 dimension)；

(2) Euclidean distance between new feature after calculating each sample data to feature extraction is passed through, forCalculation is as follows:

(3) using losing as follows with feature penalty term purpose comparison loss function calculating neural network, wherein m indicates pre- The significant interval of definition,For the predefined strength of punishment to feature:

(4) neural network parameter is updated and is learnt using gradient back-propagation algorithm, obtained after the completion of training just Beginning neural network model M_O.

Further, two models completely the same with initial neural network model M_O are constructed, M_F and M_L are denoted as.

Since P2 is the asymmetric data pair divided by screening, that is, exist simultaneously (F_a,F_b, 1) and (F_b,F_a, 0) or It is (F_a,F_b, 1) and (F_b,F_a, 0), sum is N2.Using half twin neural network to data all in P2 to the instruction that exercises supervision Practice, specific training method is as follows:

(1) for first sample of sample data pair each in P2, feature extraction is carried out to it using M_F.For Two samples carry out feature extraction to it using M_L.For example, for F_a,F_b, obtain its feature in new feature space and reflect It penetrates(512 dimension).

(2) each sample data is calculated to by the Euclidean distance between the new feature after feature extraction.ForCalculation is as follows:

(3) it is lost using following with feature penalty term purpose comparison loss function calculating neural network:

(4) loss being calculated is passed back M_F and M_L respectively, using gradient back-propagation algorithm to two models point Not, it independently optimizes, M_F and M_L training are completed, and half twin neural network model is obtained.

S107, by half twin neural network model described in target source video and at least one video input to be predicted, warp The correlation knot between the video to be predicted and the target source video is exported after the half twin neural network model analysis Fruit.

In practical application, for the primary features of any number of videos, it is used for M_F and M_L progress feature respectively and mentions It takes, available two characteristic set R_FAnd R_L.When needing to obtain a certain video v_iAssociated video list when, Ke Yifen It does not calculateWith remove v_iExcept all videos R_LDistance between feature, and according to characteristic distance size to it corresponding to source Video is ranked up, and obtains the associated video list of correlation from big to small.

Therefore, by constructing and between related/uncorrelated video of the twin neural network model study of training half in this implementation Relationship, realization can predict relative list of videos to the video for lacking user behavior information, give each associated video Correlation size and each associated video between correlation it is symmetrical, effectively increase the accuracy of video recommendations.

Following is the application Installation practice, can be used for executing the application embodiment of the method.It is real for the application device Undisclosed details in example is applied, the application embodiment of the method is please referred to.

Fig. 2 is the structural schematic diagram for the video correlation prediction meanss that the embodiment of the present application two provides, as shown in Fig. 2, should Device includes:

Neural metwork training module 210, is used for:

Dependency prediction module 220, is used for:

Video correlation prediction meanss provided in this embodiment, by constructing and training half twin neural network model to learn Relationship between related/uncorrelated video, realizing can predict that relative video arranges to the video for lacking user behavior information Table, the correlation given between the correlation size and each associated video of each associated video is symmetrical, effectively increases video The accuracy of recommendation.

Optionally, the neural metwork training module, is specifically used for:

Using the convolutional neural networks model of pre-training extract respectively the source video sample, the associated video sample and The primary features vector of the uncorrelated video sample.

Optionally, the sample data to include positive sample data to and negative sample data pair；

Optionally, the neural metwork training module, is specifically used for:

Determine that total sampling number N to the primary features vector of the source video sample, N are the integer greater than 1；

Resampling is carried out using primary features vector of the harmonious sampling policy to the associated video sample, constructs positive sample Data pair, quantity N/2；The positive sample data are to the primary features vector F including source video sample_a, associated video sample This primary features vector F_bAnd respective labels, it is denoted as (F_a,F_b, 1)；

Resampling is carried out to the primary features vector of the uncorrelated video sample, constructs negative sample data pair, quantity For N/2；The negative sample data are to the primary features vector F including source video sample_a, uncorrelated video sample primary features Vector F_bWith uncorrelated label, it is denoted as (F_a,F_b,0)。

Optionally, the neural metwork training module, is specifically used for:

The frequency calculation formula are as follows:

Optionally, the neural metwork training module, is specifically used for:

Symmetric data is obtained to the sample data to symmetry division is carried out according to the whether symmetrical principle of correlation Set and asymmetric data set.

Optionally, the neural metwork training module, is specifically used for:

Initial neural network M is constructed, using the initial neural network M to sample data pair in the symmetric data set Two samples carry out feature extraction, obtain the corresponding new feature of two samples；

Calculate the first Euclidean distance between the corresponding new feature of described two samples；

According to first Euclidean distance, neural network loss is calculated using comparison loss function, is reversely passed using gradient It broadcasts algorithm neural network parameter is updated and is learnt, initial neural network model M_O is obtained after the completion of training.

Optionally, the neural metwork training module, is specifically used for:

Half twin neural network is constructed, which includes two initial neural network model M_O, is remembered respectively For M_F and M_L；

For first sample of each sample data pair in asymmetric data set, feature is carried out to it using M_F and is mentioned It takes, obtains corresponding new feature；For second sample of each sample data pair in asymmetric data set, M_L pairs is used It carries out feature extraction, obtains corresponding new feature；

Calculate the second Euclidean between first sample of each sample data pair and the corresponding new feature of second sample Distance；

According to second Euclidean distance, neural network loss is calculated using comparison loss function, the loss is distinguished It passes M_F and M_L back, is separately optimized using parameter of the gradient back-propagation algorithm to M_F and M_L, it is twin to obtain half after the completion of training Raw neural network model.

Fig. 3 is the structural schematic diagram for the electronic equipment that the embodiment of the present application three provides, as shown in figure 3, the equipment includes: to deposit Reservoir 301 and processor 302；

Memory 301, for storing computer program；

Wherein, processor 302 executes the computer program in memory 301, to realize each method embodiment as described above Provided method.

In embodiment, example is carried out to video correlation prediction meanss provided by the present application with an electronic equipment.Processing Device can be the place of central processing unit (CPU) or the other forms with data-handling capacity and/or instruction execution capability Unit is managed, and can control the other assemblies in electronic equipment to execute desired function.

Memory may include one or more computer program products, and computer program product may include various forms Computer readable storage medium, such as volatile memory and/or nonvolatile memory.Volatile memory for example can be with Including random access memory (RAM) and/or cache memory (cache) etc..Nonvolatile memory for example can wrap Include read-only memory (ROM), hard disk, flash memory etc..It can store one or more computers on computer readable storage medium Program instruction, processor can run program instruction, method in each embodiment to realize the application above and/or Other desired functions of person.Such as input signal, signal component, noise point can also be stored in a computer-readable storage medium The various contents such as amount.

The embodiment of the present application four provides a kind of computer readable storage medium, stores in the computer readable storage medium There is computer program, for realizing side provided by each method embodiment as described above when which is executed by processor Method.

In practical application, the computer program in the present embodiment can be with any group of one or more programming languages It closes to write the program code for executing the embodiment of the present application operation, programming language includes object-oriented programming Language, Java, C++, python etc. further include conventional procedural programming language, such as " C " language or similar Programming language.Program code can be executed fully on the user computing device, partly execute, make on a user device It is executed for an independent software package, part partially executes on a remote computing on the user computing device or complete It is executed in remote computing device or server.

In practical application, computer readable storage medium can be using any combination of one or more readable mediums.It can Reading medium can be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, Magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Readable storage medium storing program for executing More specific example (non exhaustive list) includes: electrical connection with one or more conducting wires, portable disc, hard disk, random It accesses memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable Formula compact disk read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.

The description of the aforementioned specific exemplary embodiment to the application is in order to illustrate and illustration purpose.These descriptions It is not wishing to for the application to be limited to disclosed precise forms, and it will be apparent that according to the above instruction, can much be changed And variation.The purpose of selecting and describing the exemplary embodiment is that explaining the specific principle of the application and its actually answering With so that those skilled in the art can be realized and utilize the application a variety of different exemplary implementation schemes and Various chooses and changes.Scope of the present application is intended to be limited by claims and its equivalents.

Claims

1. a kind of video correlation prediction technique characterized by comprising

Obtain source video sample and its associated video list and uncorrelated list of videos；It include multiple in the associated video list The associated video sample sorted from large to small according to correlation includes multiple uncorrelated video samples in the uncorrelated list of videos This；

Extract respectively the primary features of the source video sample, the associated video sample and the uncorrelated video sample to Amount；

Weight is carried out to the primary features vector of the source video sample, the associated video sample and the uncorrelated video sample Sampling constructs sample data pair；

Half twin neural network is constructed, the half twin neural network includes two initial neural network models, according to institute It states asymmetric data set and stand-alone training is carried out to described half twin neural network, obtain half twin neural network model；

It is twin through described half by half twin neural network model described in target source video and at least one video input to be predicted The correlation results between the video to be predicted and the target source video are exported after neural network model analysis.

2. the method according to claim 1, wherein described extract the source video sample, the correlation respectively The primary features vector of video sample and the uncorrelated video sample, comprising:

The source video sample, the associated video sample and described are extracted using the convolutional neural networks model of pre-training respectively The primary features vector of uncorrelated video sample.

3. the method according to claim 1, wherein the sample data to include positive sample data to negative sample Notebook data pair；

The primary features vector to the source video sample, the associated video sample and the uncorrelated video sample into Row resampling constructs sample data pair, comprising:

Resampling is carried out using primary features vector of the harmonious sampling policy to the associated video sample, constructs positive sample data It is right, quantity N/2；The positive sample data are to the primary features vector F including source video sample_a, associated video sample Primary features vector F_bAnd respective labels, it is denoted as (F_a,F_b,1)；

Resampling is carried out to the primary features vector of the uncorrelated video sample, constructs negative sample data pair, quantity N/ 2；The negative sample data are to the primary features vector F including source video sample_a, uncorrelated video sample primary features vector F_bWith uncorrelated label, it is denoted as (F_a,F_b,0)。

4. according to the method described in claim 3, it is characterized in that, described use harmonious sampling policy to the associated video sample This primary features vector carries out resampling, comprising:

The frequency for needing to be sampled using the primary features vector that frequency calculation formula calculates the associated video sampleThen institute The primary features vector for stating associated video sample needs the number that is sampled to be

The frequency calculation formula are as follows:

Wherein,Wherein, i represents source video sample, and j represents the important journey of associated video Degree, the smaller importance of j is stronger, M_iIndicate the total number of associated video, P is default hyper parameter.

5. according to the method described in claim 4, it is characterized in that, it is described to the sample data to carry out symmetry division, Obtain symmetric data set and asymmetric data set, comprising:

Symmetric data set is obtained to the sample data to symmetry division is carried out according to the whether symmetrical principle of correlation With asymmetric data set.

6. according to the method described in claim 5, it is characterized in that, described construct initial neural network, according to the symmetry number The initial neural network is trained according to set, obtains initial neural network model, comprising:

Initial neural network M is constructed, using the initial neural network M to two of sample data pair in the symmetric data set Sample carries out feature extraction, obtains the corresponding new feature of two samples；

According to first Euclidean distance, neural network loss is calculated using comparison loss function, is calculated using gradient backpropagation Method is updated and learns to neural network parameter, and initial neural network model M_O is obtained after the completion of training.

7. according to the method described in claim 6, it is characterized in that, half twin neural network of the building, the half twin mind It include two initial neural network models through network, according to the asymmetric data set to described half twin neural network It is trained, obtains half twin neural network model, comprising:

Half twin neural network is constructed, which includes two initial neural network model M_O, is denoted as M_ respectively F and M_L；

For first sample of each sample data pair in asymmetric data set, feature extraction is carried out to it using M_F, is obtained To corresponding new feature；For second sample of each sample data pair in asymmetric data set, it is carried out using M_L Feature extraction obtains corresponding new feature；

Calculate the second Euclidean distance between first sample of each sample data pair and the corresponding new feature of second sample；

According to second Euclidean distance, neural network loss is calculated using comparison loss function, the loss is passed back respectively M_F and M_L is separately optimized using parameter of the gradient back-propagation algorithm to M_F and M_L, and half twin mind is obtained after the completion of training Through network model.

8. a kind of video correlation prediction meanss characterized by comprising

Neural metwork training module, is used for:

Dependency prediction module, is used for:

9. a kind of electronic equipment, comprising: memory and processor；

The memory, for storing computer program；

Wherein, the processor executes the computer program in the memory, to realize such as any one of claim 1-7 institute The method stated.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program, for realizing method such as of any of claims 1-7 when the computer program is executed by processor.