CN110933503A

CN110933503A - Video processing method, electronic device and storage medium

Info

Publication number: CN110933503A
Application number: CN201911128374.0A
Authority: CN
Inventors: 张进; 刘昕; 莫东松; 马晓琳; 张健; 赵璐; 钟宜峰; 马丹; 王科
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-03-27

Abstract

Embodiments of the present invention relate to the field of image processing, and in particular, to a video processing method, an electronic device, and a storage medium. The video processing method comprises the following steps: predicting the preference of the user to each alternative video according to a preference prediction model preset for the user; the preference prediction model is obtained by training according to historical videos watched by the user; and synthesizing the alternative videos according to the preference degree of the user to each alternative video. By adopting the embodiment of the invention, the video with the attached preference can be automatically synthesized, so that the problem that the personalized video watching requirement cannot be met is solved.

Description

Video processing method, electronic device and storage medium

Technical Field

Embodiments of the present invention relate to the field of image processing, and in particular, to a video processing method, an electronic device, and a storage medium.

Background

With the development of science and technology, network videos, network live broadcasts and the like become novel industries of the leisure and entertainment industry; whether news conferences, sports competitions, classroom teaching, business meetings and the like, a plurality of users can watch the information in a network video and live broadcasting mode. The network video or network live broadcast, etc. watched by users are generally selected and switched by the director. However, the inventors found that the following problems exist in the related art: since the same network video or the content of the network live broadcast watched by the users is basically selected by the director from the plurality of video streams uniformly, but the content selected by the director uniformly does not necessarily conform to the preference of the users, and the users cannot realize the purpose of selecting the watched content from the plurality of video streams individually.

Disclosure of Invention

An object of embodiments of the present invention is to provide a video processing method, an electronic device, and a storage medium, which can automatically synthesize a video matching a preference so as to solve a problem that a personalized video watching requirement cannot be met.

To solve the above technical problem, an embodiment of the present invention provides a video processing method, including: predicting the preference of the user to each alternative video according to a preference prediction model preset for the user; the preference prediction model is obtained by training according to historical videos watched by the user; and synthesizing the alternative videos according to the preference degree of the user to each alternative video.

An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video processing method described above.

Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the video processing method described above.

Compared with the prior art, the method and the device have the advantages that the preference degree of the user to each alternative video is predicted according to a preference degree prediction model preset for the user; the preference prediction model is obtained by training according to historical videos watched by the user; and synthesizing the alternative videos according to the preference degree of the user to each alternative video. Because the historical videos watched by the user can reflect the preference of the user for watching the videos, and each user is preset with the preference prediction model, the preference prediction model obtained by training can truly, accurately and pertinently predict the preference of the user for each alternative video, and the predicted preference can have higher reference value; and the finally synthesized video is obtained based on the preference degree of the user to each alternative video, which shows that the synthesized video can fit the preference of the user, namely, the personalized video watching requirement of the user is met, and the condition that the videos uniformly presented to the user by manpower have certain subjective limitation is avoided.

In addition, video features and the weight of the video features are preset in the preference prediction model; the preference degree prediction model is obtained by training in the following mode: acquiring a feature vector of the video feature in the historical video according to a preset self-encoder; predicting the preference of the user to the historical video according to the feature vector and a preset neural network; and adjusting the weight of the video features according to the preset labels and the preference degrees of the historical videos. The method for training the preference prediction model is provided, and the characteristics of the video preferred by the user are learned by combining the self-encoder and the neural network, and the preference of the user to the video is predicted, so that the calculation precision of the prediction model is guaranteed; since the historical video watched by the user is used for training the prediction model, parameters of the prediction model need to be adjusted according to the difference between the real preference of the user to the historical video and the predicted preference, and therefore the real preference of the user to the historical video can be reflected in the form of a preset label.

In addition, before adjusting the weight of the video feature, the method further comprises: and judging that the change value of the weight does not exceed a preset adjusting threshold value. That is, if the weight of the video feature is adjusted according to the prediction preference and the like, and the change of the adjustment of the weight is found to be too large, the adjustment of the weight is abandoned, that is, the training data is not adopted; the method considers the possibility of errors in the predicted preference degree, and avoids the influence of training data with larger errors on the training process of the model.

In addition, the preset label of the historical video is obtained by the following method: acquiring the watching time of the historical video by the user; marking the historical videos with the watching duration being greater than or equal to a first preset threshold value with a first label; marking the historical videos of which the watching duration is less than or equal to a second preset threshold value with a second label; wherein the first preset threshold is greater than the second preset threshold. The above provides a way for presetting tags for a historical video, and it can be understood that, the longer the viewing time of a user on the historical video is, the higher the preference of the user on the historical video is, the marked first tag may be used to represent the forward attribute of the historical video (i.e., a video that fits the preference of the user); the shorter the watching time of the historical video by the user is, the lower the preference of the user to the historical video is, the marked second label can be used for representing the negative attribute of the historical video (namely, the video not attached to the preference of the user); the label is convenient to reflect the fitting degree of the historical video and the preference of the user visually, and is helpful for comparing with the predicted preference degree to adjust the weight of the video characteristics.

In addition, acquiring a feature vector of the video feature in the historical video according to a preset self-encoder includes: cutting the historical video into slice video segments, and acquiring a feature vector of the video feature in each slice video segment according to a preset self-encoder; compared with the complete historical video, each sliced video segment is more convenient and fast to use by the processor, so that the video processing efficiency is improved, and the operation load of a self-encoder and the like is reduced; after the historical video is cut into the slice video segments, the feature vector of the video feature in each slice video segment is obtained instead of the feature vector of the video feature in one historical video, and the obtained feature vectors are expanded in number, namely samples for training the model are enriched, so that the training effect of the model is improved.

In addition, the preset video features include: the method comprises the following steps of (1) carrying out shot type characteristic, shot scene characteristic, video sound characteristic, shot use characteristic and video special effect characteristic; each video feature is composed of a plurality of sub-video features, and the preset weight of each video feature comprises the weight of each sub-video feature.

In addition, synthesizing the alternative videos according to the user's preference for each alternative video includes: sequencing the alternative videos according to the preference degree of the user to each alternative video; synthesizing the alternative videos with the preference degrees positioned at the front N bits; wherein, the N is an integer less than the number of the alternative videos. In the above manner, the alternative videos with higher degree of fitting with the preference of the user are screened from all the alternative videos to be synthesized, and the alternative videos which are not favored by the user are abandoned, so that the finally synthesized video can meet the watching requirement of the user to a greater extent.

In addition, the binding relationship between the identity information of the user and the preference prediction model is prestored; the preference degree prediction model preset for the user is obtained in the following mode: acquiring identity information of the user; and acquiring a preference degree prediction model bound with the identity information of the user according to the binding relation.

Drawings

One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.

Fig. 1 is a flowchart of a video processing method in a first embodiment of the present invention;

FIG. 2 is a flow chart of a method for training a preference prediction model according to a first embodiment of the present invention;

fig. 3 is a schematic diagram illustrating an operation process of a preset self-encoder according to a first embodiment of the present invention;

FIG. 4 is a flowchart of a method for training a preference prediction model according to a second embodiment of the present invention;

FIG. 5 is a flowchart for acquiring a tag of a preset historical video according to a third embodiment of the present invention;

fig. 6 is a block diagram showing the configuration of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

A first embodiment of the present invention relates to a video processing method, and a specific flow is shown in fig. 1, where the method includes:

step 101, predicting the preference of the user to each alternative video according to a preference prediction model preset for the user.

And 102, synthesizing the alternative videos according to the preference of the user to each alternative video.

The following describes the implementation details of the video processing method of the present embodiment in detail, and the following only provides details for easy understanding and is not necessary to implement the present embodiment.

In the embodiment, the preference of the user to each alternative video is predicted according to a preference prediction model preset for the user; the preference prediction model is obtained by training according to historical videos watched by the user; and synthesizing the alternative videos according to the preference degree of the user to each alternative video. Because the historical videos watched by the user can reflect the preference of the user for watching the videos, and each user is preset with the preference prediction model, the preference prediction model obtained by training can truly, accurately and pertinently predict the preference of the user for each alternative video, and the predicted preference can have higher reference value; and the finally synthesized video is obtained based on the preference degree of the user to each alternative video, which shows that the synthesized video can fit the preference of the user, namely, the personalized video watching requirement of the user is met, and the condition that the videos uniformly presented to the user by manpower have certain subjective limitation is avoided.

In step 101, predicting the preference of the user to each alternative video according to a preference prediction model preset for the user; and the preference prediction model is obtained by training according to the historical video watched by the user. It can be understood that, a preference prediction model obtained by training according to the historical video watched by the current user is preset for each user in advance, the candidate video is input, and the predicted preference of the current user to the candidate video is output; the prediction model preset for each user in advance can be bound with unique identifications such as the ID of the user, so that the bound preset model can be obtained according to the unique identifications such as the ID of the user; the preference degree can be expressed by a specific numerical value, wherein the larger the data is, the more the current user likes to watch the alternative video, the smaller the numerical value is, the less the current user likes to watch the alternative video, and the like.

In this embodiment, a specific implementation method for training a preference prediction model is provided, where a plurality of video features and weights of the video features are preset in the preference prediction model, and a process of training the preference prediction model can be understood as a process of continuously adjusting the weights of the video features, so that the weights of the video features that are preferred by a user are larger. The specific flow of the training method of the preference prediction model is shown in fig. 2, and includes:

step 201, obtaining a feature vector of video features in a historical video according to a preset self-encoder.

Specifically, the historical video watched by the user is input into a preset self-encoder, and a feature vector of preset video features in the historical video is output. In this embodiment, the operation process of the preset self-encoder can be understood as the schematic diagram shown in fig. 3. The preset self-encoder comprises an encoder and a decoder; the encoder is used for compressing the input data into potential space representation to obtain a mean vector and a standard deviation vector, and obtaining sampling codes according to the mean vector and the standard deviation vector to be used as the input data of the decoder; the encoder comprises 10 layers of convolution, wherein the kernel parameter of the layer 1 convolution is 7 x 7, and the output channel is 256; the 2 nd layer convolution kernel parameter is 7 x 7, and the output channel is 128; the convolution kernel parameters from layer 3 to layer 8 are 5 x 5, the output channel is 512, and so on. The decoder is used for reconstructing input data from potential spatial representation (namely sampling coding), and outputting a feature vector of video features; the decoder comprises 10 layers of convolution and 1 layer of dropout (random deactivation layer), wherein the convolution kernel parameters from the 1 st layer to the 6 th layer are 5 x 5, and the output channel is 256; the 7 th layer convolution kernel parameter is 7 x 7, and the output channel is 128; the 8 th layer convolution kernel parameter is 7 x 7, the output channel is 256, and the like; the dropout layer is used for zeroing part of elements output by the convolutional layer so as to optimize the structure of the self-encoder.

In one example, the preset video features and weights in the preference prediction model may refer to the following table 1, where the preset video features include the following: the method comprises the following steps of shot type characteristics, shot scene characteristics, video sound characteristics, shot use characteristics and video special effect characteristics, wherein each video characteristic comprises a plurality of sub-video characteristics, the sum of the weights of each video characteristic is 100, and the sum of the weights of each sub-video characteristic is included.

It should be understood that the weights of the video feature, the sub-video feature and the sub-video feature in table 1 are only for convenience of understanding, and are not limited to specific examples. Table 1 above has a total of 34 sub-video features, and the output from the encoder is a feature vector of 34 of the above sub-video features in the historical video.

And step 202, predicting the preference of the user to the historical video according to the feature vector and a preset neural network.

Specifically, the feature vector output from the encoder is input into a preset neural network, and the predicted user preference for the historical video is output. In one example, the predetermined neural network comprises 15 layers of convolution, wherein the 1 st layer of convolution kernel parameter is 7 × 7, and the output channel is 256; the 2 nd layer convolution kernel parameter is 7 x 7, and the output channel is 128; the convolution kernel parameters from the 3 rd layer to the 13 th layer are 5 x 5, and the output channel is 512; the 14 th layer to the 15 th layer are the same combination of the full connection layer and the softmax output layer; the softmax output layer is used for converting the numerical value into the probability, namely the probability is used for predicting whether the user likes to watch the historical video, and the range of the general probability is fixed [0,1], so that the probability can intuitively reflect whether the user likes to watch the historical video or not relative to the data.

Step 203, adjusting the weight of the video features according to the preset labels and preference of the historical videos.

Specifically, since the historical video watched by the user is used for training the prediction model, parameters of the prediction model need to be adjusted according to the difference between the real preference of the user to the historical video and the predicted preference; therefore, in the embodiment, the real preference of the user to the historical video is reflected by presetting the label of the historical video, and the label can be used for asking the user to mark the historical video by himself while collecting the historical video watched by the user, for example, marking the historical video as a favorite label, a general label, a dislike label, and the like, or marking a probability value on the historical video as a label, wherein the higher the probability value is, the more the user likes the historical video, and the like; the label obtained in this way can reflect the preference degree of the user for the historical video most really.

And comparing the preset label of the historical video with the preference degree obtained by prediction, and adjusting the weight of the video feature according to the difference between the preset label of the historical video and the preference degree obtained by prediction to ensure that the weight ratio of the video feature which is preferred by the user is larger, and the weight ratio of the video feature which is not preferred by the user is reduced. It can be understood that the more prominent video features in the historical video will greatly affect the user's preference for the historical video, so when the weights of the video features are adjusted, the weights of the more prominent video features can be adjusted; the more prominent video features have more prominent feature vector values, so the more prominent video features can be obtained according to the feature vectors output from the encoder. For example, when the label reflects that the user likes the historical video and the preference of the user to the historical video is predicted to be higher, the weight of the more prominent video features can be increased; when the labels reflect that the user does not like the historical videos, but the preference of the user to the historical videos is predicted to be high, the weight of the more prominent video features can be reduced.

In addition, when the historical video is processed through a preset self-encoder, a neural network and the like, the historical video can be cut into slice video segments, and then each slice video segment is processed, that is, a feature vector of video features in each slice video segment is obtained through the preset self-encoder, and the preference of a user to each slice video segment is predicted. Because each slice video segment is more convenient and fast compared with the complete historical video, the video processing efficiency is improved, and the operation load of a self-encoder and the like is reduced; after the historical video is cut into the slice video segments, the feature vector of the video feature in each slice video segment is obtained instead of the feature vector of the video feature in one historical video, and the obtained feature vectors are expanded in number, namely samples for training the model are enriched, so that the training effect of the model is improved.

In step 102, the alternative videos are synthesized according to the user's preference for each alternative video. In this embodiment, the method for synthesizing the alternative video is not specifically limited, for example: and splicing the alternative videos in sequence to obtain a final synthesized video, or simultaneously displaying the alternative videos in the same picture to obtain the final synthesized video, and the like.

When the alternative videos are synthesized, the alternative videos can be sequenced according to the preference degree of each alternative video of a user, and the alternative videos are sequenced in sequence from high to low according to the preference degree; screening out the alternative videos positioned at the front N positions according to the sorting result for synthesis, wherein N is an integer smaller than the number of the alternative videos; for example, 10 candidate videos are selected, the first 5 candidate videos are screened out for synthesis according to the sequence of the preference degree, and the remaining 5 candidate videos are discarded; that is to say, the candidate videos with a higher degree of fitting the user's taste are screened from all the candidate videos and synthesized, and the candidate videos that the user does not like enough are discarded, so that the finally synthesized video can greatly meet the watching requirement of the user.

In one example, the alternative video may be a plurality of video streams for live broadcasting, that is, when live broadcasting is performed at a news conference, a sports game, classroom teaching, and the like, multiple cameras simultaneously perform video shooting, and a video shot by each camera serves as one alternative video stream; when a user enters a live broadcast room to watch, acquiring an ID of the user, and acquiring a bound preference prediction model according to the ID of the user; and synthesizing the multiple paths of alternative video streams according to the preference prediction model bound by the user to obtain a live stream which is finally played for the user to watch.

Compared with the prior art, the method and the device have the advantages that the preference degree of the user to each alternative video is predicted according to a preference degree prediction model preset for the user; the preference prediction model is obtained by combining a self-encoder, a neural network and the like and training according to the historical videos watched by the user, so that the calculation precision of the prediction model is guaranteed, the preference of the user to each alternative video can be predicted really, accurately and pertinently, and the predicted preference is high in reference value; (ii) a And synthesizing the alternative videos according to the preference degree of the user to each alternative video, wherein the synthesized videos can fit the preference of the user, namely the personalized video watching requirement of the user is met, and the condition that the videos uniformly presented to the user by manpower have certain subjective limitation is avoided.

A second embodiment of the present invention relates to a video processing method, and is substantially the same as the first embodiment except that: in the process of training a prediction model, a weight adjustment threshold value of video features is preset; therefore, the flow chart of the video processing method in this embodiment is substantially the same as that in fig. 1, and the specific flow of the training method of the preference prediction model in this embodiment is shown in fig. 4, and the following describes the flow of fig. 4 specifically:

step 301, obtaining a feature vector of video features in a historical video according to a preset self-encoder. This step is substantially the same as step 201, and is not described herein again.

And step 302, predicting the preference of the user to the historical video according to the feature vector and a preset neural network. This step is substantially the same as step 202, and is not described herein again.

Step 303, judging whether the change value of the weight of the adjusted video features exceeds a preset adjustment threshold value according to the preset labels and the preference degrees of the historical videos; if not, go to step 304; if yes, the process is finished.

Specifically, comparing a preset historical video label with the preference degree obtained by prediction, and adjusting the weight of the video features according to the difference between the preset historical video label and the preference degree, so that the weight ratio of the video features which are preferred by the user is larger, and the weight ratio of the video features which are not preferred by the user is reduced; the more prominent video features can be obtained according to the feature vectors output by the encoder, and the weights of the more prominent video features are adjusted, for example, the weights can be adjusted by successive simulation, and the change value of the weights is determined according to the change of the preference degree output by the model, so that the preference degree output by the model is closer to the real preference degree of a user on the historical video; if the preference degree output by the model is closer to the real preference degree of the user to the historical video, the change value of the weight exceeds the preset adjustment threshold value, the adjustment is abandoned, and if the change value of the weight does not exceed the preset adjustment threshold value, step 304 is executed, and the weight of the video feature is adjusted according to the change value. For another example, the range of the adjusted weight may be determined according to a preset adjustment threshold, and the model may be run many times in an automatic manner to try to determine a more reasonable variation value of the weight; if the preference degree output by the model is not close to the real preference degree of the user to the historical video all the time when the model is operated in the weight range after the adjustment, the adjustment is abandoned; if a reasonable weight change value can be determined when the model is operated in the range of the adjusted weight, so that the preference degree of the model output is closer to the real preference degree of the user to the historical video, step 304 is executed, and the weight of the video feature is adjusted according to the change value.

The determination of the preset adjustment threshold in the present embodiment is not particularly limited, and for example, the preset adjustment threshold may be determined to be (50% × weight), that is, the preset adjustment threshold is not exceeded when the variation value of the weight is [ (-50% × weight), (50% × weight) ].

Step 304, adjusting the weight of the video features, which is not described herein again.

Compared with the prior art, the method has the advantages that the weight adjustment threshold of the video features is preset, and if the weight of the video features is adjusted according to the prediction preference degree and the like, the change of the adjustment of the weight is found to be overlarge, the adjustment of the weight is abandoned, namely the training data is not adopted; in the embodiment, the possibility that the predicted preference degree has errors is considered, and the training process of the model is prevented from being influenced by training data with larger errors.

A third embodiment of the present invention relates to a video processing method, and is substantially the same as the first embodiment except that: in the process of training the prediction model, the preset labels of the historical videos are obtained through the watching duration of the historical videos by the user; therefore, the flow chart of the video processing method in this embodiment is substantially the same as that in fig. 1, a specific flow of acquiring the preset label of the history video in this embodiment is shown in fig. 5, and the following description is made on the flow of fig. 5:

step 401, acquiring the watching duration of a historical video by a user;

step 402, if the watching duration is greater than or equal to a first preset threshold, marking the historical video with a first label; if the watching duration is less than or equal to a second preset threshold, marking the historical video with a second label; the first preset threshold is larger than the second preset threshold.

Specifically, the viewing time of the historical video by the user can basically reflect whether the user likes to view the historical video, when the viewing time of the historical video by the user is longer, the preference of the user to the historical video is higher, and when the viewing time of the historical video by the user is shorter, the preference of the user to the historical video is lower. Therefore, a threshold value can be preset for dividing the watching time length of the historical video by the user and the preference degree reflected by the watching time length; when the watching duration is greater than or equal to a first preset threshold, marking the historical video with a first label for representing the forward attribute of the historical video (namely, the historical video is a video fitting the preference of the user); when the watching duration is less than or equal to a second preset threshold, marking the historical video with a second label for representing the negative attribute of the historical video (namely the historical video does not fit with the video preferred by the user); it can be understood that, when the viewing duration is greater than the second preset threshold and less than the first preset threshold, the historical video may be labeled with a third label for representing other attributes of the historical video (i.e., the degree of fit between the historical video and the user's preference is general); similarly, because the attributes of the historical videos with the general fitting degree with the user preferences are difficult to define specifically, the historical videos with the watching duration longer than the second preset threshold and shorter than the first preset threshold can be abandoned, and only the historical videos with the positive attributes and the negative attributes are used as samples of the training model to ensure the accuracy of model training, namely the training effect of the model.

In one example, the first preset threshold is (60% x total video duration), and the second preset threshold is (40% x total video duration); acquiring the watching time length of the historical video of a user, and marking the historical video as a first label 1 if the watching time length of the user is (60%, 100%) of the total time length of the historical video; if the user's viewing duration is at 0%, 40% of the total duration of the historical video, the historical video is labeled a second label 0. It can be understood that if the historical video is cut into slice video segments, the label of each slice video segment is the label marked for the historical video.

Compared with the prior art, the embodiment provides a mode of presetting a label for the historical video through the time length of watching the historical video by the user, namely, the longer the watching time length of the historical video is, the first label is marked for the historical video and is used for representing the forward attribute of the historical video (namely, the video which is attached with the preference of the user); the shorter the watching time of the historical video is, the second label is marked for the historical video and is used for representing the negative attribute of the historical video (namely, the video which is not fit with the preference of the user); the label is convenient to reflect the fitting degree of the historical video and the preference of the user visually, and is helpful for comparing with the predicted preference degree to adjust the weight of the video characteristics.

A fourth embodiment of the invention relates to an electronic device, as shown in fig. 6, comprising at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501 to enable the at least one processor 501 to execute the video processing method.

The memory 502 and the processor 501 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 501 and the memory 502 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 501 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 501.

The processor 501 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 502 may be used to store data used by processor 501 in performing operations.

A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program, when executed by a processor, implements the above-described video processing method embodiments.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A video processing method, comprising:

predicting the preference of the user to each alternative video according to a preference prediction model preset for the user; the preference prediction model is obtained by training according to historical videos watched by the user;

and synthesizing the alternative videos according to the preference degree of the user to each alternative video.

2. The video processing method according to claim 1, wherein the preference prediction model is preset with video features and weights of the video features; the preference degree prediction model is obtained by training in the following mode:

acquiring a feature vector of the video feature in the historical video according to a preset self-encoder;

predicting the preference of the user to the historical video according to the feature vector and a preset neural network;

and adjusting the weight of the video features according to the preset labels and the preference degrees of the historical videos.

3. The video processing method according to claim 2, further comprising, before said adjusting the weight of the video feature:

and judging that the change value of the weight does not exceed a preset adjusting threshold value.

4. The video processing method according to claim 2, wherein the preset label of the historical video is obtained by:

acquiring the watching time of the historical video by the user;

marking the historical videos with the watching duration being greater than or equal to a first preset threshold value with a first label;

marking the historical videos of which the watching duration is less than or equal to a second preset threshold value with a second label;

wherein the first preset threshold is greater than the second preset threshold.

5. The video processing method according to claim 2, wherein said obtaining a feature vector of the video feature in the historical video according to a preset self-encoder comprises:

and cutting the historical video into slice video segments, and acquiring the feature vector of the video feature in each slice video segment according to a preset self-encoder.

6. The video processing method according to claim 2, wherein the preset video features comprise: the method comprises the following steps of (1) carrying out shot type characteristic, shot scene characteristic, video sound characteristic, shot use characteristic and video special effect characteristic; each video feature comprises a plurality of sub-video features, and the preset weight of each video feature comprises the weight of each sub-video feature.

7. The video processing method according to claim 1, wherein said synthesizing the alternative videos according to the user's preference for each alternative video comprises:

sequencing the alternative videos according to the preference degree of the user to each alternative video;

synthesizing the alternative videos with the preference degrees positioned at the front N bits; wherein, the N is an integer less than the number of the alternative videos.

8. The video processing method according to claim 1, wherein a binding relationship between the user's identity information and the preference prediction model is prestored; the preference degree prediction model preset for the user is obtained in the following mode:

acquiring identity information of the user;

and acquiring a preference degree prediction model bound with the identity information of the user according to the binding relation.

9. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video processing method of any of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a video processing method according to any one of claims 1 to 7.