CN112399201A

CN112399201A - Video aging determination method and device, electronic equipment and medium

Info

Publication number: CN112399201A
Application number: CN202011072163.2A
Authority: CN
Inventors: 朱朝悦
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-09
Filing date: 2020-10-09
Publication date: 2021-02-23
Anticipated expiration: 2040-10-09
Also published as: CN112399201B

Abstract

The invention discloses a video timeliness determination method, a video timeliness determination device, electronic equipment and media. The method comprises the following steps: acquiring key information of a target video; determining a vector corresponding to the key information to obtain a key vector; determining a part-of-speech vector corresponding to each key word according to the part-of-speech of each key word in the key information; determining a word position vector corresponding to each key word according to the position of each key word in the key information; obtaining a word vector set corresponding to the key information according to the part-of-speech vector and the word position vector corresponding to each key word; and inputting the key vectors and the word vector set into a video timeliness determination model to obtain the distribution timeliness of the target video. The video aging determination scheme provided by the invention relates to an artificial intelligence technology, enriches the text type characteristics of a target video, and improves the prediction accuracy and generalization capability of a video aging determination model. The distribution timeliness of the target video is output by the video timeliness determination model, and the efficiency and the precision are high.

Description

Video aging determination method and device, electronic equipment and medium

Technical Field

The present application relates to the field of internet communication technologies, and in particular, to a method and an apparatus for determining a video aging, an electronic device, and a medium.

Background

With the rapid development of internet communication technology, networks have become an important way for people to acquire and share information. The server can push various information to the client, and the user can obtain various information through the client. For example, the server sends a video to the client, and the user clicks the video through the client to watch the video.

The content of a video has a certain positive feedback effect for a period of time, and the effect can be measured by the interest degree of the user in the video content. The video timeliness indicates that the video is pushed to the user to watch in a time range, and the user does not feel uncomfortable when watching the video. Accordingly, pushing the video beyond this time range to the user will make the user feel dislike and will not bring forward effects to the related products.

In the related technology, on one hand, the timeliness of the video to be pushed is determined in a manual labeling mode, and the manual labeling often has subjective timeliness determination standards, so that the labeling quality is unstable, and meanwhile, the manual labeling efficiency is low and the cost is high. On one hand, the timeliness corresponding to various text information is determined by extracting information (such as subtitle text information) of various texts in the video to be pushed, and further the timeliness corresponding to various text information is synthesized to determine the timeliness of the video to be pushed. The link of extracting various text information consumes a lot of time, so that the efficiency of determining the video timeliness can be influenced, and particularly, a huge number of videos to be pushed can be obtained. Therefore, there is a need to provide an accurate and efficient video age determination scheme.

Disclosure of Invention

In order to solve the problems of low efficiency and the like when the prior art is applied to determining the timeliness of a video to be pushed, the application provides a video timeliness determining method, device, electronic equipment and medium:

according to a first aspect of the present application, there is provided a video aging determination method, the method comprising:

acquiring key information of a target video;

determining a vector corresponding to the key information to obtain a key vector;

determining a part-of-speech vector corresponding to each key word according to the part-of-speech of each key word in the key information;

determining a word position vector corresponding to each key word according to the position of each key word in the key information;

obtaining a word vector set corresponding to the key information according to the part-of-speech vector and the word position vector corresponding to each key word;

and inputting the key vectors and the word vector set into a video timeliness determination model to obtain the distribution timeliness of the target video.

According to a second aspect of the present application, there is provided a video aging determination apparatus, the apparatus comprising:

a key information acquisition module: the method comprises the steps of obtaining key information of a target video;

a key vector obtaining module: the vector corresponding to the key information is determined to obtain a key vector;

a part of speech vector determination module: the system is used for determining a part-of-speech vector corresponding to each key word according to the part-of-speech of each key word in the key information;

a word position vector determination module: the word position vector corresponding to each key word is determined according to the position of each key word in the key information;

a word vector set obtaining module: the word vector set corresponding to the key information is obtained according to the part-of-speech vector and the word position vector corresponding to each key word;

video aging obtaining module: and the method is used for inputting the key vectors and the word vector set into a video timeliness determination model to obtain the distribution timeliness of the target video.

According to a third aspect of the present application, there is provided an electronic device comprising a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the video aging determination method according to the first aspect.

According to a fourth aspect of the present application, there is provided a computer-readable storage medium having at least one instruction or at least one program stored therein, the at least one instruction or the at least one program being loaded and executed by a processor to implement the video aging determination method according to the first aspect.

According to a fifth aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the video aging determination method of the first aspect.

The video timeliness determination method, device, electronic equipment and medium provided by the application have the following technical effects:

the method and the device for determining the distribution timeliness of the target video determine the key vectors and the word vector sets corresponding to the target video by using the key information of the target video, and then input the key vectors and the word vector sets into the video timeliness determination model to further obtain the distribution timeliness of the target video. The key vectors are vectors corresponding to the key information, the part-of-speech vectors in the word vector set indicate the parts of speech of the key words in the key information, and the word position vectors in the word vector set indicate the positions of the key words in the key information. The method and the device extract the part-of-speech vectors and the word position vectors by using the key information, enrich the text type characteristics of the target video, and improve the prediction accuracy and the generalization capability of the video aging determination model based on the richer text type characteristics. The distribution timeliness of the target video is output by the video timeliness determination model, and the efficiency and the precision are high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the invention;

fig. 2 is a schematic flow chart of a video aging determination method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a training-derived video aging determination model according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a video aging determination method according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of a method for training a video classification model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an application scenario of a video aging determination model according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of a video aging determination method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an implementation method for vector representation of a text in a word2vec model (a correlation model for generating word vectors) provided by an embodiment of the present invention;

FIG. 9 is a diagram of a TextCNN (a convolutional neural network for text classification) model provided by an embodiment of the present invention;

FIG. 10 is a diagram of a TextCNN model provided by an embodiment of the present invention;

fig. 11 is a block diagram illustrating a video aging determination apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

And (3) user retention: the user starts to use a certain application product within a certain period of time, and continues to use the application product after a certain period of time, and the user is regarded as a retention user. The proportion of the users to the newly added users at that time is the retention rate, and statistics can be performed every 1 unit time (such as day, week, month). As the name implies, retention refers to "how many users are left". The user retention and the retention rate reflect the quality of the application product and the ability to retain the user.

Click rate: the ratio of the number of times a certain content on a web page is clicked to the number of times it is displayed, i.e., clicks/views, is a percentage. For the pushed information, the click rate of the information can be divided by the exposure of the information, and the click rate can reflect the pushing effect of the information. The Click Rate may also be referred to herein as a Click-Through-Rate (Click-Through-Rate).

Artificial Intelligence (AI): it is a theory, method, technique and application system that uses digital computer or machine controlled by digital computer to simulate, extend and expand human intelligence, sense environment, obtain knowledge and use knowledge to obtain optimum result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is used as a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Referring to fig. 1, fig. 1 is a schematic diagram of an application environment according to an embodiment of the present invention, where the application environment may include a client 01 and a server 02, and the client 01 and the server 02 may be directly or indirectly connected through wired or wireless communication. For a target video to be distributed, the server can determine a key vector and a word vector set corresponding to the target video by using key information of the target video, and then input the key vector and the word vector set into a video timeliness determination model, so as to obtain the distribution timeliness of the target video. And the server distributes the target video to the client according to the distribution time limit of the target video. It should be noted that fig. 1 is only an example.

The client may include a physical device of a type such as a smart phone, a desktop computer, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR) device, a digital assistant, a smart speaker, a smart wearable device, and the like, and may also include software running in the physical device, such as a computer program. The operating system corresponding to the client may include an Android system (Android system), an IOS system (a mobile operating system developed by apple inc.), linux (an operating system), Microsoft Windows (Microsoft Windows operating system), and the like.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. Which may include a network communication unit, a processor, and memory, among others. The server may provide background services for the client.

In the embodiment of the present invention, the trained video aging determination model may use a Natural Language Processing (NLP) technology. Natural language processing technology is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

The following describes a specific embodiment of a video aging determination method according to the present invention, and fig. 2 is a flowchart of a video aging determination method according to an embodiment of the present invention, and the present specification provides the method operation steps as described in the embodiment or the flowchart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:

s201: acquiring key information of a target video;

in the embodiment of the present invention, the target video to be distributed may be from a video pool. The video pool can comprise a first video and a historical video, wherein the first video is a video with the time for entering the video pool or the online time being less than the preset time, and the historical video is a video with the time for entering the video pool or the online time being more than the preset time.

As representative information capable of efficiently locating a target video, the key information may include at least one of: video topics, video tags. Of course, the key information is not limited to the video topic and the video tag, but may be other information, such as a video brief, a video bullet screen. The video subject of the target video is text describing the subject content of the target video, and the video subject is often short text (text is short in length, usually tens of bytes in size, including several to tens of words). A video title (title) may be used as the video topic, such as the video title "take notes!of target video B! How do fund managers help us to earn money? "; the video title "Wandering Earth The Wandering Earth" of The target video C. The video tag (tag) of the target video serves as a representative word for describing the target video, and has strong correlation with the content of the target video. For example, the video tags of the target video C include inland, 2019, science fiction, disaster, adventure, drama, and cinema. The manner in which the video tag is generated may be by at least one of: video creators, video viewers, staff maintaining the video.

S202: determining a vector corresponding to the key information to obtain a key vector;

in the embodiment of the invention, the vector transformation can be carried out on the key information presented in the text form, so as to obtain the key vector. When the key information includes a video theme, vector conversion can be performed on the video theme to obtain a theme vector, and the theme vector is used as a key vector. The video topic may be segmented to obtain at least one topic term. And when the number of the subject words is 1, performing vector conversion on the subject words, and taking the obtained vector as a subject vector. And when the number of the theme words is more than or equal to 2, respectively carrying out vector conversion on each theme word, and then obtaining a theme vector based on the vector corresponding to each theme word. The above steps of vector translation can be implemented using word2vec model (a correlation model used to generate word vectors), glove (a word embedding tool), BERT model (transformer-based bi-directional coding representation model), GPT model (a word vector model).

When the key information includes a video topic and a video tag, a vector corresponding to the video topic and the video tag may be determined, and the key vector is obtained. A vector 1 corresponding to the video theme and a vector 2 corresponding to the video tag can be obtained respectively, and then the combination of the vector 1 and the vector 2 is taken as an off vector. The obtaining manner of the vector 1 may refer to the above processes of word segmentation processing and vector transformation, and is not described herein again. Considering that the video tags are often presented in the form of words, a word set to be processed can be constructed based on the subject words and the video tags obtained after the video subject is segmented, then vector conversion is performed on each word in the word set to be processed, and then a subject vector is obtained based on the vector corresponding to each word. Of course, the vector transformation here can also be realized by word2vec model, glove, BERT model, GPT model. The video theme and the video label are used as key information together, so that a key vector fusing the video theme information and the video label information is obtained, the information content of the key vector is richer, and the accuracy of positioning the video content by using the short text can be improved.

The above mentioned word2vec model is described below: word2vec is one of Word Embedding methods, which is a new set of Word Embedding methods proposed by Mikolov of google in 2013. CBOW (Continuous Bag-of-Words Model) and Skip-gram (a Model for training word vectors) are implementation methods for vector representation of texts in word2vec Model, and refer to FIG. 8.

The training input of the CBOW is a context word vector corresponding to a particular word, and the output is a word vector of the particular word. The concept of Skip-Gram and CBOW is reversed, i.e. the input is a word vector for a particular word and the output is a context word vector for the particular word. CBOW is more suitable for small databases, while Skip-Gram performs better in large corpora.

S203: determining a part-of-speech vector corresponding to each key word according to the part-of-speech of each key word in the key information;

in the embodiment of the invention, the part of speech is used as the grammatical attribute of a word, and is determined according to the grammatical function of the word in the relevant combination. If the 'beautiful' has the grammatical attribute of the adjective, the adjective is obtained. The part of speech can be regarded as a basis for giving word classification. Parts of speech are usually the result of word segmentation and grammar analysis. In Chinese or other languages, nouns and vernouns can embody main information of related texts better than words of other parts of speech, such as the main idea of an article can be expressed better.

The parts of speech of different key words in the key information are determined, and the semantics of the video theme can be better mined. On the basis of the key vectors, part-of-speech vectors are mined from the key information, and feature vectors contributed by the key information serving as short texts can be enriched. The key information presented in text form can be segmented to obtain at least one key word. When the number of the key words is 1, determining the part of speech of the key words, and then performing vector conversion on the part of speech information to obtain part of speech vectors corresponding to the key words. When the number of the key words is more than or equal to 2, determining the part of speech of each key word, and then respectively carrying out vector conversion on the part of speech information of each key word to obtain a part of speech vector corresponding to each key word.

In one embodiment, when the key information includes a video topic, the video topic may be segmented to obtain at least one topic word. When the number of the subject words is 1, determining the part of speech of the subject words, then performing vector conversion on the part of speech information to obtain part of speech vectors corresponding to the subject words, and taking the part of speech vectors corresponding to the subject words as part of speech vectors corresponding to the key words. When the number of the subject words is more than or equal to 2, determining the part of speech of each subject word, then respectively carrying out vector conversion on the part of speech information of each subject word to obtain a part of speech vector corresponding to each subject word, and taking the part of speech vector corresponding to the subject word as the part of speech vector corresponding to the key word. In practical applications, the video topic subjected to word segmentation processing adopts a video title. Of course, the vector transformation here can also be realized by word2vec, glove, BERT model, GPT model.

S204: determining a word position vector corresponding to each key word according to the position of each key word in the key information;

in the embodiment of the present invention, the position of the keyword in the key information represents to some extent whether the keyword is important for the key information, and whether the keyword can be a target keyword (relative to the representative word) representing the key information. When the key information includes a video topic and the video title is adopted as the video topic, the topic words appearing in the main title are more important to the video topic and can be keywords representing the video topic than the topic words appearing in the subheading. Therefore, the positions of different key terms in the key information are determined, and the semantics of the key information can be better mined.

The incidence relation between the positions of the key words in the key information and the parts of speech of the key words can be established, so that the characteristics reflecting the short text semantics can be better mined from the key information. On the basis of the key vectors, part-of-speech vectors and word position vectors are mined from the key information, and feature vectors contributed by the key information serving as short texts can be enriched.

The key information presented in text form can be segmented to obtain at least one key word. When the number of the key words is 1, the position information of the key words in the key information indicates single-keyword information, and then vector conversion is performed on the position information, so that word position vectors corresponding to the key words are obtained. When the number of the key words is more than or equal to 2, determining the position of each key word in the key information, and then respectively carrying out vector conversion on the position information of each key word to obtain a word position vector corresponding to each key word.

In one embodiment, when the key information includes a video topic, the video topic may be segmented to obtain at least one topic word. When the number of the topic words is 1, the position information of the topic words in the video topic indicates a single keyword topic, then the position information is subjected to vector conversion, so that word position vectors corresponding to the topic words are obtained, and the word position vectors corresponding to the topic words are used as the word position vectors corresponding to the keyword. When the number of the topic words is more than or equal to 2, determining the position of each topic word in the video topic, then respectively carrying out vector conversion on the position information of each topic word to obtain a word position vector corresponding to each topic word, and taking the word position vector corresponding to the topic word as a word position vector corresponding to a key word. In practical applications, the video topic subjected to word segmentation processing adopts a video title. Of course, the vector transformation here can also be realized by word2vec, glove, BERT model, GPT model.

S205: obtaining a word vector set corresponding to the key information according to the part-of-speech vector and the word position vector corresponding to each key word;

in this embodiment of the present invention, a word vector set corresponding to the key information is constructed according to the part-of-speech vector corresponding to each keyword obtained in step S204 and the word position vector corresponding to each keyword obtained in step S205.

S206: and inputting the key vectors and the word vector set into a video timeliness determination model to obtain the distribution timeliness of the target video.

In the embodiment of the invention, the video aging determination model is determined by performing machine learning (deep learning) training based on the vector corresponding to the sample video in the sample video set, the vector corresponding to the sample video is the key vector and the word vector set corresponding to the sample video, and the sample video carries the corresponding distribution aging information. And taking the key vector and the word vector set corresponding to the target video as input, and determining the distribution timeliness of the model output target video through the video timeliness.

In one embodiment, as shown in fig. 3, the method further includes a process of training the video aging determination model:

s301: acquiring a sample video set, wherein each sample video in the sample video set carries corresponding distribution timeliness information;

the sample video set includes a plurality of sample videos. The corresponding distribution timeliness information carried by each sample video represents timeliness suitable for distribution of the corresponding sample video. The aging suitable for distribution may indicate an aging start time and an aging end time suitable for distribution. Within the time period suitable for distribution, the server can push the video to the lower node according to a preset content distribution strategy.

When the distribution timeliness is marked for the sample video, a plurality of preset classifications can be created, and the corresponding distribution timeliness is configured for each preset classification. In practical applications, the preset classification and its corresponding distribution age may be continuously updated. The parameters for referencing to set the preset classification and its corresponding distribution age may include at least one of: history feedback, video type, natural aging of video content. For example, the sample video A belongs to a social news video, and the sample video B belongs to a classic film and television work fragment video. The natural aging of the content of the social news video is usually short, and the natural aging of the content of the classical film and television work fragment video is relatively long. In historical feedback, if the social news video reflects an event at time a, the social news video is pushed to the user at time B (later than time a) and may be objectionable to the user as an expired video. When distributing the time efficiency for the sample video annotation, the video pushed to the user in the time range needs to be considered to be not objectionable to the user.

S302: determining a key vector and a word vector set corresponding to the sample video;

here, the key vectors and word vector sets corresponding to the sample videos may be determined by referring to the key vectors and word vector sets corresponding to the target videos determined in the foregoing steps S201 to S205, and are not described again.

Because the text semantics contained by the title and the tag cannot be as rich as those of a long text, the part-of-speech vector and the word position vector are introduced on the basis, so that the stronger text characteristics of the training data can be improved, and the quality of the training data is better and more stable. Based on a large amount of high-quality training data, the model can better perform feature learning, and further a better model training effect is obtained.

It should be noted that, when determining that the key vector corresponding to the sample video is the vector corresponding to the video tag, determining the corresponding key vector for the target video also requires the vector corresponding to the video tag (of the target video).

S303: based on the key vectors and the word vector sets corresponding to the sample videos, performing video aging determination training by using a preset machine learning model, and adjusting model parameters of the preset machine learning model in the training until distribution aging output by the preset machine learning model is matched with distribution aging information carried by the corresponding sample videos;

the preset machine learning model may employ a TextCNN (a convolutional neural network for text classification) model, a fastText (a word vector and text classification tool open based on a word2vec model) model, a random forest model, a TextRNN (a cyclic neural network for text classification) model, a Logistic Regression (LR) model, a GBDT (gradient boosting decision tree) model, and the like. The preset machine learning model can be an initial model or an intermediate model.

In training, model parameters may be adjusted based on the difference between the intermediate result (distribution age of the sample video) output by the model and the distribution age information carried by the sample file.

The above mentioned TextCNN model is described below: the TextCNN model is proposed by Yoon Kim in the paper (2014EMNLP) relational Networks for Session Classification, wherein FIGS. 9 and 10 are schematic diagrams of the TextCNN model. See fig. 10, where:

embedding layer: the first layer is a 7 by 5 sentence matrix at the leftmost side of the figure, each row is a word vector with dimension 5, which can be analogized to the original pixel point in the image.

Volume layer (convolutional layer): then passes through a one-dimensional convolutional layer of kernel _ sizes (2,3,4), each kernel _ size having two output channels.

MaxPooling layer (max pooling): the third layer is a 1-max pooling layer, so that sentences with different lengths can become fixed-length representations after passing through the pooling layer.

Fullconnection and Softmax layer: and finally, a fully-connected softmax (normalized) layer is connected, and the probability of each category is output.

S304: and taking the preset machine learning model corresponding to the adjusted model parameters as the video aging determination model.

The video timeliness determination model with high generalization capability is obtained by training the machine learning model, and the adaptability of determining video distribution timeliness can be improved when the sales data is predicted by the video timeliness determination model, so that the reliability and effectiveness of determining the video distribution timeliness can be greatly improved.

As shown in fig. 6, fig. 6 is a schematic diagram of an application scenario of a video aging determination model according to an embodiment of the present invention. In fig. 6, the training data are key vectors and word vector sets corresponding to sample videos, and each sample video carries corresponding distribution timeliness information; correspondingly, the subsequently trained video timeliness determination model can perform distribution timeliness determination on the target video. In fig. 6, the input video aging determination model is the key vector and word vector set corresponding to the target video, and the output of the video aging determination model is the distribution aging of the target video.

In another embodiment, as shown in fig. 4, after obtaining a word vector set corresponding to the at least one keyword according to the part-of-speech vector and the word position vector corresponding to each keyword, the method further includes:

s2061: inputting the key vectors and the word vector set into a video classification model to obtain the classification of the target video;

as shown in fig. 7, the video classification model is determined by performing machine learning (deep learning) training based on vectors corresponding to sample videos in a sample video set, where the vectors corresponding to the sample videos are key vectors and word vector sets corresponding to the sample videos, and the sample videos carry corresponding classification information. And taking the key vector and the word vector set corresponding to the target video as input, and outputting the classification of the target video through a video classification model.

Referring to fig. 7, in the embodiment of the present invention, a video timeliness classification model is optimized by combining parts of speech and word position features in a text, after vector quantization is performed to a certain extent by extracting features of each part of speech and corresponding position features of a word in the text, the extracted features and the corresponding position features of the word are connected (concat) with vectors (embedding) corresponding to title and tag respectively, and the extracted features and the corresponding position features are unified as input of a TextCNN classification model and then classified. Wherein title and tag as independent features can be quantized through embedding layers. The model obtained by training can effectively solve the problems of less characteristic information, weak generalization capability and the like caused by short text in video timeliness classification in application.

Referring to fig. 5, the process of training the obtained video classification model is as follows:

s501: acquiring a sample video set, wherein each sample video in the sample video set carries corresponding classification information;

the sample video set comprises a plurality of sample videos, and each sample video carries corresponding classification information.

S502: determining a key vector and a word vector set corresponding to the sample video;

S503: taking a key vector and a word vector set corresponding to the sample video as input, and randomly generating an initial value of a model parameter by using a preset machine learning model, wherein the model parameter indicates the weight of a part-of-speech vector;

the pre-programmed machine learning model may employ a TextCNN model, a fastText model, a random forest model, a TextRNN model, a logistic regression model, a GBDT model, or the like. The preset machine learning model can be an initial model or an intermediate model.

Based on the key vectors and the word vector set, a preset machine learning model is used for randomly generating an initial value of a preset model parameter, wherein the preset model parameter points to the parts of speech of key words which have certain contribution to classification in the key information of the sample video and the contribution of the parts of speech to the classification. The preset model parameters are at least one, but not all, of the model parameters involved in the preset machine learning model.

Because different parts of speech of different key words or different parts of speech of the same key word have different contributions to the classification, a corresponding attention value can be configured for each part of speech, wherein the closer the attention value is to 1, the greater the contribution to the classification is, and the smaller the attention value is in the range of 0 to 1, the smaller the contribution to the classification is. Initial attention values of relevant parts of speech can be randomly generated by using a preset machine learning model based on the key vectors and the word vector set, and initial values of preset model parameters are obtained based on the initial attention values of the relevant parts of speech.

In text classification, there are nouns, verbs, adjectives, etc. of parts of speech that contribute more to the classification result, but it is obvious that words of different parts of speech reflect different degrees of text content, that is, the parts of speech should have an influence on the weight of features.

Different characteristics of different parts of speech contribute differently to a classification, typically nouns are generally words that describe people, things, places, or abstractions, while verbs are often words that describe various types of actions. The word pair describing things, places and people is obviously better than the word describing actions, and the text content is reflected better. For example, the noun "football" and the verb "kicking" are typically present in the text of sports category, while "kicking" can be present in sports category "kicking football", economy category "house price VS land price, who kicked the developer's buttocks", etc., obviously "football" contributes more to the category. However, there are special cases, for example, the noun "book" can appear in many classes, and the verbs "shoot" and "shoot" are mostly appeared in the texts of military class, shooting class and game class, and the contribution degree of these verb words to the classification is obviously larger than that of the noun words with smaller degree of distinction such as "book". Therefore, we can weight different parts of speech to adjust the weight of the feature, and expect a more reasonable text feature vector expression.

The contribution degree of the same feature to classification when the feature appears as different parts of speech is different, because one word appears in the text possibly as multiple parts of speech, the feature should be weighted to different degrees when the feature appears as different parts of speech, and a part of speech weighted total value of the feature is obtained. Of course, the part of speech as an index of the feature quantization may be used in combination with other indexes.

S504: based on the key vectors and word vector sets corresponding to the sample videos, performing video classification training by using the preset machine learning model, and adjusting the current values of the model parameters in the training until the classification output by the preset machine learning model is matched with the classification information carried by the corresponding sample videos;

in training, model parameters may be adjusted based on the difference between the intermediate results of the model output (classification of the sample video) and the classification information carried by the sample file. The model parameters herein may not be limited to the preset model parameters described above.

S505: and taking the preset machine learning model corresponding to the adjusted model parameters as the video classification model.

The video classification determination model with high generalization capability is obtained by training the machine learning model, and the adaptability of determining the video classification can be improved when the sales data is predicted by using the video classification model, so that the reliability and effectiveness of determining the video classification can be greatly improved.

S2062: and obtaining the distribution timeliness of the target video according to the classification.

Multiple preset classifications may be created, with a corresponding distribution age configured for each preset classification. Then, a preset classification matched with the classification can be determined, and the distribution timeliness of the target video is obtained based on the distribution timeliness corresponding to the matched preset classification.

Specifically, the corresponding mapping relationships may be respectively established for a plurality of preset classifications: firstly, acquiring a plurality of preset classifications; then, configuring corresponding distribution timeliness for each preset classification according to historical feedback; and then, respectively establishing a mapping relation between each preset classification and the distribution time corresponding to the preset classification. Here, a plurality of preset classifications are created, and a corresponding distribution aging process is configured for each preset classification, which is referred to the related content of step S301 and is not described again.

The mapping relation is used for inquiring the distribution timeliness indicated by the preset classification based on the preset classification, the relation between the preset classifications and the corresponding distribution timeliness can be better managed by utilizing the mapping relation, and the inquiry efficiency of determining the distribution timeliness of the target video according to the classifications can be improved. Then, on the basis of determining the classification of the target video, a preset classification matched with the classification may be determined first; then determining a mapping relation corresponding to the matched preset classification to obtain a target mapping relation; and determining the distribution timeliness corresponding to the matched preset classification according to the target mapping relation, and taking the corresponding distribution timeliness as the distribution timeliness of the target video.

In practical application, target videos to be distributed can come from a video pool, a large number of videos needing to be processed exist in the video pool every day, the videos with the large number contain data with different aging times, and the aging time of the videos directly influences the video pushing quality and the releasing strategy, so that indexes such as the click rate of a user, the stay time of the user and the like are influenced, and even the impression of the user on application products is influenced. By utilizing the steps, the timeliness length of the videos can be accurately judged and then distributed, and the following steps need to be considered in the process: 1) the overdue video should not be pushed to a user to cause the user to feel dislike and negatively feed back, the impression of the user on the product is directly influenced, and the content timeliness of related application products is considered to be poor; 2) the classic videos should not be placed in advance, which results in poor end-side consumption (e.g., user click rate, consumption duration, etc.), and also results in too little content in the video pool, so that the user thinks that the content amount of the related application products is too small, thereby causing the user to lose, etc. According to the embodiment of the invention, the video timeliness classification model is optimized by combining the characteristics of the part of speech and the position of the part of speech in the text, so that the respective timeliness corresponding to the social news video and the classical film and television work fragment video can be found out more accurately, and a corresponding recommendation strategy can be conveniently made on the side of a related application product.

As can be seen from the technical solutions provided by the embodiments of the present specification, in the embodiments of the present specification, the key vector and the word vector set corresponding to the target video are determined by using the key information of the target video, and then the key vector and the word vector set are input into the video timeliness determination model, so as to obtain the distribution timeliness of the target video. The key vectors are vectors corresponding to the key information, the part-of-speech vectors in the word vector set indicate the parts of speech of the key words in the key information, and the word position vectors in the word vector set indicate the positions of the key words in the key information. In the implementation of the description, the fact that the video text content is short and few and does not have long text and rich information like text is considered, the text type characteristics of the target video are enriched by extracting the part-of-speech vectors and the word position vectors by using the key information, and the prediction accuracy and the generalization capability of the video aging determination model can be improved based on the richer text type characteristics. The video timeliness determination model is used for outputting the distribution timeliness of the target videos, the efficiency is high, the precision is high, and further the time range suitable for being pushed to a user to watch can be determined according to the distribution timeliness of each target video, so that the user is not influenced to generate poor impression on application products, and the end-side consumption index is not reduced.

An embodiment of the present invention further provides a video aging determination apparatus, as shown in fig. 11, the apparatus includes:

the key information acquisition module 1110: the method comprises the steps of obtaining key information of a target video;

the key vector derivation module 1120: the vector corresponding to the key information is determined to obtain a key vector;

part of speech vector determination module 1130: the system is used for determining a part-of-speech vector corresponding to each key word according to the part-of-speech of each key word in the key information;

the word position vector determination module 1140: the word position vector corresponding to each key word is determined according to the position of each key word in the key information;

word vector set derivation module 1150: the word vector set corresponding to the key information is obtained according to the part-of-speech vector and the word position vector corresponding to each key word;

video aging obtaining module 1160: and the method is used for inputting the key vectors and the word vector set into a video timeliness determination model to obtain the distribution timeliness of the target video.

It should be noted that the device and method embodiments in the device embodiment are based on the same inventive concept.

The embodiment of the present invention provides an electronic device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the video aging determination method provided by the above method embodiment.

Further, fig. 12 is a schematic diagram illustrating a hardware structure of an electronic device for implementing the video aging determination method according to the embodiment of the present invention, where the electronic device may participate in forming or including the video aging determination apparatus according to the embodiment of the present invention. As shown in fig. 12, the electronic device 120 may include one or more (shown here as 1202a, 1202b, … …, 1202 n) processors 1202 (the processors 1202 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 1204 for storing data, and a transmitting device 1206 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 12 is only an illustration and is not intended to limit the structure of the electronic device. For example, electronic device 120 may also include more or fewer components than shown in FIG. 12, or have a different configuration than shown in FIG. 12.

It should be noted that the one or more processors 1202 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the electronic device 120 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 1204 may be used for storing software programs and modules of application software, such as program instructions/data storage devices corresponding to the video aging determination method described in the embodiment of the present invention, and the processor 1202 executes various functional applications and data processing by running the software programs and modules stored in the memory 124, so as to implement the video aging determination method described above. The memory 1204 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1204 may further include memory located remotely from the processor 1202, which may be connected to the electronic device 120 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmitting device 1206 is used for receiving or sending data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the electronic device 120. In one example, the transmitting device 1206 includes a network adapter (NIC) that can be connected to other network devices through a base station to communicate with the internet. In one embodiment, the transmitting device 1206 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the electronic device 120 (or mobile device).

Embodiments of the present invention also provide a computer-readable storage medium, which may be disposed in an electronic device to store at least one instruction or at least one program for implementing a video aging determination method in the method embodiments, where the at least one instruction or the at least one program is loaded and executed by a processor to implement the video aging determination method provided in the method embodiments.

Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It should be noted that: the precedence order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device and electronic apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for video aging determination, the method comprising:

acquiring key information of a target video;

2. The method according to claim 1, wherein the obtaining key information of the target video comprises:

acquiring a video theme and a video label of the target video;

taking the video theme and the video label as the key information;

correspondingly, the determining the vector corresponding to the key information to obtain the key vector includes:

and determining vectors corresponding to the video theme and the video label to obtain the key vector.

3. The method according to any one of claims 1 and 2, further comprising a process of training the video aging determination model:

acquiring a sample video set, wherein each sample video in the sample video set carries corresponding distribution timeliness information;

determining a key vector and a word vector set corresponding to the sample video;

based on the key vectors and the word vector sets corresponding to the sample videos, performing video aging determination training by using a preset machine learning model, and adjusting model parameters of the preset machine learning model in the training until distribution aging output by the preset machine learning model is matched with distribution aging information carried by the corresponding sample videos;

and taking the preset machine learning model corresponding to the adjusted model parameters as the video aging determination model.

4. The method according to any one of claims 1 and 2, wherein after obtaining the word vector set corresponding to the at least one keyword according to the part-of-speech vector and the word position vector corresponding to each keyword, the method further comprises:

inputting the key vectors and the word vector set into a video classification model to obtain the classification of the target video;

and obtaining the distribution timeliness of the target video according to the classification.

5. The method according to claim 4, wherein before obtaining the distribution timeliness of the target video according to the classification result, the method further comprises establishing corresponding mapping relationships for a plurality of preset classifications, respectively, the mapping relationships being used for querying the distribution timeliness indicated by the preset classifications based on the preset classifications:

correspondingly, obtaining the distribution timeliness of the target video according to the classification result comprises:

determining a preset classification matched with the classification;

determining a mapping relation corresponding to the matched preset classification to obtain a target mapping relation;

and determining the distribution timeliness corresponding to the matched preset classification according to the target mapping relation, and taking the corresponding distribution timeliness as the distribution timeliness of the target video.

6. The method according to claim 5, wherein the establishing the mapping relationship for each of the plurality of predetermined classifications includes:

acquiring the plurality of preset classifications;

configuring corresponding distribution timeliness for each preset classification according to historical feedback;

and respectively establishing a mapping relation between each preset classification and the distribution time corresponding to the preset classification.

7. The method of claim 4, further comprising a process of training the video classification model:

acquiring a sample video set, wherein each sample video in the sample video set carries corresponding classification information;

taking a key vector and a word vector set corresponding to the sample video as input, and randomly generating an initial value of a model parameter by using a preset machine learning model, wherein the model parameter indicates the weight of a part-of-speech vector;

based on the key vectors and word vector sets corresponding to the sample videos, performing video classification training by using the preset machine learning model, and adjusting the current values of the model parameters in the training until the classification output by the preset machine learning model is matched with the classification information carried by the corresponding sample videos;

and taking the preset machine learning model corresponding to the adjusted model parameters as the video classification model.

8. An apparatus for video aging determination, the apparatus comprising:

9. An electronic device comprising a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and wherein the at least one instruction or the at least one program is loaded and executed by the processor to implement the video aging determination method of claims 1-7.

10. A computer-readable storage medium, wherein at least one instruction or at least one program is stored, which is loaded and executed by a processor to implement the video aging determination method of claims 1-7.