CN109840291A

CN109840291A - Video data handling procedure and device

Info

Publication number: CN109840291A
Application number: CN201811633133.7A
Authority: CN
Inventors: 王丁南
Original assignee: Netease Media Technology Beijing Co Ltd
Current assignee: Netease Media Technology Beijing Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-06-04

Abstract

Embodiments of the present invention provide a kind of video data handling procedure and device, comprising: obtain video source data；It filters video source data and obtains desired video data；Extract the label information of video data；Video data is classified according to label information, obtains the video data that classification is completed.Technical solution disclosed by the invention, flow processing is high-efficient, label accuracy is high, can guarantee good user experience.

Description

Video data handling procedure and device

Technical field

Embodiments of the present invention are related to field of computer technology, more specifically, embodiments of the present invention are related to one kind Video data handling procedure and device.

Background technique

With the development of content information industry, much all increased from media platform to video content, especially short-sighted frequency The attention degree of content also exponentially increases again from the short video data of media, but each content information platform to short-sighted frequency from media The flow processing of big data is all also in the more initial exploratory stage.

In the prior art, manpower audit is also relied on to the processing of video data, on the one hand needs desk checking video data It whether is vulgar, pornographic, title party, the clear video spent difference etc. and do not meet content rule and influence user's physical examination；Another party The label informations such as classification, keyword, timeliness that face also needs artificial correction content publisher to fill in.

Therefore, the prior art, which exists, relies on manual examination and verification, the disadvantages such as video data treatment effeciency is low, label accuracy is low.

Summary of the invention

The embodiment of the invention provides a kind of video data handling procedure and devices.Aim to solve the problem that video counts in the prior art Problem low according to treatment effeciency, label accuracy is low.In order to which some aspects of the embodiment to disclosure have a basic understanding, Simple summary is shown below.The summarized section is not extensive overview, nor to determine key/critical component or retouch Draw the protection scope of these embodiments.

In the first aspect of embodiment of the present invention, a kind of video data handling procedure is provided, comprising:

Obtain video source data；

It filters video source data and obtains desired video data；

Extract the label information of video data；

Video data is classified according to label information, obtains the video data that classification is completed.

In one embodiment of the invention, the step of filtering video source data acquisition desired video data includes:

Obtain the anti-spam policing rule built in advance；

Filtering is executed based on anti-spam policing rule, to obtain desired video data based on video source data.

In one embodiment of the invention, anti-spam policing rule includes title rule, content rule and title-content The rule of correspondence.

In one embodiment of the invention, filtering is executed based on anti-spam policing rule, to be based on video source data The step of obtaining desired video data include:

Determine whether the title form of video source data and title content meet title rule；

If so, determining whether the content of video source data meets content rule；

If so, determining whether the title of video source data and content relevance meet title-content rule of correspondence；

If so, determining that video data is desired video data.

In one embodiment of the invention, the step of whether content of video source data meets content rule packet is determined It includes:

Extract the cover and multiframe contents screen of video source data；

Identify cover and multiframe contents screen；

Confirm whether cover and multiframe contents screen meet content rule according to recognition result.

In one embodiment of the invention, determine whether the content of video source data meets title-content rule of correspondence The step of include:

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

Determine whether the video content of video source data and title content meet the corresponding rule of title-content according to recognition result Then.

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

Judge whether video source data exists with video data existing before video source data according to recognition result to repeat；

Determine whether the content of video source data meets content rule according to judging result.

In one embodiment of the invention, the step of extracting the label information of video data include:

Obtain the Video Model preestablished；

The label information of video data is extracted based on Video Model.

In one embodiment of the invention, after the step of video data is classified according to label information, method is also Include:

Hot information is obtained based on the whole network；

Judge whether video data is hot video according to hot information；

If so, mark video data；

Wherein, the video data after mark will preferentially be distributed to user.

In one embodiment of the invention, the step of whether video data is hot video packet is judged according to hot information It includes:

Extract the multiframe contents screen of video data；

Identify that multiframe contents screen generates summary info；

Judge whether summary info matches with hot information.

Obtain the classification information of video data；

Extract the multiframe contents screen of video data；

Identify that multiframe contents screen generates content summary information；

The age information of video data is determined based on classification information and content summary information；

Determined whether video data being distributed to user according to age information.

In the second aspect of embodiment of the present invention, a kind of video data processing apparatus is provided, wherein include:

Data acquisition module is configured as obtaining video source data；

Data filtering module is configured as the filtering video source data video data that obtains that treated, is not met with rejecting The video source data of preset standard；

Tag extraction module is configured as extracting the label information of video data；

Data categorization module is configured as that video data being classified according to label information, obtains the video counts that classification is completed According to.

In one embodiment of the invention, data filtering module can be additionally configured to:

Obtain the anti-spam policing rule built in advance；

Optionally, anti-spam policing rule may include title rule, content rule and title-content rule of correspondence.

If so, determining that video data is desired video data.

Extract the cover and multiframe contents screen of video source data；

Identify cover and multiframe contents screen；

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

In one embodiment of the invention, tag extraction module can be additionally configured to:

Obtain the Video Model preestablished；

The label information of video data is extracted based on Video Model.

In one embodiment of the invention, video data processing apparatus can also include hot video mark module, quilt It is configured that

Hot information is obtained based on the whole network；

Judge whether video data is hot video according to hot information；

If so, mark video data；Wherein, the video data after mark will preferentially be distributed to user.

When video data belongs to hot video, video data can be identified, and in individualized content distribution Hot video is recommended in the process, is conducive to hot video content and is more more effectively distributed to user in time.

In one embodiment of the invention, hot video mark module can be additionally configured to:

Extract the multiframe contents screen of video data；

Identify that multiframe contents screen generates summary info；

Judge whether summary info matches with hot information.

In one embodiment of the invention, video data processing apparatus can also include age information mark module, quilt It is configured that

Obtain the classification information of video data；

Extract the multiframe contents screen of video data；

In the third aspect of embodiment of the present invention, a kind of computer readable storage medium is provided, program is stored with Code, program code when being executed by a processor, for executing following methods:

Obtain video source data；

It filters video source data and obtains desired video data；

Extract the label information of video data；

In the fourth aspect of embodiment of the present invention, a kind of calculating equipment is provided, including processor and be stored with journey The storage medium of sequence code, program code when being executed by a processor, for executing following methods:

Obtain video source data；

It filters video source data and obtains desired video data；

Extract the label information of video data；

Technical solution disclosed by the embodiments of the present invention can filter out vulgar, pornographic, title party, clearly spend difference etc. and be not inconsistent Co content rule and the video for influencing user's physical examination, and the label informations such as classification, timeliness and keyword are extracted, and then to video counts According to classifying.Technical solution disclosed by the invention, flow processing is high-efficient, label accuracy is high, can guarantee good use Family experience.

Detailed description of the invention

The following detailed description is read with reference to the accompanying drawings, above-mentioned and other mesh of exemplary embodiment of the invention , feature and advantage will become prone to understand.In the accompanying drawings, if showing by way of example rather than limitation of the invention Dry embodiment, in which:

Fig. 1 schematically shows the flow chart of video data handling procedure according to an embodiment of the present invention；

Fig. 2 schematically shows the schematic diagrames of video data processing apparatus according to another embodiment of the present invention；

Fig. 3 schematically shows the schematic diagram of the computer readable storage medium of another embodiment according to the present invention；

Fig. 4 schematically shows the schematic diagrames of calculating equipment according to yet another embodiment of the invention；

In the accompanying drawings, identical or corresponding label indicates identical or corresponding part.

Specific embodiment

The principle and spirit of the invention are described below with reference to several illustrative embodiments.It should be appreciated that providing this A little embodiments are used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and be not with any Mode limits the scope of the invention.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and energy It is enough that the scope of the present disclosure is completely communicated to those skilled in the art.

One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method Or computer program product.Therefore, the present disclosure may be embodied in the following forms, it may be assumed that complete hardware, complete software The form that (including firmware, resident software, microcode etc.) or hardware and software combine.

With the development of content information industry, many media platforms all increase the attention degree to information content, information Content-data also exponentially increases again.At the same time, demand of the different users to information content be not also identical, thus, money The distribution and push mode for interrogating content will have a direct impact on user experience.

During the distribution of the information contents such as video data push, in order to meet the individual demand of different user, need Different information is pushed to different user according to user preference, thus classify firstly the need of to video data.The present invention Embodiment disclosed in technical solution, can be used for being filtered video data and classifying.

Embodiment according to the present invention proposes a kind of video data handling procedure, as shown in Figure 1, comprising:

S101, video source data is obtained；

S102, filtering video source data obtain desired video data；

S103, the label information for extracting video data；

S104, video data is classified according to label information, obtains the video data that classification is completed.

In S101, video source data may include short-sighted frequency, i.e., playing on various new media platforms, be suitble to moving Dynamic state and video content watch under leisure state in short-term, high frequency push, the duration of usual short-sighted frequency is at several seconds to rather Clock etc..Video source data can also include other various types of videos.

In S102, video source data is filtered, is including but not limited to spent by vulgar, pornographic, title party, clearly Difference etc. does not meet content rule or the video of influence user's physical examination filters out, and obtains the video data that can be used for distributing to user, I.e. desired video data.

In S103, to the desired video data obtained in S102, label information is extracted, label information can wrap Include classification information, age information and the key word information etc. of video data.Particularly, it can be extracted according to video data in S103 Label information improves the standard of label information to not depend on the information such as classification, timeliness and the keyword of content publisher's offer Exactness avoids the situation that the information inaccuracy of content publisher's offer causes user experience bad.Wherein, classification information is used for The classifying content of video data, such as news, finance and economics, sport, fashion are characterized, age information is used to characterize the content of video data Timeliness, key word information can be extracted from video data, to embody the content of video data.

In S104, it can be classified to video data by label information, sorted video data can be used for Individualized content distribution is carried out to user.

Video data handling procedure disclosed by the embodiments of the present invention can filter out vulgar, pornographic, title party, clearly spend Difference etc. does not meet content rule and influences the video of user's physical examination, and extracts the label informations such as classification, timeliness and keyword, in turn Classify to video data.Technical solution disclosed by the invention, flow processing is high-efficient, label accuracy is high, can guarantee Good user experience.

Optionally, S102 may include:

The anti-spam policing rule that S1021, acquisition are built in advance；

S1022, filtering is executed based on anti-spam policing rule, to obtain desired video counts based on video source data According to.

In S1021, anti-spam policing rule includes but is not limited to that can be used for filtering out vulgar, pornographic, title party, clear The rule of difference etc. is spent, those skilled in the art in the specific implementation process, can be according to relevant laws and regulations and content distribution Actual needs, determine the specific rules type that anti-spam policing rule includes.

In S1022, it is based on anti-spam policing rule, after being filtered to video source data, can be obtained desired Video data.

Illustratively, title rule can be used for judging, whether video source data includes vulgar, vulgar, three customs such as fawn on customs Content, if the sentence etc. including easily causing user's discomfort.

Title rule can be also used for judging, whether length for heading meets the requirements, and whether the punctuation mark in title uses Correctly, whether the foreign language accounting in title is more than preset value etc..

For example, length for heading should be greater than 10 words and less than 27 words, foreign language accounting should be less than 50% etc..

Illustratively, content rule can be used for judging, whether video source data includes advertisement, if including vulgar, low Custom, the three popular contents such as fawn on customs, if including the bloody content of violence, if the picture etc. including easily causing user's discomfort, and Video source data whether voiceless sound, if meet duration requirement, if there are a large amount of obvious duplicate contents etc..

Know for example, taking out frame to the cover of video data, content by technologies such as image recognition, machine learning and carrying out three customs Not, the video more than three popular threshold values is filtered.

Content rule can be also used for judging, the clarity of video source data, resolution ratio, length-width ratio, surface plot size, black Whether side accounting etc. meets the requirements.

Content rule can be also used for judging, whether newly-increased video source data repeats with existing video source data, show Example property, when including largely with existing video source data obviously duplicate content in newly-increased video source data, should be filtered It removes.

Illustratively, title-content rule of correspondence can be used for judging, whether the title and content of video source data are right It answers, and judges with the presence or absence of title party attribute, the video for being more than title party threshold value is filtered.

Wherein, title party refers to that the title of video data is seriously exaggerated, and content is usually completely irrelevant or contact with title Less.Video data with higher title party attribute only attracts user by title, generally can not but provide and title pair The more good content answered, causes title and content not to be inconsistent, and influences user experience, thus needs to the view for being more than title party threshold value Frequency is filtered.

Correspondingly, S1022 may include:

If so, determining that video data is desired video data.

In general, desired video data should meet anti-spam plan when anti-spam policing rule includes multiple rules The slightly rule multiple rules of whole that include.In order to improve filter efficiency, those skilled in the art can sequentially use multiple rules It is filtered.The embodiment of the present invention does not limit the sequencing of multiple rules, illustratively, can also first really in S1022 Whether the content for determining video source data meets content rule；If so, determine video source data title and content relevance whether Meet title-content rule of correspondence；If so, determining whether the title form of video source data and title content meet title rule Then；If so, determining that video data is desired video data.

Further, it determines whether the content of video source data meets content rule, may include:

Extract the cover and multiframe contents screen of video source data；

Identify cover and multiframe contents screen；

Judge whether video source data includes advertisement, if including vulgar, vulgar, the three popular contents such as fawn on customs, if including The bloody content of violence, if the picture etc. including easily causing user's discomfort needs to consider the cover of video.

Further, it determines whether the content of video source data meets content rule, can also include:

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

In the newly-increased video source data of judgement whether include largely with the obvious duplicate content of existing video source data, can Not consider the cover of video.

Further, it determines whether the content of video source data meets title-content rule of correspondence, may include:

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

Whether title-content rule of correspondence is met to the content of video source data, i.e., is carried out with the presence or absence of title party attribute Judgement, can not consider the cover of video.

In the embodiment of the present invention, when judging whether video source data meets anti-spam policing rule, it can according to need The cover and/or multiframe contents screen of video source data are extracted, and is identified and judgeed.Those skilled in the art are in the present invention On the basis of technical solution disclosed in embodiment, the prior art can be combined with, taken out in the specific implementation process using other Mode or identification method are taken, the present invention does not limit this.

Optionally, S103 may include:

The Video Model that S1031, acquisition are preestablished；

S1032, the label information that video data is extracted based on Video Model.

In S1031, the Video Model preestablished can be by big datas such as picture recognition technology, machine learning techniques Processing technique, extracts the information such as classification, timeliness, the keyword of video data, good reliability, accuracy and high-efficient.

In S1032, it is based on Video Model, the label information of video data can be extracted, is the individual character of video data Change content distribution to be ready.

Optionally, after S103, can also include:

S1051, hot information is obtained based on the whole network；

S1052, judge whether video data is hot video according to hot information；

S1053, if so, mark video data；Wherein, the video data after mark will preferentially be distributed to user.

In S1051, the whole networks information such as hot information can be searched by Baidu's index, microblogging heat are obtained.

In S1052, according to the hot information of acquisition, it can be determined that whether video data belongs to hot video.

When video data belongs to hot video, video data can be identified, and in individualized content distribution Hot video is recommended in the process, is conducive to hot video content being more more effectively distributed to user in time.

Further, S1052 may include:

Extract the multiframe contents screen of video data；

Identify that multiframe contents screen generates summary info；

Judge whether summary info matches with hot information.

Optionally, after S103, can also include:

S1061, the classification information for obtaining video data；

S1062, the multiframe contents screen for extracting video data；

S1063, identification multiframe contents screen generate content summary information；

S1064, the age information that video data is determined based on classification information and content summary information；

S1065, determined whether video data being distributed to user according to age information.

Illustratively, different contents has different freshness, in order to timely distribute fresh video to user Content, thus it needs to be determined that video data age information.

Further, age information is related to the classification of video data and content.

Illustratively, the information of some classifications is more sensitive to content freshness, such as the political situation of the time, finance and economics, sport, accordingly Effective time can be set shorter, and such as 24 hours；The information of some classifications is not especially sensitive to content freshness, such as amusement, Science and technology etc., corresponding effective time can be set longer, and such as 48 hours；The also information of some classifications is unwise to content freshness Sense, such as cuisines, fashion, corresponding effective time can be set longer, and such as 72 hours.

Further, under same category, the effective time of different content be can also be different.

Thus, in S1064, the age information of video data can be determined according to classification information and content summary information.

In S1065, the video data with timeliness can be pushed to user according to determining age information, into And avoid the video data that no longer effective property is pushed to user.

Technical solution disclosed by the embodiments of the present invention can have the information of timeliness to user's push in time, and reject Fail message further improves user experience.

The embodiment of the invention also discloses a kind of video data processing apparatus 20, as shown in Figure 2, comprising:

Data acquisition module 201 is configured as obtaining video source data；

Data filtering module 202 is configured as the filtering video source data video data that obtains that treated, is not inconsistent with rejecting Close the video source data of preset standard；

Tag extraction module 203 is configured as extracting the label information of video data；

Data categorization module 204 is configured as that video data being classified according to label information, obtains the video that classification is completed Data.

Video data processing apparatus disclosed by the embodiments of the present invention can filter out vulgar, pornographic, title party, clearly spend Difference etc. does not meet content rule and influences the video of user's physical examination, and extracts the label informations such as classification, timeliness and keyword, in turn Classify to video data.Technical solution disclosed by the invention, flow processing is high-efficient, label accuracy is high, can guarantee Good user experience.

Optionally, data filtering module 202 can be additionally configured to:

Obtain the anti-spam policing rule built in advance；

Further alternative, data filtering module 202 can be additionally configured to:

If so, determining that video data is desired video data.

Extract the cover and multiframe contents screen of video source data；

Identify cover and multiframe contents screen；

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

Extract the multiframe contents screen of video source data；

Identify multiframe contents screen；

Optionally, tag extraction module 203 can be additionally configured to:

Obtain the Video Model preestablished；

The label information of video data is extracted based on Video Model.

Optionally, video data processing apparatus 20 can also include hot video mark module 205, be configured as:

Hot information is obtained based on the whole network；

Judge whether video data is hot video according to hot information；

Further alternative, hot video mark module 205 can be additionally configured to:

Extract the multiframe contents screen of video data；

Identify that multiframe contents screen generates summary info；

Judge whether summary info matches with hot information.

Optionally, video data processing apparatus 20 can also include age information mark module 206, be configured as:

Obtain the classification information of video data；

Extract the multiframe contents screen of video data；

Video data processing apparatus disclosed by the embodiments of the present invention can have the letter of timeliness to user's push in time Breath, and fail message is rejected, further improve user experience.

The embodiment of the invention also discloses a kind of computer readable storage mediums 30, as shown in figure 3, being stored with program generation Code, program code when being executed by a processor, can be used for executing following method:

Obtain video source data；

It filters video source data and obtains desired video data；

Extract the label information of video data；

Computer readable storage medium 30 can be used for video data handling procedure disclosed by the embodiments of the present invention, mutually inside the Pass Appearance is being described above, and details are not described herein again.

The embodiment of the invention also discloses a kind of calculating equipment 40, as shown in figure 4, including processor 401 and being stored with journey The storage medium 402 of sequence code, program code can be used for executing following method when being executed by processor 401:

Obtain video source data；

It filters video source data and obtains desired video data；

Extract the label information of video data；

Calculating equipment 40 can be used for video data handling procedure disclosed by the embodiments of the present invention, and related content is above Description, details are not described herein again.

Technical solution disclosed by the embodiments of the present invention, is set up by rule, utilizes machine learning, image recognition, natural language The technologies such as speech processing improve the efficiency and accuracy of video source data filtering, can reduce the dependency degree to manual examination and verification.

It should be noted that although be referred in the above detailed description video data processing apparatus several units/modules or Subelement/module, but it is this division be only exemplary it is not enforceable.In fact, embodiment party according to the present invention The feature and function of formula, two or more above-described units/modules can embody in a units/modules.Conversely, The feature and function of an above-described units/modules can be to be embodied by multiple units/modules with further division.

In addition, although describing the operation of the method for the present invention in the accompanying drawings with particular order, this do not require that or Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one Step is decomposed into execution of multiple steps.

Although detailed description of the preferred embodimentsthe spirit and principles of the present invention are described by reference to several, it should be appreciated that, this It is not limited to the specific embodiments disclosed for invention, does not also mean that the feature in these aspects cannot to the division of various aspects Combination is benefited to carry out, this to divide the convenience merely to statement.The present invention is directed to cover appended claims spirit and Included various modifications and equivalent arrangements in range.

Claims

1. a kind of video data handling procedure, comprising:

Obtain video source data；

It filters the video source data and obtains desired video data；

Extract the label information of the video data；

The video data is classified according to the label information, obtains the video data that classification is completed.

2. the method as described in claim 1, which is characterized in that filter the video source data and obtain desired video data The step of include:

Obtain the anti-spam policing rule built in advance；

The filtering is executed based on the anti-spam policing rule, to obtain desired video based on the video source data Data.

3. method according to claim 2, which is characterized in that the anti-spam policing rule includes title rule, interior content regulation Then with title-content rule of correspondence.

4. method as claimed in claim 3, which is characterized in that the filtering is executed based on the anti-spam policing rule, with Just include: based on the step of video source data acquisition desired video data

Determine whether the title form of the video source data and title content meet the title rule；

If so, determining whether the content of the video source data meets the content rule；

If so, determining whether the title of the video source data and content relevance meet the title-content rule of correspondence；

If so, determining that the video data is desired video data.

5. method as claimed in claim 4, which is characterized in that determine whether the content of the video source data meets in described Content regulation then the step of include:

Extract the cover and multiframe contents screen of the video source data；

Identify the cover and multiframe contents screen；

Confirm whether the cover and multiframe contents screen meet the content rule according to recognition result.

6. method as claimed in claim 4, which is characterized in that determine whether the content of the video source data meets the mark The step of topic-content rule of correspondence includes:

Extract the multiframe contents screen of the video source data；

Identify the multiframe contents screen；

Determine whether the video content of the video source data and title content meet the title-content pair according to recognition result Answer rule.

7. method as claimed in claim 4, which is characterized in that determine whether the content of the video source data meets in described Content regulation then the step of include:

Extract the multiframe contents screen of the video source data；

Identify the multiframe contents screen；

Whether before judging the video source data with the video source data according to recognition result existing video data exists It repeats；

Determine whether the content of the video source data meets the content rule according to judging result.

8. method as claimed in claim 3, which is characterized in that the step of extracting the label information of the video data include:

Obtain the Video Model preestablished；

The label information of the video data is extracted based on the Video Model.

9. the method as described in claim 1, which is characterized in that the video data is classified according to the label information After step, the method also includes:

Hot information is obtained based on the whole network；

Judge whether the video data is hot video according to the hot information；

If so, identifying the video data；

Wherein, the video data after mark will preferentially be distributed to user.

10. a kind of video data processing apparatus, comprising:

Data acquisition module is configured as obtaining video source data；

Data filtering module is configured as filtering the video source data video data that obtains that treated, is not met with rejecting The video source data of preset standard；

Tag extraction module is configured as extracting the label information of the video data；

Data categorization module is configured as that the video data being classified according to the label information, obtains the view that classification is completed Frequency evidence.