CN108377417B - Video reviewing method, device, computer equipment and storage medium - Google Patents

Video reviewing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN108377417B
CN108377417B CN201810044313.5A CN201810044313A CN108377417B CN 108377417 B CN108377417 B CN 108377417B CN 201810044313 A CN201810044313 A CN 201810044313A CN 108377417 B CN108377417 B CN 108377417B
Authority
CN
China
Prior art keywords
video
scoring
cover
according
frame image
Prior art date
Application number
CN201810044313.5A
Other languages
Chinese (zh)
Other versions
CN108377417A (en
Inventor
孙昊
刘霄
文石磊
丁二锐
李旭斌
李甫
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to CN201810044313.5A priority Critical patent/CN108377417B/en
Publication of CN108377417A publication Critical patent/CN108377417A/en
Application granted granted Critical
Publication of CN108377417B publication Critical patent/CN108377417B/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network, synchronizing decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Abstract

The invention discloses video reviewing method, device, computer equipment and storage mediums, and wherein method includes: the cover for obtaining pending video;From the video clip being syncopated as in video centered on frame where cover;Video content preview is generated according to video clip, to carry out video audit based on video content preview.Using scheme of the present invention, human cost can be saved and improve review efficiency etc..

Description

Video reviewing method, device, computer equipment and storage medium

[technical field]

The present invention relates to Computer Applied Technology, in particular to video reviewing method, device, computer equipment and storage is situated between Matter.

[background technique]

The development of internet gradually evolves to the video multimedia epoch by the text epoch.It is according to disclosed data as it can be seen that every The video content of day newly-increased distribution and output reaches hundred million grades, and multitude of video content needs audit and control processing.

Currently, being based primarily upon manual type to audit to video, i.e., processing is reviewed by manual type frame by frame Deng not only needing to expend a large amount of human cost, and inefficiency.

[summary of the invention]

In view of this, can be saved the present invention provides video reviewing method, device, computer equipment and storage medium Human cost simultaneously improves review efficiency.

Specific technical solution is as follows:

A kind of video reviewing method, comprising:

Obtain the cover of pending video;

From the video clip being syncopated as in the video centered on frame where the cover;

Video content preview is generated according to the video clip, is examined to carry out video based on the video content preview Core.

According to one preferred embodiment of the present invention, the cover for obtaining pending video includes:

Obtain the manually selected cover;

Alternatively, obtaining the predetermined scoring of each frame image in the video respectively, a frame image is selected according to the scoring As the cover.

According to one preferred embodiment of the present invention, the predetermined scoring for obtaining each frame image in the video respectively, is pressed Selecting a frame image as the cover according to the scoring includes:

The intelligibility evaluation model and aesthetics assessment models obtained according to preparatory training, determines every frame image respectively Clarity scoring and aesthetics scoring;

The scoring of the clarity of every frame image and aesthetics scoring are weighted addition respectively, by each calculated result according to Sequence from big to small is ranked up, and selects the image in topN after sequence, and N is the positive integer greater than one;

According to the content relevance assessment models that preparatory training obtains, the every frame image selected and the view are determined respectively The content relevance of frequency scores;

The highest image of content relevance scoring is selected from the image selected, as the cover, alternatively, respectively will choosing Clarity scoring, aesthetics scoring and the content relevance scoring of every frame image out are weighted addition, and each calculating is tied Fruit is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;

Alternatively,

Intelligibility evaluation model, aesthetics assessment models and the content relevance assessment mould obtained according to preparatory training Type determines the clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image respectively;

The clarity scoring of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, Each calculated result is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the envelope Face.

According to one preferred embodiment of the present invention, this method further comprises:

According to the disaggregated model that preparatory training obtains, determine every frame image in the video for by the view respectively Frequency is divided into the contribution margin of generic;

It is a series of video clips by the video slicing;

It is scored according to the contribution margin of each frame image in each video clip and cover, each piece of video obtained from cutting A video clip is selected in section, the video content preview is generated according to the video clip selected;

Wherein, a is set by the cover for being selected as the image of cover scoring, the cover scoring of the image of cover will be not chosen as It is set as b, a is greater than b, and b is greater than or equal to zero.

According to one preferred embodiment of the present invention, described to include: for a series of video clips by the video slicing

The video is subjected to cutting by camera lens, obtains the corresponding video clip of different camera lenses.

According to one preferred embodiment of the present invention, the contribution margin and envelope according to each frame image in each video clip Face scoring, a video clip is selected from each video clip that cutting obtains includes:

For each video clip, the contribution margin of every frame image therein and cover scoring are weighted phase respectively Add, using the mean value of each calculated result as the comprehensive score of the video clip;

Select the highest video clip of comprehensive score.

According to one preferred embodiment of the present invention, generating video content preview according to video clip includes:

Directly using the video clip as the video content preview;

Alternatively, the acceleration for carrying out M times to the video clip plays, the video content preview is obtained, M is greater than one.

A kind of video audit device, comprising: acquiring unit and the first generation unit;

The acquiring unit, for obtaining the cover of pending video;

First generation unit, for from the piece of video being syncopated as in the video centered on frame where the cover Section generates video content preview according to the video clip, to carry out video audit based on the video content preview.

According to one preferred embodiment of the present invention, the acquiring unit obtains the manually selected cover;

Alternatively, the acquiring unit obtains the predetermined scoring of each frame image in the video respectively, according to the scoring A frame image is selected as the cover.

According to one preferred embodiment of the present invention, the acquiring unit selects the cover in the following way:

The intelligibility evaluation model and aesthetics assessment models obtained according to preparatory training, determines every frame image respectively Clarity scoring and aesthetics scoring;

The scoring of the clarity of every frame image and aesthetics scoring are weighted addition respectively, by each calculated result according to Sequence from big to small is ranked up, and selects the image in topN after sequence, and N is the positive integer greater than one;

According to the content relevance assessment models that preparatory training obtains, the every frame image selected and the view are determined respectively The content relevance of frequency scores;

The highest image of content relevance scoring is selected from the image selected, as the cover, alternatively, respectively will choosing Clarity scoring, aesthetics scoring and the content relevance scoring of every frame image out are weighted addition, and each calculating is tied Fruit is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;

Alternatively,

Intelligibility evaluation model, aesthetics assessment models and the content relevance assessment mould obtained according to preparatory training Type determines the clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image respectively;

The clarity scoring of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, Each calculated result is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the envelope Face.

According to one preferred embodiment of the present invention, which further comprises: the second generation unit;

Second generation unit, for being determined in the video respectively according to the disaggregated model that training obtains in advance Every frame image for the video to be divided into the contribution margin of generic;It is a series of piece of video by the video slicing Section;It is scored according to the contribution margin of each frame image in each video clip and cover, from each video clip that cutting obtains A video clip is selected, the video content preview is generated according to the video clip selected;Wherein, the image of cover will be selected as Cover scoring be set as a, set b for the cover for being not chosen as the image of cover scoring, a is greater than b, and b is greater than or equal to Zero.

According to one preferred embodiment of the present invention, the video is carried out cutting by camera lens by second generation unit, is obtained The corresponding video clip of different camera lenses.

According to one preferred embodiment of the present invention, second generation unit is directed to each video clip, respectively will be therein Contribution margin and the cover scoring of every frame image are weighted addition, using the mean value of each calculated result as the video clip Comprehensive score, and select the highest video clip of comprehensive score.

According to one preferred embodiment of the present invention, first generation unit and second generation unit directly will acquire The video clip arrived, alternatively, the acceleration for carrying out M times to the video clip got plays, is obtained as the video content preview To the video content preview, M is greater than one.

A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor The computer program of upper operation, the processor realize method as described above when executing described program.

A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor Now method as described above.

It can be seen that based on above-mentioned introduction using scheme of the present invention, the cover of pending video can be obtained first, Later can be from the video clip being syncopated as in video centered on frame where cover, and then can be generated in video according to video clip Hold preview, to carry out video audit based on video content preview, due to including the view such as video cover in video content preview Frequency core content, therefore video content preview is carried out auditing can reach carrying out auditing similar effect to complete video, together When, the content in video content preview is audited far fewer than the content in complete video, therefore to video content preview, can Audit time is reduced significantly, save human cost, and correspondingly improve review efficiency etc..

[Detailed description of the invention]

Fig. 1 is the flow chart of video reviewing method first embodiment of the present invention.

Fig. 2 is the flow chart of video reviewing method second embodiment of the present invention.

Fig. 3 is the composed structure schematic diagram that video of the present invention audits Installation practice.

Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.

[specific embodiment]

In order to be clearer and more clear technical solution of the present invention, hereinafter, referring to the drawings and the embodiments, to institute of the present invention The scheme of stating is further described.

Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, those skilled in the art's all other embodiment obtained without creative efforts, all Belong to the scope of protection of the invention.

Fig. 1 is the flow chart of video reviewing method first embodiment of the present invention.As shown in Figure 1, including in detail below Implementation.

In 101, the cover of pending video is obtained.

In 102, from the video clip being syncopated as in video centered on frame where cover.

In 103, video content preview is generated according to the video clip being syncopated as, to carry out based on video content preview Video audit.

For pending video, its cover can be obtained first.

The cover of video can be manually selected.For example, being selected by the user of uploaded videos, and can be regarded uploading Selected cover is uploaded while frequency, cover usually can represent the core content of video.

Alternatively, the predetermined scoring of each frame image in video can be obtained respectively, and then a frame figure is selected according to the scoring As being used as cover.

Usually, the cover selected needs that clarity is higher, aesthetics is higher, and needs to the greatest extent may be used with the content of video Can correlation, that is, need to meet as far as possible content/theme of video, correspondingly, can based on clarity, aesthetics and with view The correlation of frequency content selects cover.

It can include but is not limited to following manner:

1) according to the intelligibility evaluation model and aesthetics assessment models that training obtains in advance, every frame figure is determined respectively The clarity of picture scores and aesthetics scoring;The clarity scoring of every frame image and aesthetics scoring are weighted respectively Be added, each calculated result is ranked up according to sequence from big to small, select sequence after be in topN image, N be greater than One positive integer;According to the content relevance assessment models that preparatory training obtains, the every frame image selected and view are determined respectively The content relevance of frequency scores;The highest image of content relevance scoring, the cover as video are selected from the image selected.

Intelligibility evaluation model and aesthetics assessment models can be the network model obtained using the training of deep learning algorithm, How to be trained as the prior art.

By taking intelligibility evaluation model as an example, the image of different readabilities can be obtained respectively as training sample, and can divide Others' work marks out the clarity scoring of each image, as the label of each training sample, and then can be according to each training sample and mark Label training obtains intelligibility evaluation model.

After getting intelligibility evaluation model and aesthetics assessment models, the every frame image that can be directed in video, Respectively as the input of intelligibility evaluation model and aesthetics assessment models, to respectively obtain the clear of every frame image Degree scoring and aesthetics scoring.

Later, the scoring of the clarity of every frame image and aesthetics scoring can be weighted addition respectively, and can will be each Calculated result is ranked up according to sequence from big to small, selects the image in topN after sequence, and N is just whole greater than one Number, specific value can be determined according to actual needs.

For the every frame image selected, the content relevance assessment models that can be also obtained according to preparatory training, further really The content relevance for making every frame image and video scores.

Content relevance assessment models can be equally the network model obtained using the training of deep learning algorithm, how carry out Training is similarly the prior art.

Later, the highest image of content relevance scoring, the envelope as video can be further selected from the image selected Face.

2) according to the intelligibility evaluation model and aesthetics assessment models that training obtains in advance, every frame figure is determined respectively The clarity of picture scores and aesthetics scoring;The clarity scoring of every frame image and aesthetics scoring are weighted respectively Be added, each calculated result is ranked up according to sequence from big to small, select sequence after be in topN image, N be greater than One positive integer;According to the content relevance assessment models that preparatory training obtains, the every frame image selected and view are determined respectively The content relevance of frequency scores;Respectively by the clarity scoring for the every frame image selected, aesthetics scoring and content relevance Scoring is weighted addition, and each calculated result is ranked up according to sequence from big to small, is in first after selecting sequence Image, the cover as video.

Compared to mode 1), it, can be comprehensive after the content relevance scoring for getting each frame image selected in the method The clarity scoring, aesthetics scoring and content relevance for closing each frame image selected are scored further to select as cover Image.

3) according to intelligibility evaluation model, aesthetics assessment models and the content relevance assessment mould that training obtains in advance Type determines the clarity scoring, aesthetics scoring and the content relevance scoring with video of every frame image respectively;Respectively will Clarity scoring, aesthetics scoring and the content relevance scoring of every frame image are weighted addition, and each calculated result is pressed It is ranked up according to sequence from big to small, is in primary image, the cover as video after selecting sequence.

Mode 1) and mode 2) in, it is first to be carried out tentatively according to clarity and aesthetics to each frame image in video Screening, and then calculate content relevance scoring only for the image that filters out, in the method, can obtain respectively every in video Clarity scoring, aesthetics scoring and the content relevance scoring with video of frame image, and then directly integrate each frame image Clarity scoring, aesthetics scoring and content relevance scoring select image as cover.

In practical applications, above-mentioned any way can be used according to actual needs.

No matter which kind of mode is used, after getting the cover of video, can be syncopated as from video with frame where cover Centered on video clip, and then video content preview can be generated according to the video clip that is syncopated as, to be based on video content Preview carries out video audit.

It wherein, can be directly using the video clip being syncopated as video content preview, alternatively, M can be carried out to video clip Acceleration again plays, to obtain video content preview, M is greater than one.

For example, altogether including 100 frame images in video, frame where cover is the 20th frame, then in being with frame where video The heart is syncopated as one section of 6 seconds long video clip, using the video clip switched out as required video content preview.

It for another example, altogether include 100 frame images in video, frame where cover is the 20th frame, then can be with frame where video Center is syncopated as one section of 9 seconds long video clip, and the acceleration for making 1.5 times to the video clip being syncopated as plays, to obtain The long video content preview in required 6 seconds can play more video contents in this way.

It is subsequent, it can carry out in video audit, such as audit video whether including Pornograph based on video content preview, Due to including the video cores content such as video cover in video content preview, auditing to video content preview can be with Achieve the effect that complete video audit it is similar, meanwhile, content in video content preview is far fewer than complete video In content, therefore video content preview is audited, reduce audit time with can dramatically, save human cost, and corresponding Improve review efficiency etc. in ground.

It on this basis, is the accuracy for further increasing the video content preview got, in order to make video content Preview can preferably reflect the core content of video, can also handle below further progress.

According to the disaggregated model that preparatory training obtains, determine every frame image in video for video to be divided into respectively The contribution margin of generic;It is a series of video clips by video slicing;According to the tribute of each frame image in each video clip Value and cover scoring are offered, a video clip is selected from each video clip that cutting obtains, according to the video clip selected Generate video content preview.

Wherein, the cover for being chosen as the image of cover can be scored and is set as a, the cover of the image of cover will be not chosen as Scoring is set as b, and a is greater than b, and b is greater than or equal to zero.

Disaggregated model can be the network model obtained using the training of deep learning algorithm, how be trained as existing skill Art.

Using disaggregated model, the classification results of video can be obtained, video is such as divided into classification A, scheme of the present invention In, concern is primarily with the output of the middle layer of disaggregated model content, that is, intermediate result, the intermediate result is every in video Video for being divided into the contribution margin of generic (such as classification A) by frame image.

For one section of video, it is assumed that its generic is " football match ", then different images in video for Video is determined as that the help of " football match " is different, for example, about the image of court and sportsman for the help of classification Can be bigger, and then can be smaller about help of the image of spectators for classification, that is to say, that different images will be for that will regard The contribution margin that frequency is divided into generic is different.

In scheme of the present invention, every frame image in video can be determined for video is divided into generic respectively Contribution margin, and can respectively by the contribution margin of every frame image and cover scoring be weighted addition.

For example, including altogether 100 frame images in video, the 20th frame image therein is chosen as cover, then can be by the 20th frame The cover scoring of image is set as 1, and the cover scoring of other each frame images is set as 0.

In addition, also video can be carried out cutting by camera lens, to obtain the corresponding video clip of different camera lenses, that is, press Cutting is carried out to video according to scene changes, the duration of obtained each video clip may be identical, it is also possible to it is different, usually 7~ 11 seconds.

For each video clip, in the weighting for contribution margin and the cover scoring for calculating separately out each frame image therein After addition result, the mean value of each calculated result can be further calculated out, and then using calculated mean value as video clip Comprehensive score.

For example, including altogether 10 frame images in a certain video clip, the contribution margin and cover of every frame image are calculated separately out The weighting summation of scoring further calculates the mean value of 10 calculated result, by what is obtained as a result, to obtain 10 calculated results Comprehensive score of the mean value as the video clip.

After the comprehensive score for respectively obtaining each video clip, the highest piece of video of comprehensive score can be therefrom selected Section generates video content preview according to the video clip selected.

Such as can directly using the video clip being syncopated as video content preview, alternatively, to video clip carry out M times add Speed plays, to obtain video content preview, M is greater than one.

Based on above-mentioned introduction, Fig. 2 is the flow chart of video reviewing method second embodiment of the present invention.As shown in Fig. 2, Including implementation in detail below.

In 201, the cover of pending video is obtained.

Manually selected cover can be obtained, alternatively, the predetermined scoring of each frame image in video is obtained respectively, according to described Scoring selects a frame image as cover.

For example, can be true respectively first according to the intelligibility evaluation model and aesthetics assessment models that training obtains in advance The clarity scoring and aesthetics scoring for making every frame image in video, later, can be respectively by the clarity of every frame image Scoring and aesthetics scoring are weighted addition, each calculated result are ranked up according to sequence from big to small, the row of selecting The image of topN is in after sequence, N is the positive integer greater than one, later, can be assessed according to the content relevance that preparatory training obtains Model determines the content relevance scoring of the every frame image and video selected, finally, the every frame figure that will can be selected respectively respectively Clarity scoring, aesthetics scoring and the content relevance scoring of picture are weighted addition, and by each calculated result according to from Small sequence is arrived greatly to be ranked up, and is in primary image, the cover as video after selecting sequence.

In 202, determine every frame image in video for video to be divided into the contribution margin of generic respectively.

The disaggregated model that can be obtained according to preparatory training, determines every frame image in video for dividing video respectively To the contribution margin of generic.

It is a series of video clips by video slicing in 203.

For example, cutting can be carried out by camera lens for video, to obtain the corresponding video clip of different camera lenses.

In 204, for each video clip, the contribution margin of every frame image therein and cover scoring are carried out respectively Weighting summation, using the mean value of each calculated result as the comprehensive score of video clip;Wherein, the cover of the image of cover will be selected as Scoring is set as a, sets b for the cover for being not chosen as the image of cover scoring, a is greater than b, and b is greater than or equal to zero.

In 205, the highest video clip of comprehensive score is selected.

In 206, using the video clip selected as required video content preview, so as to based on video content preview into The audit of row video.

In above-mentioned each method embodiment, when being weighted addition, the specific value of each weight can be according to actual needs Depending on, for example, most simply, 1 can be disposed as, the contribution margin of every frame image and cover scoring are weighted and are added up to Example, that is, the contribution margin of every frame image and cover scoring are directly added.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

The introduction about embodiment of the method above, below by way of Installation practice, to scheme of the present invention carry out into One step explanation.

Fig. 3 is the composed structure schematic diagram that video of the present invention audits Installation practice.As shown in Figure 3, comprising: obtain Unit 301 and the first generation unit 302.

Acquiring unit 301, for obtaining the cover of pending video.

First generation unit 302, for the video clip centered on being syncopated as the frame where the cover in video, according to view Frequency segment generates video content preview, to carry out video audit based on video content preview.

Wherein, acquiring unit 301 can obtain manually selected cover, alternatively, each frame image in video can be obtained respectively Predetermined scoring, select a frame image as cover according to scoring.

Usually, the cover selected needs that clarity is higher, aesthetics is higher, and needs to the greatest extent may be used with the content of video The correlation of energy, that is, need to meet as far as possible content/theme of video, correspondingly, acquiring unit 301 can be based on clarity, beauty Sight degree and cover is selected with the correlation of video content.

For example, acquiring unit 301 can train obtained intelligibility evaluation model and aesthetics assessment models according to preparatory, Determine the clarity scoring and aesthetics scoring of every frame image respectively, and respectively by the scoring of the clarity of every frame image and Aesthetics scoring is weighted addition, and each calculated result is ranked up according to sequence from big to small, is in after selecting sequence The image of topN, N are the positive integer greater than one, according to the content relevance assessment models that preparatory training obtains, are determined respectively The content relevance of the every frame image and video selected scores, and the highest figure of content relevance scoring is selected from the image selected Picture, as cover, alternatively, respectively commenting the clarity scoring for the every frame image selected, aesthetics scoring and content relevance Divide and be weighted addition, each calculated result is ranked up according to sequence from big to small, is in primary after selecting sequence Image, as cover.

Alternatively, acquiring unit 301 can according to intelligibility evaluation model that preparatory training obtains, aesthetics assessment models and Content relevance assessment models determine the clarity scoring, aesthetics scoring and the content with video of every frame image respectively Relevance score, and can respectively be added the clarity scoring of every frame image, aesthetics scoring and content relevance scoring Power is added, and each calculated result is ranked up according to sequence from big to small, is in primary image after selecting sequence, as Cover.

After getting the cover of video, the first generation unit 302 with frame where cover can be from being syncopated as in video The video clip of the heart, and then video content preview can be generated according to the video clip being syncopated as, to be based on video content preview Carry out video audit.

For example, can be directly using the video clip being syncopated as video content preview, alternatively, M can be carried out to video clip Acceleration again plays, to obtain video content preview, M is greater than one.

It is subsequent, it can carry out in video audit, such as audit video whether including Pornograph based on video content preview, Due to including the video cores content such as video cover in video content preview, auditing to video content preview can be with Achieve the effect that complete video audit it is similar, meanwhile, content in video content preview is far fewer than complete video In content, therefore video content preview is audited, reduce audit time with can dramatically, save human cost, and corresponding Improve review efficiency etc. in ground.

It on this basis, is the accuracy for further increasing the video content preview got, in order to make video content Preview can preferably reflect the core content of video, can also further comprise in Fig. 3 shown device: the second generation unit 303。

The disaggregated model that second generation unit 303 can be obtained according to preparatory training, determines every frame figure in video respectively It is a series of video clips as the contribution margin for video to be divided into generic, and by video slicing, according to each video The contribution margin and cover of each frame image in segment score, and a piece of video is selected from each video clip that cutting obtains Section generates video content preview according to the video clip selected;Wherein, a is set by the cover for being selected as the image of cover scoring, B is set by the cover for being not chosen as the image of cover scoring, a is greater than b, and b is greater than or equal to zero.

Preferably, video can be carried out cutting by camera lens by the second generation unit 303, the corresponding view of different camera lenses is obtained Frequency segment.

For each video clip, the second generation unit 303 can be respectively by the contribution margin and envelope of every frame image therein Face scoring is weighted addition, using the mean value of each calculated result as the comprehensive score of video clip, and selects comprehensive score most High video clip.

Later, the video clip that the second generation unit 303 can directly will acquire is as video content preview, alternatively, can The acceleration for carrying out M times to the video clip got plays, and obtains video content preview, and M is greater than one.

The specific workflow of Fig. 3 shown device embodiment please refers to the respective description in preceding method embodiment, no longer It repeats.

Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention. The computer system/server 12 that Fig. 4 is shown is only an example, should not function and use scope to the embodiment of the present invention Bring any restrictions.

As shown in figure 4, computer system/server 12 is showed in the form of universal computing device.Computer system/service The component of device 12 can include but is not limited to: one or more processor (processing unit) 16, memory 28, connect not homology The bus 18 of system component (including memory 28 and processor 16).

Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 12 typically comprises a variety of computer system readable media.These media, which can be, appoints What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and Immovable medium.

Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, may be used To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform the present invention The function of each embodiment.

Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs It may include the realization of network environment in module and program data, each of these examples or certain combination.Program mould Block 42 usually executes function and/or method in embodiment described in the invention.

Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more Letter, and/or with the computer system/server 12 any is set with what one or more of the other calculating equipment was communicated Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 passes through bus 18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined Systems/servers 12 use other hardware and/or software module, including but not limited to: microcode, device driver, at redundancy Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

The program that processor 16 is stored in memory 28 by operation, at various function application and data Reason, such as realize the method in Fig. 1 or 2 illustrated embodiments.

The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt The method in embodiment as shown in the figures 1 and 2 will be realized when processor executes.

It can be using any combination of one or more computer-readable media.Computer-readable medium can be calculating Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes: electrical connection with one or more conducting wires, just Taking formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this document, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.In Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (14)

1. a kind of video reviewing method characterized by comprising
Obtain the cover of pending video;
From the video clip being syncopated as in the video centered on frame where the cover;
Video content preview is generated according to the video clip, to carry out video audit based on the video content preview;
This method further comprises: the disaggregated model obtained according to preparatory training determines every frame figure in the video respectively As the contribution margin for the video to be divided into generic, the generic is the classification for being divided into the video; It is a series of video clips by the video slicing;It is commented according to the contribution margin of each frame image in each video clip and cover Point, a video clip is selected from each video clip that cutting obtains, and is generated in the video according to the video clip selected Hold preview;Wherein, a is set by the cover for being selected as the image of cover scoring, the cover scoring of the image of cover will be not chosen as It is set as b, a is greater than b, and b is greater than or equal to zero.
2. the method according to claim 1, wherein
The cover for obtaining pending video includes:
Obtain the manually selected cover;
Alternatively, obtaining the predetermined scoring of each frame image in the video respectively, a frame image conduct is selected according to the scoring The cover.
3. according to the method described in claim 2, it is characterized in that,
The predetermined scoring for obtaining each frame image in the video respectively selects a frame image as institute according to the scoring Stating cover includes:
The intelligibility evaluation model and aesthetics assessment models obtained according to preparatory training, determines the clear of every frame image respectively Clear degree scoring and aesthetics scoring;
The scoring of the clarity of every frame image and aesthetics scoring are weighted addition respectively, by each calculated result according to from big It is ranked up to small sequence, selects the image in topN after sequence, N is the positive integer greater than one;
According to the content relevance assessment models that preparatory training obtains, the every frame image selected and the video are determined respectively Content relevance scoring;
It selects content relevance from the image selected to score highest image, as the cover, alternatively, will select respectively Clarity scoring, aesthetics scoring and the content relevance scoring of every frame image are weighted addition, and each calculated result is pressed It is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;
Alternatively,
According to intelligibility evaluation model, aesthetics assessment models and content relevance assessment models that preparatory training obtains, divide The clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image are not determined;
The clarity scoring of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, it will be each Calculated result is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover.
4. the method according to claim 1, wherein
It is described to include: for a series of video clips by the video slicing
The video is subjected to cutting by camera lens, obtains the corresponding video clip of different camera lenses.
5. the method according to claim 1, wherein
It is described to be scored according to the contribution margin and cover of each frame image in each video clip, each piece of video obtained from cutting Selecting a video clip in section includes:
For each video clip, the contribution margin of every frame image therein and cover scoring are weighted addition respectively, it will Comprehensive score of the mean value of each calculated result as the video clip;
Select the highest video clip of comprehensive score.
6. the method according to claim 1, wherein
Generating video content preview according to video clip includes:
Directly using the video clip as the video content preview;
Alternatively, the acceleration for carrying out M times to the video clip plays, the video content preview is obtained, M is greater than one.
7. a kind of video audits device characterized by comprising acquiring unit and the first generation unit;
The acquiring unit, for obtaining the cover of pending video;
First generation unit, for from the video clip being syncopated as in the video centered on frame where the cover, Video content preview is generated according to the video clip, to carry out video audit based on the video content preview;
Described device further comprises: the second generation unit, for being determined respectively according to the disaggregated model that training obtains in advance The video is for being divided into the contribution margin of generic by every frame image in the video out, and the generic is by institute State the classification that video is divided into;It is a series of video clips by the video slicing;According to each frame figure in each video clip The contribution margin and cover of picture score, and a video clip are selected from each video clip that cutting obtains, according to the view selected Frequency segment generates the video content preview;Wherein, a is set by the cover for being selected as the image of cover scoring, will be not chosen as The cover scoring of the image of cover is set as b, and a is greater than b, and b is greater than or equal to zero.
8. device according to claim 7, which is characterized in that
The acquiring unit obtains the manually selected cover;
Alternatively, the acquiring unit obtains the predetermined scoring of each frame image in the video respectively, selected according to the scoring One frame image is as the cover.
9. device according to claim 8, which is characterized in that
The acquiring unit selects the cover in the following way:
The intelligibility evaluation model and aesthetics assessment models obtained according to preparatory training, determines the clear of every frame image respectively Clear degree scoring and aesthetics scoring;
The scoring of the clarity of every frame image and aesthetics scoring are weighted addition respectively, by each calculated result according to from big It is ranked up to small sequence, selects the image in topN after sequence, N is the positive integer greater than one;
According to the content relevance assessment models that preparatory training obtains, the every frame image selected and the video are determined respectively Content relevance scoring;
It selects content relevance from the image selected to score highest image, as the cover, alternatively, will select respectively Clarity scoring, aesthetics scoring and the content relevance scoring of every frame image are weighted addition, and each calculated result is pressed It is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;
Alternatively,
According to intelligibility evaluation model, aesthetics assessment models and content relevance assessment models that preparatory training obtains, divide The clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image are not determined;
The clarity scoring of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, it will be each Calculated result is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover.
10. device according to claim 7, which is characterized in that
The video is carried out cutting by camera lens by second generation unit, obtains the corresponding video clip of different camera lenses.
11. device according to claim 7, which is characterized in that
Second generation unit is directed to each video clip, and the contribution margin of every frame image therein and cover score respectively It is weighted addition, using the mean value of each calculated result as the comprehensive score of the video clip, and selects comprehensive score highest Video clip.
12. device according to claim 7, which is characterized in that
The video clip that first generation unit and second generation unit directly will acquire is as in the video Hold preview, alternatively, the acceleration for carrying out M times to the video clip got plays, obtains the video content preview, M is greater than one.
13. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~6 Method described in.
14. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Such as method according to any one of claims 1 to 6 is realized when device executes.
CN201810044313.5A 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium CN108377417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810044313.5A CN108377417B (en) 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810044313.5A CN108377417B (en) 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108377417A CN108377417A (en) 2018-08-07
CN108377417B true CN108377417B (en) 2019-11-26

Family

ID=63015158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810044313.5A CN108377417B (en) 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108377417B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104618803A (en) * 2014-02-26 2015-05-13 腾讯科技(深圳)有限公司 Information push method, information push device, terminal and server
CN104822099A (en) * 2015-04-30 2015-08-05 努比亚技术有限公司 Video packaging method and mobile terminal
CN106503693A (en) * 2016-11-28 2017-03-15 北京字节跳动科技有限公司 The offer method and device of video front cover
CN106791480A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 A kind of terminal and video skimming creation method
CN107077595A (en) * 2014-09-08 2017-08-18 谷歌公司 Selection and presentation representative frame are for video preview
WO2017155685A1 (en) * 2016-03-08 2017-09-14 Flipboard, Inc. Auto video preview within a digital magazine
CN107194419A (en) * 2017-05-10 2017-09-22 百度在线网络技术(北京)有限公司 Video classification methods and device, computer equipment and computer-readable recording medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107027051B (en) * 2016-07-26 2019-11-08 中国科学院自动化研究所 A kind of video key frame extracting method based on linear dynamic system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104618803A (en) * 2014-02-26 2015-05-13 腾讯科技(深圳)有限公司 Information push method, information push device, terminal and server
CN107077595A (en) * 2014-09-08 2017-08-18 谷歌公司 Selection and presentation representative frame are for video preview
CN104822099A (en) * 2015-04-30 2015-08-05 努比亚技术有限公司 Video packaging method and mobile terminal
WO2017155685A1 (en) * 2016-03-08 2017-09-14 Flipboard, Inc. Auto video preview within a digital magazine
CN106503693A (en) * 2016-11-28 2017-03-15 北京字节跳动科技有限公司 The offer method and device of video front cover
CN106791480A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 A kind of terminal and video skimming creation method
CN107194419A (en) * 2017-05-10 2017-09-22 百度在线网络技术(北京)有限公司 Video classification methods and device, computer equipment and computer-readable recording medium

Also Published As

Publication number Publication date
CN108377417A (en) 2018-08-07

Similar Documents

Publication Publication Date Title
Hong et al. Inferring semantic layout for hierarchical text-to-image synthesis
Hori et al. Attention-based multimodal fusion for video description
Huang et al. Instance-aware image and sentence matching with selective multimodal lstm
Ganin et al. Unsupervised domain adaptation by backpropagation
Karayev et al. Recognizing image style
Zhang et al. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching
Regneri et al. Grounding action descriptions in videos
Chen et al. Multi-modal dimensional emotion recognition using recurrent neural networks
De Vries et al. Guesswhat?! visual object discovery through multi-modal dialogue
US20170263055A1 (en) Generating augmented reality exemplars
US9639746B2 (en) Systems and methods of detecting body movements using globally generated multi-dimensional gesture data
CN104246656B (en) It is recommended that video editing automatic detection
CN103329126B (en) Utilize the search of joint image-audio query
US9818409B2 (en) Context-dependent modeling of phonemes
CN107636684A (en) Emotion identification in video conference
Fothergill et al. Instructing people for training gestural interactive systems
Cummins et al. An image-based deep spectrum feature representation for the recognition of emotional speech
JP3705429B2 (en) Audio visual summary creation method
US20150294216A1 (en) Cognitive distributed network
CN105184362B (en) The acceleration of the depth convolutional neural networks quantified based on parameter and compression method
KR101643975B1 (en) System and method for dynamic adaption of media based on implicit user input and behavior
CN102207954B (en) Electronic equipment, content recommendation method and program thereof
CN104090967B (en) Application program recommends method and recommendation apparatus
JP2019527371A (en) Voiceprint identification method and apparatus
Bellegarda et al. The metamorphic algorithm: A speaker mapping approach to data augmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant