CN108377417A - Video reviewing method, device, computer equipment and storage medium - Google Patents

Video reviewing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN108377417A
CN108377417A CN201810044313.5A CN201810044313A CN108377417A CN 108377417 A CN108377417 A CN 108377417A CN 201810044313 A CN201810044313 A CN 201810044313A CN 108377417 A CN108377417 A CN 108377417A
Authority
CN
China
Prior art keywords
video
scoring
cover
frame image
image
Prior art date
Application number
CN201810044313.5A
Other languages
Chinese (zh)
Other versions
CN108377417B (en
Inventor
孙昊
刘霄
文石磊
丁二锐
李旭斌
李甫
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to CN201810044313.5A priority Critical patent/CN108377417B/en
Publication of CN108377417A publication Critical patent/CN108377417A/en
Application granted granted Critical
Publication of CN108377417B publication Critical patent/CN108377417B/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Abstract

The invention discloses video reviewing method, device, computer equipment and storage medium, wherein method includes:Obtain the cover of pending video;From the video clip being syncopated as in video centered on frame where cover;Video content preview is generated according to video clip, video audit is carried out to be based on video content preview.Using scheme of the present invention, human cost can be saved and improve review efficiency etc..

Description

Video reviewing method, device, computer equipment and storage medium

【Technical field】

The present invention relates to Computer Applied Technology, more particularly to video reviewing method, device, computer equipment and storage is situated between Matter.

【Background technology】

The development of internet gradually evolves to the video multimedia epoch by the text epoch.It is according to disclosed data as it can be seen that every The video content of day newly-increased distribution and output reaches hundred million grades, and multitude of video content needs audit and management and control processing.

Currently, being based primarily upon manual type to be audited to video, i.e., it is reviewed processing frame by frame by manual type Deng not only needing to expend a large amount of human cost, and inefficiency.

【Invention content】

In view of this, the present invention provides video reviewing method, device, computer equipment and storage medium, can save Human cost simultaneously improves review efficiency.

Specific technical solution is as follows:

A kind of video reviewing method, including:

Obtain the cover of pending video;

From the video clip being syncopated as in the video centered on frame where the cover;

Video content preview is generated according to the video clip, is examined to be based on the video content preview progress video Core.

According to one preferred embodiment of the present invention, the cover for obtaining pending video includes:

Obtain the manually selected cover;

Alternatively, obtaining the predetermined scoring of each frame image in the video respectively, a frame image is selected according to the scoring As the cover.

According to one preferred embodiment of the present invention, the predetermined scoring for obtaining each frame image in the video respectively, is pressed Selecting a frame image as the cover according to the scoring includes:

The intelligibility evaluation model and aesthetics assessment models obtained according to advance training, determines every frame image respectively Clarity scoring and aesthetics scoring;

Respectively by the clarity of every frame image scoring and aesthetics scoring be weighted addition, by each result of calculation according to Sequence from big to small is ranked up, and selects the image in topN after sequence, and N is the positive integer more than one;

According to the content relevance assessment models that advance training obtains, determine that the every frame image selected is regarded with described respectively The content relevance of frequency scores;

The highest image of content relevance scoring is selected from the image selected, as the cover, alternatively, respectively will choosing Clarity scoring, aesthetics scoring and the content relevance scoring of the every frame image gone out are weighted addition, and each calculating is tied Fruit is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;

Alternatively,

Intelligibility evaluation model, aesthetics assessment models and the content relevance obtained according to advance training assesses mould Type determines the clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image respectively;

The scoring of the clarity of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, Each result of calculation is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the envelope Face.

According to one preferred embodiment of the present invention, this method further comprises:

According to the disaggregated model that advance training obtains, determine every frame image in the video for being regarded by described in respectively Frequency is divided into the contribution margin of generic;

It is a series of video clips by the video slicing;

It is scored according to the contribution margin of each frame image in each video clip and cover, each piece of video obtained from cutting A video clip is selected in section, the video content preview is generated according to the video clip selected;

Wherein, it sets the scoring of the cover for the image for being selected as cover to a, the cover scoring of the image of cover will be not chosen as It is set as b, a is more than b, and b is greater than or equal to zero.

According to one preferred embodiment of the present invention, described to include for a series of video clips by the video slicing:

The video is subjected to cutting by camera lens, obtains the corresponding video clip of different camera lenses.

According to one preferred embodiment of the present invention, the contribution margin and envelope of each frame image in each video clip of the basis Face is scored, and selecting a video clip in each video clip obtained from cutting includes:

For each video clip, the contribution margin therein per frame image and cover scoring are weighted phase respectively Add, using the mean value of each result of calculation as the comprehensive score of the video clip;

Select the highest video clip of comprehensive score.

According to one preferred embodiment of the present invention, generating video content preview according to video clip includes:

Directly using the video clip as the video content preview;

It is played alternatively, carrying out M times of acceleration to the video clip, obtains the video content preview, M is more than one.

A kind of video audit device, including:Acquiring unit and the first generation unit;

The acquiring unit, the cover for obtaining pending video;

First generation unit, for the piece of video centered on being syncopated as the frame where the cover in the video Section generates video content preview according to the video clip, and video audit is carried out to be based on the video content preview.

According to one preferred embodiment of the present invention, the acquiring unit obtains the manually selected cover;

Alternatively, the acquiring unit obtains the predetermined scoring of each frame image in the video respectively, according to the scoring A frame image is selected as the cover.

According to one preferred embodiment of the present invention, the acquiring unit selects the cover in the following way:

The intelligibility evaluation model and aesthetics assessment models obtained according to advance training, determines every frame image respectively Clarity scoring and aesthetics scoring;

Respectively by the clarity of every frame image scoring and aesthetics scoring be weighted addition, by each result of calculation according to Sequence from big to small is ranked up, and selects the image in topN after sequence, and N is the positive integer more than one;

According to the content relevance assessment models that advance training obtains, determine that the every frame image selected is regarded with described respectively The content relevance of frequency scores;

The highest image of content relevance scoring is selected from the image selected, as the cover, alternatively, respectively will choosing Clarity scoring, aesthetics scoring and the content relevance scoring of the every frame image gone out are weighted addition, and each calculating is tied Fruit is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;

Alternatively,

Intelligibility evaluation model, aesthetics assessment models and the content relevance obtained according to advance training assesses mould Type determines the clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image respectively;

The scoring of the clarity of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, Each result of calculation is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the envelope Face.

According to one preferred embodiment of the present invention, which further comprises:Second generation unit;

Second generation unit, for according to the disaggregated model that training obtains in advance, determining respectively in the video Every frame image for the video to be divided into the contribution margin of generic;It is a series of piece of video by the video slicing Section;It is scored according to the contribution margin of each frame image in each video clip and cover, in each video clip obtained from cutting A video clip is selected, the video content preview is generated according to the video clip selected;Wherein, the image of cover will be selected as Cover scoring be set as a, set the scoring of the cover of the image for being not chosen as cover to b, a is more than b, and b is greater than or equal to Zero.

According to one preferred embodiment of the present invention, the video is carried out cutting by second generation unit by camera lens, is obtained The corresponding video clip of different camera lenses.

According to one preferred embodiment of the present invention, second generation unit is directed to each video clip, respectively will be therein Contribution margin and cover scoring per frame image is weighted addition, using the mean value of each result of calculation as the video clip Comprehensive score, and select the highest video clip of comprehensive score.

According to one preferred embodiment of the present invention, first generation unit and second generation unit will directly obtain The video clip arrived is played alternatively, carrying out M times of acceleration to the video clip got, is obtained as the video content preview To the video content preview, M is more than one.

A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor The computer program of upper operation, the processor realize method as described above when executing described program.

A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor Now method as described above.

It can be seen that using scheme of the present invention based on above-mentioned introduction, the cover of pending video can be obtained first, Later can be from the video clip being syncopated as in video centered on frame where cover, and then can be generated in video according to video clip Hold preview, video audit is carried out to be based on video content preview, due to including that video cover etc. regards in video content preview Frequency core content, therefore video content preview is carried out auditing can reach carrying out auditing similar effect to complete video, together When, the content in video content preview is audited far fewer than the content in complete video, therefore to video content preview, can Audit time is significantly decreased, save human cost, and correspondingly improve review efficiency etc..

【Description of the drawings】

Fig. 1 is the flow chart of video reviewing method first embodiment of the present invention.

Fig. 2 is the flow chart of video reviewing method second embodiment of the present invention.

Fig. 3 is the composed structure schematic diagram that video of the present invention audits device embodiment.

Fig. 4 shows the block diagram of the exemplary computer system/server 12 suitable for being used for realizing embodiment of the present invention.

【Specific implementation mode】

In order to keep technical scheme of the present invention clearer, clear, develop simultaneously embodiment referring to the drawings, to institute of the present invention The scheme of stating is further described.

Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, all other embodiment that those skilled in the art are obtained without creative efforts, all Belong to the scope of protection of the invention.

Fig. 1 is the flow chart of video reviewing method first embodiment of the present invention.As shown in Figure 1, including in detail below Realization method.

In 101, the cover of pending video is obtained.

In 102, from the video clip being syncopated as in video centered on frame where cover.

In 103, video content preview is generated according to the video clip being syncopated as, is carried out to be based on video content preview Video is audited.

For pending video, its cover can be obtained first.

The cover of video can be manually selected.For example, being selected by the user of uploaded videos, and can be regarded in upload Selected cover is uploaded while frequency, cover usually can represent the core content of video.

Alternatively, the predetermined scoring of each frame image in video can be obtained respectively, and then a frame figure is selected according to the scoring As being used as cover.

Usually, the cover selected needs that clarity is higher, aesthetics is higher, and the content with video is needed to the greatest extent may be used Can correlation, that is, need the content/theme for meeting video as far as possible, correspondingly, can be based on clarity, aesthetics and with regarding The correlation of frequency content selects cover.

It can include but is not limited to following manner:

1) according to the intelligibility evaluation model and aesthetics assessment models that training obtains in advance, every frame figure is determined respectively The clarity of picture scores and aesthetics scoring;The scoring of the clarity of every frame image and aesthetics scoring are weighted respectively Be added, each result of calculation is ranked up according to sequence from big to small, select sequence after be in topN image, N be more than One positive integer;According to the content relevance assessment models that advance training obtains, the every frame image selected is determined respectively and is regarded The content relevance of frequency scores;The highest image of content relevance scoring, the cover as video are selected from the image selected.

Intelligibility evaluation model and aesthetics assessment models can be the network model trained using deep learning algorithm, How to be trained as the prior art.

By taking intelligibility evaluation model as an example, the image of different readabilities can be obtained respectively as training sample, and can divide Others' work marks out the clarity scoring of each image, as the label of each training sample, and then can be according to each training sample and mark Label training obtains intelligibility evaluation model.

After getting intelligibility evaluation model and aesthetics assessment models, the every frame image that can be directed in video, Respectively as the input of intelligibility evaluation model and aesthetics assessment models, to respectively obtain the clear of every frame image Degree scoring and aesthetics scoring.

Later, can the scoring of the clarity of every frame image and aesthetics scoring be weighted addition respectively, and can will be each Result of calculation is ranked up according to sequence from big to small, selects the image in topN after sequence, and N is just whole more than one Number, specific value can be decided according to the actual requirements.

For the every frame image selected, the content relevance assessment models that can be also obtained according to advance training, further really The content relevance for making every frame image and video scores.

Content relevance assessment models can be equally the network model trained using deep learning algorithm, how carry out Training is similarly the prior art.

Later, the highest image of content relevance scoring, the envelope as video can be further selected from the image selected Face.

2) according to the intelligibility evaluation model and aesthetics assessment models that training obtains in advance, every frame figure is determined respectively The clarity of picture scores and aesthetics scoring;The scoring of the clarity of every frame image and aesthetics scoring are weighted respectively Be added, each result of calculation is ranked up according to sequence from big to small, select sequence after be in topN image, N be more than One positive integer;According to the content relevance assessment models that advance training obtains, the every frame image selected is determined respectively and is regarded The content relevance of frequency scores;Respectively by the scoring of the clarity for the every frame image selected, aesthetics scoring and content relevance Scoring is weighted addition, and each result of calculation is ranked up according to sequence from big to small, and first is in after selecting sequence Image, the cover as video.

Compared to mode 1), it, can be comprehensive after the content relevance scoring for getting each frame image selected in the method The clarity scoring, aesthetics scoring and content relevance scoring for each frame image selected are closed further to select as cover Image.

3) according to intelligibility evaluation model, aesthetics assessment models and the content relevance assessment mould that training obtains in advance Type determines the clarity scoring, aesthetics scoring and the content relevance scoring with video of every frame image respectively;Respectively will Clarity scoring, aesthetics scoring and content relevance scoring per frame image are weighted addition, and each result of calculation is pressed It is ranked up according to sequence from big to small, primary image, the cover as video is in after selecting sequence.

Mode 1) and mode 2) in, it is first to be carried out tentatively to each frame image in video according to clarity and aesthetics Screening, and then calculate content relevance scoring only for the image that filters out, in the method, can obtain respectively every in video Clarity scoring, aesthetics scoring and the content relevance scoring with video of frame image, and then directly integrate each frame image Clarity scoring, aesthetics scoring and content relevance scoring select image as cover.

In practical applications, above-mentioned any way can be used according to actual needs.

No matter which kind of mode is used, after getting the cover of video, can be syncopated as from video with frame where cover Centered on video clip, and then video content preview can be generated according to the video clip that is syncopated as, to be based on video content Preview carries out video audit.

It wherein, can be directly using the video clip being syncopated as video content preview, alternatively, M can be carried out to video clip Acceleration again plays, and to obtain video content preview, M is more than one.

For example, including 100 frame images altogether in video, frame where cover is the 20th frame, then in being with frame where video The heart is syncopated as one section of 6 seconds long video clip, using the video clip switched out as required video content preview.

For another example, include 100 frame images altogether in video, frame where cover is the 20th frame, then can be with frame where video Center is syncopated as one section of 9 seconds long video clip, and the video clip to being syncopated as makees 1.5 times of acceleration broadcasting, to obtain Required long video content previews in 6 seconds can play more video contents in this way.

Subsequently, it can carry out in video audit, such as audit video whether including Pornograph based on video content preview, Due to including the video cores content such as video cover in video content preview, being audited to video content preview can be with Achieve the effect that complete video audit it is similar, meanwhile, content in video content preview is far fewer than complete video In content, therefore video content preview is audited, audit time can be significantly decreased, save human cost, and corresponding Improve review efficiency etc. in ground.

On this basis, to further increase the accuracy of the video content preview got, in order to make video content Preview can preferably reflect the core content of video, can also further carry out following handle.

According to the disaggregated model that advance training obtains, determine every frame image in video for video to be divided into respectively The contribution margin of generic;It is a series of video clips by video slicing;According to the tribute of each frame image in each video clip Value and cover scoring are offered, a video clip is selected in each video clip obtained from cutting, according to the video clip selected Generate video content preview.

Wherein, the cover for the image for being chosen as cover can be scored and is set as a, the cover of the image of cover will be not chosen as Scoring is set as b, and a is more than b, and b is greater than or equal to zero.

Disaggregated model can be the network model trained using deep learning algorithm, how be trained as existing skill Art.

Using disaggregated model, the classification results of video can be obtained, video is such as divided into classification A, scheme of the present invention In, concern is primarily with the output of the middle layer of disaggregated model content, that is, intermediate result, the intermediate result is every in video Video for being divided into the contribution margin of generic (such as classification A) by frame image.

For one section of video, it is assumed that its generic is " football match ", then different images in video for Video is determined as that the help of " football match " is different, for example, about the image of court and sportsman for the help of classification Can be bigger, and then can be smaller for the help of classification about the image of spectators, that is to say, that different images will be for that will regard The contribution margin that frequency is divided into generic is different.

In scheme of the present invention, every frame image in video can be determined for video is divided into generic respectively Contribution margin, and can respectively by the contribution margin of every frame image and cover scoring be weighted addition.

For example, including altogether 100 frame images in video, the 20th frame image therein is chosen as cover, then can be by the 20th frame The cover scoring of image is set as 1, and the cover scoring of other each frame images is set as 0.

In addition, can also video be carried out cutting by camera lens, to obtain the corresponding video clip of different camera lenses, that is, press Cutting is carried out to video according to scene changes, the duration of obtained each video clip may be identical, it is also possible to it is different, usually 7~ 11 seconds.

For each video clip, in the weighting for contribution margin and the cover scoring for calculating separately out each frame image therein After addition result, the mean value of each result of calculation can be further calculated out, and then using calculated mean value as video clip Comprehensive score.

For example, including altogether 10 frame images in a certain video clip, the contribution margin and cover of every frame image are calculated separately out The weighting summation of scoring further calculates the mean value of 10 result of calculation, by what is obtained as a result, to obtain 10 result of calculations Comprehensive score of the mean value as the video clip.

After the comprehensive score for respectively obtaining each video clip, a highest piece of video of comprehensive score can be therefrom selected Section generates video content preview according to the video clip selected.

Such as can directly using the video clip being syncopated as video content preview, alternatively, to video clip carry out M times add Speed plays, and to obtain video content preview, M is more than one.

Based on above-mentioned introduction, Fig. 2 is the flow chart of video reviewing method second embodiment of the present invention.As shown in Fig. 2, Including realization method in detail below.

In 201, the cover of pending video is obtained.

Manually selected cover can be obtained, alternatively, the predetermined scoring of each frame image in video is obtained respectively, according to described Scoring selects a frame image as cover.

For example, can be true respectively first according to the intelligibility evaluation model and aesthetics assessment models that training obtains in advance Clarity scoring and the aesthetics scoring for making every frame image in video, later, can be respectively by the clarity of every frame image Scoring and aesthetics scoring are weighted addition, each result of calculation are ranked up according to sequence from big to small, the row of selecting The image of topN is in after sequence, N is the positive integer more than one, later, can be assessed according to the content relevance that advance training obtains Model determines the content relevance scoring of the every frame image and video selected, finally, the every frame figure that will can be selected respectively respectively Clarity scoring, aesthetics scoring and the content relevance scoring of picture are weighted addition, and by each result of calculation according to from Small sequence is arrived greatly to be ranked up, and primary image, the cover as video are in after selecting sequence.

In 202, determine every frame image in video for video to be divided into the contribution margin of generic respectively.

The disaggregated model that can be obtained according to advance training, determines every frame image in video for dividing video respectively To the contribution margin of generic.

It is a series of video clips by video slicing in 203.

For example, can cutting be carried out by camera lens for video, to obtain the corresponding video clip of different camera lenses.

In 204, for each video clip, the contribution margin therein per frame image and cover scoring are carried out respectively Weighting summation, using the mean value of each result of calculation as the comprehensive score of video clip;Wherein, the cover of the image of cover will be selected as Scoring is set as a, sets the cover scoring for the image for being not chosen as cover to b, a is more than b, and b is greater than or equal to zero.

In 205, the highest video clip of comprehensive score is selected.

In 206, using the video clip selected as required video content preview, so as to be based on video content preview into Row video is audited.

In above-mentioned each method embodiment, when being weighted addition, the specific value of each weights can be according to actual needs Depending on, for example, most simply, 1 can be disposed as, the contribution margin of every frame image and cover scoring are weighted and are added up to Example, that is, the contribution margin of every frame image and cover scoring are directly added.

It should be noted that for each method embodiment above-mentioned, for simple description, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the described action sequence because According to the present invention, certain steps may be used other sequences or be carried out at the same time.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention It is necessary.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.

It is the introduction about embodiment of the method above, below by way of device embodiment, to scheme of the present invention into traveling One step explanation.

Fig. 3 is the composed structure schematic diagram that video of the present invention audits device embodiment.As shown in figure 3, including:It obtains Unit 301 and the first generation unit 302.

Acquiring unit 301, the cover for obtaining pending video.

First generation unit 302, for the video clip centered on being syncopated as the frame where the cover in video, according to regarding Frequency segment generates video content preview, and video audit is carried out to be based on video content preview.

Wherein, acquiring unit 301 can obtain manually selected cover, alternatively, each frame image in video can be obtained respectively Predetermined scoring, select a frame image as cover according to scoring.

Usually, the cover selected needs that clarity is higher, aesthetics is higher, and the content with video is needed to the greatest extent may be used The correlation of energy, that is, need the content/theme for meeting video as far as possible, and correspondingly, acquiring unit 301 can be based on clarity, U.S. Sight degree and cover is selected with the correlation of video content.

For example, acquiring unit 301 can train obtained intelligibility evaluation model and aesthetics assessment models according to advance, Determine respectively every frame image clarity scoring and aesthetics scoring, and respectively by the clarity of every frame image scoring and Aesthetics scoring is weighted addition, and each result of calculation is ranked up according to sequence from big to small, is in after selecting sequence The image of topN, N are the positive integer more than one, according to the content relevance assessment models that advance training obtains, are determined respectively The content relevance of the every frame image and video selected scores, and the highest figure of content relevance scoring is selected from the image selected Picture, as cover, alternatively, respectively commenting the scoring of the clarity for the every frame image selected, aesthetics scoring and content relevance Divide and be weighted addition, each result of calculation is ranked up according to sequence from big to small, is in primary after selecting sequence Image, as cover.

Alternatively, acquiring unit 301 can according to intelligibility evaluation model that advance training obtains, aesthetics assessment models and Content relevance assessment models determine the clarity scoring, aesthetics scoring and the content with video of every frame image respectively Relevance score, and can respectively be added the scoring of the clarity of every frame image, aesthetics scoring and content relevance scoring Power is added, and each result of calculation is ranked up according to sequence from big to small, and primary image is in after selecting sequence, as Cover.

After getting the cover of video, the first generation unit 302 can be from being syncopated as in being with frame where cover in video The video clip of the heart, and then video content preview can be generated according to the video clip being syncopated as, to be based on video content preview Carry out video audit.

For example, can be directly using the video clip being syncopated as video content preview, alternatively, M can be carried out to video clip Acceleration again plays, and to obtain video content preview, M is more than one.

Subsequently, it can carry out in video audit, such as audit video whether including Pornograph based on video content preview, Due to including the video cores content such as video cover in video content preview, being audited to video content preview can be with Achieve the effect that complete video audit it is similar, meanwhile, content in video content preview is far fewer than complete video In content, therefore video content preview is audited, audit time can be significantly decreased, save human cost, and corresponding Improve review efficiency etc. in ground.

On this basis, to further increase the accuracy of the video content preview got, in order to make video content Preview can preferably reflect the core content of video, can also further comprise in Fig. 3 shown devices:Second generation unit 303。

The disaggregated model that second generation unit 303 can be obtained according to advance training, determines every frame figure in video respectively It is a series of video clips as the contribution margin for video to be divided into generic, and by video slicing, according to each video The contribution margin and cover of each frame image in segment score, and a piece of video is selected in each video clip obtained from cutting Section generates video content preview according to the video clip selected;Wherein, the scoring of the cover for the image for being selected as cover is set to a, Set the cover scoring for the image for being not chosen as cover to b, a is more than b, and b is greater than or equal to zero.

Preferably, video can be carried out cutting by the second generation unit 303 by camera lens, obtain that different camera lenses are corresponding to be regarded Frequency segment.

For each video clip, the second generation unit 303 can be respectively by the contribution margin and envelope therein per frame image Face scoring is weighted addition, using the mean value of each result of calculation as the comprehensive score of video clip, and selects comprehensive score most High video clip.

Later, the second generation unit 303 can be directly using the video clip got as video content preview, alternatively, can M times of acceleration is carried out to the video clip got to play, obtains video content preview, and M is more than one.

The specific workflow of Fig. 3 shown device embodiments please refers to the respective description in preceding method embodiment, no longer It repeats.

Fig. 4 shows the block diagram of the exemplary computer system/server 12 suitable for being used for realizing embodiment of the present invention. The computer system/server 12 that Fig. 4 is shown is only an example, should not be to the function and use scope of the embodiment of the present invention Bring any restrictions.

As shown in figure 4, computer system/server 12 is showed in the form of universal computing device.Computer system/service The component of device 12 can include but is not limited to:One or more processor (processing unit) 16, memory 28 connect not homology The bus 18 of system component (including memory 28 and processor 16).

Bus 18 indicates one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using the arbitrary bus structures in a variety of bus structures.It lifts For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 12 typically comprises a variety of computer system readable media.These media can be appointed What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and Immovable medium.

Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, may be used To provide for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable non-volatile Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each to drive Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program There is one group of (for example, at least one) program module, these program modules to be configured to perform the present invention for product, the program product The function of each embodiment.

Program/utility 40 with one group of (at least one) program module 42 can be stored in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs Module and program data may include the realization of network environment in each or certain combination in these examples.Program mould Block 42 usually executes function and/or method in embodiment described in the invention.

Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more Letter, and/or any set with so that the computer system/server 12 communicated with one or more of the other computing device Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 passes through bus 18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined Systems/servers 12 use other hardware and/or software module, including but not limited to:Microcode, device driver, at redundancy Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processor 16 is stored in the program in memory 28 by operation, to perform various functions at application and data Reason, such as realize the method in Fig. 1 or 2 illustrated embodiments.

The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt The method in embodiment as shown in the figures 1 and 2 will be realized when processor executes.

The arbitrary combination of one or more computer-readable media may be used.Computer-readable medium can be calculated Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited In --- electricity, system, device or the device of magnetic, optical, electromagnetic, infrared ray or semiconductor, or the arbitrary above combination.It calculates The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes:Electrical connection with one or more conducting wires, just It takes formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this document, can be any include computer readable storage medium or storage journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.

Computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission for by instruction execution system, device either device use or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

It can be write with one or more programming languages or combinations thereof for executing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partly executes or executed on a remote computer or server completely on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service It is connected by internet for quotient).

In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of division of logic function, formula that in actual implementation, there may be another division manner.

The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can be stored in one and computer-readable deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. it is various The medium of program code can be stored.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention With within principle, any modification, equivalent substitution, improvement and etc. done should be included within the scope of protection of the invention god.

Claims (16)

1. a kind of video reviewing method, which is characterized in that including:
Obtain the cover of pending video;
From the video clip being syncopated as in the video centered on frame where the cover;
Video content preview is generated according to the video clip, video audit is carried out to be based on the video content preview.
2. according to the method described in claim 1, it is characterized in that,
It is described obtain pending video cover include:
Obtain the manually selected cover;
Alternatively, obtaining the predetermined scoring of each frame image in the video respectively, a frame image conduct is selected according to the scoring The cover.
3. according to the method described in claim 2, it is characterized in that,
The predetermined scoring for obtaining each frame image in the video respectively selects a frame image as institute according to the scoring Stating cover includes:
The intelligibility evaluation model and aesthetics assessment models obtained according to advance training, determines the clear of every frame image respectively Clear degree scoring and aesthetics scoring;
The scoring of the clarity of every frame image and aesthetics scoring are weighted addition respectively, by each result of calculation according to from big It is ranked up to small sequence, selects the image in topN after sequence, N is the positive integer more than one;
According to the content relevance assessment models that advance training obtains, the every frame image selected and the video are determined respectively Content relevance scores;
It selects content relevance from the image selected to score highest image, as the cover, alternatively, will select respectively Clarity scoring, aesthetics scoring and content relevance scoring per frame image are weighted addition, and each result of calculation is pressed It is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;
Alternatively,
According to intelligibility evaluation model, aesthetics assessment models and content relevance assessment models that advance training obtains, divide The clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image are not determined;
The scoring of the clarity of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, it will be each Result of calculation is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover.
4. according to the method described in claim 1, it is characterized in that,
This method further comprises:
According to the disaggregated model that advance training obtains, determine every frame image in the video for drawing the video respectively Assign to the contribution margin of generic;
It is a series of video clips by the video slicing;
It is scored according to the contribution margin of each frame image in each video clip and cover, in each video clip obtained from cutting A video clip is selected, the video content preview is generated according to the video clip selected;
Wherein, it sets the scoring of the cover for the image for being selected as cover to a, the cover scoring setting of the image of cover will be not chosen as It is more than b for b, a, and b is greater than or equal to zero.
5. according to the method described in claim 4, it is characterized in that,
It is described to include for a series of video clips by the video slicing:
The video is subjected to cutting by camera lens, obtains the corresponding video clip of different camera lenses.
6. according to the method described in claim 4, it is characterized in that,
The contribution margin and cover of each frame image in each video clip of basis score, each piece of video obtained from cutting Selecting a video clip in section includes:
For each video clip, the contribution margin therein per frame image and cover scoring are weighted addition respectively, it will Comprehensive score of the mean value of each result of calculation as the video clip;
Select the highest video clip of comprehensive score.
7. according to the method described in claim 4, it is characterized in that,
Generating video content preview according to video clip includes:
Directly using the video clip as the video content preview;
It is played alternatively, carrying out M times of acceleration to the video clip, obtains the video content preview, M is more than one.
8. a kind of video audits device, which is characterized in that including:Acquiring unit and the first generation unit;
The acquiring unit, the cover for obtaining pending video;
First generation unit, for the video clip centered on being syncopated as the frame where the cover in the video, Video content preview is generated according to the video clip, video audit is carried out to be based on the video content preview.
9. device according to claim 8, which is characterized in that
The acquiring unit obtains the manually selected cover;
Alternatively, the acquiring unit obtains the predetermined scoring of each frame image in the video respectively, selected according to the scoring One frame image is as the cover.
10. device according to claim 9, which is characterized in that
The acquiring unit selects the cover in the following way:
The intelligibility evaluation model and aesthetics assessment models obtained according to advance training, determines the clear of every frame image respectively Clear degree scoring and aesthetics scoring;
The scoring of the clarity of every frame image and aesthetics scoring are weighted addition respectively, by each result of calculation according to from big It is ranked up to small sequence, selects the image in topN after sequence, N is the positive integer more than one;
According to the content relevance assessment models that advance training obtains, the every frame image selected and the video are determined respectively Content relevance scores;
It selects content relevance from the image selected to score highest image, as the cover, alternatively, will select respectively Clarity scoring, aesthetics scoring and content relevance scoring per frame image are weighted addition, and each result of calculation is pressed It is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover;
Alternatively,
According to intelligibility evaluation model, aesthetics assessment models and content relevance assessment models that advance training obtains, divide The clarity scoring, aesthetics scoring and the content relevance scoring with the video of every frame image are not determined;
The scoring of the clarity of every frame image, aesthetics scoring and content relevance scoring are weighted addition respectively, it will be each Result of calculation is ranked up according to sequence from big to small, primary image is in after selecting sequence, as the cover.
11. device according to claim 8, which is characterized in that
Described device further comprises:Second generation unit;
Second generation unit, for according to the disaggregated model that training obtains in advance, determining respectively every in the video The video for being divided into the contribution margin of generic by frame image;It is a series of video clips by the video slicing;Root It scores according to the contribution margin and cover of each frame image in each video clip, one is selected in each video clip obtained from cutting A video clip generates the video content preview according to the video clip selected;Wherein, the cover of the image of cover will be selected as Scoring is set as a, sets the cover scoring for the image for being not chosen as cover to b, a is more than b, and b is greater than or equal to zero.
12. according to the devices described in claim 11, which is characterized in that
The video is carried out cutting by second generation unit by camera lens, obtains the corresponding video clip of different camera lenses.
13. according to the devices described in claim 11, which is characterized in that
Second generation unit is directed to each video clip, and the contribution margin therein per frame image and cover score respectively It is weighted addition, using the mean value of each result of calculation as the comprehensive score of the video clip, and selects comprehensive score highest Video clip.
14. according to the devices described in claim 11, which is characterized in that
First generation unit and second generation unit are directly using the video clip got as in the video Hold preview, is played alternatively, carrying out M times of acceleration to the video clip got, obtain the video content preview, M is more than one.
15. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~7 Method described in.
16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is handled Such as method according to any one of claims 1 to 7 is realized when device executes.
CN201810044313.5A 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium CN108377417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810044313.5A CN108377417B (en) 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810044313.5A CN108377417B (en) 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108377417A true CN108377417A (en) 2018-08-07
CN108377417B CN108377417B (en) 2019-11-26

Family

ID=63015158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810044313.5A CN108377417B (en) 2018-01-17 2018-01-17 Video reviewing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108377417B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104618803A (en) * 2014-02-26 2015-05-13 腾讯科技(深圳)有限公司 Information push method, information push device, terminal and server
CN104822099A (en) * 2015-04-30 2015-08-05 努比亚技术有限公司 Video packaging method and mobile terminal
CN106503693A (en) * 2016-11-28 2017-03-15 北京字节跳动科技有限公司 The offer method and device of video front cover
CN106791480A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 A kind of terminal and video skimming creation method
CN107027051A (en) * 2016-07-26 2017-08-08 中国科学院自动化研究所 A kind of video key frame extracting method based on linear dynamic system
CN107077595A (en) * 2014-09-08 2017-08-18 谷歌公司 Selection and presentation representative frame are for video preview
WO2017155685A1 (en) * 2016-03-08 2017-09-14 Flipboard, Inc. Auto video preview within a digital magazine
CN107194419A (en) * 2017-05-10 2017-09-22 百度在线网络技术(北京)有限公司 Video classification methods and device, computer equipment and computer-readable recording medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104618803A (en) * 2014-02-26 2015-05-13 腾讯科技(深圳)有限公司 Information push method, information push device, terminal and server
CN107077595A (en) * 2014-09-08 2017-08-18 谷歌公司 Selection and presentation representative frame are for video preview
CN104822099A (en) * 2015-04-30 2015-08-05 努比亚技术有限公司 Video packaging method and mobile terminal
WO2017155685A1 (en) * 2016-03-08 2017-09-14 Flipboard, Inc. Auto video preview within a digital magazine
CN107027051A (en) * 2016-07-26 2017-08-08 中国科学院自动化研究所 A kind of video key frame extracting method based on linear dynamic system
CN106503693A (en) * 2016-11-28 2017-03-15 北京字节跳动科技有限公司 The offer method and device of video front cover
CN106791480A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 A kind of terminal and video skimming creation method
CN107194419A (en) * 2017-05-10 2017-09-22 百度在线网络技术(北京)有限公司 Video classification methods and device, computer equipment and computer-readable recording medium

Also Published As

Publication number Publication date
CN108377417B (en) 2019-11-26

Similar Documents

Publication Publication Date Title
Pan et al. Hierarchical recurrent neural encoder for video representation with application to captioning
Liu et al. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment
US10417557B2 (en) Generating natural language descriptions of images
CN104245067B (en) Book object for augmented reality
Chen et al. Multi-modal dimensional emotion recognition using recurrent neural networks
Gao et al. Hierarchical LSTMs with adaptive attention for visual captioning
Fothergill et al. Instructing people for training gestural interactive systems
US9607436B2 (en) Generating augmented reality exemplars
KR101832693B1 (en) Intuitive computing methods and systems
JP6711500B2 (en) Voiceprint identification method and apparatus
CN102207954B (en) Electronic equipment, content recommendation method and program thereof
CN107636684A (en) Emotion identification in video conference
CN104090967B (en) Application program recommends method and recommendation apparatus
US8924327B2 (en) Method and apparatus for providing rapport management
US20170357720A1 (en) Joint heterogeneous language-vision embeddings for video tagging and search
CN106028134A (en) Detect sports video highlights for mobile computing devices
CN101911050B (en) Real-time annotator
CN107077624A (en) Track hand/body gesture
US9684852B2 (en) Systems and methods for inferring gender by fusion of multimodal content
US9749684B2 (en) Multimedia processing method and multimedia apparatus
CN103365936A (en) Video recommendation system and method thereof
CN105912560A (en) Detect sports video highlights based on voice recognition
CN106575379B (en) Improved fixed point integer implementation for neural network
CA2777742A1 (en) Dynamic exercise content
CN106663425A (en) Frame skipping with extrapolation and outputs on demand neural network for automatic speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant