CN103299319A

CN103299319A - Method and device for analysing video file

Info

Publication number: CN103299319A
Application number: CN2011800032609A
Authority: CN
Inventors: 杨杰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2011-12-28
Filing date: 2011-12-28
Publication date: 2013-09-11
Anticipated expiration: 2031-12-28
Also published as: WO2013097101A1

Abstract

Disclosed are a method and device for analysing a video file. The method comprises: acquiring audio data and video data of a video file; determining at least one mute point of the video file according to the audio data; acquiring a judgment fragment of the video file according to the video data, the judgment fragment comprising at least one of a host shot fragment, a title fragment and a subtitle fragment of the video file; and determining an event segmentation point of the video file in the at least one mute point according to the judgment fragment. By acquiring at least one mute point and a judgment fragment of the video file and determining an event segmentation point of the video file in the at least one mute point according to the judgment fragment, the method and device for analysing a video file according to the embodiments of the present invention can accurately determine an event segmentation point, thereby being able to perform event segmentation on a video file accurately and thus being able to improve the efficiency and accuracy of video cataloging.

Description

The method and apparatus for analyzing video file

Analyze the method and apparatus technical field of video file

The present invention relates to the method and apparatus that video file is analyzed in areas of information technology, more particularly to areas of information technology.Background technology

With the fast development of network technology and information technology, substantial amounts of information and information pour in our sight.Image, audio, video etc. use the expression way of multi-medium data so that the clicking rate of news has obtained significantly improving.Multi-medium data just gradually replaces word, the main carriers as news.In face of the news video of magnanimity, people have felt the pressure of " information overload ".Then, how people finds oneself information interested from the video data of magnanimity if beginning to focus on.This demand has promoted the development of the technologies such as information retrieval, personalized recommendation and data mining.

TV news program is one of main source of news video.The analysis and application of news video increasingly cause concern in the industry.The main contents of current news video analysis include：News demolition, news search, news recommendation, the discovery of potential focus incident, media event tracking and public sentiment monitoring.In the first step that news video is analyzed is news demolition, the metadata for the media event that the semantic information of the news video of generation can be obtained as demolition, in order to which follow-up news search, media event such as track at the analysis and application of process.

The news video that news demolition is directed to has concentrated the video of the media event of multiple types, such as the TV news program of news hookup and each department.The news mode that this different type, different event are combined is not suitable for the quick-searching of information.Therefore, the demand for news video being divided into multiple news video fragments by different event becomes very urgent.News video is carried out into division by different event to be commonly called as " news demolition " or " fragmentation ", i.e., using the Voice ＆ Video feature of news video, news video being split by different event.Conventional Voice ＆ Video feature includes anchor shots fragment at present（Also referred to as " mouth broadcasts frame "）, subtitle fragment, Shot change fragment and silence clip.

At present, detected by anchor shots, local-caption extraction, the method such as Jing Yin detection and shot cut detection can obtain substantial amounts of sliced time point, but these sliced time points and media event cut-point are not man-to-man relations, more specifically, these sliced time points are a kind of " over-segmentation " for media event cut-point, i.e., media event segmentation point set is a subset at previous segmentation time point.

Typically, host is in reciting news, the time phase paused when describing two media events Time than telling about same media event intermediate hold is slightly longer.Thus, the silence clip of news video is generally detected in the industry, using the intermediate point of silence clip as Jing Yin point, and the Jing Yin point is thus defined as media event cut-point.But because the Jing Yin point and event segmentation point in this method do not have positive connection, therefore this method can not obtain event segmentation point exactly.The content of the invention

The embodiments of the invention provide a kind of method and apparatus for analyzing video file, event segmentation point can be accurately determined.

On the one hand, the embodiments of the invention provide a kind of method for analyzing video file, this method includes：Obtain the voice data and video data of video file；According to the voice data, at least one Jing Yin point of the video file is determined；The judgement fragment of the video file is obtained according to the video data, the judgement fragment includes at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment；According to the judgement fragment, the event segmentation point of the video file is determined at least one Jing Yin point at this.

On the other hand, the embodiments of the invention provide a kind of device for analyzing video file, the device includes：First acquisition module, voice data and video data for obtaining video file；First determining module, for the voice data obtained according to first acquisition module, determines at least one Jing Yin point of the video file；Second acquisition module, for the video data obtained according to first acquisition module, obtains the judgement fragment of the video file, and the judgement fragment includes at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment；Second determining module, for according to this second obtain ear not block obtain the judgement fragment, first determining module determine at least one Jing Yin point in, determine the event segmentation point of the video file.

Based on above-mentioned technical proposal, the method and apparatus of the analysis video file of the embodiment of the present invention, by obtaining at least one Jing Yin point of video file and judging fragment, and determine the event segmentation point of the video file at least one Jing Yin point at this according to the judgement fragment, event segmentation point can be accurately determined, so as to which event segmentation is carried out to video file exactly, and thus, it is possible to improve the efficiency and accuracy of video cataloguing.Brief description of the drawings

Technical scheme in order to illustrate the embodiments of the present invention more clearly, the required accompanying drawing used in the embodiment of the present invention will be briefly described below, apparently, drawings described below is only some embodiments of the present invention, for those of ordinary skill in the art, creative work is not being paid Under the premise of, other accompanying drawings can also be obtained according to these accompanying drawings.

Fig. 1 is the indicative flowchart of the method for analysis video file according to embodiments of the present invention.Fig. 2 is another indicative flowchart of the method for analysis video file according to embodiments of the present invention.Fig. 3 is the indicative flowchart of the method for the Jing Yin point of determination according to embodiments of the present invention.

Fig. 4 is another indicative flowchart of the method for the Jing Yin point of determination according to embodiments of the present invention.Fig. 5 is another indicative flowchart of the method for the Jing Yin point of determination according to embodiments of the present invention.Fig. 6 is another indicative flowchart of the method for analysis video file according to embodiments of the present invention.Fig. 7 is the schematic block diagram of the device of analysis video file according to embodiments of the present invention.

Fig. 8 is another schematic block diagram of the device of analysis video file according to embodiments of the present invention.Fig. 9 is the schematic block diagram of the first determining module according to embodiments of the present invention.

Figure 10 is another schematic block diagram of the first determining module according to embodiments of the present invention.

Figure 11 is the schematic block diagram of the 5th determining unit according to embodiments of the present invention.Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained on the premise of creative work is not made should all belong to the scope of protection of the invention.Figure.As shown in figure 1, this method 100 includes：

S110, obtains the voice data and video data of video file；

S120, according to the voice data, determines at least one Jing Yin point of the video file；

S130, the judgement fragment of the video file is obtained according to the video data, and the judgement fragment includes at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment；S140, according to the judgement fragment, the event segmentation point of the video file is determined at this at least one Jing Yin point.

The device for analyzing video file can be according to the voice data and video data of video file, obtain at least one Jing Yin point of the video file and judge fragment, the judgement fragment can include at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment, thus, the device can determine the event segmentation point of the video file at this according to the judgement fragment at least one Jing Yin point. Therefore, the method of the analysis video file of the embodiment of the present invention, by obtaining at least one Jing Yin point of video file and judging fragment, and determine the event segmentation point of the video file at least one Jing Yin point at this according to the judgement fragment, event segmentation point can be accurately determined, so as to which event segmentation is carried out to video file exactly, and thus, it is possible to improve the efficiency and accuracy of video cataloguing.

In S110, for the video file for needing to analyze, the voice data and video data of the video file can be obtained by carrying out audio frequency and video separation to the video file.For example, it is possible to use video file is carried out audio frequency and video separation by Video Decoder.

It should be understood that in embodiments of the present invention, video file can include various types of video files, such as news video, entertainment video, science and education video.In order to preferably show the present invention, in the following description, it will be illustrated so that video file includes news video as an example, but the embodiment of the present invention is not limited to this.

In S120, according to the voice data, it may be determined that at least one Jing Yin point that the video file includes.For example, can pair corresponding with an anchor shots fragment voice data carry out Jing Yin analysis, detect the Jing Yin point for wherein potentially including event segmentation point；Can also pair corresponding with a non-legible fragment voice data carry out Jing Yin analysis, it is determined that wherein potentially including the Jing Yin point of event segmentation point.It is of course also possible to which other voice datas are carried out with Jing Yin analysis and Jing Yin point is determined.

In S130, the judgement fragment of the video file can be obtained according to the video data, the wherein judgement fragment can include at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment.It should be understood that this judge fragment can also include it is other be used for determine Jing Yin point whether be event segmentation point fragment, such as non-host's camera lens fragment, Shot change fragment.

In embodiments of the present invention, alternatively, based on face identification method, the anchor shots fragment of the video file is obtained in the video data.For example, the face information for the host that can be related to video file is registered to database, and anchor shots fragment is obtained from video data using face recognition technology.It should be understood that in embodiments of the present invention, the fragment that video data includes anchor shots is referred to as anchor shots fragment, the fragment including anchor shots can not be referred to as non-host's camera lens fragment or camera site fragment.

Therefore, the embodiment of the present invention detects anchor shots fragment using the method for host's recognition of face, with using compared with presiding over population and broadcasting the method for frame template or other structures information extraction anchor shots fragment, with higher universality and accuracy rate, and verification and measurement ratio is higher.

In embodiments of the present invention, it is possible to use video data is divided into word fragment and non-legible fragment by character recognition technology, wherein word fragment can include title clips and subtitle fragment.For example, can from regarding Frequency extracting data text information, and the fragment of same text content is summarized as to using characters matching algorithm not having the fragment of word to be referred to as non-legible fragment in word fragment, video data.By analyzing word fragment, it may be determined that title clips and subtitle fragment that word fragment includes.For example, the word detected in word fragment is clustered, mainly text is gathered for two classes using color and both low layer pictures features of size during cluster, it is title according to the larger class of font, the less class of font is the captions of speaker, it may be determined that title clips and subtitle fragment.Wherein, title segment may be used to determine event segmentation point, and subtitle fragment can be used for removing the Jing Yin point being related in over-segmentation.

Therefore, the embodiment of the present invention to text information by clustering, text information is divided into title and the class of speaker's captions two, it can exclude in automated cataloging system because the event that causes of speaker's captions is by undue situation, so as to improve the accuracy that event segmentation is carried out to video file, and the efficiency and accuracy of video cataloguing can be improved.

In S140, at least one of anchor shots fragment, title clips and the subtitle fragment of video file that the judgement fragment includes fragment can be considered, the event segmentation point of the video file is determined at least one Jing Yin point at this.

In embodiments of the present invention, the device of analysis video file according to pre-defined rule, can also determine the event segmentation point of video file, thereby determine that the beginning and end point of event.For example, the pre-defined rule is：(1) the Jing Yin point determined according to anchor shots fragment is the end point of previous event and the starting point of next event；（2) the subsequent title clips of anchor shots fragment belong to same event with the anchor shots fragment；（3) the Jing Yin point of title clips both sides belongs to same event with the title clips；(4) there is no Jing Yin point in anchor shots fragment, the anchor shots fragment is the starting point of a story, the former frame of the anchor shots fragment is the end point of a upper story.

It should be understood that, the device of analysis video file can also consider at least one of anchor shots fragment, title clips and the subtitle fragment of video file that the judgement fragment includes fragment, and at this, exclusion can not possibly be the Jing Yin point of event segmentation point at least one Jing Yin point.

For example, the device of analysis video file can exclude all cut-points in title clips outside event segmentation point according to the title clips that fragment includes are judged；For example, it is also possible to which the Jing Yin point in the middle of anchor shots fragment and subsequent camera site fragment is excluded outside event segmentation point；For example, it is also possible to the Jing Yin point between the similar adjacent title clips of title content is excluded outside event segmentation point, the Jing Yin point included with the silence clip that camera lens does not switch is excluded outside event segmentation point etc..

It should be understood that the embodiment of the present invention is only illustrated by taking above-mentioned pre-defined rule and example as an example, but the embodiment of the present invention is not limited to this, and the device of analysis video file is also based on other factors, determines video The event segmentation point of file.

Therefore, the method of the analysis video file of the embodiment of the present invention, by obtaining at least one Jing Yin point of video file and judging fragment, and determine the event segmentation point of the video file at least one Jing Yin point at this according to the judgement fragment, event segmentation point can be accurately determined, so as to which event segmentation is carried out to video file exactly, and thus, it is possible to improve the efficiency and accuracy of video cataloguing.

In embodiments of the present invention, alternatively, as shown in Fig. 2 this method 100 of analysis video file also includes：

S150, according to the event segmentation point, determines the event segments that the video file includes；

S160, event information corresponding with the event segments is obtained according to the video file, and the event information includes at least one of host's information, accessed people's information, heading message and caption information；

S 170, the event information is defined as the metadata of the event segments.

In S150, the starting cut-point that can include event segmentation point and the video file terminated between cut-point are defined as event segments.

In S160, anchor shots fragment is detected by face recognition technology, host's face information corresponding with the event segments can be obtained, host's information is such as including Host name, host's face information.Similarly, non-host's camera lens fragment or camera site fragment are analyzed by face recognition technology, can obtained by the face information or feature of interviewer.In addition, text information can be extracted from video data by character recognition technology, and by being further analyzed to text information, heading message and caption information can be obtained.

In S170, at least one of host's information corresponding with event segments, accessed people's information, heading message and caption information can be defined as the metadata of the event segments.

On the other hand, the method of the analysis video file of the embodiment of the present invention, by using the semantic informations such as host's information, accessed people's information, heading message, caption information as event segments metadata, it is easy to the video frequency searching then carried out, recommendation and story tracking etc. to apply, it can avoid utilizing audio frequency and video low layer semantic information, it is impossible to the problem of providing the metadata of abundance for subsequent video analysis.

In embodiments of the present invention, can pair corresponding with an anchor shots fragment voice data carry out Jing Yin analysis, detect the Jing Yin point for wherein potentially including event segmentation point；Can also be pair corresponding to non-legible fragment Voice data carry out Jing Yin analysis, it is determined that wherein potentially including the Jing Yin point of event segmentation point.Below in conjunction with Fig. 3 to Fig. 5, above-mentioned two situations are described respectively.

Fig. 3 shows the indicative flowchart of the method 200 of the Jing Yin point of determination according to embodiments of the present invention.As shown in figure 3, this method 200 includes：

S210, according to the voice data, obtains non-legible clip audio data corresponding with the non-legible fragment that the video data includes；

S220, the first silence clip is determined in the non-legible clip audio data；

S230, in video data corresponding with first silence clip, determines Shot change point；S240, the Shot change point is defined as the Jing Yin point of first silence clip.

Shot cut detection is carried out by carrying out silence clip detection to the corresponding voice data of non-legible fragment, and to the corresponding video data of silence clip, the Shot change occurred in silence clip point can be defined as to Jing Yin point.When being played in view of event, had between adjacent events one section it is Jing Yin, therefore, the Jing Yin point obtained herein include event segmentation point.

Specifically, in embodiments of the present invention, the object of Jing Yin detection process is the corresponding voice data of non-word fragment, and the embodiment of the present invention detects that the shot segmentation with video is combined by Jing Yin.When detecting Jing Yin, start the shot cut detection of respective segments video data, when the length of continuous mute frame is spent more than the Minimum Static duration of a sound being previously set, the shot segmentation point detected is Jing Yin point.Otherwise the silence clip is ignored.

Therefore, Jing Yin detection and shot segmentation are used in combination the embodiment of the present invention, can not only provide accurate event segmentation point, but also can reduce the amount of calculation that shot segmentation detection is carried out to non-mute fragment.

Fig. 4 shows the indicative flowchart of the another method 300 of the Jing Yin point of determination according to embodiments of the present invention.As shown in figure 4, this method 300 includes：

S310, according to the voice data, obtains anchor shots clip audio data corresponding with the anchor shots fragment；

S320, the second silence clip is determined in the anchor shots clip audio data；

S330, determines the 3rd silence clip in second silence clip, the length for the silence clip that the 3rd silence clip includes, longer than the length of the silence clip in second silence clip in addition to the 3rd silence clip；

S340, the Jing Yin point is defined as by the midpoint of each silence clip in the 3rd silence clip.

Alternatively, as shown in Fig. 5, the method that the 3rd silence clip is determined in second silence clip 330, including：

5331, determine the average value of the length for all silence clips that second silence clip includes；

5332, the silence clip that length in second silence clip is more than or equal to the average value is defined as the 3rd silence clip.

It should be understood that the embodiment of the present invention can also determine the 3rd silence clip using other methods or based on other criterions, for example, by length in the second silence clip before most long 30% silence clip be defined as the 3rd silence clip.

For example, in the case where anchor shots fragment includes multiple events, an event in the generally first brief summary of host then begins to next event.The embodiment of the present invention carries out silence clip detection by a pair audio fragment corresponding with anchor shots fragment, and fragment length is defined as silence clip more than threshold value, and record the length of each silence clip, the midpoint that fragment length is far longer than to the silence clip of average is defined as Jing Yin point, manual labor amount can greatly be reduced, the mistake point situation that occurs when improving the automaticity of video demolition, and avoiding including multiple stories in anchor shots fragment.

Below in conjunction with Fig. 6, by taking news video as an example, the method 400 to analysis video file according to embodiments of the present invention is described in detail.

As shown in fig. 6, this method 400 includes：

S410, audio frequency and video separation is carried out to news video, obtains voice data and video data；S420, video data is divided into anchor shots fragment and camera site fragment, then other personages being related in outgoing event are extracted using face recognition technology from camera site fragment, and will can also be detected by the face information of interviewer and face characteristic write into Databasce；

S430, text information is extracted using character recognition technology from video data, and the fragment of same text content is summarized as into word fragment using characters matching algorithm, and the fragment without word is non-legible fragment.And by being clustered to the word that word fragment is detected, mainly text is gathered for two classes using color and both low layer pictures features of size during cluster, the larger class of font is title, a less class is the captions of speaker, so that it is determined that title clips and subtitle fragment, each of which class calculates the Gauss model of font color, and the news video in identical source calculates a model；

S440, a pair voice data corresponding with non-legible fragment carries out silence clip detection, to silence clip The Shot change point that corresponding video data occur in shot cut detection, silence clip is Jing Yin point；

S450, a pair voice data corresponding with anchor shots fragment carries out silence clip detection, and the length to silence clip carries out clustering, determines the longer silence clip of length, and regard the midpoint of the silence clip as Jing Yin point；

S460, using Jing Yin point as the superset of media event cut-point, considers anchor shots fragment, title clips, subtitle fragment near Jing Yin point etc., draws media event cut-point；

S470, using detect obtain host's information, by the contents such as interviewer's information, text information as the media event semantic information, it is possible to be stored as metadata in database.

It should be understood that, in various embodiments of the present invention, the size of the sequence number of above-mentioned each process is not meant to the priority of execution sequence, and the execution sequence of each process should be determined with its function and internal logic, and any limit is constituted without tackling the implementation process of the embodiment of the present invention.

Above in conjunction with Fig. 1 to Fig. 6, the method that analysis video file according to embodiments of the present invention is described in detail, below in conjunction with Fig. 7 to Figure 11, the device of description analysis video file according to embodiments of the present invention.

Fig. 7 shows the schematic block diagram of the device 500 of analysis video file according to embodiments of the present invention.As shown in fig. 7, the device 500 includes：

First acquisition module 510, voice data and video data for obtaining video file；First determining module 520, for the voice data obtained according to first acquisition module 510, determines at least one Jing Yin point of the video file；

Second acquisition module 530, for the video data obtained according to first acquisition module 510, the judgement fragment of the video file is obtained, the judgement fragment includes at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment；

Second determining module 540, for the judgement fragment obtained according to second acquisition module 530, at least one Jing Yin point that first determining module 520 is determined, determines the event segmentation point of the video file.

Therefore, the device of the analysis video file of the embodiment of the present invention, by obtaining video file at least One Jing Yin point and judge fragment, and determine the event segmentation point of the video file at least one Jing Yin point at this according to the judgement fragment, event segmentation point can be accurately determined, so as to which event segmentation is carried out to video file exactly, and thus, it is possible to improve the efficiency and accuracy of video cataloguing.

In embodiments of the present invention, alternatively, as shown in figure 8, the device 500 also includes：3rd determining module 550, for the event segmentation point determined according to second determining module 540, determines the event segments that the video file includes；

3rd acquisition module 560, for obtaining event information corresponding with the event segments according to the video file, the event information includes at least one of host's information, accessed people's information, heading message and caption information；

4th determining module 570, for the event information for obtaining the 3rd acquisition module 560, is defined as the metadata of the event segments of the 3rd determining module 550 determination.

Alternatively, as shown in figure 9, first determining module 520 includes：

First acquisition unit 521, for the voice data obtained according to first acquisition module 510, obtains non-legible clip audio data corresponding with the non-legible fragment that the video data includes；

First determining unit 522, in the non-legible clip audio data that the first acquisition unit 521 is obtained, determining the first silence clip；

Second determining unit 523, in video data corresponding with first silence clip that first determining unit 522 is determined, determining Shot change point；

3rd determining unit 524, for the Shot change point for determining second determining unit 523, is defined as the Jing Yin point of first silence clip.

Alternatively, as shown in Figure 10, first determining module 520 includes：

Second acquisition unit 525, for according to the voice data, obtaining anchor shots clip audio data corresponding with the anchor shots fragment；

4th determining unit 526, for determining the second silence clip in the anchor shots clip audio data that the second acquisition unit 525 is obtained；

5th determining unit 527, for in second silence clip that the 4th determining unit 526 is determined, determine the 3rd silence clip, it is the length for the silence clip that the 3rd silence clip includes, longer than the length of the silence clip in second silence clip in addition to the 3rd silence clip；

6th determining unit 528, the midpoint for each silence clip in the 3rd silence clip that determines the 5th determining unit 527 is defined as the Jing Yin point.

Alternatively, as shown in figure 11, the 5th determining unit 527 includes： First determination subelement 5271, the average value of the length for determining all silence clips that second silence clip includes；

Second determination subelement 5272, the silence clip for length in second silence clip to be more than or equal to the average value is defined as the 3rd silence clip.

In embodiments of the present invention, alternatively, second acquisition module 530 is additionally operable to：Based on face identification method, the anchor shots fragment of the video file is obtained in the video data.

It should be understood that, the device for the analysis video file that the device 500 of analysis video file according to embodiments of the present invention may correspond in the embodiment of the present invention, and above and other operation and/or function of the modules in device 500 is respectively in order to realize the corresponding flow of each methods 100 to 400 of the Fig. 1 into Fig. 6, for sake of simplicity, will not be repeated here.

Therefore, the device of the analysis video file of the embodiment of the present invention, by obtaining at least one Jing Yin point of video file and judging fragment, and determine the event segmentation point of the video file at least one Jing Yin point at this according to the judgement fragment, event segmentation point can be accurately determined, so as to which event segmentation is carried out to video file exactly, and thus, it is possible to improve the efficiency and accuracy of video cataloguing.

Those of ordinary skill in the art can be appreciated that, the unit and algorithm steps of each example described with reference to the embodiments described herein, it can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate the interchangeability of hardware and software, the composition and step of each example are generally described according to function in the above description.These functions are performed with hardware or software mode actually, depending on the application-specific and design constraint of technical scheme.Professional and technical personnel can realize described function to each specific application using distinct methods, but this realization is it is not considered that beyond the scope of this invention.

It is apparent to those skilled in the art that, for convenience of description and succinctly, the specific work process of the system of foregoing description, device and unit may be referred to the corresponding process in preceding method embodiment, will not be repeated here.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can realize by another way.For example, device embodiment described above is only schematical, for example, the division of the unit, it is only a kind of division of logic function, there can be other dividing mode when actually realizing, such as multiple units or component can combine or be desirably integrated into another system, or some features can be ignored, or do not perform.In addition, shown or discussed coupling or direct-coupling or communication connection each other can be by the INDIRECT COUPLING of some interfaces, device or unit or communication connection or electricity, mechanical or other forms are connected. The unit illustrated as separating component can be or may not be physically separate, and the part shown as unit can be or may not be physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Some or all of unit therein can be selected to realize the purpose of scheme of the embodiment of the present invention according to the actual needs.

In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit or unit is individually physically present or two or more units are integrated in a unit.Above-mentioned integrated unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.

If the integrated unit is realized using in the form of SFU software functional unit and as independent production marketing or in use, can be stored in a computer read/write memory medium.Understood based on such, the part that technical scheme substantially contributes to prior art in other words, or all or part of the technical scheme can be embodied in the form of software product, the computer software product is stored in a storage medium, including some instructions to cause a computer equipment (can be personal computer, server, or the network equipment etc.）Perform all or part of step of each embodiment methods described of the invention.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage（ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.

It is described above; only embodiment of the invention; but protection scope of the present invention is not limited thereto; any one skilled in the art the invention discloses technical scope in; various equivalent modifications or substitutions can be readily occurred in, these modifications or substitutions should be all included within the scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims

Claim

1st, a kind of method for analyzing video file, it is characterised in that including：

Obtain the voice data and video data of video file；

According to the voice data, at least one Jing Yin point of the video file is determined；

The judgement fragment of the video file is obtained according to the video data, the judgement fragment includes at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment；According to the judgement fragment, the event segmentation point of the video file is determined at least one described Jing Yin point.

2nd, according to the method described in claim 1, it is characterised in that methods described also includes：According to the event segmentation point, the event segments that the video file includes are determined；

Event information corresponding with the event segments is obtained according to the video file, the event information includes at least one of host's information, accessed people's information, heading message and caption information；The event information is defined as to the metadata of the event segments.

3rd, method according to claim 1 or 2, it is characterised in that at least one Jing Yin point of the determination video file, including：

According to the voice data, non-legible clip audio data corresponding with the non-legible fragment that the video data includes are obtained；

The first silence clip is determined in the non-legible clip audio data；

In video data corresponding with first silence clip, Shot change point is determined；The Shot change point is defined as to the Jing Yin point of first silence clip.

4th, according to the method in any one of claims 1 to 3, it is characterised in that described at least one Jing Yin point for determining the video file, including：

According to the voice data, anchor shots clip audio data corresponding with the anchor shots fragment are obtained；

The second silence clip is determined in the anchor shots clip audio data；

The 3rd silence clip is determined in second silence clip, it is the length for the silence clip that the 3rd silence clip includes, longer than the length of the silence clip in second silence clip in addition to the 3rd silence clip；

The midpoint of each silence clip in 3rd silence clip is defined as the Jing Yin point.

5th, method according to claim 4, it is characterised in that the 3rd silence clip of determination in second silence clip, including： Determine the average value of the length for all silence clips that second silence clip includes；The silence clip that length in second silence clip is more than or equal to the average value is defined as the 3rd silence clip.

6th, method according to any one of claim 1 to 5, it is characterised in that the judgement fragment that the video file is obtained according to the video data, including：

Based on face identification method, the anchor shots fragment of the video file is obtained in the video data.

7th, a kind of device for analyzing video file, it is characterised in that including：

First acquisition module, voice data and video data for obtaining video file；

First determining module, for the voice data obtained according to first acquisition module, determines at least one Jing Yin point of the video file；

Second acquisition module, for the video data obtained according to first acquisition module, the judgement fragment of the video file is obtained, the judgement fragment includes at least one of anchor shots fragment, title clips and the subtitle fragment of video file fragment；

Second determining module, for the judgement fragment obtained according to second acquisition module, at least one Jing Yin point described in first determining module determination, determines the event segmentation point of the video file.

8th, device according to claim 7, it is characterised in that described device also includes：3rd determining module, for the event segmentation point determined according to second determining module, determines the event segments that the video file includes；

3rd acquisition module, for obtaining event information corresponding with the event segments according to the video file, the event information includes at least one of host's information, accessed people's information, heading message and caption information；

4th determining module, for the event information for obtaining the 3rd acquisition module, is defined as the metadata for the event segments that the 3rd determining module is determined.

9th, the device according to claim 7 or 8, it is characterised in that first determining module includes：

First acquisition unit, for the voice data obtained according to first acquisition module, obtains non-legible clip audio data corresponding with the non-legible fragment that the video data includes；

First determining unit, in the non-legible clip audio data that the first acquisition unit is obtained, determining the first silence clip； Second determining unit, in video data corresponding with first silence clip that first determining unit is determined, determining Shot change point；

3rd determining unit, for the Shot change point for determining second determining unit, is defined as the Jing Yin point of first silence clip.

10th, the device according to any one of claim 7 to 9, it is characterised in that first determining module includes：

Second acquisition unit, for according to the voice data, obtaining anchor shots clip audio data corresponding with the anchor shots fragment；

4th determining unit, for determining the second silence clip in the anchor shots clip audio data that the second acquisition unit is obtained；

5th determining unit, for in second silence clip that the 4th determining unit is determined, determine the 3rd silence clip, it is the length for the silence clip that the 3rd silence clip includes, longer than the length of the silence clip in second silence clip in addition to the 3rd silence clip；

6th determining unit, the midpoint for each silence clip in the 3rd silence clip that determines the 5th determining unit is defined as the Jing Yin point.

11st, device according to claim 10, it is characterised in that the 5th determining unit includes：

First determination subelement, the average value of the length for determining all silence clips that second silence clip includes；

Second determination subelement, the silence clip for length in second silence clip to be more than or equal to the average value is defined as the 3rd silence clip.

12nd, the device according to any one of claim 7 to 11, it is characterised in that second acquisition module is additionally operable to：Based on face identification method, the anchor shots fragment of the video file is obtained in the video data.