CN104768052A

CN104768052A - Method and device for extracting voice frequency and subtitles according to language

Info

Publication number: CN104768052A
Application number: CN201510155980.7A
Authority: CN
Inventors: 彭岳松
Original assignee: Wuxi Tvmining Juyuan Media Technology Co Ltd
Current assignee: Wuxi Tvmining Juyuan Media Technology Co Ltd
Priority date: 2015-04-02
Filing date: 2015-04-02
Publication date: 2015-07-08

Abstract

The invention discloses a method and a device for extracting voice frequency and subtitles according to a language, is used for extracting one channel of the voice frequency and the subtitles of the designated language from a video file with multiple channels of the voice frequency and the subtitles, and the aim that a multi-thread video is converted into a single-thread video is achieved. The method comprises the following steps that decapsulation is carried out on the video file with the multiple channels of the voice frequency and the subtitles, and video data and the mixed stored multiple channels of the voice frequency and multiple channels of the subtitles are obtained; according to a format of the video file with the multiple channels of the voice frequency and the subtitles, language information of each channel of the voice frequency and each channel of the subtitles are obtained from information of the video file with the multiple channels of the voice frequency and the subtitles; according to the language information of each channel of the voice frequency and each channel of the subtitles, the voice frequency data and subtitle data of the designated language are extracted from the mixed stored multiple channels of the voice frequency and the multiple channels of the subtitle data; the voice frequency data and the subtitle data of the designated language are combined with the video data. According to the method, the aim that the multi-thread video is converted into the single-thread video is achieved.

Description

A kind of method and device extracting audio frequency and captions according to language

Technical field

The present invention relates to multimedia technology field, particularly relate to a kind of method and the device that extract audio frequency and captions according to language.

Background technology

Along with developing rapidly of Internet video, recompile and encapsulation after usually needing to decode to existing video file, to obtain the video file of the form that can be play for local player or current video website.At present, video file format mainly contains FLV, DV, MP4, MKV, MOV, TS, 3GP etc., and wherein, in these forms such as FLV, DV, MP4, Audio and Video is two streams, and each be single stream.And Audio and Video can be all multiple stream in these forms of MKV, MOV, TS, and MKV can also have multiple caption stream.But, existing video decode instrument can only be single flow to into, single stream exports, and does not support the video format of multithread.Therefore, need a kind of scheme that the video format of multithread (especially MKV form) can be converted to the video format of single current, namely a kind of scheme that can extract audio frequency and the captions of specifying on a road from the video file with multichannel voice frequency and captions is needed, to carry out the process such as transcoding and broadcasting below to video.

Summary of the invention

The invention provides a kind of method and the device that extract audio frequency and captions according to language, for extracting road audio frequency and captions of appointed language from the video file with multichannel voice frequency and captions, realizing the object of the video by the Video Quality Metric of multithread being single current.

The invention provides a kind of method extracting audio frequency and captions according to language, comprising:

Decapsulation is carried out to the video file with multichannel voice frequency and captions, obtains multi-path audio-frequency data and the multichannel caption data of video data and mixing storage;

According to the described form with the video file of multichannel voice frequency and captions, from described, there is the language message obtaining each road audio frequency and Ge Lu captions the information header of the video file of multichannel voice frequency and captions;

According to the language message of each road audio frequency and Ge Lu captions, the multi-path audio-frequency data stored from described mixing and multichannel caption data, extract voice data and the caption data of appointed language;

The voice data of described appointed language and caption data and video data are merged.

Some beneficial effects of the embodiment of the present invention can comprise:

According to the language message with each road audio frequency and the Ge Lu captions recorded in the video file information head of multichannel voice frequency and captions, extract road voice data and a caption data of appointed language, and merge with video data, realize the object of the video by the Video Quality Metric of multithread being single current.Meanwhile, the audio frequency of the video of the single current after merging and captions can be selected for the language of spectators, drastically increase the experience sense of video spectators.

In one embodiment, the video file described in multichannel voice frequency and captions is multimedia container MKV file.

MKV is novel packaged type, can encapsulate the file of multiple format, is following trend of the times.Method provided by the invention, can be applicable to MKV file, has the advantages that applicability is strong.

In one embodiment, comprise from the described method with the language message obtaining each road audio frequency and Ge Lu captions the information header of the video file of multichannel voice frequency and captions:

Read the rail Track information header of multimedia container MKV file;

From rail Track information header, read each road audio frequency and rail entity TrackEntry corresponding to Ge Lu captions;

Read language Language field in rail entity TrackEntry;

According to language Language field, obtain the language message of each road audio frequency and Ge Lu captions.

Because MKV file has its specific form, therefore by finding its Track information header in MKV file, and therefrom read Language field information in each road audio frequency and TrackEntry corresponding to captions, the language message of each road audio frequency and captions can be identified thus fast, thus road voice data and a caption data of appointed language can be extracted easily, and merge with video data, realize the object of the video by the Video Quality Metric of multithread being single current.

In one embodiment, when the described video file with multichannel voice frequency and captions is towards Continental Area, extract voice data and the caption data of appointed language the described multi-path audio-frequency data that stores from described mixing and multichannel caption data, comprising: the multi-path audio-frequency data stored from described mixing and multichannel caption data, extract standard Chinese voice data and simplified form of Chinese Character caption data.

The method can determine described appointed language according to the required application scenarios only comprising a road audio frequency and captions or spectators, the appointment of language is flexible and changeable, as when video spectators mainly continent crowd time, standard Chinese voice data and simplified form of Chinese Character caption data can be extracted, and merge with video data, thus drastically increase the experience sense of video spectators.

Extract a device for audio frequency and captions according to language, comprising:

Video decapsulation module, for carrying out decapsulation to the video file with multichannel voice frequency and captions, obtaining multi-path audio-frequency data and the multichannel caption data of video data and mixing storage, and exporting the data obtained;

Language message acquisition module, for having the form of the video file of multichannel voice frequency and captions described in basis, having the language message that obtains each road audio frequency and Ge Lu captions the information header of the video file of multichannel voice frequency and captions from described and export;

Audio frequency caption recognition module, the each road audio frequency exported according to described language message acquisition module and the language message of Ge Lu captions, from the multi-path audio-frequency data and multichannel caption data of the mixing storage of described video decapsulation module output, extract voice data and the caption data of appointed language, and export;

Synthesis module, receives voice data and the caption data of the appointed language that described audio frequency caption recognition module exports, and is merged by the video data that itself and described video decapsulation module export.

In one embodiment, the video file described in multichannel voice frequency and captions is multimedia container MKV file; Described language message acquisition module comprises:

Information header reading unit, for reading the rail Track information header of present multimedia container MKV file, and from rail Track information header, reads each road audio frequency and rail entity TrackEntry corresponding to Ge Lu captions;

Language message acquiring unit, read the language Language field in each road audio frequency and rail entity TrackEntry corresponding to Ge Lu captions that described information header reading unit obtains, and obtain the language message of each road audio frequency and Ge Lu captions according to language Language field and export.

Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from specification, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in write specification, claims and accompanying drawing and obtain.

Below by drawings and Examples, technical scheme of the present invention is described in further detail.

Accompanying drawing explanation

Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for specification, together with embodiments of the present invention for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:

Fig. 1 is a kind of flow chart extracting the method for audio frequency and captions according to language in the embodiment of the present invention;

Fig. 2 is the form schematic diagram of MKV file;

Fig. 3 is the flow chart of the method for the language message obtaining each road audio frequency and Ge Lu captions;

Fig. 4 is a kind of structural representation extracting the device of audio frequency and captions according to language in the embodiment of the present invention;

Fig. 5 is language message acquisition module structural representation.

Embodiment

Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.

Fig. 1 is a kind of flow chart extracting the method for audio frequency and captions according to language in the embodiment of the present invention, and as shown in fig. 1, the method comprises the following steps:

Step S101: carry out decapsulation to the video file with multichannel voice frequency and captions, obtains multi-path audio-frequency data and the multichannel caption data of video data and mixing storage;

Step S102: according to the form of video file with multichannel voice frequency and captions, from have multichannel voice frequency and captions video file information header obtain the language message of each road audio frequency and Ge Lu captions;

Step S103: according to the language message of each road audio frequency and Ge Lu captions, extracts voice data and the caption data of appointed language from mixing the multi-path audio-frequency data stored and multichannel caption data;

Step S104: the voice data of appointed language and caption data and video data are merged.

The technical scheme that the embodiment of the present invention provides, according to the language message with each road audio frequency and the Ge Lu captions recorded in the video file information head of multichannel voice frequency and captions, extract road voice data and a caption data of appointed language, and merge with video data, realize the object of the video by the Video Quality Metric of multithread being single current.Meanwhile, the audio frequency of the video of the single current after merging and captions can be selected for the language of spectators, drastically increase the experience sense of video spectators.

In one embodiment, the video file with multichannel voice frequency and captions is multimedia container MKV file.

MKV file is novel packaged type, can encapsulate the file of multiple format, be following trend of the times, if Fig. 2 is the form schematic diagram of MKV file, MKV file entirety comprises EBML Header and Segment, and wherein EBML Header contains the relevant information such as version, Doctype of file; Segment saves the real data of the Audio and Video of media file, comprises some daughter elements such as Track, Clusters.

In one embodiment, in above-mentioned steps S102 from have multichannel voice frequency and captions video file information header obtain the method for the language message of each road audio frequency and Ge Lu captions, as shown in Figure 3, comprise the following steps:

Step S301: the rail Track information header reading multimedia container MKV file; The rail Track information header of MKV file contains the essential information of audio frequency and video, as audio/video decoder type, video resolution, audio sample rate etc.By the parsing to Track part, the essential information of audio frequency and video just can be obtained.

Step S302: from rail Track information header, reads each road audio frequency and rail entity TrackEntry corresponding to Ge Lu captions; Each TrackEntry represents 1 orbit information.TrackNumber wherein in TrackEntry illustrates this TrackEntry and describes orbit number; TrackType illustrates the type of track, can be audio frequency, video, captions etc.

Step S303: read language Language field in rail entity TrackEntry; Language in TrackEntry, for representing the language message of respective carter, language is 3 codes, and code derives from ISO-639-2 and states.

Step S304: according to language Language field, obtain the language message of each road audio frequency and Ge Lu captions.

The technical scheme that the embodiment of the present invention provides, because MKV file has its specific form, therefore by finding its rail Track information header in MKV file, and therefrom read Language field information in each road audio frequency and rail entity TrackEntry corresponding to captions, the language message of each road audio frequency and captions can be identified thus fast, thus road voice data and a caption data of appointed language can be extracted easily, and merge with video data, realize the object of the video by the Video Quality Metric of multithread being single current.

In one embodiment, when the video file with multichannel voice frequency and captions is towards Continental Area, from the multi-path audio-frequency data of mixing storage and multichannel caption data, extract voice data and the caption data of appointed language in step S103, specifically can be embodied as: the multi-path audio-frequency data stored from mixing and multichannel caption data, extract standard Chinese voice data and simplified form of Chinese Character caption data.

The technical scheme that the embodiment of the present invention provides, appointed language can be determined according to the required application scenarios only comprising a road audio frequency and captions or spectators, the appointment of language is flexible and changeable, as when video spectators mainly continent crowd time, standard Chinese voice data and simplified form of Chinese Character caption data can be extracted, and merge with video data, thus drastically increase the experience sense of video spectators.

Corresponding to a kind of method extracting audio frequency and captions according to language that above-described embodiment provides, the embodiment of the present invention also provides a kind of and extracts the device of audio frequency and captions as shown in Figure 4 according to language, comprising:

Video decapsulation module 41, for carrying out decapsulation to the video file with multichannel voice frequency and captions, obtaining multi-path audio-frequency data and the multichannel caption data of video data and mixing storage, and exporting the data obtained;

Language message acquisition module 42, for according to the form of video file with multichannel voice frequency and captions, from have multichannel voice frequency and captions video file information header obtain the language message of each road audio frequency and Ge Lu captions and export;

Audio frequency caption recognition module 43, the each road audio frequency exported according to language message acquisition module 42 and the language message of Ge Lu captions, from the multi-path audio-frequency data and multichannel caption data of the mixing storage of video decapsulation module 41 output, extract voice data and the caption data of appointed language, and export;

Synthesis module 44, the voice data of the appointed language that audio reception caption recognition module 43 exports and caption data, and the video data that itself and video decapsulation module 41 export is merged.

In one embodiment, the video file with multichannel voice frequency and captions is multimedia container MKV file; Now, as shown in Figure 5, language message acquisition module 42 comprises:

Information header reading unit 51, for reading the rail Track information header of present multimedia container MKV file, and from rail Track information header, reads each road audio frequency and rail entity TrackEntry corresponding to Ge Lu captions;

Language message acquiring unit 52, read the language Language field in each road audio frequency that information header reading unit 51 obtains and rail entity TrackEntry corresponding to Ge Lu captions, and obtain the language message of each road audio frequency and Ge Lu captions according to language Language field and export.

A kind of device extracting audio frequency and captions according to language that the embodiment of the present invention provides, according to the language message with each road audio frequency and the Ge Lu captions recorded in the video file information head of multichannel voice frequency and captions, extract road voice data and a caption data of appointed language, and merge with video data, realize the object of the video by the Video Quality Metric of multithread being single current.Meanwhile, the audio frequency of the video of the single current after merging and captions can be selected for the language of spectators, drastically increase the experience sense of video spectators.

Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disc store and optical memory etc.) of computer usable program code.

The present invention describes with reference to according to the flow chart of the method for the embodiment of the present invention, equipment (system) and computer program and/or block diagram.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can being provided to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computer or other programmable data processing device produce device for realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.

These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.

These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices is provided for the step realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.

Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims

1. extract a method for audio frequency and captions according to language, it is characterized in that, comprising:

2. a kind of method extracting audio frequency and captions according to language as claimed in claim 1, is characterized in that, described in there are multichannel voice frequency and captions video file be multimedia container MKV file.

3. a kind of method extracting audio frequency and captions according to language as claimed in claim 2, is characterized in that, comprise from the described method with the language message obtaining each road audio frequency and Ge Lu captions the information header of the video file of multichannel voice frequency and captions:

Read the rail Track information header of multimedia container MKV file;

Read language Language field in rail entity TrackEntry;

4. a kind of method extracting audio frequency and captions according to language as described in any one of claim 1-3, it is characterized in that, when the described video file with multichannel voice frequency and captions is towards Continental Area, extract voice data and the caption data of appointed language the described multi-path audio-frequency data that stores from described mixing and multichannel caption data, comprising: the multi-path audio-frequency data stored from described mixing and multichannel caption data, extract standard Chinese voice data and simplified form of Chinese Character caption data.

5. extract a device for audio frequency and captions according to language, it is characterized in that, comprising:

6. a kind of device extracting audio frequency and captions according to language as claimed in claim 5, is characterized in that, described in there are multichannel voice frequency and captions video file be multimedia container MKV file; Described language message acquisition module comprises: