CN102710983A

CN102710983A - Method for extracting audio and video data from multimedia

Info

Publication number: CN102710983A
Application number: CN2012101102817A
Authority: CN
Inventors: 陈刚
Original assignee: HANGZHOU NO IMAGE TECHNOLOGY Co Ltd
Current assignee: Hebei Yitan Culture Communication Co ltd
Priority date: 2012-04-16
Filing date: 2012-04-16
Publication date: 2012-10-03
Anticipated expiration: 2032-04-16
Also published as: CN102710983B

Abstract

The invention discloses a method for extracting audio and video data from multimedia. The method mainly includes: firstly, setting up a complete media rendering chain in a DirectShow manner; secondly, disconnecting a renderer at the tail end; thirdly, inserting a screener before the renderer at the tail end and recovering connection; fourthly, extracting audio and video data through the inserted screener; fifthly, subjecting the extracted data to distribution and processing; and sixthly, delivering the processed data to an application layer. Therefore, audio and video data can be obtained in the playing process. The method for extracting audio and video data from multimedia is supportive to extensive media formats and applicable to the field of audio and video application.

Description

A kind of method that from multimedia, extracts audio frequency and video

Technical field

The present invention relates to the audio frequency and video process field, especially relate to a kind of method that from multimedia, extracts audio frequency and video.

Background technology

In the multimedia project; Often need adopt the audio frequency of specific format, video data to transmit, and Data Source differs, have plenty of the media file (rmvb or the like) of various forms; The Media Stream (mms etc.) that has plenty of network; If to every kind of concrete source of difference, all carry out link processing, then very loaded down with trivial details and do not have a versatility.

Source of media mainly contains two kinds of file type and network data stream types; We have AVI, RMVB, MOV, FLV, MP4 or the like at common file type; Each file type corresponding different media containers; It organizes the audio, video data of various different codings, and encapsulates according to specific format.Common network data stream type has MMS and RTSP of Microsoft etc., and it also has specific data format.

In the face of various medium, want to extract the audio, video data that it comprises, the simplest mode is exactly directly to read.According to the different vessels form, write corresponding program, file is read, the parsing of the row binary of going forward side by side, work such as separation are exported audio, video data at last.

In order to simplify certain operations, there are some kits on the boundary that increases income, has simplified operation; Such as FFMPEG, powerful, can change the file and the Media Stream of different-format; Its inside has comprised the realization of unprocessed form file read-write, makes the developer from hard work, break away from.But it lays particular emphasis on file conversion, and intelligent inadequately, needs manual work to discern file type, and needs to specify decoder and complicated parameter just can handle, and that is to say, also needs every kind of medium type all use different parameters.

Microsoft has proposed a DerectShow framework; Each medium (file and network flow) all are abstracted into a data supply filter (Filter); General character between them is extracted, mask the difference of bottom, use the mode of link to make up a filter chart (Filter Graph); Set up completion when chart, can play.Which kind of form need not to know specifically is, also need not the parameter that specified file is opened, and just can mate automatically.The media player of Microsoft is exactly the successful case that on this technology, realizes.

Setting up chart has two kinds of methods, and a kind of is manual foundation, need set up the filter of each grade, and manual the connection, this operates more complicated, does not have versatility simultaneously, has changed a machine and possibly just connect failure.Also have the intelligent link technology in addition, each medium can both match best link automatically, but the accurate process of control connection.

For the application of player, the method for intelligent link is enough perfect, because it has made up whole filters that file reads, separates, decodes, plays automatically.But for the application of extracting media data; This has just been realized not; Traditional method is to adopt the mode of manual construction, and like the media link technology of MPC, a kind of mode comes to this; Every kind of media formats is all done the configuration of one or more best link, and benefit is can the fast construction link, accurate controls playing effect.But too complicated, the link that storm wind oneself is announced just has the hundreds of bar, and the thing that these links are difficulties will be safeguarded, managed to common application.

Can find out that from top elaboration the mode of all media formats of manual process is the most original, workload is huge, and is inadvisable really.Use FFMPEG to change, parameter is provided with trouble, every kind of form all otherwise with parameter specify, so in the application scenario of extracting media data, have big limitation.Use DirectShow manual construction link, workload is also bigger, simultaneously every kind of form all otherwise with configuration make up.Use the mode of intelligent link because what stress is full-automatic, so the hand-guided aspect very a little less than, can only played file, can not extracted data.

Summary of the invention

The present invention mainly is that solution existing in prior technology manual construction link extraction audio, video data workload is big, speed slow, versatility is poor; Be difficult to obtain the technical problem of the data of needs, a kind of method that from multimedia, extracts audio frequency and video that can make up link automatically and have better versatility is provided.

The present invention is directed to above-mentioned technical problem mainly is able to solve through following technical proposals: a kind of method that from multimedia, extracts audio frequency and video may further comprise the steps:

Step 1, the media hype link that structure is complete;

Step 2, through the mode of search link, find the renderer of least significant end, find corresponding pin then, call the UnConnent mode, renderer is manually broken off;

Step 3, self-defining several filters are inserted the gap of links, call the Connent mode renderer that breaks off is connected again;

Step 4, filter extraction audio, video data through inserting;

Step 5, the data that extract are shunted, become and do multichannel output,, adopt the different coding parameter to handle then to the data on each road;

Step 6, the data after will handling are given application layer.

DirectShow is the multimedia framework storehouse of Microsoft, and numerous functional based methods is provided, and the Windows system all supports.

Filter chart Filter Graph is the assembly in the Microsoft DirectShow storehouse, is used for managing and filtering device assembly, wherein comprises interfaces such as Render, Run, is used to operate filter.

Filter F ilter is the assembly in the Microsoft DirectShow storehouse, it is put in the management through figures device, and connects correctly the competence exertion effect.

The pin Pin of filter is the attachment component of filter, is used for the manual attended operation of filter, and interfaces such as Connent, UnConnent are provided.

As preferably, in the step 1, make up the media hype link and be specially: use DirectShow intelligent link mode, call the Render method of chart manager, obtain complete link chart.

As preferably, in the step 3, the filter of insertion comprises screening washer.

This programme adopts the DerectShow media framework, adopts intelligence to play up, and dynamically insertion technology realizes, can be to various media files, MMS network flow self adaptation.

The management through figures device (Graph) of DerectShow provides Render method, can dispose automatically file, network flow automatically, and (Filter) is cascaded with each filter, constitutes complete broadcast link.

In the time of manual construction, can travel through the pin (Pin) of filter earlier, use then pin Connect, UnConnect method, different filters is linked together, this is manual method.What note least significant end is renderer, and its function is to call the physical layer interface of the video card of sound card, plays the voice data of final RGB, yuv video or PCM form.

The advantage of various schemes in the comprehensive background technology of the present invention realizes using automatic matching mode, need not designated parameter, just can extract audio, video data.

Advantage of the present invention is to support media formats widely, and has overcome the problem of the complexity of existing scheme, the mode that can cope with shifting events by sticking to a fundamental principle, the simple and effective extraction of carrying out data.New form also can be good at supporting to have expandability.The thinking of this scheme can also be generalized to other application scenario simultaneously, beautifies special efficacy or the like such as film sectional drawing, virtual video, virtual audio, video.

Description of drawings

Fig. 1 is a kind of flow chart of the present invention;

Fig. 2 is that chain graph is used in the broadcast that a kind of automatic structure of the present invention obtains;

Fig. 3 is the chain graph after Fig. 2 inserts screening washer.

Among the figure: 1, supply filter, 2, fore filter, 3, the audio frequency and video separator, 4, audio decoder; 5, voice band filter, 6, the audio frequency renderer, 7, Video Decoder, 8, audio filters; 9, video renderer, 10, the audio frequency screening washer, 11, the video screening washer.

Embodiment

Pass through embodiment below, and combine accompanying drawing, do further bright specifically technical scheme of the present invention.

Embodiment: a kind of method that from multimedia, extracts audio, video data of present embodiment, as shown in Figure 1, specific as follows:

1. make up complete media hype link: as long as this document ability normal play; Just can use the mode of DerectShow intelligent link; Call the Render method of chart manager; Directly obtain complete link chart (see figure 2); Multi-medium data process supply filter 1, fore filter 2 backs successively is separated into audio signal and vision signal by audio frequency and video separator 3, and 5 filtrations then get into audio frequency renderer 6 and play up broadcast through voice band filter after 4 decodings of audio signal entering audio decoder, and 8 filtrations then get into video renderer 9 and play up broadcast through audio filters after 7 decodings of vision signal entering Video Decoder; Fore filter 2, voice band filter 5 and audio filters 8 all not only only comprise a filter for the set of filter, and the various media files and the network media can both be supported.

2. dynamically insert technology: this step is crucial, through the mode of search link, finds the renderer of least significant end; Find corresponding pin (Pin) then, call the UnConnect method, renderer is manually broken off; In the middle of self-defining a plurality of filters (comprising screening washer) insertion, and then call the Connect method, the renderer that breaks off is connect again; At this time just recovered complete link (see figure 3); Specifically comprise two parts, the one, the connection between voice band filter 5 and the audio frequency renderer 6 is broken off, insert audio frequency screening washer 10 then; Then the output of voice band filter 5 and the input of audio frequency screening washer 10 are connected, and audio frequency screening washer 10 is connected with audio frequency renderer 6; The 2nd, the connection between audio filters 8 and the video renderer 9 is broken off; Insert video screening washer 11 then, then the output of audio filters 8 and the input of video screening washer 11 are connected, and video screening washer 11 is connected with video renderer 9; Video screening washer 11 is catcher with audio frequency screening washer 10; File still can normal play after recovering, but in the process of playing, data are extracted by the screening washer that is inserted by the centre endlessly.

3. recode: the data that extract are shunted, become and do multichannel output,, adopt the different coding parameter to handle, give application layer at last then to the data on each road.

Formal specification with false code is following:

// initialized processing

CComPtr<?IGraphBuilder?>?m_pGB;

m_pGB.CoCreateInstance(?CLSID_FilterGraph?);

M_pGB->RenderFile (" c: ", NULL); // can be the arbitrary format file

//

// implement here, constructed complete broadcast link, just can call IMediaControl->mode of Run (), a video playback window directly appears

// but not to make player now, so innovated, continue to walk downward

//

// adding filter ACMWrapper

IBaseFilter?*pACMWrapper;

AddFilterByCLSID(m_pGB,?CLSID_ACMWrapper,?_T("ACMWrapper"),?&pACMWrapper);

// adding filter Converter

AddFilterByCLSID(m_pGB,?CLSID_AVConverter,?_T("Converter"),?&pConverter);

// adding screening washer

CComPtr<?ISampleGrabber?>?m_pAudioGrabber;

m_pAudioGrabber.CoCreateInstance(?CLSID_SampleGrabber?);

CComQIPtr<?IBaseFilter,?&IID_IBaseFilter?>?pGrabBase(?m_pAudioGrabber?);

m_pGB->AddFilter(?pGrabBase,?L"audio?Grabber"?);

It is inappropriate also occurring the video playback window in the time of // data acquisition, so replace with empty broadcast window, does not so just eject broadcast window here

IBaseFilter?*pNullRender?=?NULL;

hr?=?AddFilterByCLSID(m_pGB,?CLSID_NullRenderer,?_T("NullRender"),?&pNullRender);

//

// dynamic the process of inserting

// filter ACMWrapper is replaced original audio frequency Render, last parameter has been specified audio frequency or video, and handle disconnection function inside

ReplaceRenderFilter(m_pGB,?pACMWrapper,?TRUE);

//pACMWrapper connects pConverter

ConnectFilters(m_pGB,?pACMWrapper,?pConverter);

//pConverter connects pGrabBase

ConnectFilters(m_pGB,?pConverter,?pGrabBase);

//pGrabBase connects pNullRender

ConnectFilters(m_pGB,?pGrabBase,?pNullRender);

// link structure finishes, and has brought into operation link

CComQIPtr<?IMediaControl,?&IID_IMediaControl?>?pControl?=?m_pGB;

hr?=?pControl->Run(?);

The screening washer of // back has just obtained the original PCM data of audio frequency continually

The mode that present embodiment can be coped with shifting events by sticking to a fundamental principle; Handle most medium type, effective especially for quick exploitation, in addition for unsupported file format; Can also expand; In operating system, add corresponding supply filter file to windows registry and can support that original program code one provisional capital need not be revised.

Simultaneously, the present invention also provides the data output of different-format, such as from a video file, adopting rgb format to extract, also can extract yuv format simultaneously, multichannel output, very flexibly.This is through after the extracted data, and inside carries out that color space conversion realizes.

Specific embodiment described herein only is that the present invention's spirit is illustrated.Person of ordinary skill in the field of the present invention can make various modifications or replenishes or adopt similar mode to substitute described specific embodiment, but can't depart from spirit of the present invention or surmount the defined scope of appended claims.

Although this paper has used terms such as link, renderer, filter morely, do not get rid of the possibility of using other term.Using these terms only is in order to describe and explain essence of the present invention more easily; It all is contrary with spirit of the present invention being construed to any additional restriction to them.

Claims

1. a method that from multimedia, extracts audio frequency and video based on the DirectShow framework, is characterized in that, may further comprise the steps:

Step 1, in the filter chart, make up complete media hype link;

Step 4, the filter extraction audio, video data through inserting are carried out the filter chart then;

Step 6, the data after will handling are given application layer.

2. a kind of method that from multimedia, extracts audio frequency and video according to claim 1; It is characterized in that, in the said step 1, make up the media hype link and be specially: use DirectShow intelligent link mode; Call the Render method of chart manager, obtain complete link chart.

3. a kind of method that from multimedia, extracts audio frequency and video according to claim 1 and 2 is characterized in that in the said step 3, the filter of insertion comprises screening washer.