Summary of the invention
Technical problem: the purpose of this invention is to provide a kind of implementation method that is used for the video-phone system between the network television-set top-set-box; This platform is the high-end embedded development platform that has ARM and DSP double-core; Consider that the DSP that only adopts that STB carries carries out the high capacity problem that sound, coding and decoding video produced; The present invention is through using the soft encoding and decoding of ARM audio frequency, DSP coding and decoding video; Alleviate the load of DSP greatly, made the CPU usage of ARM and the CPU usage of DSP can reach good balance.
Technical scheme: the DM6446 DVEVM development kit that a kind of implementation method that is used for the video-phone system between the network television-set top-set-box of the present invention utilizes TI company to release is hardware platform; The Voice & Video data are gathered respectively and caught; The H.264 coding and decoding video that video data adopts DSP to carry; Audio frequency then adopts soft code encoding/decoding mode, directly calls G.711 code decode algorithm, is handled by ARM.The Network Transmission of audio, video data adopts udp protocol as transport layer protocol, and carries out the RTP packing in application layer.
The implementation method that is used for the video-phone system between the network television-set top-set-box is following:
The implementation method that is used for the video-phone system between the network television-set top-set-box is that the DM6446DVEVM development kit of utilizing TI company to release is hardware platform; The Voice & Video data are gathered respectively and caught; The high performance video coding and decoding technology that video data employing Digital Signal Processing DSP carries is coding and decoding video H.264; Audio frequency then adopts soft code encoding/decoding mode, directly calls G.711 code decode algorithm of voice compression, and ARM handles by microprocessor; The Network Transmission of audio, video data adopts UDP as transport layer protocol, and carries out the real time transport protocol packing in application layer, and concrete implementation method is following:
Step 1). carry out demand analysis, the video-phone system between the network machine top box is analyzed, and the module of division and the demand of function are designed;
Step 2). according to each functional module of step 1 design, be familiar with the interaction flow between each module, logical relation between each module and function are described;
Step 3). according to the function declaration of step 2, at first MiniGUI interface programming under linux system is adopted in design and the user-centered interface of realizing man-machine interaction; After the video-phone system operation, can eject visualization interface, comprise IP address load button, button and X button are set; Click above button; Can eject corresponding dialog box, supply the simple and direct operation of user
Step 4). the DVEVM development kit of utilizing hardware platform to carry; Can gather and catch sound, video data; And can also carry out encoding and decoding to sound, video through Digital Signal Processing DSP; Audio frequency support voice compression standard is code decode algorithm G.711, and video is supported H.264 two kinds of code decode algorithms of video, audio frequency and multimedia coding standard MPEG4 and high performance video coding and decoding technology; Consider the high load capacity problem of Digital Signal Processing DSP; The encoding and decoding engine that this system audio need not carry; But directly in program, add the code decode algorithm code, and adopt microprocessor to handle, video then adopts the encoding and decoding engine that carries; Microprocessor and Digital Signal Processing DSP resource occupation are balanced thus
Step 5). after each modular design is accomplished; The operation of system mainly is the mutual of a plurality of threads and carries out that after the video-phone system operation, at first operational network is monitored call back function and control thread; The control thread mainly is responsible for user interface, and whether the remote controller of checking that does not stop has the order input; The user imports the other side IP Address requests the other side and replys, obtain returning accept message after, operation audio frequency thread, play thread, video thread, catch thread, show thread and Network Transmission thread, carry out the normal mutual of both sides' sound, video with this,
Step 6). the Network Transmission stream medium data is a requisite important step in the video-phone system; Consider sound, the desired real-time Transmission property of video; Though adopt the transmission control protocol transmission to have high reliability; But the delay that brings makes it be not suitable for sending a large amount of real time video datas with too much interaction data because three steps shook hands, and in this case, selects the real time transport protocol that aims at multi-medium datas such as sending a large amount of sounds, video for use; Real time transport protocol is made up of data protocol and control protocol two parts; Real time transport protocol uses user datagram to transmit data usually, and control protocol is used for supporting the function of its agreement, considers to use the real time transport protocol storehouse of some open source codes.
Beneficial effect: the inventive method has proposed on network television-set top-set-box, to develop video-phone system; And in concrete the realization, adopt the soft encoding and decoding of audio A RM, Video DSP encoding and decoding; Be mainly used in the problem that has solved the DSP high capacity; Through using this inventive method, not only expanded the value-added service function of network television-set top-set-box, also alleviated to a certain extent and only adopted DSP to carry out the inefficiencies and the unsteadiness of sound, coding and decoding video.This inventive method is also considered man-machine interaction problems, the INTERFACE DESIGN that adopts the MiniGUI system to carry out hommization.The modularized design of the low coupling of video-phone system is with good expansibility.Below provide bright specifically:
The stability of height: the video-phone system perfect in shape and function of the inventive method; The utilization of key technology is reasonable; Sound, the coding and decoding video design of just using DSP to carry have been broken away from; Adopted the common cooperation of ARM and DSP to accomplish the design of system, made CPU usage equalization stable between ARM and the DSP to have reached design purpose.
Real-time efficiently: video-phone system is aspect the real-time Transmission of sound, video data stream; Require the characteristics such as packet loss that real-time is high, delay is little and tolerable is suitable according to multimedia data stream; Adopted to aim at and sent a large amount of Real-time Transport Protocol of multi-medium datas such as sound, video in real time; Its data are sent with the UDP form, have increased controlled function.The delay of comparing TCP three-way handshake and producing, the inventive method select for use Real-time Transport Protocol aspect real-time, to be greatly improved.
Good autgmentability: the inventive method adopts the modularized design of low coupling, and its functional module is made up of following six parts substantially: audio collection playing module, audio coding decoding module, video capture display module, coding and decoding video module, system control module and network transmission module.Stratification between each module of system is clearly demarcated, and the interface of each module communication is provided, and therefore also is easy to each functional module of upgrading.If change the code decode algorithm of audio or video, only need on corresponding audio coding decoding module or coding and decoding video module, to operate getting final product.Also can be easy to upgrade to the video-phone system of PC to network machine top box.
The interface of hommization: in the operating aspect of man-machine interface, introduced the MiniGUI system,,, made the user be easier to operation like dialog box or windows such as the input of IP address, calling and image demonstrations for the user provides interface programming.
Embodiment
One, architecture:
The present invention is the research of on P2P IPTV terminal, video telephone being carried out.Its functional module is made up of following six parts substantially: audio collection playing module, audio coding decoding module, video capture display module, coding and decoding video module, system control module and network transmission module.As shown in Figure 1.
Below we carry out concrete introduction to each module:
The audio collection playing module: voice collecting, playing module are the important component of embedded video system.Functions such as the collection of this module completion audio signal, broadcast.It mainly is made up of the low-power consumption stereo coding/decoding chip TLV320AIC33 that TI produces.This chip has a plurality of input ports and a plurality of output port able to programme.System power dissipation has only 14mw when based on the energy supply control module of register it being play in 48KHz DAC loop.Extremely low power consumption makes it be particularly suitable for the application of embedded system.
The AIC33 input has the control of the preposition amplification of digital controlled stereo sound microphone, automatic gain and to many strong functions such as multichannel input audio mixing processing.Output has 4 tunnel common outputs and the output of 3 tunnel difference.Its DAC and ADC support 8KHz to the multiple frequency sampling between the 96KHz simultaneously.AIC33 uses multiple voltage, aanalogvoltage 2.7V-3.6V wherein, digital core voltage 1.525-1.95V, digital 1/0 voltage 1.IV-3.6V.AIC33 links to each other with the audio frequency end interface (ASP) of DSP, and mode of operation is the full duplex serial communication.
The audio coding decoding module: voice communication is the basic functions of video telephone.Receive the restriction of network condition, video telephone is usually operated under the low code check.In order to adapt to this low code check voice application, ITU_T has released a series of audio frequency and voice compression.Wherein G.711, G.723.1, G.728, G.729 and G.729A in video telephone, obtained extensive use.
G.711 being also referred to as PCM (pulse code modulation), is the cover voice compression that International Telecommunications Union stipulates out, is mainly used in phone.It mainly with pulse code modulation to audio sample, sample rate is the 8k per second.It utilizes not pressure channel transferring voice signal of a 64Kbps.Playing compression ratio is 1: 2, promptly is compressed into 8 to 16 bit data.G.711 be the waveform sound codec of main flow.
The video capture display module: for video capture, after system powered on, TMS320DM6446 carried out initialization through the SPI interface to pulse signal generator (CXD2457R).After initialization was accomplished, the CCD controller of TMS320DM6446 produced row, a drive signal is given pulse signal generator, and pulse signal generator produces the sampling time sequence signal of CCD timing control signal and A/D conversion chip.The raw image data that CCD gathers is delivered to the A/D conversion chip, and output 10bitBayer template original data signal is given the CCD controller of TMS320DM6446 and handled.The CCD controller mainly produces suitable row, clock signal and original image is carried out processing such as digital clamp and black level compensation, and the image after the processing is delivered to the DDR2 memory.DSP carries out medium filtering and noise filtering after getting initial data from the DDR2 memory, and cfa interpolation and RGB handle to YUV conversion scheduling algorithm, and output resolution ratio is the digital video signal of 1024 * 768 YUV (4:2:2) form.
Digital Video Display System mainly is made up of the backlight circuit of the video back-end processing subsystem of DM6446, CPLD device EPM570, LCD screen LQ057Q3DC12 and LCD.Wherein, The DM6446 chip adopts ARM and DSP dual-core architecture; The ARM subsystem carries the ARM926 nuclear of 297MHz dominant frequency, and DSP partly adopts the C64X+DSP nuclear of 594MHz, and Video processing subsystem (VPSS) has rich video pre-process and post-process function; The media co-processors that featured function unit VICP is special-purpose, Peripheral storage is all supported Peripheral Interfaces such as DDR2, Flash, ATA, CF, SD.Because the digital video output pin voltage of DM6446 is 1.8V, must convert 3.3V into through EPM570, the 3.3V respective pin with the LCD screen is connected again.
The coding and decoding video module: the video encoding module design: the VPBE module in the DM6446 sheet comprises the D/A converter of 4 54MHz; Can digital video signal be converted into analog video signal in DM6446 inside; 4 tunnel outputs, and support CVBS, S-terminal, 3 kinds of analog video formats of YprPb.Therefore, the video encoding module design is comparatively simple, only needs 4 road analog output signals are amplified, and just can directly be connected with surveillance equipment.Select for use the Voltage Feedback cmos operational amplifier OPA357 of TI company to carry out the computing amplification.
The video decode modular design: select special-purpose Video Decoder ADV7189B here for use, it supports 12 road analog video passages, comprises 3 A/D converters with 12 54MHz of noise control performance.Support the analog video signal input of CVBS, S-terminal, 3 kinds of forms of YprPb, can Auto-Sensing NTSL/PAL/SECAM standard, the digital video signal of output ITU-R BT.656 standard.Select 3 tunnel in 12 tunnel analog channels for use, 3 kinds of analog video formats of multiplexing support.ADV7189B exports 10 bit digital vision signals, independently vertical synchronizing signal VD, horizontal-drive signal HD and pixel synchronised clock LLC1; Voltage is the 3.3V level; Convert the 1.8V that DM6446 requires into through FPGA, send into DSP from the VPFE module-specific digital video signal interface of DM6446 then.Before the compressed encoding, the VPFE module converts the video data of ITU-R BT.656 standard into H.264 compatible YUV4:2:0 form, deposits among the DDR2 SDRAM.The VPFE module also supports video data carried out pretreatment operation such as white balance, convergent-divergent.ADG3301 realizes the level conversion of I2C bus.
System control module: mainly be to guarantee normal foundation that video telephone connects, discharge and provide the information Control in the videophone session process, opening and closing etc. like principal and subordinate's decision of terminal room, capabilities exchange, logic channel.Apply in the actual environment, adopt ITU-T in the WCDMA/TD-SCDMA circuit domain video telephone business H.245 as control protocol.
Network transmission module: mainly be the Network Transmission behind the Voice & Video coding, generally adopt the RTP/UDP/IP agreement.This audio frequency and video transmission platform adopts udp protocol as transport layer protocol, and carries out the RTP packing in application layer.Before network sent data, video was through the H.264 encoding compression of DSP, and audio frequency compresses through G.711 encoding, and is beneficial in network, better carry out the transmission of sound, video.
Two, method flow:
The flow process of the video-phone system between the STB Network Based is as shown in Figure 6.At first the user imports the other side's IP address in user interface, initiates connection request, and after the other side agreed, both connected.Both sides' operating procedure is similar afterwards: the initialization video equipment, through the camera capture video data, and H.264 encode with what DSP carried; The initialization audio frequency apparatus, through audio collecting device, the audio frequency acquiring data, and encode with algorithm G.711.Through network, send to the phonetic aspect of a dialect, video packets of data then, after the other side received data, respectively to the Voice & Video data decode, audio frequency was come out by loudspeaker plays again, and video shows through television set.
The implementation method of the video-phone system of this invention is held generally and can be summed up as following four realization: mutual realization, all relevant realizations of audio frequency between the thread, all the relevant realizations of video and the realization of Network Transmission.Below introduce the realization of each piece in detail.
Mutual realization between the thread: of the present invention what rely on is MontaVista Linux embedded OS, and institute's written program is made up of a plurality of threads on this basis.At first main thread carries out some initialized work; The user imports the other side IP address; After request the other side connects and meets with a response, begin to create and catch thread, demonstration thread, video thread, audio coding thread and audio decoder thread, above thread creation finishes; Begin to call the function ctrlThrFxn () in the control thread, this moment, main thread was converted into the control thread.
Be the execution that guarantees that each thread is stable, need set priority for each thread.Except main thread based on predetermined (SCHED_FIFO) priority, the video thread with show that thread enjoys highest priority, secondly be to catch thread, be audio coding thread and audio decoder thread once more, lowest priority is to control thread.
After the program running, each thread is also created accordingly.The control thread mainly is responsible for user interface, and it uses the msp430lib storehouse to remove the msp430 processor of Monitoring and Controlling IR interface, checks whether IR order input is arranged.In case receive a new IR order of being keyed in by remote controller, the action meeting that order just can be identified and respond is carried out in keyAction.Aspect video, catch thread is obtained a sky from the video thread raw buffer.Data with having removed behind the lap are filled it, and this buffer is sent to the video thread subsequently and uses VIDENC_process () to call DSP H.264 to encode, and captured buffer is written into I/Obuffer subsequently, sends to the other side then.Behind the video data encoder that receives the other side; Write raw buffer; The video thread uses VIDENC_process () to call DSP and H.264 decodes; In order to show decoded frame, call function FifoUtil_put () gives and shows that thread transmits a pointer that points to raw buffer, shows that afterwards thread uses VPSS resizer module and Rszcopy_execute () function copy raw buffer in FBDev frame buffer device, to show.Aspect audio frequency, the audio coding thread obtains voice data through calling Read () function, writes the raw buffer of distribution, directly calls G.711 the encryption algorithm code and encodes, and write the buffer that another piece distributes, and sends to the other side.The audio decoder thread receives the voice data that the other side sends, and writes raw buffer, directly calls G.711 the decoding algorithm code and decodes, and write the buffer that another piece distributes, and calls Write () function and carries out voice playing.When the user needs finished call, only need to key in mute key through remote controller, then control thread and can respond and stop this conversation of both sides.Thread is as shown in Figure 4 alternately.
All relevant realizations of audio frequency: need to connect the microphone that is used for audio collection on the network television-set top-set-box.OSS is the driving that multiple Unix (or the compatible operating system of Unix) provides sound card and other sound devices, and it is wherein a kind of device drives of OSS that the AIC33 sound device drives, and is used for the collection of voice data.The present invention considers that sound, video all adopt the DSP encoding and decoding to produce the high load capacity problem, has adopted the mode of the soft encoding and decoding of audio frequency, joins algorithmic code G.711 in the audio frequency thread and directly calls, and has avoided the operation of use DSP.Its concrete method flow is as shown in Figure 2, and step is following:
(1) at first uses InitSoundDevice () function, initialization AIC33 device drives.
(2) be original stereo (stereo) sampled data allocation buffer; Because this buffering area can not relate to DSP (conversion of stereo-to-mono is realized by ARM); So the buffering area that distributes does not require it must is continuous, uses malloc () function here.
(3) call Read () function and come the audio frequency acquiring data, because AIC33 equipment is only supported stereoly, put into buffering area again so will read the stereo sampled data from two channels.
(4) call stereoToMono () function, become monophony to three-dimensional dual track data transaction.
(5) in the audio frequency thread, call the coding that coding function (g711a_Encode ()) G.711 carries out voice data.
After calling coding function and encoding voice data, send to the other side through Network Transmission, the other side also at first needs the initialization of AIC33 device drives; And the allocation buffer etc.; Decoding audio data calls Write () function and writes buffering area then, comes out through the AIC33 device plays.
All relevant realizations of video: at first need connect the camera that is used for Video Capture on the network television-set top-set-box.Video processing front end system (VPFE) is used for being responsible for receiving and handle original video stream signal from peripheral hardware (camera), and the CCD controller (CCDC) in the Video processing front end will be specifically responsible for the collecting work to video data.Driving interface about the managing video collecting device in the linux kernel is V4L2 (Video for Linux Two).Obtain collect after the data, DSP the data format of video by RGB to the YUV conversion process, output resolution ratio is the digital video signal of YUV (4:2:2) form of 1024x768.Utilize mmap (map device memory intoapplication address space) that the Device memory address space of kernel spacing is mapped to the mode of the address space of user's space, make things convenient for the process visit data.
Video program at first open through FifoUtil_open () function and prize procedure between communication buffer, call FifoUtil_get () function and FifoUtil_put () function as the video thread and catch the data communication channel between the thread.The video thread is following to the coding step of data:
(1) use the Engine_open () of CodecEngine to create the video coding algorithm engine, return a handle, all use the module thread of identical Engine all to need independent handle, confirm the safety of thread;
(2) use videoEncodeAlgCreate () to create encryption algorithm, use the static parameter of VIDENC_create () lining to create " H.264 " video encoder;
(3) use Memory_contigAlloc () function to be coding buffer memory and one section continuous memory headroom of original video data buffer memory distribution;
(4) use VIDENC_process () function call H.264 algorithm data are encoded.
The video data that encodes sends to the other side through Network Transmission, and the other side's video thread is decoded to data, and step is following:
(1) use the Engine_open () of CodecEngine to create the video decode algorithm engine, return a handle, all use the module thread of identical Engine all to need independent handle, confirm the safety of thread;
(2) use videoDecodeAlgCreate () to create decoding algorithm, this comprises: a. uses the static parameter of VIDDEC_create () lining to create " H.264dec " Video Decoder; B. use VIDDEC_control () and XDM_GETSTATUS that dynamic video decode parameter is set, inquiry encoding and decoding buffer size;
(3) use Memory_contigAlloc () function to distribute one section continuous memory headroom for the decoding buffer memory;
(4) use VIDDEC_process () function call H.264 algorithm data are decoded.
For the good frame of video of decoding; Through FifoUtil_put () function, the frame of video that the transmission decoding is good is given and is shown thread, and the demonstration thread is received decoded video frame through FifoUtil_get () function; Use initDispaly-Device () function to come initialization FBDev (Frame Buffer Device) display device to drive then; Frame buffer device FBDev is used for accessing video input and output hardware, is an abstract representation of vision hardware, and application program is out of use and separates the interface of task low level like this.Show that thread selection frame buffer device/dev/fh/3 carries out video displaying and plays.Video processing back-end system system (VPBE) realize to video stream signal show, function such as output.Flow process is as shown in Figure 5.
The realization of Network Transmission: according to the characteristics of video telephone self, need the transmitting realtime stream multi-medium data, the present invention selects for use and aims at the Real-time Transport Protocol that sends multi-medium datas such as a large amount of audio frequency and video.To collect audio, video data, write the FIFO buffer queue after the compression, carry out RTP packing; The formation rtp streaming sends in the network (can select SendPacket () function in the JRTPLIB storehouse for use); After the other side receives packet,, send into buffering area according to the processing of sorting of the information in the RTP packet header.Wait for decoding processing.
When handling the RTP packet, adopt buffer technology to come deal with data.Data can resolve into plurality of RTP data packets in Network Transmission, because the dynamic change of network, the transmission path of each bag all maybe be different with the time that arrives receiving terminal, therefore adopts buffer technology to remedy the influence that postpones and change.Receiving terminal is put into buffering area after the RTP packet that receives is unpacked, and according to the sequence number in the RTP packet header data is arranged again, sends into decoding buffer zone and carries out real-time decoding.