CN103220543B - Real time three dimensional (3D) video communication system and implement method thereof based on Kinect - Google Patents

Real time three dimensional (3D) video communication system and implement method thereof based on Kinect Download PDF

Info

Publication number
CN103220543B
CN103220543B CN201310146580.0A CN201310146580A CN103220543B CN 103220543 B CN103220543 B CN 103220543B CN 201310146580 A CN201310146580 A CN 201310146580A CN 103220543 B CN103220543 B CN 103220543B
Authority
CN
China
Prior art keywords
video
audio
information
data
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310146580.0A
Other languages
Chinese (zh)
Other versions
CN103220543A (en
Inventor
张冬冬
刘典
叶晨
王昕�
薛敏峰
臧笛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201310146580.0A priority Critical patent/CN103220543B/en
Publication of CN103220543A publication Critical patent/CN103220543A/en
Application granted granted Critical
Publication of CN103220543B publication Critical patent/CN103220543B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a real time three dimensional (3D) video communication system and an implement method of the real time 3D video communication system based on Kinect. A KINECT camera is utilized to collect color video information, depth video information and audio information synchronously and carry out filtering optimizing processing on depth information. X264 is utilized to carry out compression processing on the two-way video information. A moving picture expert group audio layer-3 (MP3) encoder is utilized to carry out audio compression. A multithreading technology is utilized to achieve synchronous acquisition of the three-way information. A real time protocol (RTP) is utilized to carry out network transmission. The system decodes the two-way video information after receiving the three-way information, the color video information and the depth video information are obtained, reconfiguration of a virtual viewpoint scene is carried out, and a left eye scene video and a right eye scene video corresponding to human eye parallax are obtained. According to a double-eye 3D imaging principle, the two-way video is synthesized to obtain a 3D video scene, DirectX is utilized to play the 3D video scene after the 3D video scene is synchronous with the audio information, and a 3D video conference system is formed.

Description

Based on real-time 3D video communication system and its implementation of KINECT
Technical field
The present invention relates to 3D field of video communication, specifically based on real-time 3D video communication system and the method thereof of KINECT camera.
Background technology
Current video communication system mainly to transmit two-dimensional video, as the Video chats such as QQ, MSN, Gtalk, Skype or video conferencing system.Although two-dimensional video communication can provide the video quality of high definition, user can not be provided the effect of communication on the spot in person because it lacks three-dimensional depth information.
Universal along with 3D film and 3D TV programme, people have no longer been satisfied with traditional two dimensional surface video, and sight has been forwarded to more allow people exclaim three-dimensional video-frequency on.Three-dimensional video-frequency introduces parallax information, enables beholder experience three-dimensional distance space, produces sensation on the spot in person.3D video imaging needs the two-path video information of left eye visual angle and right-eye perspectives, if carry out compressing the data volume that transmission will increase transmission of video greatly to the original video of right and left eyes at 3D video communication; Adopt common camera then to need to utilize the video at two visual angles to carry out stereoscopic parallax coupling and can obtain depth information, matching process so consuming time cannot meet the requirement of real-time 3D video communication.Therefore, it is higher to bandwidth requirement that prior art carries out 3D video communication, more difficultly reaches real time communication.
Summary of the invention
In order to the video communication system overcoming existing employing common camera cannot present the deficiency of three-dimensional video-frequency scene in real time, the invention provides a kind of real-time 3D video communication system based on KINECT and method, real-time is good, and 3-D effect is good.
Present system technical scheme is characterized by:
Based on a real-time 3D video communication system of KINECT, it is characterized in that, system belongs to common C/S framework, is connected between each terminal by network, and each terminal belongs to peer node, both can as audio frequency and video transmit leg, also can as the recipient of audio frequency and video.Described each terminal is according to functional realiey module, comprise KINECT data acquisition and pretreatment module, audio/video coding module, network transmission module, audio/video decoding module, depth information optimize module, three-dimensionalreconstruction module and audio and video playing module, wherein:
Described KINECT data acquisition and pretreatment module, comprise color video information collection and pretreatment module, deep video information gathering and pretreatment module and audio-frequency information acquisition module, utilize multithreading to realize the synchronous acquisition of three tunnel information, synchronously obtain color video information, deep video information and audio-frequency information in real time;
The color video information of described color video information collection and pretreatment module collection is UYVY form, and to the YUV420 form of converting; The deep video information of described deep video information gathering and pretreatment module collection is 16 gray level images, obtains original depth information through shifting function; Preliminary treatment is carried out to original depth information: first passing threshold filtering processes background, to the object exceeding KINECT perceived distance farthest, be unifiedly set to background; Secondly depth value is normalized, obtains the gray level image of 8; KINECT obtains between depth information and the camera of color video information exists horizontal displacement, in order to carry out the correction of deep video information and color video information exactly, color video and deep video gather demand fulfillment following relationship: T (C (i+1)) >T (D (i)) >T (C (i)), wherein C (i) represents the i-th frame cromogram gathered, D (i) represents the i-th frame depth map gathered, and T represents the collection moment; Described audio-frequency information acquisition module is KINECT microphone array, and gathering audio format is single channel 16 PCM forms.
Described audio/video coding module, use X.264 agreement to compress respectively color video information and deep video information, use MP3 audio coder encode audio, thus reduce transmission tone video data redundancy, excessive to solve audio, video data amount, the problem that the network bandwidth cannot meet.
Described network transmission module, comprise audio/video flow sending module and audio/video flow receiver module, transmit for the audio, video data between holding and holding, adopt RTP real time streaming agreement, after acquisition audio/video coding data, carry out synchronous real-time Transmission at transmitting terminal immediately, obtain data at receiving terminal, be specially:
Described audio/video flow sending module need send three circuit-switched data as transmitting terminal, be respectively depth map data, cromogram data and audio stream data, for distinguishing different data flow, before transmitting, for each circuit-switched data adds prefix identifier, cromogram data are ' 0 ' and ' 1 '; Depth map data is ' 1 ' and ' 0 '; Audio stream data is ' a ' and ' b '; Simultaneously when the size of certain packet is more than 63kb, sub-frame processing is carried out to it, and send end data packet at the end of transmission;
With the frequency sampling of 16KHz, after thread is opened, the coding of the audio stream collected and transmission are interspersed in the collection of each frame video flowing, between coding and transmission, the data flow of each audio coding is the voice data collected in last link process, and the audio stream data after coding is carried out to subpackage process and sent;
Described audio/video flow receiver module, according to the identifier identification data bag classification of packet and is unpacked to packet according to the port receives data bag of specifying as receiving terminal;
Described audio/video decoding module, uses respectively the two-path video code stream received after receiving network audio-video code stream and H.264 decodes, and audio stream then goes to audio and video playing module and directly carries out decoding broadcasting.
Described depth information is optimized module and is carried out cavity filling and the optimization process such as smoothly to the decoded deep video information of receiving terminal.First adopt the low resolution depth image obtained decoding to carry out cavity to fill and smoothing processing: secondly adopt linear interpolation method to generate the high-resolution depth graph picture identical with color image resolution by low resolution depth image; Finally filtering process being carried out to high-resolution depth graph picture, obtaining the depth map for reconstructing.
Described three-dimensionalreconstruction module is used for the reconstruct to virtual view scene, the deep video reconstruct after the color video and optimization received is utilized to obtain meeting left eye scene video (color video received is as right eye scene video) or the right eye scene video (color video received is as left eye scene video) of human eye parallax, again left eye or right eye scene video are carried out synthesis with the color video received and obtain 3D video scene, specific implementation process: excursion matrix is obtained in initialization, and carry out migration processing successively, cover and judge, hole-filling, smoothing processing, synthesis 3D, finally generate 3D rendering.
Described audio and video playing module is used for synchronously playing synthesizing the 3D video scene that obtains and audio-frequency information, participant is made to see 3D effect, specific implementation: adopt DirectX as broadcasting instrument, video display portion uses DirectDraw to realize the broadcasting of video, and voice parts then uses DirectSound to realize the broadcasting of audio frequency.
The inventive method technical scheme is characterized by:
Based on a real-time 3D video communication method of KINECT, it is characterized in that, be specially following steps:
(1) adopt KINECT camera and utilize multithreading to realize synchronously obtaining color video information, deep video information and audio-frequency information in real time;
(2) to color video information and deep video information pre-processing, the color video information of YUV420 form is obtained by format conversion, obtain 8 the gray level image depth informations lower than color video resolution by the preliminary treatment such as background process, normalized, threshold filter, and correct and there is horizontal displacement between two cameras due to sampling depth information and color video information and the deviation that produces; (3) utilize x264 to carry out compression process to two-path video information, utilize MP3 encoder to carry out audio compression;
(4) RTP real time streaming agreement is adopted, multithreading scheme be have employed to the process of color video stream and deep video stream, after thread is opened, the coding of the audio stream collected and transmission are interspersed in the collection of each frame video flowing, between coding and transmission, the data flow of each audio coding is the voice data collected in last link process, and packing transmission is carried out to the audio stream after coding, thus realize the synchronized transmission of audio/video flow;
(5) system is unpacked to packet at receiving terminal, after obtaining three tunnel information, decodes to two-path video information, obtains color video information and deep video information;
(6) the low resolution depth image obtained decoding carries out cavity and fills and smoothing processing, the depth information of the acquisition of recycling linear interpolation method and color video equal resolution;
(7) color video received and the reconstruct of the deep video after optimizing is utilized to obtain meeting left eye scene video or the right eye scene video of human eye parallax.
(8) again left eye or right eye scene video are carried out synthesis with the color video received and obtain 3D video scene, DirectX is utilized to play after synchronous with audio-frequency information, form a 3D video conferencing system, participant puts on 3D glasses can see 3D effect, produces the instant interchange sensation of " face-to-face ".There is the phenomenons such as cavity, burr to overcome depth image that KINECT obtains due to illumination and the problem such as to block, the inventive method technical scheme utilizes the depth image of low resolution to be easier to carry out cavity and fills and smoothly wait process, takes to arrange the mode of deep video resolution lower than color video resolution when transmitting terminal collection video; When receiving terminal carries out depth optimization, take first to fill in the enterprising line space hole of low resolution deep video and smoothly, recycle linear interpolation method and obtain the deep video with color video equal resolution.The method can improve speed and the effect of depth image preliminary treatment and optimization well, can also promote whole system multi-channel video code efficiency and transmission speed simultaneously, specifically:
A) the pretreated complexity of transmitting terminal depth image is reduced.
B) transmitting terminal adopts the depth image of low resolution more relative to color video, is conducive to improving multi-channel video encoding-decoding efficiency and speed.
C) reduce depth image compression size of data, be conducive to Transmission Multiple Real-time Internet Video.
D) crucially, carry out on low resolution depth image in the carried out optimization processes such as hole of filling out after receiving terminal is to decoding, be more conducive to the filling in cavity, carry out upper demosaicing obtain the high-resolution depth graph picture the same with color video to filling out the optimized image behind hole again, the present invention like this must obtain good depth map effect of optimization.Video system of the present invention can be widely used in the every field such as enterprise's meeting, commercial negotiation, tele-medicine, distance education and training.
Accompanying drawing explanation
Fig. 1 is embodiment system composition diagram in kind.
Fig. 2 is the system framework figure of each terminal.
Fig. 3 is present system flow chart.
The system hardware Organization Chart of each terminal of Fig. 4.
Fig. 5 KINECT data acquisition flow figure.
Fig. 6 gathers audio frequency flow chart.
Fig. 7 is Video coding flow process figure.
Fig. 8 is audio coding flow process figure.
Fig. 9 is video decode flow chart.
Figure 10 is the graph of a relation of RTP and various procotol.
Figure 11 transmitting terminal transmission flow figure.
Figure 12 is the audio-visual synchronization transmission flow figure of audio/video flow sending module.
Figure 13 is audio stream packing flow chart.
Figure 14 is video flowing packing flow chart.
Figure 15 be three circuit-switched data streams collection, coding and transmission flow figure.
Figure 16 is the reception flow chart of audio/video flow receiver module.
Figure 17 is depth information Optimizing Flow figure.
Figure 18 is reconstruct flow chart.
Embodiment
Below in conjunction with embodiment and accompanying drawing, technical solution of the present invention is described further.
The present embodiment system forms: notebook computer+KINECT+TP-LINK+EPCM505C development board+3D glasses, and system material object composition as shown in Figure 1.
The principle of whole system is: native system is based on EPCM-505C development board as shown in Figure 2 and Figure 3, Microsoft's KINECT camera is utilized to devise a 3D video conferencing system, colour information, depth information and audio-frequency information that KINECT gathers are encoded and transmitted, after decoding, three-dimensionalreconstruction is carried out to colour information and depth information, play with audio sync, restore 3D video conference scene.
System hardware framework is as shown in Figure 4: system, when designing, takes into full account EPCM-505C high performance nature, realizes and the connection of each ancillary equipment and the process of audio, video data.KINECT is responsible for gathering audio, video data, and CF card is used for loading operation system, and externally connected with display screen is used for playing 3D video conference scene, and displaying video meeting sound is responsible for by loud speaker, and router carries out Internet Transmission.KINECT camera is a body sense camera that Microsoft releases.It is made up of pedestal and inductor, has the motor that electronic, can adjust inductor luffing angle between pedestal and inductor.1 color video camera, 1 infrared projection machine, 1 infrared camera and 1 microphone array is had in superincumbent inductor.Colour imagery shot, is used for collecting RGB data, and infrared camera is used for gathering depth of field data.Colour imagery shot maximum support 1280*960 resolution imaging, infrared camera maximum support 640*480 imaging.KINECT can not only obtain color video information, and utilizes its infrared pick-up head can Real-time Obtaining deep video information, and this puies forward extraordinary condition much of that for real-time three-dimensional reconstruct.The microphone array of adding its KINECT camera inside can collect audio-frequency information.So KINECT can as the extraordinary harvester of the present embodiment video conferencing system.
1, KINECT data acquisition and pretreatment module
In systems in which, KINECT is used for gathering color video information, deep video information and audio-frequency information as a collecting device.Collecting flowchart is as Fig. 5.
(1) video information collection and pretreatment module, comprises again color video information collection and format converting module and deep video information gathering and pretreatment module:
KINECT camera collection color video information is UYVY form, and frame rate is 15 frames/second, and resolution is 640x480.Because X.264 compression needs YUV420 form, so need the color conversion of the UYVY form gathered by KINECT to become YUV420 form.Deep video information is 16 gray level images, and frame rate is 30 frames per second, and resolution is 320x240.In its 16 bit data low three for identifying user ID, high 13 are only depth data.Original depth information can be obtained through shifting function.
Because KINECT infrared camera is subject to illumination and identification range impact, the original depth video flowing of acquisition there will be cavity and tomography.Therefore need to process depth image, to obtain continuous print depth image.The present invention is the requirement meeting real-time Transmission, only carries out the preliminary treatment such as simple threshold filter at transmitting terminal to the depth information collected, and depth information optimization module cavity larger for amount of calculation filling work being placed on receiving terminal is carried out.This processing mode takes full advantage of receiving terminal decoded frame rate much larger than this feature of transmitting terminal coding frame per second, depth information Optimization Work larger for amount of calculation is placed on decoding end and carries out, to improve the real time communication ability of system.
The depth image pretreatment work done comprises: passing threshold filtering processes depth map, because KINECT accuracy of identification is between 1.2 ~ 3.5 meters, so to the object exceeding 3.5 meters of scopes, unification is set to background; Depth value is normalized, obtains the gray level image of 8; The horizontal displacement obtained between depth information and two cameras of color video information according to KINECT corrects to obtain the depth information image corresponding with color video visual angle to depth information, for this reason, color video and deep video gather demand fulfillment following relationship:
T (C (i+1)) >T (D (i)) >T (C (i)) wherein C (i) represents the i-th frame cromogram gathered, D (i) represents the i-th frame depth map gathered, and T represents the collection moment.
(2) audio-frequency information acquisition module
KINECT camera has microphone array, can gather audio frequency.Gathering audio format is single channel 16 PCM forms.Sample frequency is 16KHz.Gather and use DMO (DirectX media object).DMO (DirectX media object), be a kind of data processing com component that Microsoft provides, the present invention adopts application program directly to use the occupation mode of DMO.Gather audio frequency flow process as shown in Figure 6.
2, audio/video coding module
(1) Video coding
Native system adopts x264 to carry out video compression, and x264 is a Video coding free software adopting GPL to authorize, and to increase income coded format as one H.264, the major function of x264 is to carry out Video coding H.264/MPEG-4AVC.
The function of x264 is divided into two-layer, i.e. video coding layer (VCL) and network abstraction layer (NAL).Between VCL and NAL, define an interface based on packet mode, packing and corresponding signaling belong to a part of NAL.Like this, the task of high efficiency coding and network adaptability has been come by VCL and NAL respectively.Output after VCL data and coded treatment, its represents by the video data sequences after compressed encoding.Before VCL transfer of data and storage, the VCL data of these codings, elder generation is mapped or encapsulate in NAL unit.Video coding flow process as shown in Figure 7.
(2) audio coding
After collecting voice data, MP3 encoder is used to encode.
MP3 is a kind of audio compression techniques, and its full name is dynamic image expert compression standard audio frequency aspect 3 (Moving PictureExperts Group Audio Layer III), referred to as MP3.MP3 technology utilizes people's ear to the insensitive characteristic of high-frequency sound signal, convert time domain waveform signal to frequency-region signal, and be divided into multiple frequency range, different compression ratios is used to different frequency ranges, compression ratio (even ignoring signal) is strengthened to high frequency, small reduction ratio is used to low frequency signal, namely discards to the unessential data of human auditory in pulse code modulation (PCM) voice data, thus by sound with 1:10 even the compression ratio compression of 1:12.And for most of user, the tonequality of playback does not significantly decrease compared with initial not compressed audio.Audio encoding process as shown in Figure 8.
3, network transmission module
Comprise audio/video flow sending module and audio/video flow receiver module, native system adopts RTP real time streaming agreement, after acquisition audio/video coding data, carries out synchronous real-time Transmission immediately, obtain data at receiving terminal at transmitting terminal.
RTP mono-kind processes a kind of procotol of multimedia data stream on internet, utilize it can at (unicast one to one, clean culture) or one-to-many (multicast, multicast) network environment in realize the real-time Transmission of stream medium data.RTP uses UDP to carry out the transmission of multi-medium data usually, but can use other agreements such as TCP or ATM if necessary.The relation of RTP and various procotol as shown in Figure 10.
Application program runs RTP usually to use its multichannel node and validate service on UDP; These two kinds of agreements both provide the function of transport layer protocol.But use together with the bottom-layer network that RTP can be applicable to other or host-host protocol.If bottom-layer network provides multicast mode, so RTP can use this multicast table to transmit data to multiple destination.
Transmitting terminal transmission flow as shown in figure 11.Send RTP packet, loadtype, mark and timestamp increment are given tacit consent to all, and data is for sending data, and len sends data length.Need send three circuit-switched data altogether, be respectively depth map data, cromogram data and audio stream data, for distinguishing different data flow, before transmitting, for each circuit-switched data adds prefix identifier, cromogram data are ' 0 ' and ' 1 '; Depth map data is ' 1 ' and ' 0 '; Audio stream data is ' a ' and ' b '.Simultaneously when the size of certain packet is more than 63kb, sub-frame processing is carried out to it, and send end data packet at the end of transmission.
Described audio/video flow sending module achieves audio-visual synchronization and sends.For reaching synchronous effect, native system fully employs cpu resource, multithreading scheme be have employed to the process of color video stream and deep video stream, but for video flowing, the data volume of audio stream is less than normal, in the switching time of cross-thread during in order to reduce CPU process each thread request, native system is before each video flowing thread is opened, initialization audio collecting device DMO, and open audio collection interface, from then on the frequency sampling of 16KHz, after thread is opened, the coding of the audio stream collected and transmission are interspersed in the collection of each frame video flowing, between coding and transmission, thus reach simulation multithreading effect, specific implementation is as shown in flow process Figure 12, the data flow of each audio coding is the voice data collected in last link process, and packing transmission is carried out to the audio stream after coding.
Described audio/video flow sending module adopts audio/video flow packing to send:
(1) audio stream: the data flow of each audio coding is the voice data collected in last link process, namely set audio collection speed as v bytes/s, last link is consuming time is t s, the data stream size that then each audio frequency MP3 encodes is vtbytes, so the size of the audio stream data after each coding should consuming time depending on links.By investigation, native system finds that the every frame audio stream data size after MP3 coding is 144 bytes simultaneously, the fidelity best results when the size of each packet is 5 frame audio stream.Therefore, need to carry out subpackage process to the audio stream data after coding to send again.Specific embodiments is as the audio stream packing flow process of Figure 13.
(2) video flowing: the size at every turn sending packet due to RTP is 64Kb to the maximum, so, in the present system subpackage transmission processing is carried out to the data flow being greater than 64kb, if the size of data after a two field picture coding is size, then be divided into int (size/ (64X1024))+1 bag, except last bag size is size% (64X1024) byte, the size of all the other each bags is 64Kb, give the identifier of each bag interpolation two byte in addition again, if the data after coding are less than 64kb, then directly send after adding identifier.Concrete enforcement video flowing packing flow process as shown in figure 14.
Comprehensively whole three circuit-switched data streams collection, coding with transmission see shown in flow process Figure 15.
Described audio/video flow receiver module
Receive flow process as shown in figure 16, receiving terminal, according to the port receives data bag of specifying, according to the identifier identification data bag classification of packet, carries out subsequent sound video decode thus.The data packet length of audio/video flow receiver module for acquiring, in the present embodiment, actual useful data should remove two bytes of prefix identifier, and the useful data length in historical facts or anecdotes border is the value that return value deducts after 2.
4, audio/video decoding module
(1) video decode
Because cataloged procedure uses x264 to encode, then use in decode procedure and H.264 decode.FFmpeg comprises libavformat storehouse and libavcodec storehouse, wherein the decoding of libavcodec process video flowing.In video decoding process, use FFmpeg storehouse to decode, use structure AVCodec as the control of whole decode procedure.Video decode flow process as shown in Figure 9.
(2) audio decoder
Carry out in audio and video playing module at receiving terminal audio decoder, directly call decoding and broadcasting that DirectSound realizes audio frequency.
5, depth information optimizes module
Depth information Optimizing Flow as shown in figure 17, first the depth image of 320*240 size adopting morphologic filtering method to obtain decoding carries out cavity and fills and smoothing processing: adopt the rectangular configuration of 3*3 pixel to carry out twice erosion filter, then adopts radius to be that the circular configuration of 4 pixels carries out twice expansion filtering; Secondly linear interpolation method is adopted to obtain the depth image of 640*480 size; Finally a morphological erosion filtering and twice expansion filtering are carried out to high-resolution depth graph picture, obtain the depth map needed for reconstructing.Identical when filter construction used and low resolution.
6, three-dimensionalreconstruction module
Native system uses and carries out three-dimensionalreconstruction based on the virtually drawing technology (DIBR) of depth image, reconstructs the scene of virtual view according to the colour information of one or more scene and corresponding depth information.Restructuring procedure is as follows:
(1) initialization procedure:
Excursion matrix is obtained during initialization.Matrix name is called shiftx, totally 256, and corresponding 256 depth values return different side-play amounts, uses function f ind_shiftx to ask side-play amount.
Then process is reconstructed, as shown in figure 18.
(2) migration processing process:
Use cromogram and depth map.The side-play amount that the pixel of Y-component in cromogram is obtained by matrix shiftx according to depth value in corresponding depth maps is offset, generates left figure Y-component.The side-play amount obtained by matrix shiftx according to depth value in uv component pixel point in cromogram and corresponding depth maps is again offset, and generates left figure uv component.
(3) judgement is covered:
In migration processing process, if pixel has carried out processing, be recorded as 1 in the corresponding position of mask matrix.Judge mask matrix in coverage judges, if this has record in the position of matrix, this point has carried out migration processing, otherwise is considered as being covered and carries out hole-filling process.
(4) hole-filling process:
The left figure Y-component generated and each pixel of right figure Y-component are judged, if do not obtain this point asking in left figure process, both at the position of mask matrix not record, then hole-filling is carried out to it.Detailed process is ask the mean value of 4 points around, arranges currency.
(5) smoothing process:
To the Y-component generated and the smoothing process of uv component.Concrete smoothing process is ask the mean value of totally 25 pixels around to use as the value of this point.
(6) 3D process is synthesized: generate 3D figure according to left figure and right figure.First left figure and right figure is become rgb format from yuv format.Get the R component of R component as 3D figure of left figure, the GB component of right figure is as the GB component of 3D figure.Finally generate 3D rendering.
7, audio and video playing module
The present embodiment adopts DirectX as broadcasting instrument.DirectX, while maintenance device independence, can allow application program directly control multimedia equipment, thus can make full use of the function of hardware, therefore can obtain very high performance.DirectX is a kind of system based on COM, formed primarily of hardware abstraction layer HAL and Hardware Emulation Layer HEL.DirectX is divided into several assembly modules, covers the every aspect of multimedia application.DirectX SDK is used to realize the broadcasting of Audio and Video.Microsoft DirectX SDK is the software of DirectX programming.DirectX can hardware directly in access computer.DirectX provides accordant interface to reduce the complexity of installing and configuring between hardware and application, and makes the utilization of hardware reach optimum.Video display portion is divided into DirectDraw (DDraw) and Direct3D (D3D), uses DirectDraw to realize the broadcasting of video in program.Voice parts then uses DirectSound to realize the broadcasting of audio frequency.DirectDraw is by supporting the software and hardware speed technology of the outer display memory Bitmap of access screen, and quick direct access, utilizes position Bulk transport and the buffering area turn over function of hardware.DirectSound provides software and hardware sound mix and playback function.

Claims (3)

1. based on a real-time 3D video communication system of KINECT, it is characterized in that, system belongs to common C/S framework, connected by network between each terminal, each terminal belongs to peer node, both can as audio frequency and video transmit leg, also can as the recipient of audio frequency and video; Described each terminal is according to functional realiey module, comprise KINECT data acquisition and pretreatment module, audio/video coding module, network transmission module, audio/video decoding module, depth information optimize module, three-dimensionalreconstruction module and audio and video playing module, wherein:
Described KINECT data acquisition and pretreatment module, comprise color video information collection and pretreatment module, deep video information gathering and pretreatment module and audio-frequency information acquisition module, utilize multithreading to realize the synchronous acquisition of three tunnel information, synchronously obtain color video information, deep video information and audio-frequency information in real time;
The color video information of described color video information collection and pretreatment module collection is UYVY form, and to the YUV420 form of converting; The deep video information of described deep video information gathering and pretreatment module collection is 16 gray level images, obtains original depth information through shifting function; Preliminary treatment is carried out to original depth information: first passing threshold filtering processes background, to the object exceeding KINECT perceived distance farthest, be unifiedly set to background; Secondly depth value is normalized, obtains the gray level image of 8; KINECT obtains between depth information and the camera of color video information exists horizontal displacement, in order to carry out the correction of deep video information and color video information exactly, color video and deep video gather demand fulfillment following relationship: T (C (i+1)) >T (D (i)) >T (C (i)), wherein C (i) represents the i-th frame cromogram gathered, D (i) represents the i-th frame depth map gathered, and T represents the collection moment; Described audio-frequency information acquisition module is KINECT microphone array, and gathering audio format is single channel 16 PCM forms;
Described audio/video coding module, use X.264 agreement to compress respectively color video information and deep video information, use MP3 audio coder encode audio, thus reduce transmission tone video data redundancy, excessive to solve audio, video data amount, the problem that the network bandwidth cannot meet;
Described network transmission module, comprise audio/video flow sending module and audio/video flow receiver module, transmit for the audio, video data between holding and holding, adopt RTP real time streaming agreement, after acquisition audio/video coding data, carry out synchronous real-time Transmission at transmitting terminal immediately, obtain data at receiving terminal, be specially:
Described audio/video flow sending module need send three circuit-switched data as transmitting terminal, be respectively depth map data, cromogram data and audio stream data, for distinguishing different data flow, before transmitting, for each circuit-switched data adds prefix identifier, cromogram data are ' 0 ' and ' 1 '; Depth map data is ' 1 ' and ' 0 '; Audio stream data is ' a ' and ' b '; Simultaneously when the size of certain packet is more than 63kb, sub-frame processing is carried out to it, and send end data packet at the end of transmission;
With the frequency sampling of 16KHz, after thread is opened, the coding of the audio stream collected and transmission are interspersed in the collection of each frame video flowing, between coding and transmission, the data flow of each audio coding is the voice data collected in last link process, and the audio stream data after coding is carried out to subpackage process and sent;
Described audio/video flow receiver module, according to the identifier identification data bag classification of packet and is unpacked to packet according to the port receives data bag of specifying as receiving terminal;
Described audio/video decoding module, uses respectively the two-path video code stream received after receiving network audio-video code stream and H.264 decodes, and audio stream then goes to audio and video playing module and directly carries out decoding broadcasting;
Described depth information is optimized module and is carried out cavity to the decoded deep video information of receiving terminal and fill and smoothly wait optimization process: first adopt the low resolution depth image obtained decoding to carry out cavity filling and smoothing processing; Secondly linear interpolation method is adopted to generate the high-resolution depth graph picture identical with color image resolution by low resolution depth image; Finally filtering process being carried out to high-resolution depth graph picture, obtaining the depth map for reconstructing;
Described three-dimensionalreconstruction module is used for the reconstruct to virtual view scene, the deep video reconstruct after the color video and optimization received is utilized to obtain meeting left eye scene video or the right eye scene video of human eye parallax, again left eye or right eye scene video are carried out synthesis with the color video received and obtain 3D video scene, specific implementation process: excursion matrix is obtained in initialization, and carry out migration processing, coverage judgement, hole-filling, smoothing processing, synthesis 3D successively, finally generate 3D rendering;
Described audio and video playing module is used for synchronously playing synthesizing the 3D video scene that obtains and audio-frequency information, participant is made to see 3D effect, specific implementation: adopt DirectX as broadcasting instrument, video display portion uses DirectDraw to realize the broadcasting of video, and voice parts then uses DirectSound to realize the broadcasting of audio frequency.
2., based on a real-time 3D video communication method of KINECT, it is characterized in that, be specially following steps:
(1) adopt KINECT camera and utilize multithreading to realize synchronously obtaining color video information, deep video information and audio-frequency information in real time;
(2) to color video information and deep video information pre-processing, the color video information of YUV420 form is obtained by format conversion, obtain 8 gray level image depth informations lower than color video resolution by background process, normalized, threshold filter preliminary treatment, and correct and there is horizontal displacement between two cameras due to sampling depth information and color video information and the deviation that produces;
(3) utilize x264 to carry out compression process to two-path video information, utilize MP3 encoder to carry out audio compression;
(4) RTP real time streaming agreement is adopted, multithreading scheme be have employed to the process of color video stream and deep video stream, after thread is opened, the coding of the audio stream collected and transmission are interspersed in the collection of each frame video flowing, between coding and transmission, the data flow of each audio coding is the voice data collected in last link process, and packing transmission is carried out to the audio stream after coding, thus realize the synchronized transmission of audio/video flow;
(5) system is unpacked to packet at receiving terminal, after obtaining three tunnel information, decodes to two-path video information, obtains color video information and deep video information;
(6) the low resolution depth image obtained decoding carries out cavity and fills and smoothing processing, the depth information of the acquisition of recycling linear interpolation method and color video equal resolution;
(7) color video received and the reconstruct of the deep video after optimizing is utilized to obtain meeting left eye scene video or the right eye scene video of human eye parallax;
(8) again left eye or right eye scene video are carried out synthesis with the color video received and obtain 3D video scene, utilize DirectX to play after synchronous with audio-frequency information, form a 3D video conferencing system.
3. method as claimed in claim 2, is characterized in that, utilizes the depth image of low resolution to be easier to carry out cavity and fills and smoothing processing, take to arrange the mode of deep video resolution lower than color video resolution when transmitting terminal gathers video; When receiving terminal carries out depth optimization, take first to fill in the enterprising line space hole of low resolution deep video and smoothly, recycle linear interpolation method and obtain the deep video with color video equal resolution.
CN201310146580.0A 2013-04-25 2013-04-25 Real time three dimensional (3D) video communication system and implement method thereof based on Kinect Expired - Fee Related CN103220543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310146580.0A CN103220543B (en) 2013-04-25 2013-04-25 Real time three dimensional (3D) video communication system and implement method thereof based on Kinect

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310146580.0A CN103220543B (en) 2013-04-25 2013-04-25 Real time three dimensional (3D) video communication system and implement method thereof based on Kinect

Publications (2)

Publication Number Publication Date
CN103220543A CN103220543A (en) 2013-07-24
CN103220543B true CN103220543B (en) 2015-03-04

Family

ID=48817943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310146580.0A Expired - Fee Related CN103220543B (en) 2013-04-25 2013-04-25 Real time three dimensional (3D) video communication system and implement method thereof based on Kinect

Country Status (1)

Country Link
CN (1) CN103220543B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475888A (en) * 2013-08-29 2013-12-25 东莞市凯泰科技有限公司 Real scene production device and real scene production method
CN103561258B (en) * 2013-09-25 2015-04-15 同济大学 Kinect depth video spatio-temporal union restoration method
CN103500013B (en) * 2013-10-18 2016-05-11 武汉大学 Real-time three-dimensional plotting method based on Kinect and stream media technology
TW201528775A (en) 2014-01-02 2015-07-16 Ind Tech Res Inst Depth map aligning method and system
CN105487660A (en) * 2015-11-25 2016-04-13 北京理工大学 Immersion type stage performance interaction method and system based on virtual reality technology
CN107517369B (en) * 2016-06-17 2019-08-02 聚晶半导体股份有限公司 Stereo-picture production method and the electronic device for using the method
CN106302132A (en) * 2016-09-14 2017-01-04 华南理工大学 A kind of 3D instant communicating system based on augmented reality and method
CN106992959B (en) * 2016-11-01 2023-08-18 圆周率科技(常州)有限公司 3D panoramic audio and video live broadcast system and audio and video acquisition method
US20180192033A1 (en) * 2016-12-30 2018-07-05 Google Inc. Multi-view scene flow stitching
CN107707900A (en) * 2017-10-17 2018-02-16 西安万像电子科技有限公司 Processing method, the device and system of content of multimedia
CN108919950A (en) * 2018-06-26 2018-11-30 上海理工大学 Autism children based on Kinect interact device for image and method
CN109460077B (en) * 2018-11-19 2022-05-17 深圳博为教育科技有限公司 Automatic tracking method, automatic tracking equipment and automatic tracking system
CN110113603A (en) * 2019-04-22 2019-08-09 屠晓 HD video processing terminal
CN111242090B (en) * 2020-01-22 2023-06-23 腾讯科技(深圳)有限公司 Human face recognition method, device, equipment and medium based on artificial intelligence
CN113473106A (en) * 2021-06-18 2021-10-01 青岛小鸟看看科技有限公司 Image transmission method, image display and processing device, and image transmission system
CN118055243B (en) * 2024-04-15 2024-06-11 深圳康荣电子有限公司 Audio and video coding processing method, device and equipment for digital television

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232623A (en) * 2007-01-22 2008-07-30 李会根 System and method for transmitting stereo audio and video numerical coding based on transmission stream
CN101662694A (en) * 2008-08-29 2010-03-03 深圳华为通信技术有限公司 Method and device for presenting, sending and receiving video and communication system
CN101668219A (en) * 2008-09-02 2010-03-10 深圳华为通信技术有限公司 Communication method, transmitting equipment and system for 3D video
CN102307309A (en) * 2011-07-29 2012-01-04 杭州电子科技大学 Somatosensory interactive broadcasting guide system and method based on free viewpoints
EP2424256A2 (en) * 2010-08-27 2012-02-29 Broadcom Corporation Method and system for generating three-dimensional video utilizing a monoscopic camera

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232623A (en) * 2007-01-22 2008-07-30 李会根 System and method for transmitting stereo audio and video numerical coding based on transmission stream
CN101662694A (en) * 2008-08-29 2010-03-03 深圳华为通信技术有限公司 Method and device for presenting, sending and receiving video and communication system
CN101668219A (en) * 2008-09-02 2010-03-10 深圳华为通信技术有限公司 Communication method, transmitting equipment and system for 3D video
EP2424256A2 (en) * 2010-08-27 2012-02-29 Broadcom Corporation Method and system for generating three-dimensional video utilizing a monoscopic camera
CN102307309A (en) * 2011-07-29 2012-01-04 杭州电子科技大学 Somatosensory interactive broadcasting guide system and method based on free viewpoints

Also Published As

Publication number Publication date
CN103220543A (en) 2013-07-24

Similar Documents

Publication Publication Date Title
CN103220543B (en) Real time three dimensional (3D) video communication system and implement method thereof based on Kinect
CN103650515B (en) wireless 3D streaming server
US11303826B2 (en) Method and device for transmitting/receiving metadata of image in wireless communication system
EP2234406A1 (en) A three dimensional video communication terminal, system and method
US11044455B2 (en) Multiple-viewpoints related metadata transmission and reception method and apparatus
Chen et al. Overview of the MVC+ D 3D video coding standard
EP2469853B1 (en) Method and device for processing video image data, system and terminal for video conference
CN1132406C (en) A picture communication apparatus
CN109218734A (en) For Video coding and decoded method, apparatus and computer program product
KR102278848B1 (en) Multi-viewpoint-based 360 video processing method and device
CN106303329A (en) Record screen live broadcasting method and device, mobile device and live broadcast system
CN102611873A (en) Method and system for realizing 2D/3D (two dimension/3 dimension) video communication and transmission optimization
Carballeira et al. FVV live: A real-time free-viewpoint video system with consumer electronics hardware
CN103369289A (en) Communication method of video simulation image and device
CN101651841A (en) Method, system and equipment for realizing stereo video communication
Ahmad Multi-view video: get ready for next-generation television
CN105900445A (en) Robust live operation of DASH
KR101861929B1 (en) Providing virtual reality service considering region of interest
CN106331883A (en) Remote visualization data interaction method and system
CN102195894A (en) System and method for realizing three-dimensional video communication in instant communication
CN109451293B (en) Self-adaptive stereoscopic video transmission system and method
WO2020234509A1 (en) A method, an apparatus and a computer program product for volumetric video encoding and decoding
KR101584111B1 (en) A Method And Apparatus For Enhancing Quality Of Multimedia Service By Using Cloud Computing
CN101489090B (en) Method, apparatus and system for multipath media stream transmission and reception
WO2020068284A1 (en) Virtual reality (vr) viewpoint grouping

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150304

Termination date: 20180425

CF01 Termination of patent right due to non-payment of annual fee