WO2018082284A1 - 3d全景音视频直播系统及音视频采集方法 - Google Patents

3d全景音视频直播系统及音视频采集方法 Download PDF

Info

Publication number
WO2018082284A1
WO2018082284A1 PCT/CN2017/084482 CN2017084482W WO2018082284A1 WO 2018082284 A1 WO2018082284 A1 WO 2018082284A1 CN 2017084482 W CN2017084482 W CN 2017084482W WO 2018082284 A1 WO2018082284 A1 WO 2018082284A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
panoramic
video data
video
data
Prior art date
Application number
PCT/CN2017/084482
Other languages
English (en)
French (fr)
Inventor
王超
沈靖程
刘亚辉
Original Assignee
深圳市圆周率软件科技有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市圆周率软件科技有限责任公司 filed Critical 深圳市圆周率软件科技有限责任公司
Publication of WO2018082284A1 publication Critical patent/WO2018082284A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/4061Push-to services, e.g. push-to-talk or push-to-video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/167Synchronising or controlling image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/296Synchronisation thereof; Control thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/04Synchronising
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0096Synchronisation or controlling aspects

Definitions

  • the present disclosure relates to the field of audio and video processing technologies, and in particular, to a 3D panoramic audio and video live broadcast system and an audio and video capture method.
  • panoramic audio and video capture devices are mainly divided into two categories.
  • the first category is a binocular camera solution, that is, two camera modules are placed back to back or horizontally misplaced
  • the second type is multiple camera solutions, that is, multiple The free positioning of the camera module enables the acquisition without dead ends.
  • the first type has problems such as insufficient pixel density, large image distortion and unclear image quality.
  • the second type cannot synchronize video data frames of multiple cameras, and the video collected by these two types of panoramic audio and video capture devices The data all have an impact on the stitching of the panoramic video. It can be seen that how to realize a panoramic audio and video acquisition system with small image distortion, clear picture quality, multiple camera acquisition and good synchronization is a technical problem to be solved.
  • the embodiment of the present disclosure provides a 3D panoramic audio and video live broadcast system and an audio and video capture method, which can improve the definition of image quality, reduce image distortion, and simultaneously realize hardware synchronous acquisition of multi-channel video data and multi-channel audio data.
  • a first aspect of the embodiments of the present disclosure discloses a 3D panoramic audio and video live broadcast system, including: an audio and video collection device, a server, and a plurality of user terminals, where:
  • the audio and video collection device is configured to synchronously collect multiple original video data and multiple original audio data, and process the multiple original video data and the multiple original audio data to obtain 3D panoramic audio and video data. And pushing the 3D panoramic audio and video data to the server;
  • the server is configured to receive the 3D panoramic audio and video data pushed by the audio and video collection device, perform transcoding processing on the 3D panoramic audio and video data, and distribute the transcoded 3D panoramic audio and video data to The user terminal;
  • the user terminal is configured to acquire the transcoded 3D panoramic audio and video data from the server in real time, and live broadcast the transcoded 3D panoramic audio and video data in real time.
  • the audio and video collection device hardware synchronously collects the original video data from the multi-channel camera module and the hardware synchronously collects the original audio data from the multi-channel pickup device, and obtains the original video data and the original audio data after processing.
  • the 3D panoramic audio and video data realizes the synchronous acquisition of multi-channel video data and multi-channel audio data.
  • the video data input by the multi-channel high-definition camera module can support higher pixels and sharpness, so that the image distortion is greatly reduced.
  • the pixel dilution is greatly reduced, and finally the quality of the spliced panoramic video is increased, so that the user terminal broadcasts the 3D panoramic audio and video data with higher pixel and sharpness, and the image distortion is also reduced, thereby improving the image quality. Sharpness, reduced image distortion, and improved user experience.
  • a second aspect of the embodiments of the present disclosure discloses an audio and video collection method, including:
  • the hardware synchronously collects raw video data from the multi-camera module
  • the hardware synchronously collects raw audio data from a multi-way pickup
  • the original video data and the original audio data are processed to obtain 3D panoramic audio and video data.
  • a third aspect of an embodiment of the present disclosure discloses an audio-video capture device including at least one processor and a memory communicatively coupled to the at least one processor, the memory for storing instructions executable by the at least one processor And executing, by the at least one processor, the at least one processor to perform the audio and video collection method described above.
  • the audio and video collection device hardware synchronously collects the original video data from the multi-channel camera module and the hardware synchronously collects the original audio data from the multi-channel pickup, and further, the audio and video collection device pairs the original video data and The original audio data is processed to obtain 3D panoramic audio and video data.
  • the camera module is a high-definition camera module, and the video data input by the multi-channel high-definition camera module can support higher pixels and sharpness.
  • the total size of the photosensitive cells of the multi-channel high-definition camera module will be larger, further Sharing the pressure of each pixel in the panoramic view to take the imaging angle, the image distortion is greatly reduced, and the pixel dilution is greatly reduced, which ultimately increases the quality of the stitched panoramic video, thereby improving the definition of the image quality and reducing the image. distortion.
  • FIG. 1 is a schematic structural diagram of a 3D panoramic audio and video live broadcast system disclosed in an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of another 3D panoramic audio and video live broadcast system disclosed in an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of an audio and video collection method disclosed in an embodiment of the present disclosure.
  • the embodiment of the disclosure discloses a 3D panoramic audio and video live broadcast system and an audio and video capture method, which improves the definition of image quality, reduces image distortion, and simultaneously realizes hardware synchronous acquisition of multi-channel video data and multi-channel audio data. The details are described below separately.
  • FIG. 1 is a schematic structural diagram of a 3D panoramic audio and video live broadcast system according to an embodiment of the present disclosure.
  • the 3D panoramic audio and video live broadcast system includes an audio and video capture device 10 , a server 20 , and a user terminal 30 .
  • the audio and video collection device 10 is configured to synchronously collect multiple original video data and multiple original audio data, and process the multiple original video data and the multiple original audio data.
  • the 3D panoramic audio and video data is obtained, and the 3D panoramic audio and video data is pushed to the server.
  • the server 20 is configured to receive the 3D panoramic audio and video data that is streamed by the audio and video collection device.
  • the user terminal 30 is configured to acquire the transcoded 3D panoramic audio and video data from the server in real time, and live broadcast the transcoded 3D panoramic audio and video data in real time.
  • the audio and video capture device 10 can be a panoramic camera, such as a panoramic camera.
  • the server 20 is a wide area network server or a local area network server.
  • a plurality of the user terminals 30 can simultaneously view the transcoded 3D panoramic audio and video data that is broadcast live in real time.
  • the LAN server is mainly used to set up a streaming media broadcast that supports multiple users to simultaneously view local 3D panoramic audio and video data in a LAN environment, and can accept audio and video streams of RTMP format pushed by audio and video collection devices, and at the same time, support multiple Conversion of audio and video stream formats, such as HTTP, HLS, RTP, RTSP, RTCP, RTMP, PNM, MMS, Onvif, etc., and multi-way distribution of audio and video streams, so that user terminals can be present Real-time 3D panoramic audio and video live experience.
  • RTMP format pushed by audio and video collection devices
  • the WAN server is mainly used for receiving audio and video streams that the audio and video collection device pushes through the Ethernet and creates a live broadcast on the cloud platform, generates a push stream address or a play address to distribute to the user terminal, and the WAN service can also perform protocol conversion to receive the
  • the format of the audio and video stream is converted into various video formats such as HTTP, HLS, RTP, RTSP, RTCP, RTMP, PNM, MMS, Onvif, etc., and distributed to user terminals capable of accepting the live video of the corresponding video format.
  • the audio and video streams undergo a CDN acceleration process during transmission.
  • the user terminal 30 is configured with a panoramic player corresponding to the operating system of the user terminal 30, and the operating system includes any one of the following: a Windows operating system, a Mac OS, an IOS, and an Android Android.
  • the user terminal 30 is a wide area network user terminal or a local area network user terminal.
  • the WAN user terminal can include, but is not limited to, a Virtual Reality All-in-one Headset, a mobile phone, a tablet computer, a MAC computer, a laptop computer, a desktop computer, etc., and the user can play through different WAN user terminals.
  • a Virtual Reality All-in-one Headset a mobile phone, a tablet computer, a MAC computer, a laptop computer, a desktop computer, etc.
  • CDN-accelerated 3D panoramic audio and video broadcasts at least 4K/30fps in real time.
  • the WAN user terminal can support multiple people watching online at the same time.
  • the LAN user terminal can include, but is not limited to, a Virtual Reality All-in-one Headset, a mobile phone, a tablet computer, a MAC computer, a laptop computer, a desktop computer, etc., and the user can pass the player on the local area network user terminal. Come and experience local 3D panoramic audio and video live broadcasts at least 4K/30fps in real time. In addition, the LAN user terminal can support multiple people watching online at the same time.
  • the audio and video collection device 10 synchronously collects multiple original video data and multiple original audio data, and realizes synchronous acquisition of multiple original video data and multiple original audio data.
  • the obtained 3D panorama is obtained.
  • the quality of audio and video data is clearer and the image distortion is smaller.
  • the audio and video collection device 10 pushes the 3D panoramic audio and video data to the server 20, and after the server 20 performs transcoding processing on the 3D panoramic audio and video data, the user terminal 30 can be from the server 20.
  • the transcoded 3D panoramic audio and video data is obtained in real time, and the transcoded 3D panoramic audio and video data is broadcasted in real time, so that the user can view higher definition, less image distortion, and higher panoramic stitching instructions.
  • 3D panoramic audio and video which can enhance the user's immersive experience.
  • FIG. 2 is a schematic structural diagram of another 3D panoramic audio and video live broadcast system disclosed in an embodiment of the present disclosure.
  • the 3D panoramic audio and video live broadcast system shown in FIG. 2 is further optimized based on the 3D panoramic audio and video live broadcast system shown in FIG. 1 , compared with the 3D panoramic audio and video live broadcast system shown in FIG. 1 , FIG. 2 .
  • the illustrated 3D panoramic audio and video live broadcast system includes all modules of the 3D panoramic audio and video live broadcast system shown in FIG.
  • the audio and video collection device 10 includes an acquisition module 100 and a processing module 200.
  • the acquisition module 100 passes the M-channel first mobile industry processor interface (MIPI) (such as the first MIPI 1, the first MIPI 2...
  • MIPI M M-channel first mobile industry processor interface
  • the acquisition module 100 includes an N-channel camera module (such as a camera module 1, a camera module 2, a camera module N), and a P-channel pickup (such as a pickup 1).
  • the N-channel camera module passes the N-way second MIPI (such as the second MIPI 1, the second MIPI 2 a second MIPI N) is connected to the FPGA chip, and the P-channel pickup is connected to the FPGA chip through a P-channel first audio data interface, wherein M, N, and P are positive integers, and M is less than N ;among them:
  • the FPGA chip 110 is configured to synchronously collect original video data from the N-channel camera module through the N-way second MIPI hardware, and send the original video data in parallel through the M-channel first MIPI.
  • the processing module ;
  • the FPGA chip 110 is further configured to synchronously collect raw audio data from the P-way pickup through the P-channel first audio data interface hardware, and send the original audio data to the Processing module
  • the processing module 200 is configured to process the original video data and the original audio data to obtain 3D panoramic audio and video data.
  • the N-channel camera module is directly connected to the FPGA chip 110 through the N-way second MIPI. Because the second MIPI has a fast transmission speed, it can be used to transmit images with higher definition and larger data volume.
  • the sensor chip has the characteristics that the interface is rich and can work in parallel, so that the FPGA chip 110 can synchronously collect the original video data from the N-channel camera module through the N-way second MIPI hardware, and
  • the P-channel first audio data interface hardware synchronously collects original audio data from the P-channel pickup, that is, can realize multi-channel video data and multi-channel Hardware synchronous acquisition of audio data.
  • the N-channel camera module is a high-definition camera module
  • the video data input by the multi-channel high-definition camera module can support higher pixels and sharpness, so the resolution of the 3D panoramic audio and video data obtained by the processing module 200 can be very High
  • the total size of the photosensitive cells of the N-channel high-definition camera module will be larger, further sharing the pressure of the imaging angle of each pixel in the panoramic view, so that the image distortion is greatly reduced, and the pixel dilution is greatly reduced.
  • the quality of the stitched panoramic video is increased, thereby improving the definition of image quality and reducing image distortion.
  • the camera module includes an image sensor and a lens corresponding to the image sensor (not shown); optionally, the N lenses are in a circular shape. Or evenly distributed outwards, or N of the lenses are evenly distributed on a sphere so that the lens faces outward.
  • the lens is a fisheye lens with an angle greater than or equal to 180 degrees, and each of the image sensors is erected.
  • each of the image sensors is required to be placed upright, that is, the long side of the image sensor is perpendicular to the circumference of the plurality of image sensors uniformly arranged horizontally, so that the pixel utilization and image quality of the image sensor can be improved. quality.
  • the lens is a wide-angle lens, and the angle of the wide-angle lens corresponds to the number of the image sensors.
  • the angle of the wide-angle lens corresponds to the number of the image sensors, that is, the angle of the wide-angle lens may vary depending on the number of image sensors.
  • the FPGA chip 110 can acquire the original video data of the N image sensors in accordance with the 10-bit precision at the same time, and acquire the original RAW DATA format of the N-channel image sensor. Video data.
  • the FPGA chip 110 includes: N video data input buffer units and M video data output buffer units (shown in the figure), where N is an integral multiple of M, where
  • the video data input buffer unit is configured to store original video data of the camera module corresponding to the video data input buffer unit;
  • the FPGA chip 110 is further configured to divide the original video data stored in the N video data input buffer units into M groups, and obtain original video data of each group;
  • the video data output buffer unit is configured to store original video data of a packet corresponding to the video data output buffer unit, and to send the stored original video data to the processing module by using the first MIPI.
  • the FPGA chip 110 synchronously collects the original video data from the N-channel camera module through the N-way second MIPI hardware.
  • the FPGA chip 110 separately establishes a video data input buffer unit for the N-channel camera modules, that is, a total of N video data input buffer units are established.
  • Each video data input buffer unit can store the video data frame of the X frame, and X is greater than or equal to 1, which is beneficial for the high speed and high frame rate of the original video data of each camera module to be received and cached in time. It facilitates the processing of subsequent original video data, and at the same time, prevents the loss of video data caused by the subsequent processing efficiency being incapable of matching the output efficiency of the camera module.
  • the original video data obtained by the synchronous acquisition of the hardware of the N-channel camera module by the FPGA chip 110 may be referred to as a set of acquisition inputs, and the original video data obtained by synchronously acquiring the hardware, that is, a set of acquisition inputs.
  • the original video data obtained by synchronously acquiring the hardware will be stored in parallel to the corresponding video data input buffer unit, and in each video data input buffer unit, the raw video data corresponding to the video data input buffer unit collected by each hardware synchronization will follow the storage space.
  • Address from low to high The order is stored sequentially from high to low until the number of original video data frames of the video data input buffer unit reaches X. If a new original video data frame is input, the first storage is overwritten.
  • the video data is input to the data frame of the buffer unit, and continues to sequentially store and overwrite the previously stored original video data in order.
  • the FPGA chip 110 divides the original video data stored in the N video data input buffer units into equal parts. Group M, obtaining original video data of each packet, and storing original video data of each packet to a video data output buffer unit corresponding to the packet, wherein the number of video data output buffer units is M, each group The number of original video data frames is N/M, where N is an integral multiple of M.
  • the M-channel video data output buffer unit transmits the stored original video data to the processing module in parallel through the first MIPI of the M-way.
  • the M-channel video data output buffer unit also sends a data request to the FPGA chip 110, and the request is used to request that the group of acquisition inputs stored in the N-channel video data input buffer unit be further divided into M groups and transmitted to the M group.
  • M channel video data output buffer unit also sends a data request to the FPGA chip 110, and the request is used to request that the group of acquisition inputs stored in the N-channel video data input buffer unit be further divided into M groups and transmitted to the M group.
  • the FPGA chip 110 sequentially performs real-time hardware acquisition of the original video data of the N-channel camera module in real time, and stores it as a set of acquisition inputs in the address storage order to the corresponding N-channel video data input buffer unit, and then waits until M
  • the video data output buffer unit sends a data request
  • the original video data input into the buffer unit by the N video data is sequentially transmitted to the M video data in a first-in first-out manner according to the first-in-first-out principle.
  • Output buffer unit
  • the M-channel video data output buffer unit can send the stored original video data to the processing module 200 through the first MIPI of the M-path, so that the processing module 200 performs subsequent processing on the original video data.
  • the P-way pickup is a P-channel analog microphone
  • the first The audio data interface is an audio input AIN interface.
  • the FPGA chip 110 is connected to the P-channel analog microphone through the P-channel AIN interface, and performs P-channel analog audio signal amplification, AGC (automatic gain control), A/D sampling, quantization, and encoding through the FPGA chip 110, and finally obtains P. Road raw audio data.
  • the number of acquisition bits can be selected according to different requirements of precision and sound quality, such as 8-bit, 16-bit, 24-bit, etc.
  • the sampling frequency can be selected according to different sound quality requirements, such as 22.05KHz, 44.1KHz, 48KHz.
  • the P-channel pickup is a P-channel digital microphone
  • the first audio data interface is an integrated circuit built-in audio bus I2S interface.
  • the FPGA chip 110 receives the original audio data of the P-channel digital microphone through the P-channel I2S interface, and the sampling accuracy is mainly limited by the characteristics of the digital microphone itself.
  • the FPGA chip 110 includes: P audio data input buffer units and an audio data output buffer unit, where
  • the audio data input buffer unit is configured to store original audio data of the pickup corresponding to the audio data input buffer unit;
  • the audio data output buffer unit is configured to store original audio data from the P audio data input buffer units, and to transmit the stored original audio data to the processing module through the second audio data interface.
  • the second audio data interface may include but is not limited to interfaces such as USB2.0, USB3.0, McBSP, and HDMI.
  • the FPGA chip 110 establishes an audio data input buffer unit for each of the pickups. At the same time, the FPGA chip 110 also establishes a total audio data output buffer unit for the collected P-channel original audio data. Whether the pickup is an analog microphone or a digital microphone, for each audio data input buffer unit, the audio data input buffer unit stores a corresponding to the audio data input buffer unit. The raw audio data of the pickup,
  • the P audio data input buffer units will store the stored P channel original audio data according to the first to the Pth.
  • the sequence of the channels is sequentially transmitted to the audio data output buffer unit, and the audio data output buffer unit buffers the received original audio data according to a certain format, and sends the stored original audio data to the device through the second audio data interface.
  • the processing module 200 is described.
  • the processing module 200 includes: a main control module 210, an M image signal processing (ISP) module (such as an ISP module 1, an ISP module 2, an ISP module M), and a graphic.
  • ISP M image signal processing
  • the main control module 210 is configured to receive the original video data in parallel through the M-channel first MIPI, and process the original video data by scheduling the M-way ISP module and the GPU module 220 to obtain 3D. Panoramic video data.
  • the main control module 210 is further configured to receive the original audio data through the second audio data interface, and process the original audio data to obtain panoramic audio data.
  • the main control module 210 is further configured to perform direction matching processing on the 3D panoramic video data and the panoramic audio data.
  • the main control module 210 is further configured to perform encoding processing on the matched 3D panoramic video data and the panoramic audio data by scheduling the encoding module 230, and perform audio and video synchronization on the encoded 3D panoramic video data and the panoramic audio data. Processing to obtain 3D panoramic audio and video data.
  • the main control module 210 is further configured to store the 3D panoramic audio and video data by scheduling the external storage module 240.
  • the manner in which the main control module 210 processes the original video data by scheduling the M ISP module and the GPU module 220 to obtain 3D panoramic video data is specifically:
  • the main control module 210 performs ISP processing on the original video data by scheduling the M-way ISP module to obtain M-channel video data;
  • the main control module 210 performs hardware-accelerated real-time 3D panoramic algorithm splicing and rendering processing on the M-channel video data by scheduling the GPU module 220 to obtain 3D panoramic video data.
  • the manner in which the main control module 210 processes the original audio data to obtain the panoramic audio data is specifically:
  • the main control module 210 performs a surround sound algorithm processing and synthesis on the original audio data to obtain panoramic audio data.
  • the main control module 210 can be a central processing unit (CPU), an image signal processing (ISP), and a graphics processing unit (GPU).
  • CPU central processing unit
  • ISP image signal processing
  • GPU graphics processing unit
  • the main control module 210 in the processing module 200 Receiving, by the M-channel first MIPI, the original video data in parallel, and performing ISP processing on the original video data by scheduling the M-way ISP module, that is, performing 3D noise reduction processing, image quality optimization processing, and converting RAW DATA Converting the original video data of the format into the original video data of the YUV format, and finally obtaining the M video data; further, performing the hardware accelerated real-time 3D panoramic algorithm stitching and rendering processing on the M video data by scheduling the GPU module 220 Obtaining 3D panoramic video data; at the same time, the main control module 210 performs surround sound algorithm processing and synthesis on the original audio data to obtain panoramic audio data.
  • the resolution, the frame rate, and the bit rate of the video stream of the 3D panoramic video data are mainly affected by the performance of the processing module 200 itself, and at least a real-time video stream encoding 4K/30 fps and a low code rate can be realized.
  • the main control module 210 also performs the 3D panoramic video data and the panoramic audio.
  • the data is subjected to direction matching processing, so that the panoramic audio can simulate the occurrence of the sound source position felt by the human ear in the real scene according to different perspective position matching of the 3D panoramic video, further enhancing the shock of the immersive experience of the experiencer. sense.
  • main control module 210 further performs hardware accelerated H264/H265 encoding on the matched 3D panoramic video data by scheduling the encoding module 230, and performs hardware accelerated AAC encoding on the matched panoramic audio data, and further The main control module 210 performs audio and video synchronization processing on the encoded 3D panoramic video data and the panoramic audio data to ensure synchronization of the audio and video data, and thus, the 3D panoramic audio and video data is obtained.
  • the main control module 210 is further configured to push the 3D panoramic audio and video data to a local area network server or a wide area network server by using an Ethernet over the Real Time Messaging Protocol (RTMP) format.
  • RTMP Real Time Messaging Protocol
  • the main control module 210 is further configured to push the 3D panoramic audio and video data to the local area network server or the wide area network server in a real-time message transmission protocol (RTMP format) by using an Ethernet in a wireless or wired manner.
  • RTMP format real-time message transmission protocol
  • FIG. 3 is a schematic flowchart of an audio and video collection method according to an embodiment of the present disclosure. The method is applied to an audio and video collection device. As shown in FIG. 3, the method may include steps 301 to 303. .
  • step 301 the audio and video collection device hardware synchronously collects the original video data from the multi-channel camera module.
  • the manner in which the audio and video collection device hardware synchronously collects the original video data from the multi-channel camera module may be: synchronously collecting the original video data from the multi-channel camera module through the multi-channel mobile industry processor interface MIPI hardware.
  • the FPGA chip in the audio and video collection device is processed by the multi-path mobile industry.
  • the MIPI hardware synchronously collects the original video data from the multi-channel camera module.
  • the camera module is directly connected to the FPGA chip through MIPI.
  • the MIPI has a fast transmission speed and can be used to transmit higher-definition, larger-volume image sensors.
  • Data, FPGA chip has a rich interface and can work in parallel, so the FPGA chip can synchronously collect raw video data from multiple camera modules through multiple MIPI hardware.
  • step 302 the audio and video collection device hardware synchronously collects the original audio data from the multi-channel pickup.
  • the audio and video collection device synchronously collects original audio data from the multi-channel pickup device through multiple first audio data interface hardware
  • the first audio data interface may include but is not limited to USB2.0, USB3.0, McBSP, HDMI, etc. interface.
  • step 303 the audio and video collection device processes the original video data and the original audio data to obtain 3D panoramic audio and video data.
  • the audio and video collection device processes the original video data and the original audio data, and the method for obtaining the 3D panoramic audio and video data specifically includes steps 11 to 13.
  • step 11 the audio and video collection device processes the original video data to obtain 3D panoramic video data.
  • step 12 the audio and video collection device processes the original audio data to obtain panoramic audio data.
  • step 13 the audio and video collection device processes the 3D panoramic video data and the panoramic audio data to obtain 3D panoramic audio and video data.
  • the audio and video collection device processes the original video data
  • the method for obtaining the 3D panoramic video data is specifically: the audio and video collection device performs image signal processing on the original video data. Processing, obtaining multi-channel video data; performing hardware-accelerated real-time 3D panoramic algorithm splicing and rendering processing on the multi-channel video data to obtain 3D panoramic video data.
  • the manner in which the audio and video collection device processes the original audio data to obtain the panoramic audio data is specifically: the audio and video collection device performs a surround sound algorithm processing and synthesis on the original audio data to obtain panoramic audio data.
  • the manner in which the audio and video collection device processes the 3D panoramic video data and the panoramic audio data to obtain 3D panoramic audio and video data is specifically: the audio and video collection device pairs the 3D panoramic video data and the panoramic audio data. Performing direction matching processing; respectively performing encoding processing on the matched 3D panoramic video data and the panoramic audio data, and performing audio and video synchronization processing on the encoded 3D panoramic video data and the panoramic audio data to obtain 3D panoramic audio and video data.
  • the audio and video collection device performs ISP processing on the original video data, that is, performs 3D noise reduction processing, image quality optimization processing, and converts the original video data in the converted RAW DATA format into the original of the YUV format.
  • Video data and finally obtaining multi-channel video data; further, the audio and video collection device performs hardware-accelerated real-time 3D panoramic algorithm splicing and rendering processing on the multi-channel video data to obtain 3D panoramic video data; meanwhile, the audio and video collection device pairs
  • the original audio data is processed and synthesized by a surround sound algorithm to obtain panoramic audio data.
  • the audio and video collection device further performs direction matching processing on the 3D panoramic video data and the panoramic audio data, so that the panoramic audio can simulate the human ear in the real scene according to different perspective position matching of the 3D panoramic video.
  • the occurrence of the perceived sound source position further strengthens the shock of the immersive experience of the experiencer.
  • the audio and video capture device performs hardware accelerated H264/H265 encoding on the matched 3D panoramic video data, and performs hardware accelerated AAC encoding on the matched panoramic audio data, and encodes the 3D panoramic video data and the panoramic audio data. Perform audio and video synchronization to ensure sound The video data is synchronized so that 3D panoramic audio and video data is obtained.
  • the method further includes the following steps: the audio and video collection device pushes the 3D panoramic audio and video data to a local area network server or a wide area network server in a real-time message transmission protocol (RTMP) format via Ethernet.
  • RTMP real-time message transmission protocol
  • the audio and video collection device pushes the 3D panoramic audio and video data to the local area network server or the wide area network server in a real-time message transmission protocol (RTMP format) in a wireless or wired manner via Ethernet.
  • RTMP format real-time message transmission protocol
  • the local area network server is mainly used to set up a live streaming media in a local area network environment to support multiple users simultaneously watching local 3D panoramic audio and video data, and can receive audio and video in RTMP format pushed by the audio and video collection device.
  • Streaming at the same time, supports conversion of multiple audio and video stream formats, such as conversion to HTTP, HLS, RTP, RTSP, RTCP, RTMP, PNM, MMS, Onvif, etc., and multi-channel distribution of audio and video streams, so as to facilitate
  • the user terminal performs an immersive real-time 3D panoramic audio and video live broadcast experience.
  • the WAN server is mainly used for receiving audio and video streams that the audio and video collection device pushes through the Ethernet and creates a live broadcast on the cloud platform, generates a push stream address or a play address to distribute to the user terminal, and the WAN service can also perform protocol conversion to receive the
  • the format of the audio and video stream is converted into various video formats such as HTTP, HLS, RTP, RTSP, RTCP, RTMP, PNM, MMS, Onvif, etc., and distributed to user terminals capable of accepting the live video of the corresponding video format.
  • the audio and video streams undergo a CDN acceleration process during transmission.
  • the audio and video capture device hardware synchronously collects the original video data from the multi-channel camera module and the hardware synchronously collects the original audio data from the multi-channel pickup device.
  • the audio and video collection device processes the original video data and the original audio data to obtain 3D panoramic audio and video data. It can be seen that by implementing the embodiments of the present disclosure, hardware synchronous acquisition of multiple original video data and multiple original audio data can be realized.
  • the camera module is a high-definition camera module
  • the video data input by the multi-channel high-definition camera module can support higher pixels and sharpness, and at the same time, the total size of the photosensitive lens of the high-definition camera module will be larger, further sharing The pressure of the imaging angle is taken for each pixel in the panoramic view, so that the image distortion is greatly reduced, and the pixel dilution is greatly reduced, which ultimately increases the quality of the stitched panoramic video, thereby improving the definition of the image quality and reducing the image distortion.
  • the present disclosure also provides an audiovisual acquisition device including at least one processor and a memory communicatively coupled to the at least one processor, the memory for storing instructions executable by the at least one processor, The instructions, when executed by the at least one processor, cause the at least one processor to perform the audio and video capture method described above.
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, Or not.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure, or all or part of the technical solution, may be embodied in the form of a software product stored in a memory. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the foregoing memory includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Studio Devices (AREA)

Abstract

本公开提供了一种3D全景音视频直播系统及音视频采集方法。该3D全景音视频直播系统包括音视频采集设备、服务器以及多个用户终端。音视频采集设备硬件同步采集多路原始视频数据以及多路原始音频数据,对多路原始视频数据以及多路原始音频数据进行处理,获得3D全景音视频数据;服务器将3D全景音视频数据进行转码处理;用户终端实时直播转码后的3D全景音视频数据。本公开实施例可以提高画质的清晰度、减小图像畸变,实现多路视频数据与多路音频数据的硬件同步采集。

Description

3D全景音视频直播系统及音视频采集方法 技术领域
本公开涉及音视频处理技术领域,尤其涉及一种3D全景音视频直播系统及音视频采集方法。
背景技术
随着计算机技术、微电子技术、光学技术以及多媒体技术的飞速发展,人们信息交流互通的需求不断加大,信息沟通方式革新的需求更是与日俱增,传统的单个摄像头进行音视频信息采集和网络传输到远程的方式进行可视对讲等等的方式已经不满足人们日益增长无死角的图像视频信息采集的需求,一种突破传统,带给用户720度无死角沉浸式体验的全景音视频采集设备应运而生。
目前,全景音视频采集设备主要分为两大类,第一类是采用双目摄像头方案,即两个摄像头模组背靠背放置或者水平方向错位放置,第二类则是多个摄像头方案,即多目摄像头模组自由排布使得能够无死角采集。然而实践中发现,第一类存在着像素密度不够、图像畸变较大和画质不清晰等问题,第二类无法进行多个摄像头的视频数据帧同步,这两类全景音视频采集设备采集的视频数据都给全景视频的拼接带来影响。可见,如何实现一种图像畸变比较小、画质清晰、多个摄像头采集且同步性好的全景音视频采集系统是一个亟待解决的技术难题。
发明内容
本公开实施例提供了一种3D全景音视频直播系统及音视频采集方法,可以提高画质的清晰度、减小图像畸变,同时,实现多路视频数据与多路音频数据的硬件同步采集。
本公开实施例第一方面公开了一种3D全景音视频直播系统,包括:音视频采集设备、服务器以及多个用户终端,其中:
所述音视频采集设备,设置为硬件同步采集多路原始视频数据以及多路原始音频数据,对所述多路原始视频数据以及所述多路原始音频数据进行处理,获得3D全景音视频数据,并将所述3D全景音视频数据推流到所述服务器;
所述服务器,设置为接收所述音视频采集设备推流的所述3D全景音视频数据,将所述3D全景音视频数据进行转码处理,以及将转码后的3D全景音视频数据分发给所述用户终端;
所述用户终端,设置为从所述服务器实时获取所述转码后的3D全景音视频数据,并实时直播所述转码后的3D全景音视频数据。
本公开实施例中,音视频采集设备硬件同步采集来自多路摄像头模组的原始视频数据以及硬件同步采集来自多路拾音器的原始音频数据,在对该原始视频数据以及原始音频数据进行处理后获得3D全景音视频数据,实现了多路视频数据与多路音频数据的硬件同步采集,同时,多路高清摄像头模组输入的视频数据能够支持更高像素和清晰度,使得图像畸变大大减小,同时像素稀释度大大降低,最终使得拼接的全景视频质量增高,这样,用户终端直播3D全景音视频数据时将会有更高的像素和清晰度,图像畸变也会减小,从而可以提高画质的清晰度、减小图像畸变,提高用户体验。
本公开实施例第二方面公开了一种音视频采集方法,包括:
硬件同步采集来自多路摄像头模组的原始视频数据;
硬件同步采集来自多路拾音器的原始音频数据;
对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据。
本公开实施例第三方面公开了一种音视频采集设备包括至少一个处理器和与所述至少一个处理器通信连接的存储器,所述存储器用于存储可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行时,使所述至少一个处理器执行上述的音视频采集方法。
本公开实施例中,音视频采集设备硬件同步采集来自多路摄像头模组的原始视频数据以及硬件同步采集来自多路拾音器的原始音频数据,进一步地,音视频采集设备对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据。可见,通过实施本公开实施例,能够实现多路原始视频数据以及多路原始音频数据的硬件同步采集。此外,摄像头模组为高清摄像头模组,多路高清摄像头模组输入的视频数据能够支持更高像素和清晰度,此外,多路高清摄像头模组的感光晶元的总尺寸会更大,进一步分担了拍摄全景中每个像素点承担成像角度的压力,使得图像畸变大大减小,同时像素稀释度大大降低,最终使得拼接的全景视频质量增高,从而能够提高画质的清晰度、减小图像畸变。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。
图1是本公开实施例公开的一种3D全景音视频直播系统的结构示意图;
图2是本公开实施例公开的另一种3D全景音视频直播系统的结构示意图;
图3是本公开实施例公开的一种音视频采集方法的流程示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
本公开实施例公开了一种3D全景音视频直播系统及音视频采集方法,提高画质的清晰度、减小图像畸变,同时,实现多路视频数据与多路音频数据的硬件同步采集。以下分别进行详细说明。
请参阅图1,图1是本公开实施例公开的一种3D全景音视频直播系统的结构示意图。如图1所示,该3D全景音视频直播系统包括音视频采集设备10、服务器20以及用户终端30。
所述音视频采集设备10设置为硬件同步采集多路原始视频数据以及多路原始音频数据,对所述多路原始视频数据以及所述多路原始音频数据进行处理, 获得3D全景音视频数据,并将所述3D全景音视频数据推流到所述服务器。
所述服务器20设置为接收所述音视频采集设备推流的所述3D全景音视频数据。
所述用户终端30设置为从所述服务器实时获取所述转码后的3D全景音视频数据,并实时直播所述转码后的3D全景音视频数据。
其中,音视频采集设备10可以为全景拍摄装置,如全景相机。服务器20为广域网服务器或局域网服务器。多个所述用户终端30可以同时观看实时直播的所述转码后的3D全景音视频数据。
具体的,局域网服务器主要用于搭建局域网环境下支持多用户同时观看本地3D全景音视频数据的流媒体直播,它能够接受音视频采集设备推流过来的RTMP格式的音视频流,同时,支持多种音视频流格式的转换,例如转换成HTTP、HLS、RTP、RTSP、RTCP、RTMP、PNM、MMS、Onvif等协议,并进行音视频流的多路分发工作,以便于用户终端进行身临其境的实时3D全景音视频直播体验。
广域网服务器主要用于接收音视频采集设备通过以太网推流过来的音视频流并在云平台创建直播、生成推流地址或播放地址分发到用户终端,广域网服务还可以进行协议转换,把接收到的音视频流的格式转换成为HTTP、HLS、RTP、RTSP、RTCP、RTMP、PNM、MMS、Onvif等等多种视频格式,并分发到能够接受相应视频格式直播的用户终端。同时,音视频流在传输过程中还经过了CDN加速过程。
其中,所述用户终端30上配置有所述用户终端30的操作系统对应的全景播放器,所述操作系统包括以下中的任一个:视窗操作系统Windows、Mac OS、IOS以及安卓Android。
其中,用户终端30为广域网用户终端或局域网用户终端。
广域网用户终端可以包括但不限于:VR一体机(Virtual Reality All-in-one Headset)、手机、平板电脑、MAC电脑,笔记本电脑和台式机电脑等等,用户可以通过不同广域网用户终端上的播放器来实时体验远程的经过CDN加速的至少4K/30fps的3D全景音视频直播。此外,广域网用户终端能够支持多人同时在线观看。
局域网用户终端可以包括但不限于:VR一体机(Virtual Reality All-in-one Headset)、手机、平板电脑、MAC电脑,笔记本电脑和台式机电脑等等,用户可以通过局域网用户终端上的播放器来实时体验本地的至少4K/30fps的3D全景音视频直播。此外,局域网用户终端能够支持多人同时在线观看。
本公开实施例中,音视频采集设备10硬件同步采集多路原始视频数据以及多路原始音频数据,实现了多路原始视频数据以及多路原始音频数据的硬件同步采集,此外,获得的3D全景音视频数据的画质更清晰、图像畸变较小。音视频采集设备10将所述3D全景音视频数据推流到所述服务器20,经所述服务器20对所述3D全景音视频数据进行转码处理之后,用户终端30就可以从所述服务器20实时获取所述转码后的3D全景音视频数据,并实时直播所述转码后的3D全景音视频数据,这样,用户就可以观看清晰度更高、图像畸变更小,全景拼接指令更高的3D全景音视频了,从而能够增强用户的沉浸式体验。
请参阅图2,图2是本公开实施例公开的另一种3D全景音视频直播系统的结构示意图。其中,图2所示的3D全景音视频直播系统是在图1所示3D全景音视频直播系统的基础上进一步优化得到的,与图1所示的3D全景音视频直播系统相比,图2所示的3D全景音视频直播系统除了包括图1所示的3D全景音视频直播系统的所有模块外,
音视频采集设备10包括采集模块100和处理模块200,所述采集模块100通过M路第一移动产业处理器接口(Mobile Industry Processor Interface,MIPI)(比如第一MIPI 1、第一MIPI 2……第一MIPI M)与所述处理模块200连接,所述采集模块100包括N路摄像头模组(比如摄像头模组1、摄像头模组2……摄像头模组N)、P路拾音器(比如拾音器1、拾音器2……拾音器N)以及现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片110,所述N路摄像头模组通过N路第二MIPI(比如第二MIPI 1、第二MIPI 2……第二MIPI N)与所述FPGA芯片连接,所述P路拾音器通过P路第一音频数据接口与所述FPGA芯片连接,其中,M、N、P均为正整数,且M小于N;其中:
所述FPGA芯片110,设置为通过所述N路第二MIPI硬件同步采集来自所述N路摄像头模组的原始视频数据,并通过所述M路第一MIPI将所述原始视频数据并行发送给所述处理模块;
所述FPGA芯片110,还设置为通过所述P路第一音频数据接口硬件同步采集来自所述P路拾音器的原始音频数据,并通过第二音频数据接口将所述原始音频数据发送给所述处理模块;
所述处理模块200,设置为对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据。
本公开实施例中,所述N路摄像头模组通过N路第二MIPI与所述FPGA芯片110直接相连,由于第二MIPI的传输速度快,可以用于传输更高清、数据量更大的图像传感器数据,所述FPGA芯片110具有接口丰富且能够并行工作的特点,故所述FPGA芯片110能够通过N路第二MIPI硬件同步采集来自所述N路摄像头模组的原始视频数据,以及通过所述P路第一音频数据接口硬件同步采集来自所述P路拾音器的原始音频数据,即能够实现多路视频数据与多路 音频数据的硬件同步采集。此外,所述N路摄像头模组为高清摄像头模组,多路高清摄像头模组输入的视频数据能够支持更高像素和清晰度,故处理模块200获得的3D全景音视频数据的清晰度会很高,同时,N路高清摄像头模组的感光晶元的总尺寸会更大,进一步分担了拍摄全景中每个像素点承担成像角度的压力,使得图像畸变大大减小,同时像素稀释度大大降低,最终使得拼接的全景视频质量增高,从而能够提高画质的清晰度、减小图像畸变。
其中,针对每路所述摄像头模组,所述摄像头模组包括图像传感器以及所述图像传感器对应的镜头(图中未示出);可选的,N个所述镜头按照一个圆形进行镜头朝外均匀分布排列,或者,N个所述镜头按照镜头朝外均匀分布在一个球体上。
作为一种可选的实施方式,若N个所述镜头按照一个圆形进行镜头朝外均匀分布排列,则所述镜头为角度大于或等于180度的鱼眼镜头,每个所述图像传感器竖立放置;
在该实施方式中,需要每个所述图像传感器竖立放置,即图像传感器的长边与多个图像传感器水平均匀排布的圆周相垂直,这样,能够提高图像传感器的像素利用率和成像画质质量。
作为另一种可选的实施方式,若N个所述镜头按照镜头朝外均匀分布在一个球体上,则所述镜头为广角镜头,所述广角镜头的角度与所述图像传感器的数量相对应。
在该实施方式中,需要采用广角镜头,广角镜头的角度与所述图像传感器的数量相对应,即广角镜头的角度会根据图像传感器数目的不同而不同。
FPGA芯片110能够在同一时刻对N个图像传感器按照10bit精度进行硬件同步的原始视频数据采集,采集得到N路图像传感器的RAW DATA格式的原始 视频数据。
作为另一种可选的实施方式,所述FPGA芯片110包括:N个视频数据输入缓存单元以及M个视频数据输出缓存单元(图中为示出),N为M的整倍数,其中,
所述视频数据输入缓存单元,用于存储与所述视频数据输入缓存单元对应的所述摄像头模组的原始视频数据;
所述FPGA芯片110,还用于将N个所述视频数据输入缓存单元存储的原始视频数据均分为M组,获得每个分组的原始视频数据;
所述视频数据输出缓存单元,用于存储与所述视频数据输出缓存单元对应的分组的原始视频数据,以及用于通过所述第一MIPI将存储的原始视频数据发送给所述处理模块。
具体的,FPGA芯片110通过N路第二MIPI硬件同步采集来自N路摄像头模组的原始视频数据。同时,FPGA芯片110为这N路摄像头模组分别建立了视频数据输入缓存单元,即总共建立了N个视频数据输入缓存单元。其中,每个视频数据输入缓存单元可以存储X帧的视频数据帧,X大于等于1,这样有利于每一路摄像头模组高速且高帧率传输过来的原始视频数据能够及时地被接收与缓存,便于后续原始视频数据的处理,同时,防止因为后续处理工作效率不能够与摄像头模组输出效率匹配而造成的视频数据的丢失。
其中,每经过一次FPGA芯片110对N路摄像头模组的硬件同步采集所得到的原始视频数据可以称之为一组采集输入,N个经过硬件同步采集得到的原始视频数据,即一组采集输入,就会并行的存储到对应的视频数据输入缓存单元中,而且在每一个视频数据输入缓存单元中,每一次硬件同步采集到的对应于该视频数据输入缓存单元的原始视频数据会按照存储空间地址从低到高的顺 序或者从高到低的顺序依次地存储,直到该视频数据输入缓存单元的原始视频数据帧的数目达到X,如果再有新的原始视频数据帧输入进来,就会覆盖掉第一个存储到视频数据输入缓存单元的数据帧,并继续按照顺序依次存储并覆盖之前存储的原始视频数据。
此外,FPGA芯片110在进行N路摄像头模组的原始视频数据的采集并传输给视频数据输入缓存单元的过程中,FPGA芯片110还将N个视频数据输入缓存单元存储的原始视频数据均分为M组,获得每个分组的原始视频数据,并把每个分组的原始视频数据存储到与该分组对应的视频数据输出缓存单元,其中,视频数据输出缓存单元的数量为M个,每个分组的原始视频数据帧的数目为N/M个,其中,N为M的整倍数。M路视频数据输出缓存单元将通过M路所述第一MIPI将存储的原始视频数据并行地发送给所述处理模块。
同时,M路视频数据输出缓存单元也会向FPGA芯片110发出数据请求,该请求用于请求把N路视频数据输入缓存单元中所存储的一组采集输入继续均分为M组,并传递给M路视频数据输出缓存单元。这样FPGA芯片110内部依次实时地进行着N路摄像头模组的原始视频数据的硬件同步采集,并作为一组采集输入按照地址存储顺序存储到对应的N路视频数据输入缓存单元中,然后等到M路视频数据输出缓存单元发出数据请求的时候,再依次地把N路视频数据输入缓存单元中的原始视频数据以一组采集输入的方式,按照先进先出的原则依次的传递给M路视频数据输出缓存单元。
这样,M路视频数据输出缓存单元就可以通过M路所述第一MIPI将存储的原始视频数据发送给所述处理模块200,以便于所述处理模块200对原始视频数据进行后续处理。
作为另一种可选的实施方式,所述P路拾音器为P路模拟麦克,所述第一 音频数据接口为音频输入AIN接口。
具体的,FPGA芯片110通过P路AIN接口与P路模拟麦克进行连接,通过FPGA芯片110进行P路模拟音频信号放大、AGC(自动增益控制)、A/D采样、量化和编码,最后得到P路原始音频数据。可以根据精度和音质的不同需求来择取采集位数,比如8位、16位、24位等,可以根据不同音质要求来择取采样频率,比如22.05KHz、44.1KHz、48KHz。
作为另一种可选的实施方式,所述P路拾音器为P路数字麦克,所述第一音频数据接口为集成电路内置音频总线I2S接口。
具体的,FPGA芯片110通过P路I2S接口来接收P路数字麦克的原始音频数据,采样精度主要由数字麦克本身特性来限制。
作为另一种可选的实施方式,所述FPGA芯片110包括:P个音频数据输入缓存单元以及一个音频数据输出缓存单元,其中,
所述音频数据输入缓存单元,用于存储与所述音频数据输入缓存单元对应的所述拾音器的原始音频数据;
所述音频数据输出缓存单元,用于存储来自P个所述音频数据输入缓存单元的原始音频数据,以及用于通过所述第二音频数据接口将存储的原始音频数据发送给所述处理模块。
其中,第二音频数据接口可以包括但不限于USB2.0、USB3.0、McBSP、HDMI等接口。
具体的,FPGA芯片110为每路拾音器建立了一个音频数据输入缓存单元。同时,FPGA芯片110也为采集到的P路原始音频数据建立了一个总的音频数据输出缓存单元。无论拾音器是模拟麦克还是数字麦克,对于每个音频数据输入缓存单元来说,音频数据输入缓存单元存储与该音频数据输入缓存单元对应的 拾音器的原始音频数据,
当处理模块200对FPGA芯片110发送数据获取请求或音频数据输入缓存单元存储的数据填满了之后,P个音频数据输入缓存单元就会把存储的P路原始音频数据按照第1路~第P路的顺序依次传输给音频数据输出缓存单元,该音频数据输出缓存单元按照一定的格式对接收到的原始音频数据进行缓存,并通过所述第二音频数据接口将存储的原始音频数据发送给所述处理模块200。
作为另一种可选的实施方式,处理模块200包括:主控模块210、M路图像信号处理(Image Signal Processor,ISP)模块(比如ISP模块1、ISP模块2……ISP模块M)、图形处理器GPU模块220、编码模块230以及外部存储模块240。
所述主控模块210设置为通过所述M路第一MIPI并行接收所述原始视频数据,并通过调度所述M路ISP模块以及所述GPU模块220对所述原始视频数据进行处理,获得3D全景视频数据。
所述主控模块210还设置为通过所述第二音频数据接口接收所述原始音频数据,并对所述原始音频数据进行处理,获得全景音频数据。
所述主控模块210还设置为对所述3D全景视频数据以及所述全景音频数据进行方向匹配处理。
所述主控模块210还设置为通过调度所述编码模块230对匹配后的3D全景视频数据以及全景音频数据分别进行编码处理,以及对编码后的3D全景视频数据以及全景音频数据进行音视频同步处理,获得3D全景音视频数据。
所述主控模块210还设置为通过调度所述外部存储模块240存储所述3D全景音视频数据。
其中,所述主控模块210通过调度所述M路ISP模块以及所述GPU模块220对所述原始视频数据进行处理,获得3D全景视频数据的方式具体为:
所述主控模块210通过调度所述M路ISP模块对所述原始视频数据进行ISP处理,获得M路视频数据;
所述主控模块210通过调度所述GPU模块220对所述M路视频数据进行硬件加速的实时3D全景算法拼接和渲染处理,获得3D全景视频数据。
其中,所述主控模块210对所述原始音频数据进行处理,获得全景音频数据的方式具体为:
所述主控模块210对所述原始音频数据进行环绕立体声算法处理与合成,获得全景音频数据。
其中,主控模块210可以为中央处理器(Central Processing Unit,CPU),图像信号处理(Image Signal Processor,ISP),图形处理器(Graphics Processing Unit,GPU)。
具体的,本实用新型实施例中,FPGA芯片110通过M路第一MIPI将N路摄像头模组的RAW DATA格式的原始视频数据帧传输给处理模块200之后,处理模块200中的主控模块210通过所述M路第一MIPI并行接收所述原始视频数据,通过调度所述M路ISP模块对所述原始视频数据进行ISP处理,即进行3D降噪处理、图像质量优化处理以及将转换RAW DATA格式的原始视频数据转换成YUV格式的原始视频数据,最后获得M路视频数据;进一步地,通过调度所述GPU模块220对所述M路视频数据进行硬件加速的实时3D全景算法拼接和渲染处理,获得3D全景视频数据;同时,所述主控模块210对所述原始音频数据进行环绕立体声算法处理与合成,获得全景音频数据。其中,3D全景视频数据的分辨率、帧率和视频流的码率主要受到处理模块200本身性能的影响,至少能够实现编码4K/30fps且低码率的实时视频流。
进一步地,所述主控模块210还对所述3D全景视频数据以及所述全景音频 数据进行方向匹配处理,这样,可以使得全景音频能够根据3D全景视频不同的视角位置匹配模拟出真实场景中人耳感受到的声源位置的发生情况,进一步加强了体验者身临其境的震撼感。
此外,所述主控模块210还通过调度所述编码模块230对匹配后的3D全景视频数据进行硬件加速的H264/H265编码,以及对匹配后的全景音频数据进行硬件加速的AAC编码,进一步地,所述主控模块210把经过编码的3D全景视频数据以及全景音频数据进行音视频同步处理,以确保音视频数据的同步,这样,就获得了3D全景音视频数据。
作为另一种可选的实施方式,所述主控模块210还用于通过以太网将所述3D全景音视频数据以实时消息传输协议RTMP格式推流到局域网服务器或广域网服务器。
具体的,所述主控模块210还用于通过以太网以无线或者有线的方式将所述3D全景音视频数据以实时消息传输协议RTMP格式推流到局域网服务器或广域网服务器。
请参阅图3,图3是本公开实施例公开的一种音视频采集方法的流程示意图,其中,该方法应用于音视频采集设备,如图3所示,该方法可以包括步骤301至步骤303。
在步骤301、音视频采集设备硬件同步采集来自多路摄像头模组的原始视频数据。
具体的,音视频采集设备硬件同步采集来自多路摄像头模组的原始视频数据的方式具体可以为:通过多路移动产业处理器接口MIPI硬件同步采集来自所述多路摄像头模组的原始视频数据。
本公开实施例中,音视频采集设备中的FPGA芯片通过多路移动产业处理 器接口MIPI硬件同步采集来自多路摄像头模组的原始视频数据,其中,摄像头模组通过MIPI与FPGA芯片直接相连,MIPI的传输速度快,可以用于传输更高清、数据量更大的图像传感器数据,FPGA芯片具有接口丰富且能够并行工作的特点,故所述FPGA芯片能够通过多路MIPI硬件同步采集来自多路摄像头模组的原始视频数据。
在步骤302、音视频采集设备硬件同步采集来自多路拾音器的原始音频数据。
具体的,音视频采集设备通过多路第一音频数据接口硬件同步采集来自多路拾音器的原始音频数据,该第一音频数据接口可以包括但不限于USB2.0、USB3.0、McBSP、HDMI等接口。
在步骤303、音视频采集设备对原始视频数据以及原始音频数据进行处理,获得3D全景音视频数据。
作为一种可选的实施方式,所述音视频采集设备对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据的方式具体包括步骤11至步骤13。
在步骤11,音视频采集设备对所述原始视频数据进行处理,获得3D全景视频数据。
在步骤12,音视频采集设备对所述原始音频数据进行处理,获得全景音频数据。
在步骤13,音视频采集设备对所述3D全景视频数据以及所述全景音频数据进行处理,获得3D全景音视频数据。
具体的,音视频采集设备对所述原始视频数据进行处理,获得3D全景视频数据的方式具体为:音视频采集设备对所述原始视频数据进行图像信号处理ISP 处理,获得多路视频数据;对所述多路视频数据进行硬件加速的实时3D全景算法拼接和渲染处理,获得3D全景视频数据。
所述音视频采集设备对所述原始音频数据进行处理,获得全景音频数据的方式具体为:音视频采集设备对所述原始音频数据进行环绕立体声算法处理与合成,获得全景音频数据。
所述音视频采集设备对所述3D全景视频数据以及所述全景音频数据进行处理,获得3D全景音视频数据的方式具体为:音视频采集设备对所述3D全景视频数据以及所述全景音频数据进行方向匹配处理;对匹配后的3D全景视频数据以及全景音频数据分别进行编码处理,以及对编码后的3D全景视频数据以及全景音频数据进行音视频同步处理,获得3D全景音视频数据。
在该可选的实施方式中,音视频采集设备对所述原始视频数据进行ISP处理,即进行3D降噪处理、图像质量优化处理以及将转换RAW DATA格式的原始视频数据转换成YUV格式的原始视频数据,最后获得多路视频数据;进一步地,音视频采集设备对所述多路视频数据进行硬件加速的实时3D全景算法拼接和渲染处理,获得3D全景视频数据;同时,音视频采集设备对所述原始音频数据进行环绕立体声算法处理与合成,获得全景音频数据。更进一步地,音视频采集设备还对所述3D全景视频数据以及所述全景音频数据进行方向匹配处理,这样,可以使得全景音频能够根据3D全景视频不同的视角位置匹配模拟出真实场景中人耳感受到的声源位置的发生情况,进一步加强了体验者身临其境的震撼感。
此外,音视频采集设备对匹配后的3D全景视频数据进行硬件加速的H264/H265编码,以及对匹配后的全景音频数据进行硬件加速的AAC编码,把经过编码的3D全景视频数据以及全景音频数据进行音视频同步处理,以确保音 视频数据的同步,这样,就获得了3D全景音视频数据。
作为一种可选的实施方式,该方法还包括以下步骤:音视频采集设备通过以太网将所述3D全景音视频数据以实时消息传输协议RTMP格式推流到局域网服务器或广域网服务器。
具体的,音视频采集设备通过以太网以无线或者有线的方式将所述3D全景音视频数据以实时消息传输协议RTMP格式推流到局域网服务器或广域网服务器。
在该可选的实施方式中,局域网服务器主要用于搭建局域网环境下支持多用户同时观看本地3D全景音视频数据的流媒体直播,它能够接受音视频采集设备推流过来的RTMP格式的音视频流,同时,支持多种音视频流格式的转换,例如转换成HTTP、HLS、RTP、RTSP、RTCP、RTMP、PNM、MMS、Onvif等协议,并进行音视频流的多路分发工作,以便于用户终端进行身临其境的实时3D全景音视频直播体验。
广域网服务器主要用于接收音视频采集设备通过以太网推流过来的音视频流并在云平台创建直播、生成推流地址或播放地址分发到用户终端,广域网服务还可以进行协议转换,把接收到的音视频流的格式转换成为HTTP、HLS、RTP、RTSP、RTCP、RTMP、PNM、MMS、Onvif等等多种视频格式,并分发到能够接受相应视频格式直播的用户终端。同时,音视频流在传输过程中还经过了CDN加速过程。
需要说明的是,图3中相关步骤的描述具体还可以参照图1或图2中的描述,在此不再赘述。
在图3所描述的方法流程中,音视频采集设备硬件同步采集来自多路摄像头模组的原始视频数据以及硬件同步采集来自多路拾音器的原始音频数据,进 一步地,音视频采集设备对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据。可见,通过实施本公开实施例,能够实现多路原始视频数据以及多路原始音频数据的硬件同步采集。此外,由于摄像头模组为高清摄像头模组,多路高清摄像头模组输入的视频数据能够支持更高像素和清晰度,同时,高清摄像头模组的感光晶元的总尺寸会更大,进一步分担了拍摄全景中每个像素点承担成像角度的压力,使得图像畸变大大减小,同时像素稀释度大大降低,最终使得拼接的全景视频质量增高,从而能够提高画质的清晰度、减小图像畸变。
本公开还提供了一种音视频采集设备,包括至少一个处理器和与所述至少一个处理器通信连接的存储器,所述存储器用于存储可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行时,使所述至少一个处理器执行上述的音视频采集方法。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本公开所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略, 或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory, 简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上上述,本说明书内容不应理解为对本公开的限制。

Claims (17)

  1. 一种3D全景音视频直播系统,包括:音视频采集设备,设置为硬件同步采集多路原始视频数据以及多路原始音频数据,对所述多路原始视频数据以及所述多路原始音频数据进行处理,获得3D全景音视频数据,并将所述3D全景音视频数据推流到所述服务器;
    服务器,设置为接收所述音视频采集设备推流的所述3D全景音视频数据,将所述3D全景音视频数据进行转码处理,以及将转码后的3D全景音视频数据分发给所述用户终端;以及
    用户终端,设置为用于从所述服务器实时获取所述转码后的3D全景音视频数据,并实时直播所述转码后的3D全景音视频数据。
  2. 根据权利要求1所述的3D全景音视频直播系统,其中,所述音视频采集设备包括采集模块和处理模块,
    所述采集模块包括N路摄像头模组、P路拾音器以及现场可编程门阵列(FPGA)芯片,
    所述现场可编程门阵列芯片通过M路第一移动产业处理器接口(MIPI)与所述处理模块连接;
    所述N路摄像头模组通过N路第二移动产业处理器接口与所述现场可编程门阵列芯片连接,设置为硬件同步采集多路原始视频数据;
    所述P路拾音器通过P路第一音频数据接口与所述现场可编程门阵列芯片连接,设置为硬件同步采集多路原始音频数据;
    所述现场可编程门阵列芯片通过第二音频数据接口与所述处理模块连接,其中,M、N、P均为正整数,且M小于N;
    所述处理模块设置为对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据。
  3. 根据权利要求2所述的3D全景音视频直播系统,其中,所述处理模块 包括:主控模块、M路图像信号处理器、图形处理器、编码模块以及外部存储模块,其中,
    所述主控模块设置为通过所述M路第一MIPI并行接收所述原始视频数据,并通过调度所述M路图像信号处理器以及所述图形处理器对所述原始视频数据进行处理,获得3D全景视频数据;
    所述主控模块还设置为通过所述第二音频数据接口接收所述原始音频数据,并对所述原始音频数据进行处理,获得全景音频数据;
    所述主控模块还设置为对所述3D全景视频数据以及所述全景音频数据进行方向匹配处理;
    所述主控模块还设置为通过调度所述编码模块对匹配后的3D全景视频数据以及全景音频数据分别进行编码处理,以及对编码后的3D全景视频数据以及全景音频数据进行音视频同步处理,获得3D全景音视频数据;
    所述主控模块还设置为通过调度所述外部存储模块存储所述3D全景音视频数据。
  4. 根据权利要求3所述的3D全景音视频直播系统,其中,所述主控模块还设置为通过以太网将所述3D全景音视频数据以实时消息传输协议RTMP格式推流到所述服务器。
  5. 根据权利要求2~4任一项所述的3D全景音视频直播系统,其中,每路摄像头模组包括图像传感器以及镜头;
    N路摄像头模组的N个所述镜头按照一个圆形进行镜头朝外均匀分布排列,或者,N路摄像头模组的N个所述镜头按照镜头朝外均匀分布在一个球体上。
  6. 根据权利要求5所述的3D全景音视频直播系统,其中,若N个所述镜头按照一个圆形进行镜头朝外均匀分布排列,则所述镜头为角度大于或等于180度的鱼眼镜头,N个所述图像传感器均竖立放置;
    若N个所述镜头按照镜头朝外均匀分布在一个球体上,则所述镜头为广角 镜头,所述广角镜头的角度与所述图像传感器的数量相对应。
  7. 根据权利要求2~4任一项所述的3D全景音视频直播系统,其中,所述FPGA芯片包括:N个视频数据输入缓存单元以及M个视频数据输出缓存单元,N为M的整倍数,其中,
    所述视频数据输入缓存单元,设置为存储与所述视频数据输入缓存单元对应的所述摄像头模组的原始视频数据;
    所述FPGA芯片还设置为将N个所述视频数据输入缓存单元存储的原始视频数据均分为M组,获得M组原始视频数据;
    所述视频数据输出缓存单元设置为存储与所述视频数据输出缓存单元对应的分组的原始视频数据,以及用于通过所述第一MIPI将存储的原始视频数据发送给所述处理模块。
  8. 根据权利要求2~4任一项所述的3D全景音视频直播系统,其中,所述FPGA芯片包括:P个音频数据输入缓存单元以及一个音频数据输出缓存单元,其中,
    所述音频数据输入缓存单元设置为存储与所述音频数据输入缓存单元对应的原始音频数据,
    所述音频数据输出缓存单元设置为存储来自P个所述音频数据输入缓存单元的原始音频数据,以及用于通过所述第二音频数据接口将存储的原始音频数据发送给所述处理模块。
  9. 根据权利要求2~4任一项所述的3D全景音视频直播系统,其中,所述P路拾音器为P路模拟麦克,所述第一音频数据接口为音频输入AIN接口;或,
    所述P路拾音器为P路数字麦克,所述第一音频数据接口为集成电路内置音频总线I2S接口。
  10. 根据权利要求2~4任一项所述的3D全景音视频直播系统,其中,所述服务器为广域网服务器或局域网服务器,所述用户终端为广域网用户终端或局 域网用户终端。
  11. 根据权利要求2~4任一项所述的3D全景音视频直播系统,其中,所述用户终端上配置有与所述用户终端的操作系统对应的全景播放器,所述操作系统选自:视窗操作系统Windows、MacOS、IOS以及安卓Android。
  12. 一种音视频采集方法,应用于音视频采集设备,包括:
    硬件同步采集来自多路摄像头模组的原始视频数据;
    硬件同步采集来自多路拾音器的原始音频数据;以及
    对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据。
  13. 根据权利要求12所述的音视频采集方法,其中,所述硬件同步采集来自多路摄像头模组的原始视频数据包括:
    通过多路移动产业处理器接口MIPI硬件同步采集来自所述多路摄像头模组的原始视频数据。
  14. 根据权利要求12所述的音视频采集方法,其中,所述对所述原始视频数据以及所述原始音频数据进行处理,获得3D全景音视频数据包括:
    对所述原始视频数据进行处理,获得3D全景视频数据;
    对所述原始音频数据进行处理,获得全景音频数据;
    对所述3D全景视频数据以及所述全景音频数据进行处理,获得3D全景音视频数据。
  15. 根据权利要求14所述的音视频采集方法,其中,所述对所述原始视频数据进行处理,获得3D全景视频数据的方式具体为:
    对所述原始视频数据进行图像信号处理ISP处理,获得M路视频数据;对所述M路视频数据进行硬件加速的实时3D全景算法拼接和渲染处理,获得3D全景视频数据;
    所述对所述原始音频数据进行处理,获得全景音频数据的方式具体为:
    对所述原始音频数据进行环绕立体声算法处理与合成,获得全景音频数据;
    所述对所述3D全景视频数据以及所述全景音频数据进行处理,获得3D全景音视频数据的方式具体为:
    对所述3D全景视频数据以及所述全景音频数据进行方向匹配处理;对匹配后的3D全景视频数据以及全景音频数据分别进行编码处理,以及对编码后的3D全景视频数据以及全景音频数据进行音视频同步处理,获得3D全景音视频数据。
  16. 根据权利要求12~15任一项所述的音视频采集方法,还包括:
    通过以太网将所述3D全景音视频数据以实时消息传输协议RTMP格式推流到局域网服务器或广域网服务器。
  17. 一种音视频采集设备,包括至少一个处理器和与所述至少一个处理器通信连接的存储器,所述存储器用于存储可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行时,使所述至少一个处理器执行如权利要求12至16任一项所述的音视频采集方法。
PCT/CN2017/084482 2016-11-01 2017-05-16 3d全景音视频直播系统及音视频采集方法 WO2018082284A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610935572.8A CN106992959B (zh) 2016-11-01 2016-11-01 一种3d全景音视频直播系统及音视频采集方法
CN201610935572.8 2016-11-01

Publications (1)

Publication Number Publication Date
WO2018082284A1 true WO2018082284A1 (zh) 2018-05-11

Family

ID=59414484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/084482 WO2018082284A1 (zh) 2016-11-01 2017-05-16 3d全景音视频直播系统及音视频采集方法

Country Status (2)

Country Link
CN (1) CN106992959B (zh)
WO (1) WO2018082284A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112866713A (zh) * 2021-01-19 2021-05-28 北京睿芯高通量科技有限公司 一种转码一体机系统以及转码方法
CN112954272A (zh) * 2021-01-29 2021-06-11 上海商汤临港智能科技有限公司 相机模组、数据传输方法及装置、存储介质和车辆
CN112954394A (zh) * 2021-01-28 2021-06-11 广州虎牙科技有限公司 一种高清视频的编码及解码播放方法、装置、设备和介质
CN113315940A (zh) * 2021-03-23 2021-08-27 海南视联通信技术有限公司 一种视频通话方法、装置及计算机可读存储介质
CN115102929A (zh) * 2021-03-03 2022-09-23 阿里巴巴(中国)有限公司 音频处理系统、中间层芯片及音频处理设备

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107205122A (zh) * 2017-08-03 2017-09-26 哈尔滨市舍科技有限公司 多分辨率全景视频直播拍照系统与方法
CN107396122A (zh) * 2017-08-11 2017-11-24 西安万像电子科技有限公司 音视频数据输入/输出方法、装置及设备
CN108989739B (zh) * 2018-07-24 2020-12-18 上海国茂数字技术有限公司 一种全视角视频会议直播系统及方法
CN110908643B (zh) * 2018-09-14 2023-05-05 阿里巴巴集团控股有限公司 软件开发工具包的配置方法、装置和系统
CN109951650B (zh) * 2019-01-07 2024-02-09 北京汉博信息技术有限公司 校园电台系统
CN109743643B (zh) * 2019-01-16 2022-04-01 成都合盛智联科技有限公司 楼宇对讲系统的处理方法及装置
CN112073748B (zh) * 2019-06-10 2022-03-18 北京字节跳动网络技术有限公司 全景视频的处理方法、装置及存储介质
CN111031327A (zh) * 2019-11-06 2020-04-17 石家庄微泽科技有限公司 一种全景播放的方法
CN111416989A (zh) * 2020-04-28 2020-07-14 北京金山云网络技术有限公司 视频直播方法、系统及电子设备
CN111642890A (zh) * 2020-07-07 2020-09-11 北京兰亭数字科技有限公司 一种8k5gvr背包
CN111901351A (zh) * 2020-07-30 2020-11-06 西安万像电子科技有限公司 远程教学系统、方法、装置以及语音网关路由器
CN113132672B (zh) * 2021-03-24 2022-07-26 联想(北京)有限公司 一种数据处理方法以及视频会议设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103220543A (zh) * 2013-04-25 2013-07-24 同济大学 基于Kinect的实时3D视频通信系统及其实现方法
CN205071232U (zh) * 2015-09-24 2016-03-02 北京工业大学 一种3d全景视频采集装置
CN205320214U (zh) * 2016-01-28 2016-06-15 北京极图科技有限公司 3dvr 全景视频成像装置
CN206117891U (zh) * 2016-11-01 2017-04-19 深圳市圆周率软件科技有限责任公司 一种音视频采集设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102790872B (zh) * 2011-05-20 2016-11-16 南京中兴软件有限责任公司 一种视频会议的实现方法及系统
CN202395858U (zh) * 2011-12-14 2012-08-22 深圳市中控生物识别技术有限公司 一种双目摄像装置
CN103297688A (zh) * 2013-04-16 2013-09-11 宁波高新区阶梯科技有限公司 一种多媒体全景录制系统及录制方法
CN104570577B (zh) * 2015-02-12 2018-06-05 沈靖程 一种720度全景照相机
CN105120193A (zh) * 2015-08-06 2015-12-02 佛山六滴电子科技有限公司 一种录制全景视频的设备及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103220543A (zh) * 2013-04-25 2013-07-24 同济大学 基于Kinect的实时3D视频通信系统及其实现方法
CN205071232U (zh) * 2015-09-24 2016-03-02 北京工业大学 一种3d全景视频采集装置
CN205320214U (zh) * 2016-01-28 2016-06-15 北京极图科技有限公司 3dvr 全景视频成像装置
CN206117891U (zh) * 2016-11-01 2017-04-19 深圳市圆周率软件科技有限责任公司 一种音视频采集设备

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112866713A (zh) * 2021-01-19 2021-05-28 北京睿芯高通量科技有限公司 一种转码一体机系统以及转码方法
CN112954394A (zh) * 2021-01-28 2021-06-11 广州虎牙科技有限公司 一种高清视频的编码及解码播放方法、装置、设备和介质
CN112954272A (zh) * 2021-01-29 2021-06-11 上海商汤临港智能科技有限公司 相机模组、数据传输方法及装置、存储介质和车辆
CN112954272B (zh) * 2021-01-29 2023-10-24 上海商汤临港智能科技有限公司 相机模组、数据传输方法及装置、存储介质和车辆
CN115102929A (zh) * 2021-03-03 2022-09-23 阿里巴巴(中国)有限公司 音频处理系统、中间层芯片及音频处理设备
CN115102929B (zh) * 2021-03-03 2024-02-13 阿里巴巴(中国)有限公司 音频处理系统、中间层芯片及音频处理设备
CN113315940A (zh) * 2021-03-23 2021-08-27 海南视联通信技术有限公司 一种视频通话方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
CN106992959A (zh) 2017-07-28
CN106992959B (zh) 2023-08-18

Similar Documents

Publication Publication Date Title
WO2018082284A1 (zh) 3d全景音视频直播系统及音视频采集方法
US10021301B2 (en) Omnidirectional camera with multiple processors and/or multiple sensors connected to each processor
US8477950B2 (en) Home theater component for a virtualized home theater system
US9843725B2 (en) Omnidirectional camera with multiple processors and/or multiple sensors connected to each processor
JP6377784B2 (ja) オーディオビデオ同期取込によって一対多オーディオビデオストリーミングを行う方法
WO2018068481A1 (zh) 一种双目 720 度全景采集系统
WO2017092338A1 (zh) 一种数据传输的方法和装置
US20150139614A1 (en) Input/output system for editing and playing ultra-high definition image
WO2017166721A1 (zh) 视频直播的方法、装置及系统
CN206117891U (zh) 一种音视频采集设备
KR101611531B1 (ko) 촬영장치 및 촬영영상 제공방법
US20230283888A1 (en) Processing method and electronic device
US20180376181A1 (en) Networked video communication applicable to gigabit ethernet
CN207443024U (zh) 全景音视频录制设备及系统
US10721500B2 (en) Systems and methods for live multimedia information collection, presentation, and standardization
WO2011099254A1 (ja) データ処理装置及びデータ符号化装置
CN110602523A (zh) 一种vr全景直播多媒体处理合成系统和方法
CN109756683B (zh) 全景音视频录制方法、装置、存储介质和计算机设备
WO2012067051A1 (ja) 映像処理サーバおよび映像処理方法
CN209402583U (zh) 一种数据收发设备和录播系统
KR102637147B1 (ko) 세로 모드 스트리밍 방법, 및 이동형 세로 모드 스트리밍 시스템
CN109348245B (zh) 一种4k全景超融合多通道监测方法及装置
KR101676400B1 (ko) 촬영장치 및 촬영영상 제공방법
Alvarez-Mesa et al. Global 8K Live Streaming Showcase 2020: The technologies behind the scenes
TWI532376B (zh) 影像串流系統及其電腦裝置與影像串流方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17866588

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17866588

Country of ref document: EP

Kind code of ref document: A1