WO2024061295A1 - 视频数据的处理方法和系统 - Google Patents

视频数据的处理方法和系统 Download PDF

Info

Publication number
WO2024061295A1
WO2024061295A1 PCT/CN2023/120228 CN2023120228W WO2024061295A1 WO 2024061295 A1 WO2024061295 A1 WO 2024061295A1 CN 2023120228 W CN2023120228 W CN 2023120228W WO 2024061295 A1 WO2024061295 A1 WO 2024061295A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
frame
frames
encoding
spliced
Prior art date
Application number
PCT/CN2023/120228
Other languages
English (en)
French (fr)
Inventor
陈科
孙洪军
朱祥
Original Assignee
上海微创医疗机器人(集团)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海微创医疗机器人(集团)股份有限公司 filed Critical 上海微创医疗机器人(集团)股份有限公司
Publication of WO2024061295A1 publication Critical patent/WO2024061295A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources

Definitions

  • the present application relates to the technical field of long-distance synchronous transmission of image data, and in particular to a method, system, computer equipment and storage medium for processing video data.
  • multi-channel video sources are widely used in medical, film and television, navigation and other fields.
  • the synchronous playback method of multi-channel video sources is mainly cache synchronization. Synchronization control is performed by extracting time stamps in the data stream and adding header information such as key frame information and timestamps to the video frames.
  • this method will still cause the video frames of the multi-channel video source to be out of sync, which will affect the use of the product.
  • smearing will occur, resulting in unclear 3D images. , and the viewer may even become dizzy.
  • This application provides a video data processing method, which method includes:
  • the spliced video data includes the spliced video frames and the splicing information of the spliced video frames;
  • This application also provides a video data processing system, which includes:
  • the first acquisition module is used to acquire multi-channel video data from at least two different video sources
  • the frame splicing module is used to splice the video frames of different video sources in the multi-channel video data at the same time into one spliced video frame to obtain the spliced video data.
  • the spliced video data includes the spliced video frames and the splicing information of the spliced video frames.
  • An encoding module used for encoding the spliced video data to obtain multiple encoding frames
  • the encapsulation module is used to encapsulate multiple encoded frames to obtain a video stream to be transmitted, and transmit the video stream to the target decoder.
  • the above video data processing method and system obtains multi-channel video data from at least two different video sources, splices the video frames of different video sources at the same time into one spliced video frame, encodes and encapsulates it, and sends it to the target decoding end. , since video frames from different video sources at the same time are spliced into one spliced video frame, video frames from different video sources at the same time can be sent at the same time, achieving absolute consistency in the sending time of video frames from different video sources at the same time. This enables synchronous transmission of video frames from different video sources at the same time.
  • Figure 1 is an application environment diagram of a video data processing method in one embodiment
  • Figure 2 is a schematic flow chart of a video data processing method in one embodiment
  • Figure 3 is a multi-channel distribution network connection diagram based on the transfer server, encoding end and decoding end in one embodiment
  • Figure 4 is a schematic flowchart of splicing video frames from different video sources at the same time into one spliced video frame in another embodiment
  • Figure 5 is a schematic diagram of data encapsulation and transmission in one embodiment
  • Figure 6 is a structural diagram of a hardware combination system that implements frame splicing and frame splitting in one embodiment
  • Figure 7 is a flow chart of ordinary frame processing at the encoding end in one embodiment
  • Figure 8 is a flow chart of key frame processing at the encoding end in one embodiment
  • FIG9 is a key frame processing example at the encoding end in one embodiment
  • Figure 10 is a flow chart of video stream transmission at the encoding end in one embodiment
  • Figure 11 is a flow chart of video stream reception at the decoding end in one embodiment
  • Figure 12 is a schematic diagram of frame restoration at the decoding end in one embodiment
  • Figure 13 is a schematic diagram of the frame splitting function of the decoding end in one embodiment
  • Figure 14 is a flow chart of video stream reception at the decoding end in one embodiment
  • Figure 15 is a local multiplex distribution network connection diagram in one embodiment
  • Figure 16 is a flow chart of video stream forwarding by the relay server in one embodiment
  • Figure 17 is a transfer flow chart of the transfer server in one embodiment
  • Figure 18 is a flow chart of pairing settings between the encoding end and the decoding end in one embodiment
  • Figure 19 is a distribution principle diagram of the relay server in one embodiment
  • Figure 20 is a communication flow chart between the encoding end and the decoding end in one embodiment
  • Figure 21 is a structural block diagram of a video data processing system in one embodiment
  • Figure 22 is an internal structure diagram of a computer device in one embodiment.
  • the video data processing method provided by the embodiment of the present application can be applied in the application environment as shown in Figure 1.
  • the encoding end 102 obtains multi-channel video data from at least two different video sources; splices the video frames of different video sources in the multi-channel video data at the same time into one spliced video frame to obtain multiple spliced video data,
  • Each spliced video data includes spliced video frames and splicing information of the spliced video frames;
  • the spliced video data is encoded to obtain multiple encoded frames;
  • the multiple encoded frames are encapsulated to obtain a video stream to be transmitted, and the video stream is Transmitted to the transfer server 104 or the target decoding end.
  • the decoding end 106 receives the video stream sent by the encoding end 102 or receives the video stream forwarded by the relay server 104.
  • the decoding end 106 decapsulates the video stream to obtain multiple encoded frames; decodes the encoded frames to obtain spliced video data.
  • the spliced video data includes spliced video frames and splicing information of the spliced video frames; according to the splicing information of each spliced video frame, each spliced video frame is split to obtain at least two video frames of different video sources at the same time; for different video sources The video frames at the same time are rendered and displayed.
  • the encoding end 102 When the encoding end 102 and the decoding end 106 are deployed in different local area networks, the encoding end 102 establishes a connection with the decoding end 106 through the transit server 104; when the encoding end 102 and the decoding end 106 are deployed in the same local area network, the encoding end 102 directly establishes a connection with the decoding end 106 through the local area network.
  • the data storage system can store the data that the transit server 104 needs to process.
  • the data storage system can be integrated on the transit server 104, or it can be placed on the cloud or other network servers.
  • the encoding end 102 and the decoding end 106 can be processors of computer devices, and the computer devices are not limited to various personal computers, laptops, smart phones, tablet computers, Internet of Things devices and portable wearable devices.
  • the Internet of Things devices can be laparoscopic robots, etc.
  • Portable wearable devices can be smart watches, smart bracelets, head-mounted devices, etc.
  • the transit server 104 can be implemented with an independent server or a server cluster consisting of multiple servers.
  • a video data processing method is provided. This method is explained by taking the method applied to the encoding end 102 in Figure 1 as an example, and includes the following steps:
  • S202 Obtain multi-channel video data from at least two different video sources.
  • the video source can be a left-eye video source and a right-eye video source of the laparoscopic robot, and the left-eye video source and the right-eye video source both output dual-channel video data, and each channel of video data includes a video frame and a timestamp of the video frame.
  • a multi-channel distribution network connection diagram is constructed based on the transfer server, encoding end and decoding end.
  • Devices B3, B6, B7 and B9 can be switched to the encoding end or decoding end according to the actual scenario.
  • B1 and B4 are both laparoscopic robots
  • B2, B5, B8 and B10 are all local monitors.
  • the laparoscopic robot B1 in local operating room A transmits dual-channel endoscopic images through optical fiber b1. It is transmitted to the encoding end B3.
  • the encoding end B3 performs frame merging, encoding and compression on the dual-channel video data and then sends it to the transfer server or target solution through the high-speed network b5.
  • the encoding end B3 loops out the dual-channel endoscope image and transmits it to the local monitor B2 through the optical fiber b2.
  • S204 Splice video frames from different video sources in the multi-channel video data at the same time into one spliced video frame to obtain spliced video data.
  • the spliced video data includes spliced video frames and splicing information of the spliced video frames.
  • the encoding end splices the video frames of different video sources at the same time into one spliced video frame, and then performs encoding and compression processing after splicing, resulting in The video stream to be transmitted, and finally the video stream to be transmitted is transmitted to the target decoder.
  • algorithms such as frame image splicing algorithm or global iterative nearest method can be used to splice the video frames of different video sources in the multi-channel video data at the same time into one spliced video frame.
  • the splicing direction can be horizontal splicing or horizontal splicing. Vertical splicing, the splicing algorithm and splicing direction are not limited here.
  • the encoding end of this embodiment splices the video frames of different video sources in the multi-channel video data at the same time into one spliced video frame, thereby obtaining multiple spliced video data.
  • the splicing information of the spliced video frames is used to identify the range of pixels of each video frame in the spliced video data before splicing. For example, video frames from different video sources at the same time are recorded as the first video frame and the second video frame respectively.
  • the splicing information of the spliced video frames identifies the original pixel point coordinates and range of the first video frame, and the original pixel point coordinates and range of the second video frame. Pixel coordinates and range.
  • the dual-channel video data of the left eye video source of the laparoscopic robot is recorded as endoscopic image-L
  • the dual-channel video data of the right eye video source is recorded as endoscopic image-R.
  • the resolutions of the endoscopic image-L and the endoscopic image-R are both 1920*1080P.
  • the video frames of the left eye video source and the right eye video source of the laparoscopic robot at the same time are spliced into one Frame splicing video frame
  • the resolution of the splicing video frame is 3840*1080P
  • the splicing direction is horizontal
  • the splicing information of the splicing video frame identifies the 1st pixel to the 1920th pixel in the horizontal direction
  • the range of 1080 pixels in the vertical direction is the pixel range of the dual-channel video data of the left-eye video source
  • the range of 1080 pixels in the vertical direction are the pixels of the dual-channel video data of the right-eye video source point range.
  • the encoding end splices video frames from different video sources at the same time in the multi-channel video data into a spliced video frame, configures splicing information for identifying the spliced video frame for each spliced video frame, and obtains multiple spliced video data.
  • S206 Encode the spliced video data to obtain multiple encoded frames.
  • the basic principle of encoding is to represent and transmit video data using a certain form of code stream according to certain rules.
  • the main purpose of encoding spliced video data is data compression to solve the problem that storage space and transmission bandwidth cannot meet the storage and transmission requirements.
  • the encoding may be H.261, H.262, H.263 or H.264 encoding. This embodiment uses H.264 encoding.
  • S208 Encapsulate multiple encoded frames to obtain a video stream to be transmitted, and transmit the video stream to the target decoder.
  • the function of encapsulation is to protect or prevent the encoded frame from being damaged or modified.
  • the most commonly used encapsulation protocols are PPP/HDLC, LAPS, and GFP.
  • This embodiment uses the UDP protocol to transmit the video stream, as shown in Figure 5.
  • a 4-byte data length, a 2-byte frame number and H.264 frame data are also added to the UDP data part. Encapsulate and send.
  • the encoding end adds a DUP header to the data packet of each encoded frame, and adds 4 bytes of data length, 2 bytes of frame number and H.264 frame data to the UDP data part for encapsulation before sending. to the target decoder.
  • multi-channel video data from at least two different video sources are obtained, spliced into one spliced video frame according to the video frames of the different video sources at the same time, and then decoded and encapsulated and sent to the target decoder.
  • video frames from different video sources at the same time are spliced into one spliced video frame, video frames from different video sources at the same time can be sent at the same time, achieving absolute consistency in the sending time of video frames from different video sources at the same time. This enables synchronous transmission of video frames from different video sources at the same time.
  • the current method for splicing image frames generally uses a matrix splicer for frame splicing.
  • a matrix splicer with excellent performance basically has an image delay of about 30ms. If both the encoding end and the decoding end use matrix splicing, If the processor performs frame splicing and frame splitting on the image, the image delay will increase by about 60ms. In other words, even using a matrix splicer cannot achieve absolutely consistent transmission times at the encoding end. Therefore, in order to solve the above problem, the encoding end of this embodiment adopts a hardware combination system to implement frame splicing and frame splitting.
  • the system structure of the hardware combination is shown in Figure 6, including HDMI decoding end, HDMI encoding end, CPU chip and FPGA processing module, and the left eye video source of the laparoscopic robot
  • the dual-channel video data from the right eye video source is decoded by two HDMI decoding terminals respectively.
  • After decoding, it is processed by the FPGA processing module for hardware acceleration, and then encoded by two HDMI encoding terminals to obtain lossless splicing.
  • Video frames including HDMI decoding end, HDMI encoding end, CPU chip and FPGA processing module, and the left eye video source of the laparoscopic robot.
  • the dual-channel video data from the right eye video source is decoded by two HDMI decoding terminals respectively. After decoding, it is processed by the FPGA processing module for hardware acceleration, and then encoded by two HDMI encoding terminals to obtain lossless splicing.
  • Video frames are shown in Figure 6, including HDMI decoding end, HDMI encoding end, CPU chip and FPGA processing module, and the left eye video source of the laparoscopic robot
  • This embodiment uses an FPGA hardware system to losslessly stitch multi-channel video data from two different video sources to obtain stitched video data. Compared with the high latency of traditional image stitchers, this embodiment uses a method of stitching using an FPGA hardware system, which has the characteristics of high efficiency and low latency.
  • 3D image data is prone to frame loss during remote synchronous transmission. If frames are lost in one video source, the overall 3D image visual effect will be affected. Therefore, in order to solve the above problem, the encoding end determines whether the current encoding frame is a key frame before encapsulating multiple encoding frames. When the current encoding frame is a key frame, the key frame is copied.
  • the current coded frame is determined to be a normal frame, and normal frame information is identified in the data packet of the normal frame.
  • the normal frame information is used to identify the current coded frame as a normal frame, and the normal frame information can be a constant frame number or a specific character, for example, the constant frame number can be 000000000.
  • the encoding end encapsulates ordinary frames.
  • the UDP protocol is used, and a 4-byte data length, a 2-byte frame number and H264 frame data are added to the UDP data part.
  • the 2-byte frame number in the UDP data part of the ordinary frame is the constant frame number 000000000.
  • the purpose of adding a 2-byte constant frame number to the UDP data part of the ordinary frame is that after the decoding end decapsulates the ordinary frame and removes the UDP header and data length, a 2-byte constant is obtained.
  • Frame number based on the 2-byte constant frame number, determine whether the encoded frame of the current encapsulated data packet is a normal frame.
  • the key frame refers to the frame corresponding to the key action in the movement change of the character or object, which is recorded as I frame.
  • Ordinary frames include forward prediction frames and bidirectional interpolation frames. Forward prediction frames are denoted as P frames, and bidirectional interpolation frames are denoted as B frames.
  • the I frame is a complete picture, while the P frame and B frame record changes relative to the I frame. Without the I frame, the P frame and B frame cannot be decoded.
  • the video frame transmission rate can be improved, the frame loss rate of video frames during the remote transmission of three-dimensional image data can be effectively reduced, and the problem of key frame loss affecting the visual effects of the three-dimensional image can be avoided.
  • this embodiment only copies key frames, which can effectively reduce the bandwidth resources required for network transmission.
  • copying the key frame includes the following steps:
  • This embodiment takes copying 3 key frames as an example, and records the network packet loss rate as X.
  • the encoding end After the encoding end obtains multiple encoding frames, it determines whether the current encoding frame is a key frame. When the current encoding frame is a key frame, it copies 2 or 3 key frames.
  • S804 Identify the key frame information in the data packet of each key frame, where the key frame information of the same key frame is the same.
  • the key frame information is used to identify the current encoding frame as a key frame.
  • the key frame information may be a frame number or a specific character.
  • the key frame information may be a frame number 000000001.
  • the copied key frames are encapsulated to obtain the video stream to be transmitted, and the video stream is transmitted to the target decoder.
  • the target decoding end decapsulates the video stream to obtain the encoded frame, and the decoding end decodes the encoded frame.
  • the same key frame needs to be processed multiple times. Secondary decoding reduces the decoding efficiency of the decoder and increases the frame number difference between the video played by the decoder and the video source of the encoder, causing the video quality played by the decoder to be lower than the video quality of the encoder.
  • the encoding end of this embodiment identifies the key frame information in the data packet of each key frame, wherein the key frame information of the same key frame is the same.
  • the decoder determines whether the current encoded frame is a key frame based on the key frame information.
  • the encoder identifies the key frame information in the key frame data packet, which facilitates the decoder to identify whether the current encoded frame is a key frame and improves the key frame quality of the encoder.
  • Recognition efficiency the encoding end sets the key frame information of the same key frame to be the same, so that the decoding end can judge whether the current encoded frame has been decoded based on the key frame information.
  • the key frame information of the current key frame overlaps with the decoded key frame, discard it. Encoded frames.
  • the copied key frames are encapsulated together.
  • the same key frame information is added to the data packet of each copied key frame, and Data is encapsulated and sent according to the UDP protocol.
  • 2 or 3 key frames are copied on the encoding end to ensure the balance between network bandwidth and the frame loss rate of video frames; key frame information is identified in the data packet of each key frame to facilitate the decoding end. Identify whether the current encoded frame is a key frame and improve the key frame identification efficiency of the encoding end; the encoding end sets the key frame information of the same key frame to be the same, so that the decoding end can judge whether the current encoding frame has been decoded based on the key frame information.
  • the encoded frame is discarded and the frame discarding method is used. On the one hand, it can reduce the frame number difference between the remotely played video and the source video; on the other hand, it can filter out unnecessary Copy keyframes to achieve a playback effect that is as close to or equal to the quality of the source video as possible.
  • a video data processing method is provided. This method is explained by taking the method applied to the encoding end 102 in Figure 1 as an example, and includes the following steps:
  • S1004 splicing video frames from different video sources in the multiple channels of video data at the same time into a spliced video frame to obtain spliced video data, where the spliced video data includes the spliced video frame and splicing information of the spliced video frame.
  • S1006 Encode the spliced video data to obtain multiple encoded frames.
  • S1008 Determine whether the current encoded frame is a key frame. When the current encoded frame is not a key frame, execute S1010; when the current encoded frame is a key frame, execute S1012.
  • S1010 determine that the current encoded frame is an ordinary frame, identify the ordinary frame information in the data packet of the ordinary frame, and execute S1016.
  • S1016 Encapsulate the encoded frame to obtain the video stream to be transmitted, and transmit the video stream to the target decoder.
  • S1018 determine whether the multiple channels of video data are all encapsulated. When the multiple channels of video data are all encapsulated, the process ends; when the multiple channels of video data are not all encapsulated, execute S1002.
  • video frames from different video sources at the same time are spliced into one spliced video frame, so that video frames from different video sources at the same time can be sent at the same time, achieving absolute transmission time of video frames from different video sources at the same time. Consistent, thereby achieving synchronous transmission of video frames from different video sources at the same time; by copying key frames, on the one hand, the video frame transmission rate can be improved, and the frame loss rate of video frames during the remote transmission of three-dimensional image data can be effectively reduced. This avoids the problem of key frame loss affecting the visual effects of the three-dimensional image. On the other hand, compared with the solution of copying all video frames, this embodiment only copies the key frames, which can effectively reduce the bandwidth resources required for network transmission.
  • a video data processing method is provided. This method is explained by taking the method applied to the decoder 106 in Figure 1 as an example, and includes the following steps:
  • the decoding end when the encoding end and the decoding end are deployed in different LANs, the decoding end obtains the video stream sent by the encoding end through the relay server; when the encoding end and the decoding end are deployed in the same LAN, the decoding end directly obtains the video stream sent by the encoding end through the LAN. Video streaming.
  • S1104 Decapsulate the video stream to obtain multiple encoded frames.
  • decapsulation is the reverse process of encapsulation, which mainly realizes the process of restoring data from bit stream to data.
  • the encapsulated protocol uses the UDP protocol, and the corresponding decapsulation process is shown in Figure 12.
  • the data structure finally remove the frame label, and obtain the data structure as shown in Figure 12c.
  • the corresponding encoded frame at the decoding end is obtained.
  • the decoding end receives the video stream sent by the encoding end, it removes the UDP message header, then removes the data length, and finally removes the frame label to obtain the corresponding encoded frame from the decoding end, and determines the frame sent by the encoding end. Whether the video stream is completely decapsulated. If the video stream is completely decapsulated, the decapsulation operation will end; if the video stream has not been completely decapsulated, Then repeat the decapsulation process.
  • S1106 Decode the encoded frame to obtain spliced video data; the spliced video data includes spliced video frames and splicing information of the spliced video frames.
  • decoding is to restore the encoded frame to spliced video data, which corresponds to the encoding process.
  • the decoding algorithm can use a fast Fourier transform algorithm, a discrete Fourier transform algorithm, or a frequency domain filtering algorithm.
  • the decoding algorithm is not limited here. Since encoding is to process spliced video data into encoded frames, correspondingly, decoding is to restore encoded frames to spliced video data.
  • S1108 According to the splicing information of each spliced video frame, split each spliced video frame to obtain at least two video frames from different video sources at the same time.
  • the splicing information of the spliced video frames is used to identify the range of pixels in the spliced video data belonging to each video frame before splicing. Therefore, during the frame splitting process, the spliced video frame is split into the original two video frames at the same time according to the range of pixel points of the video frame identified by the splicing information.
  • the inter-frame difference method can be used to split each spliced video frame to obtain video frames from at least two different video sources at the same time.
  • the spliced video frame at the encoding end is image data with a resolution of 3840*1920P, and the decoding end obtains a video including image data at 3840*1920P
  • the decoder decapsulates and decodes the video stream to obtain spliced video data.
  • the image data with a resolution of 3840*1920P is processed from the first pixel to the 1920th pixel in the horizontal direction.
  • S1110 Render and display video frames from different video sources at the same time.
  • the video frames of different video sources in the multi-channel video data of the video source at the same time are two-dimensional images, and the spliced video frames are also two-dimensional images.
  • Video frames from different video sources at the same time are obtained.
  • the video frames from different video sources at the same time are processed into three-dimensional images at the decoding end and rendered to obtain the three-dimensional image at the same time.
  • the decoding end After the video stream is decapsulated and decoded, the spliced video data is obtained.
  • the frame splitting technology is used to split each spliced video frame to obtain at least two video frames of different video sources at the same time, so that the video frames of different video sources at the same time are obtained. It can be received by the decoder at the same time, achieving absolutely consistent reception time of video frames from different video sources at the same time, thereby achieving synchronous transmission of video frames from different video sources at the same time.
  • the video frames of different video sources at the same time are rendered in different ways.
  • the two video sources are the left eye video of the laparoscopic robot.
  • both the left-eye video source and the right-eye video source output dual-channel video data.
  • the steps for rendering and displaying video frames from different video sources at the same time include the following steps:
  • S1 process the video frames of at least two video sources of the laparoscopic robot at the same time into three-dimensional images.
  • the dual-channel video data output by the left-eye video source and the right-eye video source are two-dimensional image data.
  • the dual-channel video data output by the left-eye video source and the right-eye video source are also two-dimensional images after splicing.
  • the video frames of at least two video sources of the laparoscopic robot at the same time are processed into three-dimensional images.
  • a 3D structure generator can be used to process video frames from at least two video sources at the same time into three-dimensional images.
  • the purpose of rendering is to make the three-dimensional image conform to the 3D scene.
  • a video data control method is provided and applied to the decoding end. Since 2 or 3 key frames are copied at the encoding end, the decoding end needs to perform the same processing on the decoding end.
  • the key frames are decoded multiple times, which reduces the decoding efficiency of the decoder and increases the frame number difference between the video played by the decoder and the video source of the encoder, causing the video quality played by the decoder to be lower than the video quality of the encoder. Therefore, in order to solve the above problems, Specifically, it includes the following steps:
  • S1404 Decapsulate the video stream to obtain multiple encoded frames.
  • S1406 Determine whether the current encoded frame is a key frame according to the key frame information of the encoded frame. When the current encoded frame is not a key frame, execute S1412; when the current encoded frame is a key frame, execute S1408.
  • S1408 determine whether the key frame information of the current encoded frame overlaps with the decoded key frame; when the encoded frame is a key frame and overlaps with the decoded key frame, execute S1410; when the encoded frame is a key frame and does not overlap with the decoded key frame When the decoded key frames are repeated, perform S1412.
  • S1412 Decode the encoded frame to obtain spliced video data.
  • the spliced video data includes spliced video frames and splicing information of the spliced video frames.
  • each spliced video frame splits each spliced video frame to obtain at least two video frames from different video sources at the same time;
  • S1418 Determine whether all the encoded frames of the video stream are decoded. When all the encoded frames of the video stream are decoded, the process ends; when the encoded frames of the video stream are not fully decoded, execute S1402.
  • the decoder determines whether the current encoded frame is a key frame based on the key frame information, and determines whether the key frame information of the current encoded frame overlaps with the decoded key frame based on the key frame information. If the encoded frame is a key frame, If the frame overlaps with the decoded key frame, the encoded frame will be discarded, which can improve the decoding efficiency of the decoder.
  • the frame discarding method on the one hand, it can reduce the frame number difference between the remotely played video and the source video; on the other hand, On the other hand, On the other hand, redundant copied keyframes can be filtered to achieve a playback effect that is as close to or equivalent to the quality of the source video as possible.
  • the transfer server can be used to forward the data, or the encoding end and decoding end can be directly connected.
  • the encoding end and the decoding end cannot connect to the Internet.
  • a local area network can be constructed to directly connect the encoding end and the decoding end.
  • the step of receiving the video stream sent by the encoding end includes the following steps: :
  • S1 receives a broadcast message from at least one encoding end, and the broadcast message carries the IP address of the encoding end.
  • the encoding end and decoding end in the local operating room are deployed on the same local area network.
  • Equipment C3, C4 and C6 can be switched to the encoding end or decoding end according to the actual scenario;
  • C1 is a laparoscopic robot;
  • C2, C5 and C7 are all local monitors.
  • the laparoscopic robot C1 transmits dual-channel endoscopic images to the encoding terminal C3 through the optical fiber c1.
  • the encoding terminal C3 performs frame merging, encoding and compression on the dual-channel video data and then sends it to the target decoding terminal through the high-speed network c3.
  • the encoding terminal C3 The dual-channel endoscope image is looped out and transmitted to the local monitor C2 through the optical fiber c2.
  • the encoding end encodes and encapsulates the spliced video data, it sends the IP address of the encoding end to multiple encoding ends on the same local area network through broadcast transmission.
  • the decoding end receives the broadcast message from the encoding end, and based on the encoding end in the broadcast message.
  • the IP address determines whether the broadcast message sent by the encoding end corresponding to the IP address can be received.
  • the IP address of the paired encoding end is set on the decoding end. After the decoding end receives the broadcast message sent by the encoding end, the IP address of the encoding end is compared with the IP address of the decoding end. If the IP address of the encoding end set on the decoding end is the same as the IP address of the received encoding end, it is considered that the encoding end and the decoding end are in the same local area network. At this time, the decoding end sends a response to the encoding end.
  • the encoding end After receiving the response, the encoding end sends the video stream to the matching decoding end, and the decoding end receives the video stream sent by the encoding end; if the IP address of the encoding end set on the decoding end is different from the IP address of the received encoding end, it is considered that the encoding end and the decoding end are not in the same local area network. At this time, the decoding end does not respond to the broadcast message of the encoding end.
  • the encoding end and the decoding end are set up on the same local area network.
  • the encoding end sends a broadcast message.
  • the broadcast message carries the IP address of the encoding end. If the IP address of the encoding end matches the IP address of the decoding end, the decoding end receives the message sent by the encoding end.
  • the above method can be used to transmit the video stream from the encoding end to the decoding end when the encoding end or decoding end cannot connect to the Internet.
  • a video data control method is provided and applied to a relay server. As shown in Figure 16, the method includes:
  • S1602 Obtain the video stream sent by the encoding end and the device encoding of the encoding end.
  • the video stream includes encoding of at least two different
  • the spliced video frames are obtained by splicing the video frames of the video source at the same time.
  • devices B3, B6, B7 and B9 can be switched to the encoding end or decoding end according to the actual scenario.
  • Device B3 passes The b5 network is connected to the transit server, device B6 is connected to the transit server through the b10 network, device B7 is connected to the transit server through the b6 network, and device B9 is connected to the transit server through the b7 network for data push or data pull.
  • the video stream is a video stream obtained by the encoding end splicing the video frames of different video sources in the multi-channel video data at the same time into one spliced video frame, and encoding and encapsulating the spliced video data.
  • the transfer server monitors the ports of the encoding end and the decoding end in real time.
  • the transfer server receives the video stream sent by the encoding end and the device encoding of the encoding end.
  • S1604 Create a virtual room for device encoding on the encoding side.
  • the transfer server creates a virtual room based on the device encoding of each encoding terminal.
  • the virtual room may be a storage unit for the transit server.
  • the relay server when the relay server detects that both the encoding end and the decoding end are online, the relay server receives the video stream sent by the encoding end and the device code of the encoding end, and forwards the video stream to the decoding end according to the data acquisition request.
  • the transfer server receives the video stream sent by the encoding end and the device encoding of the encoding end, receives the data acquisition request from the decoding end, and obtains the target device encoding carried in the data acquisition request.
  • the flow chart is set for the pairing of the encoding end and the decoding end.
  • the transfer server collects the online information of the encoding end and the decoding end respectively, and pairs the encoding end and the decoding end.
  • Settings Set a one-to-many relationship between one encoder and multiple decoders, and bind them according to the one-to-many relationship. After binding once, there is no need to bind again in the future. After the encoder and decoder go online, pairing can be completed automatically.
  • the decoding terminal when one or more decoding terminals want to obtain the video stream sent by the matching encoding terminal, the decoding terminal sends a data acquisition request to the relay server, and the relay server obtains the target device code carried in the data acquisition request. , match the target device encoding with the created virtual room. If there is a virtual room corresponding to the target device encoding, send the video stream corresponding to the virtual room to the decoder, and the decoder will split the spliced video frame into at least two Video frames from different video sources at the same time.
  • the relay server obtains the video stream sent by the encoding end and the device encoding of the encoding end, obtains the target device encoding carried in the data acquisition request of the decoding end, and matches the target device encoding with the device encoding of the encoding end. If there is a match with the target device If the encoding matches the device encoding of the encoding end, the video stream will be sent to the decoding end.
  • the video stream on the encoding end only needs to be sent once, and the method of multiple decoding ends pulling data to the transfer server can effectively reduce bandwidth.
  • a video data control method is provided, as shown in Figure 20, specifically including the following steps:
  • the transfer server monitors the ports of the encoding end and decoding end.
  • S2004 The encoding end splices video frames from different video sources in the multi-channel video data at the same time into one spliced video frame to obtain spliced video data.
  • the spliced video data includes spliced video frames and splicing information of the spliced video frames.
  • the encoding end performs encoding processing on the spliced video data to obtain multiple encoded frames.
  • S2008 determine whether the current encoded frame is a key frame. If the current encoded frame is not a key frame, execute S2010; if the current encoded frame is a key frame, execute S2012.
  • S2010 determine that the current encoded frame is an ordinary frame, identify the ordinary frame information in the data packet of the ordinary frame, and execute S2014.
  • S2014 The encoding end identifies the key frame information in the data packet of each key frame, where the key frame information of the same key frame is the same.
  • the encoding end encapsulates the encoded frame to obtain the video stream to be transmitted.
  • the transit server receives the video stream sent by the encoding end and the device encoding of the encoding end.
  • the transit server creates a virtual room for device encoding on the encoding side.
  • the encoding end sends a data acquisition request to the transfer server, and the data acquisition request carries the target device code.
  • the transfer server receives the data acquisition request from the decoding end and obtains the target device code carried in the data acquisition request.
  • the transfer server matches the target device encoding with the created virtual room. If there is a virtual room corresponding to the target device encoding, the video stream corresponding to the virtual room is sent to the decoding end; if there is no virtual room corresponding to the target device encoding, room, it will not respond to the data acquisition request.
  • the decoding end receives the video stream sent by the relay server.
  • the decoding end decapsulates the video stream and obtains multiple encoded frames.
  • S2032 The decoder determines whether the current encoded frame is a key frame based on the key frame information of the encoded frame. If the current encoded frame is not a key frame, S2038 is executed; if the current encoded frame is a key frame, S2034 is executed.
  • the decoder determines whether the key frame information of the current encoded frame is repeated with the decoded key frame; if the encoded frame is a key frame and overlaps with the decoded key frame, S2036 is executed; if the encoded frame is a key frame, and If it does not overlap with the decoded key frame, execute S2038.
  • S2036 The decoder discards the key frame and executes S2038.
  • the decoder decodes the encoded frame to obtain spliced video data.
  • the spliced video data includes spliced video frames and splicing information of the spliced video frames.
  • the decoder splits each spliced video frame according to the splicing information of each spliced video frame to obtain at least two video frames from different video sources at the same time;
  • the decoder renders and displays video frames from different video sources at the same time.
  • the video stream sent by the encoding end and the device encoding of the encoding end are obtained, the target device encoding carried in the data acquisition request of the decoding end is obtained, and the target device encoding is matched with the device encoding of the encoding end. If there is a device encoding that matches the target device encoding, The device at the encoding end encodes and sends the video stream to the decoding end.
  • the encoding end's video stream only needs to be sent once, and multiple decoding ends pull data to the transfer server, which can effectively reduce bandwidth; for each encoding end device Encoding creates a virtual room to ensure that the decoding end will not incorrectly receive the video stream sent by the non-corresponding encoding end; set the encoding end and decoding end on the same LAN, and the encoding end sends a broadcast message, and the broadcast message carries the IP address of the encoding end , if the IP address of the encoding end matches the IP address of the decoding end, the decoding end receives the video stream sent by the encoding end.
  • the video stream of the encoding end can be transmitted to the decoding end when the encoding end or decoding end cannot connect to the Internet.
  • embodiments of the present application also provide a video data processing system for implementing the above-mentioned video data processing method.
  • the implementation scheme for solving the problem provided by this system is similar to the implementation scheme recorded in the above method. Therefore, the specific limitations in the one or more video data processing system embodiments provided below can be found in the video data processing above. The limitations of the method will not be repeated here.
  • a video data processing system which is applied to an encoding end and includes:
  • the first acquisition module 111 is used to acquire multi-channel video data from at least two different video sources.
  • the frame splicing module 112 is used to splice the video frames of different video sources in the multi-channel video data at the same time into one spliced video frame to obtain spliced video data.
  • the spliced video data includes spliced video frames and the splicing of spliced video frames. information.
  • the encoding module 113 is used to encode the spliced video data to obtain multiple encoded frames.
  • the encapsulation module 114 is used to encapsulate multiple encoded frames to obtain a video stream to be transmitted, and spread the video Output to the target decoder.
  • the encapsulation module 114 is also configured to copy the key frame when the encoded frame is a key frame before encapsulating the multiple encoded frames.
  • the encapsulation module 114 is configured to copy at least one key frame when the encoded frame is a key frame; and identify the key frame information in the data packet of each key frame, wherein the key frame information of the same key frame is the same .
  • the target decoding end includes:
  • the receiving module 115 is used to receive the video stream sent by the encoding end;
  • the decapsulation module 116 is used to decapsulate the video stream to obtain multiple encoded frames
  • the decoding module 117 is used to decode the encoded frames to obtain spliced video data;
  • the spliced video data includes spliced video frames and splicing information of the spliced video frames;
  • the frame splitting module 118 is used to split each spliced video frame according to the splicing information of each spliced video frame to obtain at least two video frames from different video sources at the same time;
  • the rendering module 119 is used to render and display video frames from different video sources at the same time.
  • a video data processing system is provided.
  • the system also includes a transfer server, which includes:
  • the second acquisition module 120 is used to acquire the video stream sent by the encoding end and the device encoding of the encoding end; the video stream includes spliced video frames obtained by splicing video frames of at least two different video sources at the same time.
  • the creation module 121 is used to create a virtual room for device coding on the coding side.
  • the receiving module 122 is configured to receive a data acquisition request from the decoding end and obtain the target device code carried in the data acquisition request.
  • the distribution module 123 is configured to send a video stream to the decoding end when there is a virtual room corresponding to the encoding of the target device, and the decoding end splits the spliced video frames to obtain video frames of at least two different video sources at the same time.
  • Each module in the above video data processing system can be implemented in whole or in part by software, hardware and combinations thereof.
  • Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be an abdominal robot, and its internal structure diagram may be as shown in Figure 22.
  • the computer device includes a processor, memory, input/output interface, communication interface, display unit and input device.
  • the processor, memory and input/output interface are connected through the system bus, and the communication interface, display unit and input device are connected to the system bus through the input/output interface.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes non-volatile storage media and internal memory.
  • the non-volatile storage medium stores operating systems and computer programs. This internal memory provides an environment for the execution of operating systems and computer programs in non-volatile storage media.
  • the input/output interface of the computer device is used to exchange information between the processor and external devices.
  • the communication interface of the computer device is used for wired or wireless communication with external terminals.
  • the wireless mode can be implemented through WIFI, mobile cellular network, NFC (Near Field Communication) or other technologies.
  • the computer program implements a video data processing method when executed by the processor.
  • the display unit of the computer device is used to form a visually visible picture and can be a display screen, a projection device or a virtual reality imaging device.
  • the display screen can be a liquid crystal display screen or an electronic ink display screen.
  • the input device of the computer device can be a display screen.
  • the touch layer covered above can also be buttons, trackballs or touch pads provided on the computer equipment shell, or it can also be an external keyboard, touch pad or mouse, etc.
  • Figure 22 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
  • a computer device including a memory and a processor.
  • a computer program is stored in the memory.
  • the processor executes the computer program, it implements the steps in the above method embodiments.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps in the above method embodiments are implemented.
  • a computer program product including a computer program, which implements the steps in the above method embodiments when executed by a processor.
  • the user information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请涉及一种视频数据的处理方法、系统、计算机设备和存储介质,所述方法包括:获取至少两个不同视频源的多路视频数据;将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息;对拼接视频数据进行编码处理,得到多个编码帧;对多个编码帧进行封装处理,得到待传输的视频流,将视频流传输至目标解码端。不同视频源在同一时刻的视频帧拼接为一个拼接视频帧,使得不同视频源在同一时刻的视频帧能够在相同时间发送,实现不同视频源同一时间的视频帧发送时间的绝对一致,进而实现不同视频源同一时间的视频帧的同步传输。

Description

视频数据的处理方法和系统
本申请要求于2022年9月23日提交中国专利局,申请号为2022111616647,申请名称为“视频数据的处理方法、系统、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像数据远距离同步传输技术领域,特别是涉及一种视频数据的处理方法、系统、计算机设备和存储介质。
背景技术
目前,多通道视频源广泛应用在医疗、影视、导航等领域。多通道视频源的图像发送时通常面临多通道视频源的同步播放的问题。
目前对于多通道视频源的同步播放方式主要是缓存同步,通过提取数据流中的时间标记以及在视频帧加入关键帧信息以及时间戳等头信息进行同步控制。但该方式仍会造成多通道视频源的视频帧不同步,而对产品使用造成影响。以多通道视频源应用在医疗领域的医疗腹腔镜机器人为例,现有医疗腹腔镜机器人三维图像远程传输中,若多视频源不能做到同步,则会出现拖影现象,造成三维图像不清晰,且观看者甚至会出现晕眩的可能。
发明内容
基于此,有必要针对上述技术问题,提供一种能够实现不同视频源同一时间的视频帧的同步传输的视频数据的处理方法、系统、设备和存储介质。
本申请提供了一种视频数据的处理方法,所述方法包括:
获取至少两个不同视频源的多路视频数据;
将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息;
对拼接视频数据进行编码处理,得到多个编码帧;
对多个编码帧进行封装处理,得到待传输的视频流,将视频流传输至目标解码端。
本申请还提供了一种视频数据的处理系统,所述系统包括:
第一获取模块,用于获取至少两个不同视频源的多路视频数据;
帧拼接模块,用于将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息;
编码模块,用于对拼接视频数据进行编码处理,得到多个编码帧;
封装模块,用于对多个编码帧进行封装处理,得到待传输的视频流,将视频流传输至目标解码端。
上述视频数据的处理方法和系统,获取至少两个不同视频源的多路视频数据,根据不同视频源在同一时刻的视频帧拼接为一帧拼接视频帧,进行编码和封装后发送至目标解码端,由于不同视频源在同一时刻的视频帧拼接为一个拼接视频帧,使得不同视频源在同一时刻的视频帧能够在相同时间发送,实现不同视频源在同一时间的视频帧发送时间的绝对一致,进而实现不同视频源同一时间的视频帧的同步传输。
附图说明
图1为一个实施例中视频数据的处理方法的应用环境图;
图2为一个实施例中视频数据的处理方法的流程示意图;
图3为一个实施例中基于中转服务器、编码端和解码端构建的多路分发网络连接图;
图4为另一个实施例中不同视频源在同一时刻的视频帧拼接为一帧拼接视频帧的流程示意图;
图5为一个实施例中数据封装发送示意图;
图6为一个实施例中实现帧拼接和帧拆分的硬件组合系统的结构图;
图7为一个实施例中编码端的普通帧处理流程图;
图8为一个实施例中编码端的关键帧处理流程图;
图9为一个实施例中编码端的关键帧处理实施例;
图10为一个实施例中编码端的视频流发送流程图;
图11为一个实施例中解码端的视频流接收流程图;
图12为一个实施例中解码端的帧还原示意图;
图13为一个实施例中解码端的帧拆分功能原理图;
图14为一个实施例中解码端的视频流接收流程图;
图15为一个实施例中本地多路分发网络连接图;
图16为一个实施例中中转服务器的视频流转发流程图;
图17为一个实施例中中转服务器的中转流程图;
图18为一个实施例中编码端和解码端的配对设置流程图;
图19为一个实施例中中转服务器的分发原理图;
图20为一个实施例中编码端和解码端的通信流程图;
图21为一个实施例中视频数据的处理系统的结构框图;
图22为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供的视频数据的处理方法,可以应用于如图1所示的应用环境中。其中,编码端102获取至少两个不同视频源的多路视频数据;将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到多个拼接视频数据,每个拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息;对拼接视频数据进行编码处理,得到多个编码帧;对多个编码帧进行封装处理,得到待传输的视频流,将视频流传输至中转服务器104或目标解码端。
解码端106接收编码端102发送的视频流,或接收中转服务器104转发的视频流,解码端106对视频流进行解封装处理,得到多个编码帧;对编码帧进行解码处理,得到拼接视频数据;拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息;根据各拼接视频帧的拼接信息,将各拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧;对不同视频源在同一时刻的视频帧进行渲染后展示。
当编码端102与解码端106部署在不同的局域网内,编码端102通过中转服务器104与解码端106建立连接;当编码端102与解码端106部署在同一局域网内,编码端102通过局域网直接与解码端106建立连接。数据存储系统可以存储中转服务器104需要处理的数据。数据存储系统可以集成在中转服务器104上,也可以放在云上或其他网络服务器上。其中,编码端102和解码端106可以是计算机设备的处理器,计算机设备不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑、物联网设备和便携式可穿戴设备,物联网设备可为腹腔镜机器人等。便携式可穿戴设备可为智能手表、智能手环、头戴设备等。中转服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种视频数据的处理方法,以该方法应用于图1中的编码端102为例进行说明,包括以下步骤:
S202,获取至少两个不同视频源的多路视频数据。
其中,获取目标业务场景下至少两个不同视频源的多路视频数据。以目标业务场景为腹腔镜机器人为例,视频源可以是腹腔镜机器人的左眼视频源和右眼视频源,左眼视频源和右眼视频源均输出双路视频数据,每路视频数据均包括视频帧以及视频帧的时间戳。
该业务场景下的系统架构如图3所示,基于中转服务器、编码端和解码端构建多路分发网络连接图,设备B3、B6、B7和B9可以根据实际场景可切换为编码端或者解码端;B1和B4均为腹腔镜机器人;B2、B5、B8和B10均为本地监视器,以本地操作室A为例,本地操作室A的腹腔镜机器人B1通过光纤b1将双路内窥镜图像传输至编码端B3,编码端B3对双路视频数据进行帧合并编码压缩后通过高速网络b5发送至中转服务器或目标解 码端,同时,编码端B3对双路内窥镜图像进行环出并通过光纤b2传输至本地监视器B2上。
S204,将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息。
其中,为实现编码端的相同时刻的两个视频源的发送时间绝对一致,编码端将不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,拼接后再进行编码和压缩处理,得到待传输的视频流,最后将待传输的视频流传输至目标解码端。其中,可以采用帧图像拼接算法或全局迭代就近法等算法将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,拼接方向是可以水平拼接,也可以是垂直拼接,在此不限定拼接算法和拼接方向。
本实施例的编码端对多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到多个拼接视频数据。其中,拼接视频帧的拼接信息是用于标识拼接视频数据中拼接前每张视频帧的像素点的范围。例如,不同视频源在同一时刻的视频帧分别记为第一视频帧和第二视频帧,拼接视频帧的拼接信息标识第一视频帧的原始像素点坐标和范围,以及第二视频帧的原始像素点坐标和范围。
例如,如图4所述,腹腔镜机器人的左眼视频源的双路视频数据记为内窥镜图像-L,右眼视频源的双路视频数据记为内窥镜图像-R,内窥镜图像-L和内窥镜图像-R的分辨率均为1920*1080P,在视频帧拼接的过程中,将腹腔镜机器人的左眼视频源和右眼视频源相同时刻的视频帧拼接为一帧拼接视频帧,拼接视频帧的分辨率为3840*1080P,拼接方向水平,拼接视频帧的拼接信息标识水平方向第1个像素点至第1920个像素点,以及垂直方向1080个像素点的范围为左眼视频源的双路视频数据的像素点范围;水平方向第1921个像素点至第3840个像素点,以及垂直方向1080个像素点的范围为右眼视频源的双路视频数据的像素点范围。
具体地,编码端对多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,为每帧拼接视频帧配置用于标识拼接视频帧的拼接信息,得到多个拼接视频数据。
S206,对拼接视频数据进行编码处理,得到多个编码帧。
其中,编码的基本原理是将视频数据按照一定规则使用某种形式的码流表示与传输。对拼接视频数据进行编码的最主要目的是数据压缩,以解决存储空间和传输带宽完全无法满足保存和传输的需求的问题。编码可以采用H.261、H.262、H.263或H.264编码,本实施例采用H.264编码。
S208,对多个编码帧进行封装处理,得到待传输的视频流,将视频流传输至目标解码端。
其中,封装的作用在于保护或者防止编码帧被破坏或修改。最常用的封装协议有PPP/HDLC、LAPS、GFP。本实施例采用UDP协议传输视频流,如图5所示,除了DUP报文头,还会在UDP数据部分加入4个字节的数据长度、2个字节的帧编号和H.264帧数据进行封装后发送。
具体地,编码端在每个编码帧的数据包中加入DUP报文头,在UDP数据部分加入4个字节的数据长度、2个字节的帧编号和H.264帧数据进行封装后发送至目标解码端。
上述视频数据的处理方法中,获取将至少两个不同视频源的多路视频数据,根据不同视频源在同一时刻的视频帧拼接为一帧拼接视频帧,进而进行解码和封装后发送至目标解码端,由于不同视频源在同一时刻的视频帧拼接为一个拼接视频帧,使得不同视频源在同一时刻的视频帧能够在相同时间发送,实现不同视频源同一时间的视频帧发送时间的绝对一致,进而实现不同视频源同一时间的视频帧的同步传输。
在一个实施例中,目前对于图像帧拼接的方法一般是采用矩阵拼接器进行帧拼接,但是性能优异的矩阵拼接器基本也有着30ms左右的图像延时,若编码端和解码端都使用矩阵拼接器对图像进行帧拼接和帧拆分,则图像延时就增加了60ms左右。也就是说采用矩阵拼接器也无法实现编码端的发送时间绝对一致。因此,为解决上述问题,本实施例的编码端采用硬件组合的系统实现帧拼接和帧拆分。硬件组合的系统结构如图6所示,包括HDMI解码端、HDMI编码端、CPU芯片和FPGA处理模块,腹腔镜机器人的左眼视频源 和右眼视频源的双路视频数据分别经过两路HDMI解码端进行解码处理,解码后再经过FPGA处理模块进行硬件加速处理后,再经过两路HDMI编码端进行编码处理,得到无损拼接的拼接视频帧。
本实施例使用FPGA硬件系统对两个不同视频源的多路视频数据进行无损拼接,得到拼接视频数据,相比传统的图像拼接器延迟高的特征,本实施例采用使用FPGA硬件系统完成拼接的方法具有效率高和延迟低的特点。
在一个实施例中,三维图像数据在远程同步传输中容易出现的丢帧现象,若一路视频源出现丢帧,则影响整体的三维图像视觉效果。因此,为解决上述问题,编码端在对多个编码帧进行封装处理的步骤前,判断当前的编码帧是否为关键帧,当当前的编码帧为关键帧时,复制关键帧。
当当前的编码帧不是关键帧时,判定当前的编码帧为普通帧,在普通帧的数据包中标识普通帧信息。普通帧信息用于标识当前的编码帧为普通帧,普通帧信息可以是常量帧编号或者特定字符,例如常量帧编号可以是000000000。
如图7所示,编码端对普通帧进行封装处理,在封装处理过程中,采用UDP协议,在UDP数据部分加入4个字节的数据长度、2个字节的帧编号和H264帧数据进行封装后发送,其中普通帧的UDP数据部分中2个字节的帧编号为常量帧编号000000000。本实施例中,在普通帧的UDP数据部分加入2个字节常量帧编号的目的,在于解码端对普通帧进行解封装处理后,去掉UDP报文头和数据长度后得到2个字节常量帧编号,根据2个字节常量帧编号判断当前封装数据包的编码帧是否为普通帧。
其中,关键帧是指角色或者物体运动变化中关键动作所对应帧,记为I帧。普通帧包括前向预测帧和双向内插帧,前向预测帧记为P帧,双向内插帧记为B帧。I帧是一个完整的画面,而P帧和B帧记录的是相对于I帧的变化,如果没有I帧,P帧和B帧就无法解码。
本实施例,通过对关键帧进行复制,一方面,可以提高视频帧传输率,有效降低三维图像数据在远程传输过程中视频帧的丢帧率,避免关键帧丢失影响三维图像的视觉效果的问题,另一方面,相较于对所有视频帧进行复制的方案,本实施例仅对关键帧进行复制,可以有效降低网络传输所需的带宽资源。
在一个实施例中,如图8所示,当编码帧为关键帧时,复制所述关键帧,包括以下步骤:
S802,当编码帧为关键帧时,至少复制一帧关键帧。
其中,关键帧复制的数量越多,视频帧的丢帧率就越低,对应地,所需的网络带宽越大。因此,为了保证网络带宽与视频帧的丢帧率之间的平衡,如图9所示,本实施例中关键帧复制2帧或者3帧,从而保证网络带宽与视频帧的丢帧率之间的平衡。
本实施例以复制3帧关键帧为例,将网络丢包率记为X,则通过复制关键帧可实现视频源的丢包率可以从X降低到X3,若X=10%,则可将视频源的关键帧丢包率降低到0.1%,若X=5%,则可将视频源的关键帧丢包率降低到0.0125%。在保证网络状况良好的情况下X一般小于1%,显然,本实施例可以极大的降低视频源的关键帧丢包率。
具体地,编码端得到多个编码帧后,判断当前的编码帧是否为关键帧,当当前的编码帧为关键帧时,复制2帧或者3帧关键帧。
S804,在各关键帧的数据包中标识关键帧信息,其中,相同关键帧的关键帧信息相同。
其中,关键帧信息用于标识当前的编码帧为关键帧,关键帧信息可以是帧编号或者特定字符,例如关键帧信息可以是帧编号000000001。
对复制后的关键帧进行封装处理,得到待传输的视频流,将视频流传输至目标解码端。目标解码端对视频流进行解封装处理,得到编码帧,解码端对编码帧进行解码处理,但是由于关键帧复制了2帧或者3帧,因此在解码过程中,需要对相同的关键帧进行多次解码,降低了解码端的解码效率,增大了解码端播放的视频与编码端的视频源之间的帧数差,导致解码端播放的视频效果低于编码端的视频质量。因此,为解决上述问题,如图9所示,本实施例的编码端在各关键帧的数据包中标识关键帧信息,其中,相同关键帧的关键帧信息相同。解码端根据关键帧信息判断当前的编码帧是否为关键帧,编码端在关键帧的数据包中标识关键帧信息,便于解码端识别当前的编码帧是否为关键帧,提高编码端的关键帧 识别效率;编码端设置相同关键帧的关键帧信息相同,便于解码端根据关键帧信息判断当前的编码帧是否已解码,当当前的关键帧的关键帧信息与已解码的关键帧重复时,丢弃编码帧。
具体地,编码端复制2帧或者3帧关键帧后,将复制的关键帧一起进行封装处理,在封装处理过程中,在每个复制的关键帧的数据包中加入相同的关键帧信息,并按照UDP协议进行数据封装和发送。
本实施例中,通过在编码端复制2帧或者3帧关键帧,从而保证网络带宽与视频帧的丢帧率之间的平衡;在各关键帧的数据包中标识关键帧信息,便于解码端识别当前的编码帧是否为关键帧,提高编码端的关键帧识别效率;编码端设置相同关键帧的关键帧信息相同,便于解码端根据关键帧信息判断当前的编码帧是否已解码,当当前的关键帧的关键帧信息与已解码的关键帧重复时,丢弃编码帧,采用帧丢弃的方法,一方面,可以降低远程播放的视频与源视频之间的帧数差;另一方面,可以过滤多余的复制关键帧,以实现播放效果最大程度接近或等同于源视频的质量。
在一个实施例中,如图10所示,提供了一种视频数据的处理方法,以该方法应用于图1中的编码端102为例进行说明,包括以下步骤:
S1002,获取至少两个不同视频源的多路视频数据。
S1004,将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息。
S1006,对拼接视频数据进行编码处理,得到多个编码帧。
S1008,判断当前的编码帧是否为关键帧,当当前的编码帧不是关键帧时,执行S1010;当当前的编码帧是关键帧时,执行S1012。
S1010,判定当前的编码帧为普通帧,在普通帧的数据包中标识普通帧信息,执行S1016。
S1012,至少复制一帧关键帧。
S1014,在各关键帧的数据包中标识关键帧信息,其中,相同关键帧的关键帧信息相同。
S1016,对编码帧进行封装处理,得到待传输的视频流,将视频流传输至目标解码端。
S1018,判断多路视频数据是否全部封装,当多路视频数据全部封装时,结束流程;当多路视频数据没有全部封装时,执行S1002。
本实施例中,不同视频源在同一时刻的视频帧拼接为一个拼接视频帧,使得不同视频源在同一时刻的视频帧能够在相同时间发送,实现不同视频源同一时刻的视频帧发送时间的绝对一致,进而实现不同视频源同一时间的视频帧的同步传输;通过对关键帧进行复制,一方面,可以提高视频帧传输率,有效降低三维图像数据在远程传输过程中视频帧的丢帧率,避免关键帧丢失影响三维图像的视觉效果的问题,另一方面,相较于对所有视频帧进行复制的方案,本实施例仅对关键帧进行复制,可以有效降低网络传输所需的带宽资源。
在一个实施例中,如图11所示,提供了一种视频数据的处理方法,以该方法应用于图1中的解码端106为例进行说明,包括以下步骤:
S1102,接收编码端发送的视频流。
其中,当编码端与解码端部署在不同的局域网内,解码端通过中转服务器获取编码端发送的视频流;当编码端与解码端部署在同一局域网内,解码端通过局域网直接获取编码端发送的视频流。
S1104,对视频流进行解封装处理,得到多个编码帧。
其中,解封装是封装的逆过程,主要实现数据从比特流还原为数据的过程。本实施例中,封装的协议采用UDP协议,对应的解封装的过程如图12所示,首先去掉UDP报文头,得到如图12a所示的数据结构,再去掉数据长度,得到如图12b的数据结构,最后去掉帧标号,得到如图12c的数据结构,经上述解封装处理后,得到解码端对应的编码帧。
具体地,如图12所示,解码端接收到编码端发送的视频流后,去掉UDP报文头,再去掉数据长度,最后去掉帧标号,得到解码端对应的编码帧,判断编码端发送的视频流是否全部解封装,若视频流全部解封装,则结束解封装的操作;若视频流还未完全解封装, 则重复解封装的过程。
S1106,对编码帧进行解码处理,得到拼接视频数据;拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息。
其中,解码是将编码帧还原为拼接视频数据,与编码过程相对应。解码算法可以采用快速傅里叶变换算法、离散傅里叶变换算法、频域滤波算法,在此不限定解码的算法。由于编码是将拼接视频数据处理为编码帧,因此,对应的,解码是将编码帧还原为拼接视频数据。
S1108,根据各拼接视频帧的拼接信息,将各拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧。
其中,由上述实施例可知拼接视频帧的拼接信息用于标识拼接视频数据中属于拼接前每张视频帧的像素点的范围。因此,在帧拆分的过程中,按照拼接信息标识的视频帧的像素点的范围将拼接视频帧拆分为原始的两个相同时刻的视频帧。可以采用帧间差分法将各拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧。
例如,如图13所示,以视频源为内窥镜的视频源为例,编码端的拼接视频帧为分辨率为3840*1920P的图像数据,解码端获取到包括3840*1920P的图像数据的视频流后,解码端对视频流进行解封装和解码处理后,得到拼接视频数据,根据拼接视频帧的拼接信息,将分辨率为3840*1920P的图像数据按照水平方向第1个像素点至第1920个像素点,以及垂直方向1080个像素点的范围拆分,得到1920*1080P的左眼视频源的双路视频数据,按照水平方向第1921个像素点至第3840个像素点,以及垂直方向1080个像素点的范围拆分,得到1920*1080P的右眼视频源的双路视频数据,并将两张1920*1080P的双路视频数据输出至本地监视器中。
S1110,对不同视频源在同一时刻的视频帧进行渲染后展示。
其中,视频源的多路视频数据中的不同视频源在同一时刻的视频帧为二维图像,拼接后的拼接视频帧也是二维图像,在解码端通过解封装、解码和帧拆分后,获得了不同视频源在同一时刻的视频帧,为了展示三维效果,在解码端对不同视频源在同一时刻的视频帧处理为三维图像并进行渲染处理,得到同一时刻的三维图像。
本实施例中,通过接收编码端发送的视频流,对视频流进行解封装和解码后,由于在编码端将不同视频源在同一时刻的视频帧拼接为一个拼接视频帧,因此,解码端对视频流进行解封装和解码后,得到拼接视频数据,利用帧拆分技术将各拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧,使得不同视频源在同一时刻的视频帧能够在相同时间被解码端接收,实现不同视频源同一时间的视频帧接收时间的绝对一致,进而实现不同视频源同一时间的视频帧的同步传输。
可以理解的是,对于不用的目标业务场景,对不同视频源在同一时刻的视频帧进行渲染的方式不同,以应用于腹腔镜机器人为例,两个视频源分别为腹腔镜机器人的左眼视频源和右眼视频源为例,左眼视频源和右眼视频源均输出双路视频数据,对不同视频源在同一时刻的视频帧进行渲染后展示的步骤包括以下步骤:
S1,将腹腔镜机器人的至少两个视频源在同一时刻的视频帧处理为三维图像。
其中,左眼视频源和右眼视频源输出的双路视频数据为二维图像数据,左眼视频源和右眼视频源输出的双路视频数据经过拼接后也是二维图像,为了展示三维效果,在编码端将腹腔镜机器人的至少两个视频源在同一时刻的视频帧处理为三维图像。可以采用3D结构生成器将至少两个视频源在同一时刻的视频帧处理为三维图像。
S2,渲染并展示三维图像。
其中,渲染的目的是使三维图像符合3D场景。
本实施例中,通过将腹腔镜机器人的至少两个视频源在同一时刻的视频帧处理为三维图像,渲染并展示三维图像,可以保证解码端对拼接视频帧进行解封装、解码和帧拆分处理后,可以恢复出原始的三维图像,保证三维图像的无损同步传输和展示。
在一个实施例中,如图14所示,提供一种视频数据控制方法,应用在解码端,由于在编码端复制了2帧或者3帧关键帧,因此解码端在解码过程中,需要对相同的关键帧进行多次解码,降低了解码端的解码效率,增大了解码端播放的视频与编码端的视频源之间的帧数差,导致解码端播放的视频效果低于编码端的视频质量。因此,为了解决上述问题, 具体包括以下步骤:
S1402,接收编码端发送的视频流。
S1404,对视频流进行解封装处理,得到多个编码帧。
S1406,根据编码帧的关键帧信息判断当前的编码帧是否是关键帧,当当前的编码帧不是关键帧时,执行S1412;当当前的编码帧是关键帧时,执行S1408。
S1408,判断当前的编码帧的关键帧信息是否与已解码的关键帧重复;当编码帧为关键帧,且与已解码的关键帧重复时,执行S1410;当编码帧为关键帧,且不与已解码的关键帧重复时,执行S1412。
S1410,丢弃关键帧,执行S1418。
S1412,对编码帧进行解码处理,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息。
S1414,根据各拼接视频帧的拼接信息,将各拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧;
S1416,对不同视频源在同一时刻的视频帧进行渲染后展示。
S1418,判断视频流的编码帧是否全部解码,当视频流的编码帧全部解码时,结束流程;当视频流的编码帧未完全解码时,执行S1402。
本实施例中,在解码端根据关键帧信息判断当前的编码帧是否是关键帧,以及根据关键帧信息判断当前的编码帧的关键帧信息是否与已解码的关键帧重复,若编码帧为关键帧,且与已解码的关键帧重复,则丢弃编码帧,可提高解码端的解码效率;采用帧丢弃的方法,一方面,可以降低远程播放的视频与源视频之间的帧数差;另一方面,可以过滤多余的复制关键帧,以实现播放效果最大程度接近或等同于源视频的质量。
对于传输方式,可以采用中转服务器的方式,由中转服务器进行转发,也可以采用编码端和解码端直连的方式。
在一个实施例中,编码端与解码端存在无法联网的情况,在这种情况下,可以构建局域网使编码端和解码端直连,此时,接收编码端发送的视频流的步骤包括以下步骤:
S1,接收至少一个编码端的广播消息,广播消息携带有编码端的IP地址。
具体地,如图15所示,本地操作室内的编码端和解码端部署在同一个本地局域网,设备C3、C4和C6可以根据实际场景可切换为编码端或者解码端;C1为腹腔镜机器人;C2、C5和C7均为本地监视器。腹腔镜机器人C1通过光纤c1将双路内窥镜图像传输至编码端C3,编码端C3对双路视频数据进行帧合并编码压缩后通过高速网络c3发送至目标解码端,同时,编码端C3对双路内窥镜图像进行环出并通过光纤c2传输至本地监视器C2上。
具体地,编码端将拼接视频数据进行编码和封装后,通过广播发送的方式将编码端的IP地址发送给同一局域网的多个编码端,解码端接收编码端的广播消息,根据广播消息里的编码端的IP地址确定是否可以接收该IP地址对应的编码端发送的广播消息。
S2,当编码端的IP地址与解码端的IP地址匹配时,接收编码端发送的视频流。
具体地,在解码端上设置配对的编码端的IP地址,解码端接收到编码端发送的广播消息后,将编码端的IP地址与解码端的IP地址进行比对,若解码端上设置的编码端的IP地址与接收的编码端的IP地址相同,则认为编码端与解码端在同一个局域网,此时,解码端向编码端发送应答响应,编码端接收到应答响应后,将视频流发送给匹配的解码端,解码端接收编码端发送的视频流;若解码端上设置的编码端的IP地址与接收的编码端的IP地址不相同,则认为编码端与解码端不在同一个局域网,此时,解码端不响应编码端的广播消息。
本实施例中,将编码端和解码端设置在同一局域网,编码端发送广播消息,广播消息携带有编码端的IP地址,若编码端的IP地址与解码端的IP地址匹配,则解码端接收编码端发送的视频流,采用上述方式可以在编码端或解码端无法联网时将编码端的视频流传输给解码端。
在其中一个实施例中,提供了一种视频数据控制方法,应用于中转服务器,如图16所示,所述方法包括:
S1602,获取编码端发送的视频流和编码端的设备编码,视频流包括对至少两个不同 视频源在同一时刻的视频帧拼接得到的拼接视频帧。
其中,如图3所示,基于中转服务器、编码端和解码端构建的多路分发网络连接图,设备B3、B6、B7和B9可以根据实际场景可切换为编码端或者解码端,设备B3通过b5网络连接至中转服务器,设备B6通过b10网络连接至中转服务器,设备B7通过b6网络连接至中转服务器,设备B9通过b7网络连接至中转服务器,进行数据推送或数据拉取。
其中,视频流为编码端将多路视频数据中不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,对拼接视频数据进行编码和封装处理得到的视频流。
具体地,如图17所示,中转服务器实时监听编码端及解码端的端口,当监听到编码端上线后,中转服务器接收编码端发送的视频流和编码端的设备编码。
S1604,为编码端的设备编码创建虚拟房间。
其中,创建虚拟房间的目的在于保证解码端之间不会错误接收到非对应编码端发送的视频流,中转服务器根据每个编码端的设备编码创建一个虚拟房间。虚拟房间可以是中转服务器的存储单元。
S1606,当接收到解码端的数据获取请求时,获取数据获取请求携带的目标设备编码。
其中,如图17所示,当中转服务器监听到编码端和解码端均上线后,中转服务器接收编码端发送的视频流和编码端的设备编码,并根据数据获取请求将视频流转发给解码端。
具体地,当中转服务器监听到编码端和解码端均上线后,中转服务器接收编码端发送的视频流和编码端的设备编码,并接收解码端的数据获取请求,获取数据获取请求携带的目标设备编码。
S1608,当存在与目标设备编码对应的虚拟房间时,向解码端发送视频流,由解码端将拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧。
其中,编码端与解码端之间存在配对关系,一个编码端可以对应多个解码端。如图18所示,为编码端和解码端的配对设置流程图,编码端和解码端与中转服务器建立连接后,中转服务器分别采集编码端和解码端的上线信息,并对编码端和解码端进行配对设置,设置一个编码端对应多个解码端的一对多关系,按照一对多关系进行绑定,一次绑定后,后续无需再次绑定,编码端和解码端上线后,可自动完成配对。
具体地,如图19所示,当一个或多个解码端想要获取匹配的编码端发送的视频流时,解码端向中转服务器发送数据获取请求,中转服务器获取数据获取请求携带的目标设备编码,将目标设备编码与创建的虚拟房间进行匹配,若存在与目标设备编码对应的虚拟房间,则向解码端发送该虚拟房间对应的视频流,由解码端将拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧。
本实施例中,中转服务器获取编码端发送的视频流和编码端的设备编码,获取解码端的数据获取请求中携带的目标设备编码,将目标设备编码与编码端的设备编码进行匹配,若存在与目标设备编码匹配的编码端的设备编码,则向解码端发送视频流。相比传统的编码端将视频流分别发送给解码端的方式,编码端的视频流只需发送一次,多个解码端到中转服务器进行数据拉取的方式,可以有效降低带宽。
在其中一个实施例中,提供了一种视频数据控制方法,如图20所示,具体包括以下步骤:
S2002,中转服务器监听编码端及解码端的端口。
S2004,编码端将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息。
S2006,编码端对拼接视频数据进行编码处理,得到多个编码帧。
S2008,判断当前的编码帧是否为关键帧,若当前的编码帧不是关键帧,则执行S2010;若当前的编码帧是关键帧,则执行S2012。
S2010,判定当前的编码帧为普通帧,在普通帧的数据包中标识普通帧信息,执行S2014。
S2012,至少复制一帧关键帧。
S2014,编码端在各关键帧的数据包中标识关键帧信息,其中,相同关键帧的关键帧信息相同。
S2016,编码端对编码帧进行封装处理,得到待传输的视频流。
S2018,中转服务器接收编码端发送的视频流和编码端的设备编码。
S2020,中转服务器为编码端的设备编码创建虚拟房间。
S2022,编码端向中转服务器发送数据获取请求,数据获取请求携带有目标设备编码。
S2024中转服务器接收解码端的数据获取请求,获取数据获取请求携带的目标设备编码。
S2026,中转服务器将目标设备编码与创建的虚拟房间进行匹配,若存在与目标设备编码对应的虚拟房间,则向解码端发送该虚拟房间对应的视频流;若不存在与目标设备编码对应的虚拟房间,则不响应数据获取请求。
S2028,解码端接收中转服务器发送的视频流。
S2030,解码端对视频流进行解封装处理,得到多个编码帧。
S2032,解码端根据编码帧的关键帧信息判断当前的编码帧是否是关键帧,若当前的编码帧不是关键帧,则执行S2038;若当前的编码帧是关键帧,则执行S2034。
S2034,解码端判断当前的编码帧的关键帧信息是否与已解码的关键帧重复;若编码帧为关键帧,且与已解码的关键帧重复,则执行S2036;若编码帧为关键帧,且不与已解码的关键帧重复,则执行S2038。
S2036,解码端丢弃关键帧,执行S2038。
S2038,解码端对编码帧进行解码处理,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息。
S2040,解码端根据各拼接视频帧的拼接信息,将各拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧;
S2042,解码端对不同视频源在同一时刻的视频帧进行渲染后展示。
本实施例,获取编码端发送的视频流和编码端的设备编码,获取解码端的数据获取请求中携带的目标设备编码,将目标设备编码与编码端的设备编码进行匹配,若存在与目标设备编码匹配的编码端的设备编码,则向解码端发送视频流。相比传统的编码端将视频流分别发送给解码端的方式,编码端的视频流只需发送一次,多个解码端到中转服务器进行数据拉取的方式,可以有效降低带宽;为每个编码端的设备编码创建虚拟房间,可以保证解码端之间不会错误接收到非对应编码端发送的视频流;将编码端和解码端设置在同一局域网,编码端发送广播消息,广播消息携带有编码端的IP地址,若编码端的IP地址与解码端的IP地址匹配,则解码端接收编码端发送的视频流,采用上述方式可以在编码端或解码端无法联网时将编码端的视频流传输给解码端。
应该理解的是,虽然如上所述的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,如上所述的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
基于同样的发明构思,本申请实施例还提供了一种用于实现上述所涉及的视频数据的处理方法的视频数据的处理系统。该系统所提供的解决问题的实现方案与上述方法中所记载的实现方案相似,故下面所提供的一个或多个视频数据的处理系统实施例中的具体限定可以参见上文中对于视频数据的处理方法的限定,在此不再赘述。
在一个实施例中,如图21所示,提供了一种视频数据的处理系统,应用于编码端,包括:
第一获取模块111,用于获取至少两个不同视频源的多路视频数据。
帧拼接模块112,用于将多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息。
编码模块113,用于对拼接视频数据进行编码处理,得到多个编码帧。
封装模块114,用于对多个编码帧进行封装处理,得到待传输的视频流,将视频流传 输至目标解码端。
在一个实施例中,封装模块114还用于在对多个编码帧进行封装处理前,当编码帧为关键帧时,复制该关键帧。
在一个实施例中,封装模块114用于当编码帧为关键帧时,至少复制一帧关键帧;以及在各关键帧的数据包中标识关键帧信息,其中,相同关键帧的关键帧信息相同。
在一个实施例中,如图21所示,目标解码端包括:
接收模块115,用于接收编码端发送的视频流;
解封装模块116,用于对视频流进行解封装处理,得到多个编码帧;
解码模块117,用于对编码帧进行解码处理,得到拼接视频数据;拼接视频数据包括拼接视频帧以及拼接视频帧的拼接信息;
帧拆分模块118,用于根据各拼接视频帧的拼接信息,将各拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧;
渲染模块119,用于对不同视频源在同一时刻的视频帧进行渲染后展示。
在一个实施例中,如图21所示,提供了一种视频数据的处理系统,所述系统还包括中转服务器,其包括:
第二获取模块120,用于获取编码端发送的视频流和编码端的设备编码;视频流包括对至少两个不同视频源在同一时刻的视频帧拼接得到的拼接视频帧。
创建模块121,用于为编码端的设备编码创建虚拟房间。
接收模块122,用于接收到解码端的数据获取请求,获取数据获取请求携带的目标设备编码。
分发模块123,用于在存在与所述目标设备编码对应的虚拟房间时,向解码端发送视频流,由解码端将拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧。
上述视频数据的处理系统中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是腹腔机器人,其内部结构图可以如图22所示。该计算机设备包括处理器、存储器、输入/输出接口、通信接口、显示单元和输入装置。其中,处理器、存储器和输入/输出接口通过系统总线连接,通信接口、显示单元和输入装置通过输入/输出接口连接到系统总线。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、移动蜂窝网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种视频数据的处理方法。该计算机设备的显示单元用于形成视觉可见的画面,可以是显示屏、投影装置或虚拟现实成像装置,显示屏可以是液晶显示屏或电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图22中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述各方法实施例中的步骤。
在一个实施例中,提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤。
需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用 户授权或者经过各方充分授权的信息和数据,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请的保护范围应以所附权利要求为准。

Claims (15)

  1. 一种视频数据的处理方法,包括:
    获取至少两个不同视频源的多路视频数据;
    将所述多路视频数据中不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,所述拼接视频数据包括拼接视频帧以及所述拼接视频帧的拼接信息;
    对所述拼接视频数据进行编码处理,得到多个编码帧;
    对所述多个编码帧进行封装处理,得到待传输的视频流,将所述视频流传输至目标解码端。
  2. 根据权利要求1所述的方法,还包括:在所述对所述多个编码帧进行封装处理的步骤前,
    当所述编码帧为关键帧时,复制所述关键帧。
  3. 根据权利要求2所述的方法,其中,当所述编码帧为关键帧时,复制所述关键帧,包括:
    当所述编码帧为关键帧时,至少复制一帧所述关键帧;
    在各所述关键帧的数据包中标识关键帧信息,其中,相同关键帧的所述关键帧信息相同。
  4. 根据权利要求1所述的方法,其中,所述方法还应用于解码端,包括:
    接收编码端发送的视频流;
    对所述视频流进行解封装处理,得到多个编码帧;
    对所述编码帧进行解码处理,得到拼接视频数据;所述拼接视频数据包括拼接视频帧以及所述拼接视频帧的拼接信息;
    根据各所述拼接视频帧的拼接信息,将各所述拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧;
    对所述不同视频源在同一时刻的视频帧进行渲染后展示。
  5. 根据权利要求4所述的方法,还包括:在所述对所述编码帧进行解码处理,得到拼接视频数据的步骤之前,
    当所述编码帧为关键帧,且与已解码的关键帧重复时,丢弃所述编码帧。
  6. 根据权利要求4所述的方法,还包括:在所述对所述编码帧进行解码,得到拼接视频数据的步骤之前,
    当所述编码帧为关键帧,且不与已解码的关键帧重复时,执行所述对所述编码帧进行解码处理,得到拼接视频数据。
  7. 根据权利要求4所述的方法,其中,所述至少两个视频源为手术系统的至少两个视频源;
    所述对所述不同视频源在同一时刻的视频帧进行渲染后展示,包括:
    将所述手术系统的至少两个视频源在同一时刻的视频帧处理为三维图像;
    渲染并展示所述三维图像。
  8. 根据权利要求4所述的方法,其中,所述接收编码端发送的视频流,包括:
    接收至少一个编码端的广播消息,所述广播消息携带有所述编码端的IP地址;
    当所述编码端的IP地址与解码端的IP地址匹配时,接收所述编码端发送的视频流。
  9. 根据权利要求1所述的方法,其中,所述方法还应用于中转服务器,包括:
    获取编码端发送的视频流和所述编码端的设备编码;所述视频流包括对至少两个不同视频源在同一时刻的视频帧拼接得到的拼接视频帧;
    为所述编码端的设备编码创建虚拟房间;
    当接收到解码端的数据获取请求时,获取所述数据获取请求携带的目标设备编码;
    当存在与所述目标设备编码对应的虚拟房间时,向所述解码端发送所述视频流,由所述解码端将所述拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧。
  10. 一种视频数据的处理系统,包括:
    第一获取模块,用于获取至少两个不同视频源的多路视频数据;
    帧拼接模块,用于将所述多路视频数据中的不同视频源在同一时刻的视频帧,拼接为一帧拼接视频帧,得到拼接视频数据,所述拼接视频数据包括拼接视频帧以及所述拼接视 频帧的拼接信息;
    编码模块,用于对所述拼接视频数据进行编码处理,得到多个编码帧;
    封装模块,用于对所述多个编码帧进行封装处理,得到待传输的视频流,将所述视频流传输至目标解码端。
  11. 根据权利要求10所述的系统,其中,所述目标解码端包括:
    接收模块,用于接收编码端发送的视频流;
    解封装模块,用于对所述视频流进行解封装处理,得到多个编码帧;
    解码模块,用于对所述编码帧进行解码处理,得到拼接视频数据;所述拼接视频数据包括拼接视频帧以及所述拼接视频帧的拼接信息;
    帧拆分模块,用于根据各所述拼接视频帧的拼接信息,将各所述拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧;
    渲染模块,用于对所述不同视频源在同一时刻的视频帧进行渲染后展示。
  12. 根据权利要求10所述的系统,还包括中转服务器,所述中转服务器包括:
    第二获取模块,用于获取编码端发送的视频流和所述编码端的设备编码;所述视频流包括对至少两个不同视频源在同一时刻的视频帧拼接得到的拼接视频帧;
    创建模块,用于为所述编码端的设备编码创建虚拟房间;
    接收模块,用于接收到解码端的数据获取请求,获取所述数据获取请求携带的目标设备编码;
    分发模块,用于在存在与所述目标设备编码对应的虚拟房间时,向所述解码端发送所述视频流,由所述解码端将所述拼接视频帧拆分得到至少两个不同视频源在同一时刻的视频帧。
  13. 根据权利要求10所述的系统,其中,所述封装模块还用于在所述对所述多个编码帧进行封装处理前,当所述编码帧为关键帧时,复制所述关键帧。
  14. 根据权利要求13所述的系统,其中,所述封装模块用于当所述编码帧为关键帧时,至少复制一帧所述关键帧;以及在各所述关键帧的数据包中标识关键帧信息,其中,相同关键帧的所述关键帧信息相同。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的方法的步骤。
PCT/CN2023/120228 2022-09-23 2023-09-21 视频数据的处理方法和系统 WO2024061295A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211161664.7 2022-09-23
CN202211161664.7A CN115567661B (zh) 2022-09-23 2022-09-23 视频数据的处理方法、系统、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2024061295A1 true WO2024061295A1 (zh) 2024-03-28

Family

ID=84741679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/120228 WO2024061295A1 (zh) 2022-09-23 2023-09-21 视频数据的处理方法和系统

Country Status (2)

Country Link
CN (1) CN115567661B (zh)
WO (1) WO2024061295A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118555357A (zh) * 2024-07-24 2024-08-27 浙江大华技术股份有限公司 一种视频拼接存储方法、设备和存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115567661B (zh) * 2022-09-23 2024-07-23 上海微创医疗机器人(集团)股份有限公司 视频数据的处理方法、系统、计算机设备和存储介质
CN116916172B (zh) * 2023-09-11 2024-01-09 腾讯科技(深圳)有限公司 一种远程控制方法和相关装置
CN117119223B (zh) * 2023-10-23 2023-12-26 天津华来科技股份有限公司 基于多通道传输的视频流播放控制方法及系统
CN117596373B (zh) * 2024-01-17 2024-04-12 淘宝(中国)软件有限公司 基于动态数字人形象进行信息展示的方法及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409831A (zh) * 2008-07-10 2009-04-15 浙江师范大学 一种多媒体视频对象处理方法
US20170208220A1 (en) * 2016-01-14 2017-07-20 Disney Enterprises, Inc. Automatically synchronizing multiple real-time video sources
CN109963185A (zh) * 2017-12-26 2019-07-02 杭州海康威视数字技术股份有限公司 视频数据发送方法、视频显示方法、装置、系统及设备
CN110401820A (zh) * 2019-08-15 2019-11-01 北京迈格威科技有限公司 多路视频处理方法、装置、介质及电子设备
CN115567661A (zh) * 2022-09-23 2023-01-03 上海微创医疗机器人(集团)股份有限公司 视频数据的处理方法、系统、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409831A (zh) * 2008-07-10 2009-04-15 浙江师范大学 一种多媒体视频对象处理方法
US20170208220A1 (en) * 2016-01-14 2017-07-20 Disney Enterprises, Inc. Automatically synchronizing multiple real-time video sources
CN109963185A (zh) * 2017-12-26 2019-07-02 杭州海康威视数字技术股份有限公司 视频数据发送方法、视频显示方法、装置、系统及设备
CN110401820A (zh) * 2019-08-15 2019-11-01 北京迈格威科技有限公司 多路视频处理方法、装置、介质及电子设备
CN115567661A (zh) * 2022-09-23 2023-01-03 上海微创医疗机器人(集团)股份有限公司 视频数据的处理方法、系统、计算机设备和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118555357A (zh) * 2024-07-24 2024-08-27 浙江大华技术股份有限公司 一种视频拼接存储方法、设备和存储介质

Also Published As

Publication number Publication date
CN115567661A (zh) 2023-01-03
CN115567661B (zh) 2024-07-23

Similar Documents

Publication Publication Date Title
WO2024061295A1 (zh) 视频数据的处理方法和系统
US9351028B2 (en) Wireless 3D streaming server
CN110430441B (zh) 一种云手机视频采集方法、系统、装置及存储介质
JP6338688B2 (ja) ビデオ同期再生方法、装置、およびシステム
CN102098443A (zh) 一种摄像设备、通信系统和相应的图像处理方法
CN109040786B (zh) 摄像头数据的传输方法、装置、系统及存储介质
KR20090126176A (ko) 정보 처리 장치 및 방법과, 프로그램
CN106227492B (zh) 拼接墙与移动智能终端互联方法和装置
CN103369289A (zh) 一种视频模拟形象的通信方法和装置
CN112019877A (zh) 基于vr设备的投屏方法、装置、设备及存储介质
CN102088593B (zh) 基于蓝牙3.0规范的mpeg4压缩视频传输通信系统及方法
CN103856809A (zh) 一种多点同屏方法、系统及终端设备
CN103957391A (zh) 在可视对讲中多方通话时同时显示各方视频的方法及系统
EP3342171A1 (en) Networked video communication applicable to gigabit ethernet
TWI519131B (zh) 影像傳輸系統及其傳輸端裝置與接收端裝置
CN110572673A (zh) 视频编解码方法和装置、存储介质及电子装置
CN102843566B (zh) 一种3d视频数据的通讯方法和设备
CN108322691A (zh) 视频会议实现方法、装置和系统、计算机可读存储介质
CN112565799B (zh) 视频数据处理方法和装置
CN109640030A (zh) 一种视频会议系统的音视频外设扩展装置及方法
WO2022116822A1 (zh) 沉浸式媒体的数据处理方法、装置和计算机可读存储介质
CN111049624B (zh) 一种基于滑动窗口的高容错无反馈链路影像传输方法及系统
CN110740286A (zh) 一种视频会议控制方法、多点控制单元及视频会议终端
CN110636295A (zh) 视频编解码方法和装置、存储介质及电子装置
US11758108B2 (en) Image transmission method, image display device, image processing device, image transmission system, and image transmission system with high-transmission efficiency

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23867581

Country of ref document: EP

Kind code of ref document: A1