CN113573088B

CN113573088B - Method and equipment for synchronously drawing identification object for live video stream

Info

Publication number: CN113573088B
Application number: CN202110838186.8A
Authority: CN
Inventors: 薛如冰; 王星; 王夷; 王瑞; 何剑; 俞君杰
Original assignee: Shanghai Xinyi Intelligent Technology Co ltd
Current assignee: Shanghai Xinyi Intelligent Technology Co ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2023-11-10
Anticipated expiration: 2041-07-23
Also published as: CN113573088A

Abstract

The application aims to provide a scheme for synchronously drawing identification objects for live video streams. Specifically, a live video stream containing video data and SEI data is firstly acquired; when the obtained live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information; when the live video stream is played, corresponding SEI data are obtained from the SEI queue according to the current playing time information; and then drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information. Compared with the prior art, the method and the device can control the drawing time of the identification object and the playing time of the video picture to keep accurate synchronization, prevent the sensory perception of the user from being delayed, and effectively improve the user experience.

Description

Method and equipment for synchronously drawing identification object for live video stream

Technical Field

The application relates to the technical field of information, in particular to a technology for synchronously drawing identification objects for live video streams.

Background

When playing live video, it is sometimes necessary to dynamically superimpose identification object markers for the live video being played. In particular, live video streams typically contain additional extension data (such as SEI data) from which identification objects need to be drawn when the live video stream is played. If the drawing time of the identification object and the playing time of the video picture have errors of more than 50 milliseconds, the human sense can obviously perceive the delay, so that the user experience is poor.

Disclosure of Invention

The application aims to provide a method and equipment for synchronously drawing identification objects for live video streams.

According to one aspect of the present application, there is provided a method of synchronously rendering an identification object for a live video stream, wherein the method comprises:

acquiring a live video stream, wherein the live video stream comprises video data and SEI data;

when the obtained live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information;

when the live video stream is played, corresponding SEI data are obtained from the SEI queue according to the current playing time information;

and drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information.

According to another aspect of the present application, there is also provided an apparatus for synchronously rendering an identification object for a live video stream, wherein the apparatus includes: the system comprises a video stream acquisition module, a video stream analysis module, a video stream synthesis module, an SEI synchronizer module, an SEI analysis module, a play module and a drawing module;

the video stream acquisition module is used for acquiring a live video stream;

when the live video stream acquired by the video stream acquisition module reaches a preset capacity threshold, the video stream analysis module is used for analyzing the live video stream, and the video stream synthesis module is used for storing the SEI data in the live video stream into an SEI queue according to time information;

the playing module is used for playing the live video stream, and the SEI synchronizer module is used for acquiring corresponding SEI data from the SEI queue according to the current playing time information of the live video stream;

the SEI synchronizer module is used for transmitting SEI data corresponding to the current playing time information to the SEI analysis module, the SEI analysis module is used for analyzing the SEI data corresponding to the current playing time information, and the drawing module is used for drawing a corresponding identification object on a playing picture of the live video stream.

According to yet another aspect of the present application, there is also provided a computing device, wherein the device comprises a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the method of synchronously rendering identification objects for live video streams.

According to yet another aspect of the present application, there is also provided a computer readable medium having stored thereon computer program instructions executable by a processor to implement the method of synchronously rendering identification objects for live video streams.

In the scheme provided by the application, a live video stream containing video data and SEI data is firstly obtained; when the obtained live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information; when the live video stream is played, corresponding SEI data are obtained from the SEI queue according to the current playing time information; and then drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information. When acquiring a live video stream, the method synchronously acquires video data and SEI data; when the live video is played, the identification object is rapidly drawn according to the SEI data, the drawing time of the identification object is controlled to be accurately synchronous with the playing time of the video picture, the sensory perception of a user is not delayed, and the user experience is effectively improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a method for synchronously rendering identification objects for live video streams in accordance with an embodiment of the present application;

FIG. 2 is a device workflow diagram for synchronously rendering identification objects for live video streams in accordance with an embodiment of the present application;

fig. 3 is an effect diagram of synchronously drawing identification objects for live video streams according to an embodiment of the present application;

FIG. 4 is a workflow diagram of a video stream acquisition module according to an embodiment of the application;

FIG. 5 is a schematic diagram of a data structure of a video streaming unit according to an embodiment of the present application;

fig. 6 is a schematic diagram of a naluItem data structure according to an embodiment of the present application.

The same or similar reference numbers in the drawings refer to the same or similar parts.

Detailed Description

The application is described in further detail below with reference to the accompanying drawings.

In one exemplary configuration of the application, the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer-readable media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, program devices, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device.

The embodiment of the application provides a method for synchronously drawing an identification object for a live video stream, which synchronously acquires video data and SEI data when acquiring the live video stream; and when the live video is played, rapidly drawing the identification object according to the SEI data, and controlling the drawing time of the identification object to be accurately synchronous with the playing time of the video picture.

In a practical scenario, the device implementing the method may be a user device, a network device, or a device formed by integrating the user device and the network device through a network. The user equipment includes, but is not limited to, terminal equipment such as a smart phone, a tablet computer, a Personal Computer (PC) and the like, and the network equipment includes, but is not limited to, a network host, a single network server, a plurality of network server sets or a computer set based on cloud computing and the like. Here, the Cloud is composed of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual computer composed of a group of loosely coupled computer sets.

Fig. 1 is a flowchart of a method for synchronously drawing identification objects for live video streams according to an embodiment of the present application, the method including step S101, step S102, step S103, and step S104.

Step S101, acquiring a live video stream, wherein the live video stream comprises video data and SEI data;

step S102, when the obtained live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information;

step S103, when the live video stream is played, corresponding SEI data is obtained from the SEI queue according to the current playing time information;

step S104, drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information.

Here, the live video stream includes video data and SEI (Supplemental Enhancement Information ) data; the SEI data is additional extension data, including related information of the identification object.

For example, the live video stream may be in a format such as h.264, and is based on an http+flv protocol. The live video stream can be played in an H5 page, and video elements in the H5 page support mp4, webm, ogg and other formats; the live video stream can be played after being converted into a format supported by H5 pages such as mp4, webm, ogg and the like.

After the live video stream is acquired, the live video stream needs to be parsed. The video stream is an unstable structure, and is affected by various factors, sometimes more and sometimes less. In an embodiment of the present application, a space (i.e., a buffer) for storing live video streaming data is provided. After the live video stream is acquired, the live video stream is not immediately parsed; when the obtained live video stream reaches a preset capacity threshold, analyzing the live video stream, and analyzing video data and SEI data in the live video stream.

And when the live video stream is analyzed, the live video stream is also required to be converted into a format supported by video playing pages such as mp4, webm, ogg and the like. For example, the live video stream may be parsed according to MSE (Media Source Extensions, media source expansion) rules, the live video stream may be time stamped (timestamp), and during parsing, the SEI data may also carry time information. In the step S102, the SEI data in the live video stream may be stored in a SEI queue (e.g., seiQueue) according to time information. The SEI queue is a queue for storing the SEI data and is located in a memory of a computer.

In the step S103, when the live video stream is played, current playing time information (such as currentTime) of the live video stream may be obtained. The corresponding SEI data may then be obtained from the SEI queue (e.g., seiQueue) according to the current play time information (e.g., currentTime). For example, SEI data corresponding to currentTime may include: the time information is less than or equal to currentTime, and the SEI data having the smallest gap from currentTime. In the step S104, the SEI data corresponding to the current playing time information may be handed to a canvas element in an H5 page for processing, and the corresponding identification objects may be synchronously drawn on the playing frame of the live video stream.

For example, as shown in fig. 2, an apparatus for synchronously rendering an identification object for a live video stream includes a video stream acquisition module (e.g., IO-Controller), a video stream parsing module (e.g., flv-demux), a video stream synthesizing module (e.g., mp 4-remux), an SEI synchronizer module (e.g., seiSynchronizer), an SEI parsing module (e.g., seipearse), a playing module (e.g., video element in H5 page), and a rendering module (e.g., canvas element in H5 page).

The video stream acquisition module (such as IO-Controller) is used for acquiring a live video stream; when the live video stream acquired by the video stream acquisition module (such as IO-Controller) reaches a preset capacity threshold, the video stream analysis module (such as flv-demux) analyzes the live video stream, the video stream synthesis module (such as mp 4-remux) processes the data analyzed by the video stream analysis module (such as flv-demux), synthesizes formats supported by video elements in an H5 page (such as mp4, webm, ogg and the like), simultaneously stores SEI data in the live video stream into an SEI queue according to time information, and gives the SEI data to the SEI synchronizer module (such as SeiSynchronizer). The SEI synchronizer module (such as SeiSynchronizer) periodically acquires current playing time information (such as currentTime) of the playing module (such as video element in H5 page), and matches corresponding (such as time information before and closest to the current playing time information) SEI data according to the current playing time information. And then analyzing the SEI data corresponding to the current playing time information by the SEI analysis module (such as the SEiParse), and processing the obtained analysis data (including position information, such as coordinate information and the like) by the drawing module (such as a canvas element in an H5 page). In an H5 page for playing live video, a canvas element is covered on a video element, and as shown in FIG. 3, the positions of the canvas element and the video element in the page are identical, and the width and the height are identical; the visual perception of a person is that the recognition object appears on a live video picture.

In some embodiments, the step S101 includes: acquiring a live video stream, detecting the data size of the acquired live video stream, and storing the acquired live video stream into a buffer area if the data size of the acquired live video stream does not exceed the capacity size of the buffer area, wherein the capacity size of the buffer area is the preset capacity threshold.

For example, a video stream is an unstable structure, and the data size of each acquisition of a live video stream fluctuates with the network, sometimes more acquisitions, and sometimes less acquisitions. The buffer area is a space for storing live video stream data and is positioned in the memory of the computer. Here, threshold detection may be performed on live video stream data obtained each time and the data size of the buffer area (such as stashBuffer); if the data size of the obtained live video stream does not exceed the preset capacity threshold, continuing to store the obtained live video stream into the buffer area; and if the acquired data size of the live video stream exceeds the preset capacity threshold, triggering the video stream analysis module (flv-demux shown in fig. 2) to perform data analysis. Here, frequent triggering of the video stream analysis module to analyze data can be avoided, and waste of system resources is avoided.

As shown in fig. 4, after the video stream obtaining module (e.g., IO-Controller) obtains a live video stream, calculation processing is performed, and the obtained live video stream is stored in the buffer (e.g., stashBuffer). Meanwhile, a judgment is made according to whether stashused+chunk.byte length < = stashSize is true.

For example, as shown in fig. 4, if stashused+chunk. Specifically, chunk may be appended in stashBuffer; stashUsed plus chunk.

For example, as shown in fig. 4, if stashused+chunk.byte length < = stashSize is not satisfied, it is determined whether stashused= 0 is satisfied. If stashuseds= 0 is true, no data remains in stashBuffer, and the data transmitted at this time is directly used to trigger dispatchChunks. Specifically, (1) fivdumux sniffs the resolved chunk data (chunk, byte start); (2) The FlvDemuxer analyzes according to the rule and returns the number of the consumed chunk after analysis; (3) If the con-sumed < chunk. (4) Consuming the remaining data length remain=chunk.byte length-con-sumed, and when the value of remain is larger than the space size buffer size of the stashBuffer, performing space expansion on the stashBuffer; (5) storing the remaining data remainderry in a stashBuffer; (6) The value of stashUsed is equal to the value of stashUsed plus remainderay. (7) stashbyestart=bytestart+con-sumed, and the value of stashbyestart plus con-sumed may also be used.

For example, as shown in fig. 4, if stashused+chunk.byte length < = stashSize is not true, and stashused= 0 is also not true, there is remaining data in stashBuffer, and the data buffer remaining in stashBuffer is directly used for disatchuncks. Specifically, (1) the flvdumux sniffs and parses buffer data (buffer, stashbyte start); (2) The FlvDemuxer analyzes according to the rule, and returns the consumed buffer number condumed after analysis; (3) If the con-sumed is smaller than buffer. Byte length and con-sumed >0, then the subsequent steps ((4) - (6) and (10) - (12)) are performed; (4) The buffer data has residual data remannarray, and the residual data remannarray is stored in a stashBuffer; (5) Setting a stashUsed value, stashused=remannarray. (6) stashbytesttart plus the used data amount con-sumed, i.e. stashbytesttart = stashbytesttart + con-sumed; (7) If the con-sumed is equal to buffer. Byte length, then the subsequent steps ((8) - (12)) are performed; (8) buffer data is consumed in its entirety, stashuse=0; (9) stashbytesttart plus the used data amount con-sumed, i.e. stashbytesttart = stashbytesttart + con-sumed; (10) When the value of the consumption remaining available data length stashUsed plus the chunk. Byte length is greater than the space size buffer size of the stashBuffer, the stashBuffer needs to be spatially extended; (11) Adding the chunk data to the stashBuffer, and adding the order of the guaranteed data from the stashUsed position; (12) stashUsed plus chunk.

In some embodiments, the step S102 includes: and when the obtained live video stream reaches the preset capacity threshold, analyzing the live video stream to obtain the size of consumed video stream data, and re-storing the unconsumed video stream data into the buffer area.

For example, after the video stream parsing module (such as flv-demux) parses the live video stream, the byte count of the consumed video stream data is obtained and the unconsumed video stream data is restored in the buffer (such as stashBuffer). The consumed video stream data refers to the data consumed by the video which can be synthesized at the time, and the unconsumed video stream data refers to the data which cannot be synthesized at the time.

In some embodiments, the step S102 includes: analyzing the data blocks of the live video stream to obtain a plurality of video stream units; the structure of the video stream unit comprises tagType, dataSize, timestamp, steamId, data and prevTagSize, wherein tagType is used for recording type information of the video stream unit, dataSize is used for recording Data size of real Data in the video stream unit, timestamp is used for recording timestamp information, stearid is used for recording identification information of the video stream unit, data is used for recording real Data in the video stream unit, and prevTagSize is used for recording Data size of a previous video stream unit of the video stream unit.

For example, after the video stream parsing module (such as flv-demux) obtains the live video stream, it needs to parse a data block (chunks) of the live video stream, parse the data block (chunks) of the live video stream into a plurality of video stream units chunkItem (such as units of flv protocol), and parse available chunkItem. In addition, the length of the data byte after analysis can be returned to the video stream acquisition module (such as IO-Controller) to be stored and used together with the buffer (such as stashBuffer).

For example, as shown in fig. 5, the structure of each video stream unit chunkItem includes tagType, dataSize, timestamp, steamId, data and prevTagSize. Where tagType represents the type of chunkItem, occupies a three byte length, such as a tagType value of 8 for Audio (Audio), 9 for video (video), and 18 for the scriptataobject. dataSize represents the number of bytes of the real data portion, occupying a length of 3 bytes. timestamp represents a timestamp, occupying 4 bytes in length. The steemid represents the identification information of chukktem, occupying a length of 3 bytes. Data represents the real Data in chunkItem, occupying the byte length of the value of dataSize. prevTagSize represents the length of the previous chunkItem. The byte length of chunkItem is equal to the sum of the tagType length, dataSize length, timestamp length, stepid length, data length, and prevTagSize length, i.e., chunkitemlength=1+3+4+3+datasize value+4). The starting position of the chunkItem is offset, and the offset of the first chunkItem is 0; the position where the next chunkItem starts is equal to the last chunkItem's offset plus the chunkItemLength's value dataOffset. Thus, reading chunks using the value of chunkItemLength obtains each chunkItem data and parses chunkItem.

In some embodiments, the step S102 includes: establishing a video track object, wherein samples of the video track object are used for storing analysis data of the live video stream; analyzing the data blocks of the live video stream to obtain a plurality of video stream units, and sequentially reading and analyzing each video stream unit.

For example, before reading and parsing a data block (chunks) of the live video stream, a videoTrack object (video data storage space object) for storing parsed data needs to be created, wherein samples of the videoTrack object store the parsed data of the live video stream.

In some embodiments, the step S102 includes: judging whether the video stream unit is video data according to the tagType, if so, analyzing a first byte of the real data, and marking an analysis value of the first byte of the real data as a frameType; analyzing a second byte of the real data, and marking an analysis value of the second byte of the real data as a packetType, wherein a value of 0 of the packetType indicates that the real data is video configuration information, and a value of 1 of the packetType indicates that the real data is a video sample; if the value of the PacketType is 0, analyzing the video configuration information to obtain the length of naluLengthSize; if the value of the PacketType is 1, naluItem is analyzed, and an avcSample object of the video sample is constructed.

For example, when parsing each of the video stream units chunkItem: (1) The byte of chunkItem is tagType with the first bit, the second to fourth bits are the byte length dataSize of real data, and the fifth to eighth bits represent a timestamp, such as representing video parsing data when tagType is 9. (2) The first byte of real data realData (the value from the position where the value of dataOffset starts to the value of dataSize is real data realData) is parsed and its parsed value is denoted as frameType (key frame flag bit). (3) Analyzing real data realData, wherein the value obtained by analyzing the second byte is a packetType, wherein the packetType value is 0 to represent configuration information, and if 1 represents a video sample; the third to fifth bits of the real data realData byte have a value of cts. (4) When the value of the packetType is equal to 0, the video configuration information is parsed to obtain the length of naluLengthSize. (5) When the value of the PacketType is equal to 1, naluItem is parsed, and a video sample avcSample object is constructed. (6) Reading the next chunkItem, then continuing to execute the step (1), and repeating the steps until all chunks are read; and when the reading is finished, executing the video stream synthesis module (such as mp 4-remux), and simultaneously returning the byte number condumed of the consumed video stream data to the video stream acquisition module (such as IO-Controller).

For example, as shown in fig. 6, parsing starts with the sixth bit of the real data realData byte, resulting in video data and additional extension data (i.e., nalus), generating a video sample avcSample object. The starting position of naluItem is naluOffset, and naluOffset of the first nalItem is 0; the position at which the next naluItem starts is equal to the naluOffset of the previous naluItem plus naluLengthSize and the value of the naluLengthSize bit before the naluItem (i.e., the value of the first three or four bits of the naluItem); the nalus is read repeatedly, each naluItem data is obtained, and naluItems are parsed. The fourth or fifth bit of the naluItem byte is naluType (3 or 4 bits depending mainly on the byte occupied by naluLengthSize), the value of naluType is numerous (5 or 6 or other); the acknowledgement is a key frame again when naluType is 5. If the resolution value frameType of the first byte of the real data realData is 1, or the naluType of any naluItem in the current resolution nalus is 5, the keyframe is true, i.e. the keyframe is confirmed. Parsing naluItem generates unit data for each stored nalu object, e.g., { type:5, data: data } or { type:6, data: data, nalutengthsize: nalutengthsize }. the type attribute value is naluType; and data, the saved result is the data of naluItem in chucks, the starting position of real data realData of the current analysis is added with the data between the value of naluLengthSize and the previous naluLengthSize bit of the current analysis naluItem, such as: the data is data from 16 bytes to 39 bytes of chucks when the start position of the realData is 16 bytes, the nalunengthsize is 3 bytes, and the first three bits of naluItem are 20; when naluType is 6, an additional attribute naluLengthSize needs to be built, with a value of 3 or 4. After nalus analysis is completed, units arrays of the storage unit can be obtained, and an avcSample object can be constructed at the moment. The avcSample contains units with a units attribute value of units, isKeyframe, a keyframe with a dts attribute value of timestamp (time information), and then the avcSample is stored in the samples attribute of videoTrack.

In some embodiments, the step S102 includes: and acquiring analysis data of the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information.

For example, the video stream composition module (such as mp 4-remux) processes each item in samples of the video track object after receiving the video track object delivered by the video stream parsing module (such as flv-demux), and finds a unit with type 6 in units, i.e. SEI data, in each sample item (i.e. avcSample object), for example, { type:6, nalutengthsize, data: data }, nalutengthsize is 3 or 4. The obtained SEI data may then be analyzed, adding to the SEI data an additional payloadType (the second bit value after naluLengthSize of the data of the SEI, theoretical 5), payloadSize (the length of the data available in the data of the SEI, including the length of the UUID), UUID (the first 16 bits of the data available in the data of the SEI, describing rules of the SEI data, such as vendor unique identification information, determining the parsing rules of the SEI parsing module) and payloadByteOffset (the true custom offset position of the data available in the data of the SEI, the data of the identification object does not include the length of the UUID). In the step S102, the SEI data is handed to the SEI synchronizer module (e.g., seiSynchronizer) and stored in the SEI queue (e.g., seiQueue).

In some embodiments, the step S103 includes: when the live video stream is played, acquiring current playing time information of the live video stream at intervals of preset time; and acquiring corresponding SEI data from the SEI queue according to the current playing time information.

For example, the SEI synchronizer module (e.g., seiSynchronizer) receives the dts time information and SEI data delivered by the video stream composition module (e.g., mp 4-remux), and stores them in the SEI queue (e.g., seiQueue). The SEI synchronizer module (such as SeiSynchronizer) periodically inquires the current playing time information (such as currentTime) of the playing module (such as video element in H5 page), and obtains SEI data corresponding to the current playing time information from the SEI queue (such as SeiQueue). Here, the SEI data corresponding to the current play time information (e.g., currentTime) may be determined by comparing dts time information with currentTime, the time information is at or before the current play time information, and the time information is the SEI data closest to the current play time information. And then, transferring the SEI data corresponding to the current playing time information to the SEI analysis module (such as a SEiParse). In this process, the SEI queue (e.g., seiQueue) clears the data prior to the current play time information (e.g., currentTime), ensuring that the SEI queue is not too large.

In some embodiments, the step S104 includes: analyzing SEI data corresponding to the current playing time information to obtain position information and/or content information of the identification object; and drawing the identification object on a playing picture of the live video stream.

For example, the SEI parsing module (e.g., seipipe) receives the SEI data corresponding to the current play time information transmitted by the SEI synchronizer module (e.g., seisync), performs verification and parsing according to a predetermined parsing rule, and returns corresponding location information or content information, etc. Here, the SEI parsing module may select different parsing rules according to UUIDs). The SEI parsing module (e.g., the seiParse) then passes the parsing information of the SEI data to the rendering module (e.g., canvas element in the H5 page). In the H5 page for playing the live video, as shown in FIG. 3, the positions of the canvas element and the video element in the page are the same, and the width and the height dimensions are kept consistent; and the drawing module draws according to analysis information of the SEI data in a positioning mode, and covers canvas elements on video elements, and the background is set to be transparent. In addition, the drawing module can clean the data drawn last time before drawing the data each time.

In some embodiments, as shown in fig. 2, the step S101 includes: acquiring the live video stream through a video stream acquisition module (such as an IO-Controller); the step S102 includes: when the live video stream acquired by the video stream acquisition module (such as IO-Controller) reaches a preset capacity threshold, the live video stream is analyzed by a video stream analysis module (such as flv-demux), and the SEI data in the live video stream is stored into an SEI queue according to time information by a video stream synthesis module (such as mp 4-remux); the step S103 includes: when the live video stream is played through a playing module (such as video element in an H5 page), a SEI synchronizer module (such as SeiSynchronizer) is utilized to acquire corresponding SEI data from the SEI queue according to the current playing time information; the step S104 includes: and transmitting the SEI data corresponding to the current playing time information to an SEI analysis module (such as a SEiSynchronizer) through the SEI synchronizer module (such as the SeiSynchronizer), analyzing the SEI data corresponding to the current playing time information by utilizing the SEI analysis module (such as the SEiParse), and drawing a corresponding identification object on a playing picture of the live video stream by utilizing a drawing module (such as a canvas element in an H5 page).

According to an embodiment of the present application, there is also provided an apparatus for synchronously drawing an identification object for a live video stream, where the apparatus includes: the system comprises a video stream acquisition module, a video stream analysis module, a video stream synthesis module, an SEI synchronizer module, an SEI analysis module, a play module and a drawing module.

As shown in fig. 2, the video stream obtaining module (such as IO-Controller) is configured to obtain a live video stream; when the live video stream acquired by the video stream acquisition module (such as IO-Controller) reaches a preset capacity threshold, the video stream analysis module (such as flv-demux) is used for analyzing the live video stream, and the video stream synthesis module (such as mp 4-demux) is used for storing the SEI data in the live video stream into an SEI queue according to time information; the playing module (such as video element in H5 page) is used for playing the live video stream, and the SEI synchronizer module (such as SeiSynchronizer) is used for obtaining corresponding SEI data from the SEI queue according to the current playing time information of the live video stream; the SEI synchronizer module (such as SeiSynchronizer) is used for transmitting SEI data corresponding to the current playing time information to the SEI analysis module (such as SeiPorse), the SEI analysis module (such as SeiPorse) is used for analyzing the SEI data corresponding to the current playing time information, and the drawing module (such as canvas element in H5 page) is used for drawing a corresponding identification object on a playing picture of the live video stream.

In summary, when the embodiment of the application acquires the live video stream, the video data and the SEI data are synchronously acquired; when the live video is played, the identification object is rapidly drawn according to the SEI data, the drawing time of the identification object is controlled to be accurately synchronous with the playing time of the video picture, the sensory perception of a user is not delayed, and the user experience is effectively improved.

Furthermore, portions of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application by way of operation of the computer. Program instructions for carrying out the methods of the present application may be stored on fixed or removable recording media and/or transmitted over a data stream on a broadcast or other signal bearing medium and/or stored in a working memory of a computer device that operates in accordance with the program instructions. Some embodiments of the present application herein provide a computing device comprising a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the methods and/or aspects of the various embodiments of the present application described above.

Furthermore, some embodiments of the present application provide a computer readable medium having stored thereon computer program instructions executable by a processor to implement the methods and/or aspects of the various embodiments of the present application described above.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In some embodiments, the software program of the present application may be executed by a processor to perform the steps or functions described above. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Claims

1. A method of synchronously rendering an identified object for a live video stream, wherein the method comprises:

acquiring a live video stream, detecting the data size of the acquired live video stream, and storing the acquired live video stream into a buffer area if the data size of the acquired live video stream does not exceed the capacity size of the buffer area, wherein the live video stream comprises video data and SEI data, and the capacity size of the buffer area is a preset capacity threshold;

when the obtained live video stream reaches the preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information;

analyzing the live video stream, including: analyzing the data blocks of the live video stream to obtain a plurality of video stream units; the structure of the video stream unit comprises tagType, dataSize, timestamp, steamId, data and prevTagSize, wherein the tagType is used for recording type information of the video stream unit, the dataSize is used for recording Data size of real Data in the video stream unit, the timestamp is used for recording timestamp information, the steemid is used for recording identification information of the video stream unit, the Data is used for recording real Data in the video stream unit, and the prevTagSize is used for recording Data size of a previous video stream unit of the video stream unit;

2. The method of claim 1, wherein parsing the live video stream when the acquired live video stream reaches a predetermined capacity threshold comprises:

and when the obtained live video stream reaches the preset capacity threshold, analyzing the live video stream to obtain the size of consumed video stream data, and re-storing the unconsumed video stream data into the buffer area.

3. The method of claim 1, wherein parsing the live video stream comprises:

establishing a video track object, wherein samples of the video track object are used for storing analysis data of the live video stream;

analyzing the data blocks of the live video stream to obtain a plurality of video stream units, and sequentially reading and analyzing each video stream unit.

4. A method according to claim 3, wherein the step of parsing each of the video stream units comprises:

judging whether the video stream unit is video data according to the tagType, if so, analyzing a first byte of the real data, and marking an analysis value of the first byte of the real data as a frameType;

analyzing a second byte of the real data, and marking an analysis value of the second byte of the real data as a packetType, wherein a value of 0 of the packetType indicates that the real data is video configuration information, and a value of 1 of the packetType indicates that the real data is a video sample;

if the value of the PacketType is 0, analyzing the video configuration information to obtain the length of naluLengthSize;

if the value of the PacketType is 1, naluItem is analyzed, and an avcSample object of the video sample is constructed.

5. A method according to claim 3, wherein storing the SEI data in the live video stream in an SEI queue according to time information, comprises:

and acquiring analysis data of the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information.

6. The method of claim 1, wherein retrieving corresponding SEI data from the SEI queue according to current play time information when playing the live video stream, comprises:

when the live video stream is played, acquiring current playing time information of the live video stream at intervals of preset time;

and acquiring corresponding SEI data from the SEI queue according to the current playing time information.

7. The method of claim 1, wherein drawing the corresponding recognition object on the play picture of the live video stream according to the SEI data corresponding to the current play time information, comprises:

analyzing SEI data corresponding to the current playing time information to obtain position information and/or content information of the identification object;

and drawing the identification object on a playing picture of the live video stream.

8. The method of any of claims 1 to 7, wherein obtaining a live video stream comprises:

acquiring the live video stream through a video stream acquisition module;

when the obtained live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information, wherein the method comprises the following steps:

when the live video stream acquired by the video stream acquisition module reaches a preset capacity threshold, analyzing the live video stream by a video stream analysis module, and storing the SEI data in the live video stream into an SEI queue by a video stream synthesis module according to time information;

when the live video stream is played, corresponding SEI data is obtained from the SEI queue according to current playing time information, and the method comprises the following steps:

when the live video stream is played through a playing module, a SEI synchronizer module is utilized to acquire corresponding SEI data from the SEI queue according to the current playing time information;

wherein, according to the SEI data corresponding to the current playing time information, drawing a corresponding identification object on a playing picture of the live video stream comprises:

and transmitting the SEI data corresponding to the current playing time information to an SEI analysis module through the SEI synchronizer module, analyzing the SEI data corresponding to the current playing time information by utilizing the SEI analysis module, and drawing a corresponding identification object on a playing picture of the live video stream by utilizing a drawing module.

9. An apparatus for synchronously rendering an identification object for a live video stream, wherein the apparatus comprises: the system comprises a video stream acquisition module, a video stream analysis module, a video stream synthesis module, an SEI synchronizer module, an SEI analysis module, a play module and a drawing module;

the video stream acquisition module is used for acquiring a live video stream, detecting the data size of the acquired live video stream, and if the data size of the acquired live video stream does not exceed the capacity size of a buffer area, storing the acquired live video stream into the buffer area, wherein the live video stream comprises video data and SEI data, and the capacity size of the buffer area is a preset capacity threshold;

when the live video stream acquired by the video stream acquisition module reaches the preset capacity threshold, the video stream analysis module is used for analyzing the live video stream, and the video stream synthesis module is used for storing the SEI data in the live video stream into an SEI queue according to time information;

10. A computing device, wherein the device comprises a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the method of any one of claims 1 to 8.

11. A computer readable medium having stored thereon computer program instructions executable by a processor to implement the method of any of claims 1 to 8.