CN113573088A

CN113573088A - Method and equipment for synchronously drawing identification object for live video stream

Info

Publication number: CN113573088A
Application number: CN202110838186.8A
Authority: CN
Inventors: 薛如冰; 王星; 王夷; 王瑞; 何剑; 俞君杰
Original assignee: Shanghai Xinyi Intelligent Technology Co ltd
Current assignee: Shanghai Xinyi Intelligent Technology Co ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-10-29
Anticipated expiration: 2041-07-23
Also published as: CN113573088B

Abstract

The application aims to provide a scheme for synchronously drawing an identification object for a live video stream. Specifically, a live video stream containing video data and SEI data is acquired first; when the acquired live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information; when the live video stream is played, acquiring corresponding SEI data from the SEI queue according to current playing time information; and then drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information. Compared with the prior art, the method and the device can control the drawing time of the recognition object and the playing time of the video picture to keep accurate synchronization, so that the sense organ of the user can not perceive delay, and the user experience is effectively improved.

Description

Method and equipment for synchronously drawing identification object for live video stream

Technical Field

The application relates to the technical field of information, in particular to a technology for synchronously drawing and identifying an object for a live video stream.

Background

When playing a live video, sometimes it is necessary to dynamically superimpose an identification object mark on the live video being played. Specifically, the live video stream usually contains additional extension data (such as SEI data), and when the live video stream is played, the identification object needs to be drawn according to the additional extension data. If the drawing time of the identification object and the playing time of the video picture have an error of more than 50 milliseconds, the delay can be obviously perceived by human senses, and the user experience is poor.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for synchronously rendering an identification object for a live video stream.

According to one aspect of the application, a method for synchronously drawing an identification object for a live video stream is provided, wherein the method comprises the following steps:

acquiring a live video stream, wherein the live video stream comprises video data and SEI data;

when the acquired live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information;

when the live video stream is played, acquiring corresponding SEI data from the SEI queue according to current playing time information;

and drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information.

According to another aspect of the present application, there is also provided an apparatus for synchronously rendering an identification object for a live video stream, wherein the apparatus includes: the system comprises a video stream acquisition module, a video stream analysis module, a video stream synthesis module, an SEI synchronizer module, an SEI analysis module, a playing module and a drawing module;

the video stream acquisition module is used for acquiring a live video stream;

when the live video stream acquired by the video stream acquisition module reaches a preset capacity threshold, the video stream analysis module is used for analyzing the live video stream, and the video stream synthesis module is used for storing the SEI data in the live video stream into an SEI queue according to time information;

the play module is used for playing the live video stream, and the SEI synchronizer module is used for acquiring corresponding SEI data from the SEI queue according to the current play time information of the live video stream;

the SEI synchronizer module is used for transmitting SEI data corresponding to the current playing time information to the SEI analysis module, the SEI analysis module is used for analyzing the SEI data corresponding to the current playing time information, and the drawing module is used for drawing corresponding identification objects on a playing picture of the live video stream.

According to yet another aspect of the application, there is also provided a computing device, wherein the device comprises a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the method for synchronously rendering an identification object for a live video stream.

According to yet another aspect of the present application, there is also provided a computer readable medium having stored thereon computer program instructions executable by a processor to implement the method for synchronously rendering an identification object for a live video stream.

In the scheme provided by the application, a live video stream containing video data and SEI data is obtained firstly; when the acquired live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information; when the live video stream is played, acquiring corresponding SEI data from the SEI queue according to current playing time information; and then drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information. When a live video stream is acquired, video data and SEI data are acquired synchronously; when live video is played, the identification object is drawn rapidly according to the SEI data, the drawing time of the identification object is controlled to be kept accurate and synchronous with the playing time of the video picture, delay cannot be perceived by a user sense, and user experience is effectively improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a flow chart of a method for synchronously rendering an identification object for a live video stream according to an embodiment of the present application;

FIG. 2 is a flowchart of the operation of an apparatus for synchronously rendering an identified object for a live video stream according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating an effect of synchronously rendering an identification object for a live video stream according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating the operation of a video stream capture module according to an embodiment of the present application;

fig. 5 is a schematic diagram of a data structure of a video stream unit according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of naluItem data according to an embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, program means, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The embodiment of the application provides a method for synchronously drawing an identification object for a live video stream, and when the live video stream is acquired, video data and SEI data are synchronously acquired; and when the live video is played, quickly drawing the identification object according to the SEI data, and controlling the drawing time of the identification object and the playing time of the video picture to keep accurate synchronization.

In a practical scenario, the device implementing the method may be a user equipment, a network device, or a device formed by integrating the user equipment and the network device through a network. The user equipment includes, but is not limited to, a terminal device such as a smartphone, a tablet computer, a Personal Computer (PC), and the like, and the network device includes, but is not limited to, a network host, a single network server, multiple network server sets, or a cloud computing-based computer set. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, one virtual computer consisting of a collection of loosely coupled computers.

Fig. 1 is a flowchart of a method for synchronously rendering an identification object for a live video stream according to an embodiment of the present application, where the method includes step S101, step S102, step S103, and step S104.

Step S101, acquiring a live video stream, wherein the live video stream comprises video data and SEI data;

step S102, when the acquired live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information;

step S103, when the live video stream is played, corresponding SEI data is obtained from the SEI queue according to the current playing time information;

and step S104, drawing a corresponding identification object on a playing picture of the live video stream according to the SEI data corresponding to the current playing time information.

Here, the live video stream includes video data and SEI (Supplemental Enhancement Information) data; the SEI data is additional extension data containing information about the identified object.

For example, the live video stream may adopt a format such as h.264, and is based on a protocol such as http + flv. The live video stream can be played in an H5 page, and a video element in an H5 page supports formats such as mp4, webm and ogg; the live video stream can be played after being converted into a format supported by H5 pages such as mp4, webm, ogg, and the like.

After the live video stream is acquired, the live video stream needs to be parsed. The video stream is an unstable structure, and is affected by various factors, sometimes more and sometimes less. In an embodiment of the present application, a space (i.e., a buffer) for storing live video stream data is provided. After the live video stream is acquired, the live video stream is not immediately parsed; when the acquired live video stream reaches a preset capacity threshold, analyzing the live video stream, and analyzing video data and SEI data in the live video stream.

When the live video stream is parsed, the live video stream needs to be converted into a format supported by video playing pages such as mp4, webm, and ogg. For example, the live video stream may be parsed according to the MSE (Media Source Extensions) rule, the live video stream may have a timestamp (timestamp), and the SEI data may also carry time information during parsing. In step S102, the SEI data in the live video stream may be stored in an SEI queue (e.g., seiQueue) according to time information. The SEI queue is a queue used for storing the SEI data and is positioned in the memory of the computer.

In step S103, when the live video stream is played, current playing time information (e.g., currentTime) of the live video stream may be acquired. Then, according to the current playing time information (e.g., currentTime), corresponding SEI data may be obtained from the SEI queue (e.g., seiQueue). For example, SEI data corresponding to currentTime may include: the time information is less than or equal to currentTime and SEI data having the smallest gap from currentTime. In step S104, the SEI data corresponding to the current playing time information may be handed over to a canvas element in the H5 page for processing, and a corresponding recognition object may be synchronously drawn on the playing screen of the live video stream.

For example, as shown in fig. 2, an apparatus for synchronously rendering an identified object for a live video stream includes a video stream acquisition module (e.g., IO-Controller), a video stream parsing module (e.g., flv-demux), a video stream composition module (e.g., mp4-remuxer), an SEI synchronizer module (e.g., seissyncronizer), an SEI parsing module (e.g., seiParse), a playback module (e.g., video element in H5 page), and a rendering module (e.g., canvas element in H5 page).

The video stream obtaining module (such as an IO-Controller) is configured to obtain a live video stream; when the live video stream acquired by the video stream acquisition module (such as an IO-Controller) reaches a predetermined capacity threshold, the live video stream is parsed by the video stream parsing module (such as flv-demux), the video stream synthesis module (such as mp4-remuxer) processes the data parsed by the video stream parsing module (such as flv-demux), synthesizes formats supported by video elements in an H5 page (such as mp4, webm, and ogg), and stores SEI data in the live video stream into an SEI queue according to time information and hands the SEI data to the SEI synchronizer module (such as seissyncronizer) for processing. The SEI synchronizer module (e.g., seissyncronizer) periodically obtains current playing time information (e.g., currentTime) of the playing module (e.g., video element in H5 page), and matches corresponding (e.g., time information is before and closest to the current playing time information) SEI data according to the current playing time information. Then, the SEI analysis module (e.g., seiParse) analyzes the SEI data corresponding to the current playing time information, and the obtained analysis data (including position information, such as coordinate information, etc.) is processed by the rendering module (e.g., canvas element in the H5 page). In the H5 page playing the live video, the canvas element is overlaid on the video element, as shown in fig. 3, the positions of the canvas element and the video element in the page are the same, and the sizes of the canvas element and the video element are the same; the human being visually perceives that the recognition object appears on the live video picture.

In some embodiments, the step S101 includes: the method comprises the steps of obtaining a live broadcast video stream, detecting the size of the obtained data of the live broadcast video stream, and storing the obtained live broadcast video stream into a cache area if the size of the obtained data of the live broadcast video stream does not exceed the size of the capacity of the cache area, wherein the size of the capacity of the cache area is the preset capacity threshold.

For example, a video stream is an unstable structure, and the data size of each acquisition of a live video stream fluctuates with the network, sometimes more is acquired, and sometimes less is acquired. The buffer area is a space for storing live video stream data and is positioned in a memory of the computer. Here, threshold detection may be performed on the live video stream data acquired each time and the data size of the buffer (e.g., the stashBuffer); if the size of the acquired live video stream does not exceed the preset capacity threshold, continuing to store the acquired live video stream into the cache region; and if the acquired data size of the live video stream exceeds the preset capacity threshold, triggering the video stream parsing module (such as flv-demux shown in fig. 2) to perform data parsing. Therefore, the video stream analysis module can be prevented from being frequently triggered to analyze data, and waste of system resources is avoided.

As shown in fig. 4, after the video stream obtaining module (e.g., IO-Controller) obtains a live video stream, the video stream obtaining module performs calculation processing, and stores the obtained live video stream into the buffer (e.g., a stashBuffer). Meanwhile, the judgment is made according to whether the stashUsed + chunk.

For example, as shown in fig. 4, if stashUsed + chunk, bytlength < (stashSize) holds, a part of data is saved using the stashBuffer, which can avoid the dispatch chunks every time a chunk of a live video stream is acquired. Specifically, chunk may be appended to the stashBuffer; stashUsed plus chunk.

For example, as shown in fig. 4, if the stashUsed + chunk is not satisfied, it is determined whether the stashUsed is satisfied as 0. If the statused is equal to 0, the statbuffer has no residual data, and the data transmitted this time is directly used to trigger the dispatch chunks. Specifically, (1) the fivddemuxer performs sniffing to analyze chunk data (chunk, byteStart); (2) the FlvDemuxer analyzes according to the rule and returns the consumed chunk number consumed after analysis; (3) performing subsequent steps ((4) - (7)) if consumed < chunk. (4) Consuming the remaining data length, namely, chunk, byte length-consumed, and when the value of the remaining data length is greater than the space size buffer size of the pushbuffer, performing space expansion on the pushbuffer; (5) storing the remaining data remainArray in a stashBuffer; (6) stashUsed has a value equal to the original value of stashUsed plus remainarray. (7) The statbytestart is byteStart + consumed, and consumed may be added to the value of statbytestart.

For example, as shown in fig. 4, if the stashUsed + chunk is not established and the stashUsed is not established as 0, the remaining data in the stashBuffer is stored, and the remaining data buffer in the stashBuffer is used as it is to perform the dispatch chunks. Specifically, (1) flvddemuxer sniffs and resolves buffer data (buffer, statbytestart); (2) the FlvDemuxer analyzes according to the rule and returns the consumed buffer quantity consummed after analysis; (3) if consumed is less than buffer. bytelength and consumed >0, performing the subsequent steps ((4) - (6) and (10) - (12)); (4) the buffer data comprises residual data remainArray, and the residual data remainArray is stored in the stashBuffer; (5) setting a statused value, the statused being remainaray. (6) Adding the used data amount consumed to the statByteStart, namely, the statByteStart is statByteStart + consumed; (7) if consummed is equal to buffer. (8) The buffer data is completely consumed, and the stateused is 0; (9) adding the used data amount consumed to the statByteStart, namely, the statByteStart is statByteStart + consumed; (10) when consuming the value of the remaining available data length, statused, plus chunk, byterlength, which is larger than the spatial size buffer size of the statbuffer, the space of the statbuffer needs to be expanded; (11) adding chunk data into a statBuffer, and adding a sequence of guaranteed data from a statused position; (12) the statused is added with chunk.

In some embodiments, the step S102 includes: when the acquired live video stream reaches the preset capacity threshold value, analyzing the live video stream to obtain the size of consumed video stream data, and storing unconsumed video stream data into the cache region again.

For example, after the live video stream is parsed by the video stream parsing module (e.g., flv-demux), the byte count consummed of the consumed video stream data is obtained, and the unconsumed video stream data is restored into the buffer (e.g., a stashBuffer). The consumed video stream data refers to data consumed by synthesizing a video at this time, and the unconsumed video stream data refers to data which cannot be synthesized into a video at this time.

In some embodiments, the step S102 includes: analyzing the data blocks of the live video stream to obtain a plurality of video stream units; the structure of the video stream unit comprises tagType, dataSize, timestamp, steadId, Data and prevtagSize, wherein the tagType is used for recording the type information of the video stream unit, the dataSize is used for recording the Data size of real Data in the video stream unit, the timestamp is used for recording timestamp information, the steadId is used for recording the identification information of the video stream unit, the Data is used for recording the real Data in the video stream unit, and the prevtagSize is used for recording the Data size of the previous video stream unit of the video stream unit.

For example, after the video stream parsing module (e.g., flv-demux) acquires the live video stream, it needs to parse the data blocks (chunks) of the live video stream, parse the data blocks (chunks) of the live video stream into several video stream units chunkitems (e.g., units of flv protocol), and parse the available chunkitems. In addition, the length of the data byte after the parsing is completed can be returned to the video stream acquisition module (such as IO-Controller) to be used in cooperation with the storage of the buffer (such as pushbuffer).

For example, as shown in fig. 5, the structure of each video stream unit chunkItem includes tagType, dataSize, timemap, steamId, Data, and prevTagSize. Where tagType represents the type of chunkItem, and occupies three bytes in length, for example, the tagType value of Audio (Audio) is 8, video (video) is 9, and script dataobject is 18. dataSize indicates the number of bytes of the real data portion, and occupies 3 bytes in length. timestamp represents a timestamp, occupying 4 bytes in length. steamId represents identification information of chukItem, occupying 3 bytes in length. Data represents the real Data in chunkltem, occupying the byte length of the value of dataSize. prevTagSize represents the length of the previous chunkItem. So the byte length of chunkltem is equal to the sum of tagType length, dataSize length, timemap length, steamd length, Data length, and prevTagSize length, i.e., chunkItemLength ═ 1+3+4+ dataSize value). The starting position of the chunkItem is offset, and the offset of the first chunkItem is 0; the position where the next chunkltem starts is equal to the offset of the last chunkltem plus the value dataOffset of chunkltemlength. Reading chunks with the chunkltemLength value to obtain each chunkltem data, and parsing the chunkltem.

In some embodiments, the step S102 includes: establishing a videoTrack object, wherein samples of the videoTrack object are used for storing analysis data of the live video stream; and analyzing the data blocks of the live video stream to obtain a plurality of video stream units, and sequentially reading and analyzing each video stream unit.

For example, before reading and parsing a data block (chunks) of the live video stream, a videoTrack object (video data storage space object) for storing parsing data needs to be established, where samples of the videoTrack object store the parsing data of the live video stream.

In some embodiments, the step S102 includes: judging whether the video streaming unit is video data or not according to the tagType, if the video streaming unit is video data, analyzing a first byte of the real data, and recording an analysis value of the first byte of the real data as frameType; analyzing a second byte of the real data, and recording an analysis value of the second byte of the real data as a packetType, wherein a value of the packetType is 0 to indicate that the real data is video configuration information, and a value of the packetType is 1 to indicate that the real data is a video sample; if the value of the packetType is 0, analyzing video configuration information to obtain the length of nalulengthSize; and if the value of the packetType is 1, analyzing naluItem and constructing the avcSample object of the video sample.

For example, when parsing each of the video stream units chunkItem: (1) the chunkItem has a byte with a tag type at the first bit, a byte length dataSize of real data realData at the second bit to the fourth bit, and a timestamp at the fifth bit to the eighth bit, for example, when the tag type is 9, the video parsing data is represented. (2) The first byte of real data realData (from the position from the value of dataOffset to the position where dataSize has the value of real data realData) is parsed, and the parsed value is denoted as frameType (key frameflag bit). (3) Analyzing the real data realData, wherein a value obtained by analyzing the second byte is packetType, wherein the packetType value is 0 to represent configuration information, and the packetType value is 1 to represent a video sample; the values of the third bit to the fifth bit of the real data realData byte are cts. (4) When the packetType value is equal to 0, parsing the video configuration information results in the length of nalulengthSize. (5) When the value of packetType is equal to 1, naluItem is parsed, and a video sample avcSample object is constructed. (6) Reading the next chunkItem, then continuing to execute the step (1), and repeating the steps until all chunks are read; and when the reading is finished, executing the video stream synthesis module (such as mp4-remuxer), and returning to the video stream acquisition module (such as IO-Controller) to consume the byte count consummed of the video stream data.

For example, as shown in fig. 6, the parsing starts from the sixth bit of the real data realData byte, and the video data and the additional extension data (i.e., nalus) are obtained, so as to generate the avcSample object of the video sample. The start position of naluItem is naluOffset, and the naluOffset of the first nalItem is 0; the position where the next naluItem starts is equal to naluOffset of the previous naluItem plus naluLengthSize and the value of the naluItem's first naluLengthSize bit (i.e., the value of the first three bits or the first four bits of the naluItem); this is repeated to read nalus, get each naluItem data, and parse naluItem. The fourth or fifth bit of the naluItem byte is naluType (which depends primarily on the byte occupied by naluLengthSize being 3 or 4 bits), which has many values (5 or 6 or others); the key frame is once again acknowledged when naluType is 5. If the resolution value frameType of the first byte of real data realData is 1, or the naluType of any naluItem in the nalus of the current resolution is 5, the keyframe is true, that is, it is confirmed to be the key frame. Parsing a naluItem generates unit data for each stored nalu object, such as { type: 5, data: data } or { type: 6, data: data, naluLengthSize: naluLengthSize }. the type attribute value is naluType; and data, the saved result is data of naluItem in chucks, the starting position of real data realData analyzed this time is added with data between nalulengthSize and the value of the previous nalulengthSize bit of the currently analyzed naluItem, such as: the starting position of realData in chucks is 16 bytes, naluLengthSize is 3 bytes, the value of the first three bits of naluItem is 20, and the data is the data from the starting position of 16 bytes to 39 bytes of the chucks; when naluType is 6, an additional attribute nalulengthSize, with a value of 3 or 4, needs to be constructed. After the nalus is analyzed, a units array of the storage units can be obtained, and at this time, the avcSample object can be constructed. The avcSample contains a units attribute value of units, an issKeyframe attribute value of keyframe, and a dts attribute value of timeframe (time information), and then the avcSample is stored in the samples attribute of the videoTrack.

In some embodiments, the step S102 includes: and acquiring analysis data of the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information.

For example, after receiving the video track object delivered by the video stream parsing module (e.g., flv-demux), the video stream synthesizing module (e.g., mp4-remuxer) processes each of samples of the video track object, and finds a unit of type 6, i.e., SEI data, in each sample item (i.e., avcSample object), such as { type: 6, naluLengthSize, data: data }, naluLengthSize is 3 or 4. The obtained SEI data may then be analyzed, and an additional payloadType (second bit value after naluLengthSize of data of SEI, theoretical 5), payloadSize (length of data available in data of SEI, length including UUID), UUID (first 16 bits of data available in data of SEI, describing rules of SEI data, such as unique identification information of manufacturer, determining parsing rules of the SEI parsing module), and payloadbytoffset (true custom offset position of data available in data of SEI, data of identification object, length not including UUID) may be added to the SEI data. In step S102, the SEI data is handed to the SEI synchronizer module (e.g., seissyncronizer) and stored in the SEI queue (e.g., seiQueue).

In some embodiments, the step S103 includes: when the live video stream is played, acquiring current playing time information of the live video stream at preset time intervals; and acquiring corresponding SEI data from the SEI queue according to the current playing time information.

For example, the SEI synchronizer module (e.g., SeiSynchronizer) receives the dts time information and SEI data delivered by the video stream composition module (e.g., mp4-remuxer), and stores them in the SEI queue (e.g., seiQueue). The SEI synchronizer module (e.g., seissyncronizer) periodically queries the current playing time information (e.g., currentTime) of the playing module (e.g., video element in H5 page), and obtains SEI data corresponding to the current playing time information from the SEI queue (e.g., seiQueue). Here, SEI data corresponding to the current play time information (e.g., currentTime) may be determined by comparing dts time information and currentTime, the time information being at or before the current play time information, and the SEI data having the time information closest to the current play time information. Then, the SEI data corresponding to the current playing time information is transmitted to the SEI parsing module (e.g., seiParse). In this process, the SEI queue (e.g., seiQueue) clears the data before the current play time information (e.g., currentTime), ensuring that the SEI queue is not too large.

In some embodiments, the step S104 includes: analyzing the SEI data corresponding to the current playing time information to obtain the position information and/or the content information of the identification object; and drawing the identification object on a playing picture of the live video stream.

For example, the SEI parsing module (e.g., seiParse) receives SEI data corresponding to the current playing time information transmitted by the SEI synchronizer module (e.g., seisyncronizer), performs validation and parsing according to an agreed parsing rule, and returns corresponding position information or content information. Here, the SEI parsing module may select different parsing rules according to UUID). The SEI parsing module (e.g., seiParse) then passes the parsed information of the SEI data to the rendering module (e.g., canvas element in the H5 page). In the H5 page playing live video, as shown in fig. 3, the positions of the canvas element and the video element in the page are the same, and the sizes of the width and the height are also consistent; and the drawing module draws according to the analysis information of the SEI data in a positioning mode, covers the canvas element on the video element, and sets the background to be transparent. In addition, the drawing module can also clear the data drawn last time before drawing the data each time.

In some embodiments, as shown in fig. 2, the step S101 includes: acquiring the live video stream through a video stream acquisition module (such as an IO-Controller); the step S102 includes: when the live video stream acquired by the video stream acquisition module (such as an IO-Controller) reaches a preset capacity threshold, analyzing the live video stream through a video stream analyzing module (such as flv-demux), and storing the SEI data in the live video stream into an SEI queue by using a video stream synthesizing module (such as mp 4-remux) according to time information; the step S103 includes: when the live video stream is played through a playing module (such as a video element in an H5 page), an SEI synchronizer module (such as a SeiSynchronizer) is used for acquiring corresponding SEI data from the SEI queue according to current playing time information; the step S104 includes: the SEI data corresponding to the current play time information is transmitted to an SEI parsing module (e.g., seipase) through the SEI synchronizer module (e.g., SeiSynchronizer), the SEI data corresponding to the current play time information is parsed by the SEI parsing module (e.g., seipase), and a corresponding recognition object is rendered on the play picture of the live video stream by a rendering module (e.g., canvas element in H5 page).

According to an embodiment of the present application, there is also provided an apparatus for synchronously rendering an identification object for a live video stream, where the apparatus includes: the system comprises a video stream acquisition module, a video stream analysis module, a video stream synthesis module, an SEI synchronizer module, an SEI analysis module, a playing module and a drawing module.

As shown in fig. 2, the video stream obtaining module (e.g., IO-Controller) is configured to obtain a live video stream; when the live video stream acquired by the video stream acquisition module (such as an IO-Controller) reaches a predetermined capacity threshold, the video stream parsing module (such as flv-demux) is configured to parse the live video stream, and the video stream synthesizing module (such as mp 4-demux) is configured to store the SEI data in the live video stream into an SEI queue according to time information; the playing module (such as a video element in a page H5) is configured to play the live video stream, and the SEI synchronizer module (such as a seissyncronizer) is configured to obtain corresponding SEI data from the SEI queue according to current playing time information of the live video stream; the SEI synchronizer module (e.g., seissyncronizer) is configured to transmit SEI data corresponding to the current play time information to the SEI parsing module (e.g., seiParse), the SEI parsing module (e.g., seiParse) is configured to parse the SEI data corresponding to the current play time information, and the rendering module (e.g., canvas element in H5 page) is configured to render a corresponding identification object on a play picture of the live video stream.

To sum up, in the embodiment of the present application, when a live video stream is acquired, video data and SEI data are acquired synchronously; when live video is played, the identification object is drawn rapidly according to the SEI data, the drawing time of the identification object is controlled to be kept accurate and synchronous with the playing time of the video picture, delay cannot be perceived by a user sense, and user experience is effectively improved.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. Herein, some embodiments of the present application provide a computing device comprising a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the methods and/or aspects of the embodiments of the present application as described above.

Furthermore, some embodiments of the present application also provide a computer readable medium, on which computer program instructions are stored, the computer readable instructions being executable by a processor to implement the methods and/or aspects of the foregoing embodiments of the present application.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In some embodiments, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method of synchronously rendering an identified object for a live video stream, wherein the method comprises:

2. The method of claim 1, wherein obtaining a live video stream comprises:

the method comprises the steps of obtaining a live broadcast video stream, detecting the size of the obtained data of the live broadcast video stream, and storing the obtained live broadcast video stream into a cache area if the size of the obtained data of the live broadcast video stream does not exceed the size of the capacity of the cache area, wherein the size of the capacity of the cache area is the preset capacity threshold.

3. The method of claim 2, wherein parsing the live video stream when the acquired live video stream reaches a predetermined capacity threshold comprises:

when the acquired live video stream reaches the preset capacity threshold value, analyzing the live video stream to obtain the size of consumed video stream data, and storing unconsumed video stream data into the cache region again.

4. The method of claim 1, wherein parsing the live video stream comprises:

analyzing the data blocks of the live video stream to obtain a plurality of video stream units;

the structure of the video stream unit comprises tagType, dataSize, timestamp, steadId, Data and prevtagSize, wherein the tagType is used for recording the type information of the video stream unit, the dataSize is used for recording the Data size of real Data in the video stream unit, the timestamp is used for recording timestamp information, the steadId is used for recording the identification information of the video stream unit, the Data is used for recording the real Data in the video stream unit, and the prevtagSize is used for recording the Data size of the previous video stream unit of the video stream unit.

5. The method of claim 4, wherein parsing the live video stream comprises:

establishing a videoTrack object, wherein samples of the videoTrack object are used for storing analysis data of the live video stream;

and analyzing the data blocks of the live video stream to obtain a plurality of video stream units, and sequentially reading and analyzing each video stream unit.

6. The method of claim 5, wherein parsing each of the video stream units comprises:

judging whether the video streaming unit is video data or not according to the tagType, if the video streaming unit is video data, analyzing a first byte of the real data, and recording an analysis value of the first byte of the real data as frameType;

analyzing a second byte of the real data, and recording an analysis value of the second byte of the real data as a packetType, wherein a value of the packetType is 0 to indicate that the real data is video configuration information, and a value of the packetType is 1 to indicate that the real data is a video sample;

if the value of the packetType is 0, analyzing video configuration information to obtain the length of nalulengthSize;

and if the value of the packetType is 1, analyzing naluItem and constructing the avcSample object of the video sample.

7. The method of claim 5, wherein storing the SEI data in the live video stream in an SEI queue according to temporal information comprises:

and acquiring analysis data of the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information.

8. The method of claim 1, wherein retrieving corresponding SEI data from the SEI queue according to current playback time information while playing the live video stream comprises:

when the live video stream is played, acquiring current playing time information of the live video stream at preset time intervals;

and acquiring corresponding SEI data from the SEI queue according to the current playing time information.

9. The method of claim 1, wherein rendering the corresponding recognition object on the playing screen of the live video stream according to the SEI data corresponding to the current playing time information comprises:

analyzing the SEI data corresponding to the current playing time information to obtain the position information and/or the content information of the identification object;

and drawing the identification object on a playing picture of the live video stream.

10. The method of any of claims 1-9, wherein obtaining a live video stream comprises:

acquiring the live video stream through a video stream acquisition module;

when the acquired live video stream reaches a preset capacity threshold, analyzing the live video stream, and storing the SEI data in the live video stream into an SEI queue according to time information, wherein the method comprises the following steps:

when the live video stream acquired by the video stream acquisition module reaches a preset capacity threshold value, analyzing the live video stream by a video stream analysis module, and storing the SEI data in the live video stream into an SEI queue by a video stream synthesis module according to time information;

when the live video stream is played, corresponding SEI data is obtained from the SEI queue according to current playing time information, including:

when the live video stream is played through a playing module, an SEI synchronizer module is used for acquiring corresponding SEI data from the SEI queue according to the current playing time information;

wherein, according to the SEI data corresponding to the current playing time information, drawing a corresponding recognition object on the playing picture of the live video stream, includes:

and transmitting the SEI data corresponding to the current playing time information to an SEI analysis module through the SEI synchronizer module, analyzing the SEI data corresponding to the current playing time information by using the SEI analysis module, and drawing a corresponding identification object on a playing picture of the live video stream by using a drawing module.

11. An apparatus for synchronously rendering an identified object for a live video stream, wherein the apparatus comprises: the system comprises a video stream acquisition module, a video stream analysis module, a video stream synthesis module, an SEI synchronizer module, an SEI analysis module, a playing module and a drawing module;

the video stream acquisition module is used for acquiring a live video stream;

12. A computing device, wherein the device comprises a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform the method of any of claims 1 to 10.

13. A computer readable medium having stored thereon computer program instructions executable by a processor to implement the method of any one of claims 1 to 10.