CN113645486A

CN113645486A - Video data processing method and device, computer equipment and storage medium

Info

Publication number: CN113645486A
Application number: CN202110808126.1A
Authority: CN
Inventors: 李丽颖; 刘江波; 龚龙
Original assignee: Beijing Aibee Technology Co Ltd
Current assignee: Beijing Aibee Technology Co Ltd
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2021-11-12

Abstract

The application relates to a video data processing method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring a target object detection result aiming at first video stream data, wherein the target object detection result comprises target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data of a target object in the first video stream data; inserting each target object data into the target video frame data corresponding to each target object data to obtain second video stream data; and sending the second video stream data to a receiving end so that the receiving end displays the second video stream data containing the target object data. By adopting the method, the timeliness of video data display can be improved.

Description

Video data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for processing video data, a computer device, and a storage medium.

Background

For real-time video stream data, when a target object detection result is obtained by performing target object detection on the video stream data, the corresponding video stream data is already transmitted, so that the target object detection result is delayed, and therefore the video stream data in a preset time period needs to be cached through a video cache service.

In the related art, when a target object detection result for the video stream data is obtained (taking the example that the target object detection result includes frame data for a target object), the video stream data may be obtained from the video cache service, the video stream data is decoded, the frame data is framed in the decoded video stream data according to the frame data, the framed video stream data is encoded again to obtain video stream data including the frame data, the video stream data including the frame data is sent to the terminal device side to be played, and a video content played by the terminal device side may include a frame corresponding to the target object.

The decoding and encoding processes in the related art consume time and CPU (central processing unit) resources, which may result in video display being not timely enough.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, and a storage medium for processing video data, which can process the video data.

A method of processing video data, the method comprising:

acquiring a target object detection result aiming at first video stream data, wherein the target object detection result comprises target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data of a target object in the first video stream data;

inserting each target object data into the target video frame data corresponding to each target object data to obtain second video stream data;

and sending the second video stream data to a receiving end so that the receiving end displays the second video stream data containing the target object data.

In one embodiment, the inserting each of the target object data into the target video frame data corresponding to each of the target object data to obtain second video stream data includes:

determining the target video frame data corresponding to each target object data from the first video stream data;

and respectively taking each target object data as additional enhancement information of the target video frame data corresponding to each target object data, and inserting the additional enhancement information into each target video data to obtain the second video stream data.

In one embodiment, the method further comprises:

and updating the display time stamp carried in each video frame data according to the acquisition time stamp of each video frame data in the first video stream data.

In one embodiment, the inserting each of the target object data into each of the target video data as additional enhancement information of the target video frame data corresponding to each of the target object data, respectively, includes:

aiming at any one target object data, acquiring a display time stamp carried in the target video frame data corresponding to the target object data;

according to the display timestamp, updating timestamp information carried in the target object data to obtain updated target object data;

and inserting the updated target object data into each target video data as the additional enhancement information of the target video frame data corresponding to the target object data.

In one embodiment, the target object includes any one of a target person, a target vehicle, and a target animal, and the target object data includes frame data corresponding to the target object and/or attribute data of the target object.

A method of processing video data, the method comprising:

receiving video stream data sent by a sending end;

analyzing the video stream data to obtain a target object detection result aiming at the video stream data, wherein the target object detection result comprises target object data corresponding to at least one frame of target video frame data, and the target video frame data is the video frame data of a target object in the video stream data;

and performing superposition display on the video stream data and the target object data.

In one embodiment, the target video frame data includes additional enhancement information, the additional enhancement information including target object data corresponding to the target video frame data,

the analyzing the video stream data to obtain a target object detection result for the video stream data includes:

analyzing the video stream data to obtain additional enhancement information of each target video frame data;

and constructing and obtaining the target object detection result according to the target object data included in the additional enhancement information of each target video frame data.

In one embodiment, the displaying the video stream data and the target object data in an overlapping manner includes:

determining a display time stamp carried by the currently displayed video frame data;

determining whether the target object data associated with the display time stamp exists in the target object detection result according to the display time stamp;

and when the target object data associated with the display time stamp exists in the target object detection result, performing superposition display on the currently displayed video frame data and the target object data associated with the display time stamp.

responding to the display setting operation aiming at the target object data, and determining a display mode aiming at the target object data;

and performing superposition display on the video stream data and the target object data according to the display mode of the target object data.

An apparatus for processing video data, the apparatus comprising:

an obtaining module, configured to obtain a target object detection result for first video stream data, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data of a target object included in the first video stream data;

the inserting module is used for inserting each target object data into the target video frame data corresponding to each target object data to obtain second video stream data;

and the sending module is used for sending the second video stream data to a receiving end so that the receiving end displays the second video stream data containing the target object data.

An apparatus for processing video data, the apparatus comprising:

the receiving module is used for receiving video stream data sent by the sending end;

the analysis module is used for analyzing the video stream data to obtain a target object detection result aiming at the video stream data, the target object detection result comprises target object data corresponding to at least one frame of target video frame data, and the target video frame data is the video frame data of a target object in the video stream data;

and the display module is used for displaying the video stream data and the target object data in an overlapping manner.

A computer device comprising a memory storing a computer program and a processor implementing the above video data processing method when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above-described method of processing video data.

In the above method, apparatus, computer device and storage medium for processing video data, the sending end may obtain a target object detection result for the first video stream data, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data including a target object in the first video stream data. And the sending end inserts each target object data into the target video frame data corresponding to each target object data to obtain second video stream data, and sends the second video stream data to the receiving end so that the receiving end displays the second video stream data comprising the target object data. According to the video data processing method provided by the embodiment of the disclosure, the target object data is directly inserted into the target video frame data corresponding to each target object data, that is, the second video stream data is obtained, and the second video stream data is directly sent to the receiving end to be played, so that the encoding and decoding processes caused by operations such as picture frame and the like are avoided, time consumption can be reduced, the video display efficiency and timeliness are improved, and due to the fact that the encoding and decoding processes are avoided, occupation of resources of the sending end can be reduced, the hardware requirement on the sending end can be reduced, and the hardware cost is reduced.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of a method for processing video data;

FIG. 2 is a flow diagram illustrating a method for processing video data according to one embodiment;

FIG. 3 is a flowchart illustrating steps of a method for processing video data according to one embodiment;

FIG. 4 is a flow diagram illustrating a method for processing video data according to one embodiment;

FIG. 5 is a flowchart illustrating steps of a method for processing video data according to one embodiment;

FIG. 6 is a flowchart illustrating steps of a method for processing video data according to another embodiment;

FIG. 7 is a flow diagram illustrating a method for processing video data according to one embodiment;

FIG. 8 is a flowchart illustrating steps of a method for processing video data according to one embodiment;

FIG. 9 is a flowchart illustrating steps of a method for processing video data according to one embodiment;

FIG. 10 is a flowchart illustrating steps of a method for processing video data according to one embodiment;

FIG. 11 is a schematic diagram of a display interface of a method for processing video data according to an embodiment;

FIG. 12 is a flow diagram that illustrates a method for processing video data in accordance with one embodiment;

FIG. 13 is a block diagram showing a configuration of a video data processing apparatus according to an embodiment;

fig. 14 is a block diagram showing a configuration of a video data processing apparatus according to another embodiment;

FIG. 15 is a diagram showing an internal structure of a computer device in one embodiment;

FIG. 16 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The video data processing method provided by the application can be applied to the application environment shown in fig. 1. Wherein, the transmitting end 102 communicates with the receiving end 104 through a network. After the sending end 102 acquires the first video stream data, a target object detection result for the first video stream data may be acquired, where the target object detection result includes target object data corresponding to at least one frame of target video frame data. The sending end 102 inserts each target object data into the target video frame data corresponding to each target object data in the first video stream data to obtain second video stream data, and sends the second video stream data to the middle receiving end 104. After receiving the second video stream data, the receiving end 104 may correspondingly display the second video stream data including the target video frame data. The sending end 102 and the receiving end 104 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, independent servers, or a server cluster composed of a plurality of servers, functional modules, and the like.

In an embodiment, as shown in fig. 2, a method for processing video data is provided, which is described by taking the method as an example applied to the transmitting end in fig. 1, and includes the following steps:

step 202, a target object detection result for the first video stream data is obtained, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data of a target object included in the first video stream data.

For example, the sending end may buffer the received first video stream data, and may obtain a target object detection result for the first video stream data. For example: the sending end may receive a target object detection result of the first video stream data by the third-party device, or the sending end may perform a target object detection operation for the first video stream data to obtain a target object detection result of the first video stream data. The target object may be an object to be detected, including but not limited to a target person, a target animal, a target vehicle, and the like, and the embodiment of the present disclosure also does not specifically limit the target object.

The target object detection result may include target object data corresponding to at least one frame of target video frame data, where the target video frame data may be video frame data including a target object in the first video stream data, for example: when the target object is detected in the 1 st to 5 th frames of the first video stream data, the video frame data corresponding to the 1 st to 5 th frames of the first video stream data is the target video frame data. The target object data may include, but is not limited to, frame data corresponding to a target object included in the target video frame, attribute information of the target object (where the attribute information of the target object includes, but is not limited to, attribute information of age, gender, type, and the like), and the like.

And step 204, inserting each target object data into the target video frame data corresponding to each target object data to obtain second video stream data.

For example, for any target object data, target video frame data corresponding to the target object data may be determined. For example, each video frame data may include a capture time stamp of a corresponding video frame, and the target object data also includes a capture time stamp of a target video frame corresponding to the target object data, so that the target video frame data of the same video frame corresponding to the target object data, that is, the target video frame data corresponding to the target object data, may be determined according to the capture time stamp of the target object data and the capture time stamp of each video frame data in the first video stream data.

After the target video frame data corresponding to each target object data is determined, each target object data may be inserted into the target video frame data corresponding to each target object data, respectively, to obtain second video stream data. For example: if the target object search result includes the target object data 1 corresponding to the 1 st video frame data and the target object data 2 … … corresponding to the 2 nd video frame data, the target object data 1 may be inserted into the 1 st video frame data, and the target object data 2 may be inserted into the 2 nd video frame data … …, so as to obtain the second video stream data.

Step 206, sending the second video stream data to the receiving end, so that the receiving end displays the second video stream data containing the target object data.

And the sending end sends the second video stream data to the receiving end. After receiving the second video stream data, the receiving end may display the second video stream data containing the target object data. For example, the receiving end may perform an overlay display on the second video stream data and the corresponding target object data, so that when the second video stream data displays the target video frame data, the content corresponding to the target object data may be overlaid on the content corresponding to the target video frame data, for example: and if the target object data comprises frame data, the picture frame corresponding to the display frame data can be superposed at the position of the target object in the content corresponding to the displayed target video frame data.

In the method for processing video data, the sending end may obtain a target object detection result for the first video stream data, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data of a target object included in the first video stream data. And the sending end inserts each target object data into the target video frame data corresponding to each target object data to obtain second video stream data, and sends the second video stream data to the receiving end so that the receiving end displays the second video stream data comprising the target object data. According to the video data processing method provided by the embodiment of the disclosure, the target object data is directly inserted into the target video frame data corresponding to each target object data, that is, the second video stream data is obtained, and the second video stream data is directly sent to the receiving end to be played, so that the encoding and decoding processes caused by operations such as picture frame and the like are avoided, time consumption can be reduced, the video display efficiency and timeliness are improved, and due to the fact that the encoding and decoding processes are avoided, occupation of resources of the sending end can be reduced, the hardware requirement on the sending end can be reduced, and the hardware cost is reduced.

In one embodiment, as shown in FIG. 3, step 204 may comprise:

step 302, determining target video frame data corresponding to each target object data from the first video stream data;

and step 304, inserting each target object data into each target video data to obtain second video stream data, wherein each target object data is used as additional enhancement information of target video frame data corresponding to each target object data.

Taking the first video stream data adopting the h.264 coding format as an example, if each video frame data in the first video stream data has an SEI (Supplemental enhancement information) field, the target object data may be inserted as the SEI information of the target video frame data corresponding to the target object data into the SEI field corresponding to the target video frame data, as shown in fig. 4. In this way, after the insertion of all the target object data is completed, the second video stream data can be obtained.

According to the video data processing method provided by the embodiment of the disclosure, the target object data is used as the SEI information of the target video frame data corresponding to the target object data, so that the target object data can be inserted into the target video frame data corresponding to each target object data, and the second video stream data is obtained. The encoding and decoding processes introduced by operations such as picture frames are avoided, time consumption can be reduced, the display efficiency and timeliness of videos are improved, and due to the fact that the encoding and decoding processes are avoided, occupation of CPU resources of a sending end can be reduced, hardware requirements on the sending end can be reduced, and hardware cost is reduced.

In one embodiment, the method may further include:

In the embodiment of the present disclosure, for any video frame data in the first video stream data, the video frame data may carry a collection timestamp including a video frame corresponding to the video frame data and a display timestamp corresponding to the video frame data, where the display timestamp is used to measure when a video frame obtained after decoding the video frame data is displayed.

For any video frame data, the target display time stamp corresponding to the video frame data can be determined according to the acquisition time stamp of the video frame data and the preset association relation, and the display time stamp carried in the video frame data is replaced by the target display time stamp so as to update the display time stamp carried in each video frame data. The preset association relationship is that the acquisition time stamp of the video frame is in direct proportion to the target display time stamp, that is, pts is k ts, wherein pts represents the target display time stamp, ts represents the acquisition time stamp, and k is a preset coefficient.

And so on, can all be replaced with the target display time stamp that is correlated with the collection time stamp that carries in the video frame data with the display time stamp in each video frame data, so that the receiving end is in the in-process of the second video stream data of show, can be through the display time stamp that the video frame data that show correspond and predetermined incidence relation, confirm the collection time stamp that corresponds, and then confirm the target object data that corresponds with this video frame data according to the collection time stamp, and overlap the show to this target object data and the video frame data that show, the synchronism when can guarantee target object data in the second video stream data and the demonstration of the corresponding target video frame data.

In one embodiment, as shown in FIG. 5, step 304 includes:

step 502, aiming at any target object data, acquiring a display time stamp carried in target video frame data corresponding to the target object data;

step 504, updating the timestamp information carried in the target object data according to the display timestamp to obtain updated target object data;

step 506, the updated target object data is inserted into each target video data as the additional enhancement information of the target video frame data corresponding to the target object data.

In the embodiment of the present disclosure, for any target object data, the display timestamp carried in the target video frame data corresponding to the target object data may be acquired, and after the timestamp information carried in the target object data (that is, the acquisition timestamp of the video frame corresponding to the target object data) is replaced by the display timestamp, updated target object data is acquired, and the updated target object data is inserted into the target video frame data as the SEI information of the corresponding visual video frame data.

By analogy, the timestamp information in each target object data can be replaced by the display timestamp carried in the corresponding target video frame data, so that the target object data corresponding to the target video frame data can be determined through the display timestamp corresponding to the displayed target video frame data in the process of displaying the second video stream data by the receiving end, so as to perform superposition display on the target object data and the displayed target video frame data, and the synchronization of the target object data and the corresponding target video frame data in the second video stream data during display can be ensured.

In an embodiment, as shown in fig. 6, a method for processing video data is provided, which is described by taking the method as an example applied to the receiving end in fig. 1, and includes the following steps:

step 602, receiving video stream data sent by a sending end.

In this embodiment, the video stream data may be obtained by inserting each target object data in the target object detection result into target video frame data corresponding to each target object data in the first video stream data after the sending end obtains the target object detection result corresponding to the first video stream data, and the specific process of the relevant operation of the sending end may refer to the foregoing embodiment, which is not described in detail herein.

Step 604, analyzing the video stream data to obtain a target object detection result for the video stream data, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data including a target object in the video stream data.

For example, the receiving end may parse the received video stream data to parse target object data carried by target video frame data in the video stream data, and obtain a target object detection result based on each target object data obtained through parsing.

As an example, it is assumed that the video stream data is h.264 format encoded, flash (streaming media format) format encapsulated real-time video stream data. The receiving end may parse the target object data carried in each target video frame data in the video stream data when the flv-format video stream data is converted and encapsulated into a BMFF (i.e., a fragment of MP4(Moving Picture Experts Group 4)), and further obtain a target object detection result according to the target object data carried in each target video frame data.

And 606, performing superposition display on the video stream data and the target object data.

In the embodiment of the present disclosure, referring to fig. 7, after obtaining a target object detection result corresponding to video stream data, a receiving end may perform overlay display on the video stream data and target object data in the target object detection result.

For example, the video content corresponding to the video stream data may be displayed through a video playing layer, and the content corresponding to the corresponding target object data may be displayed through a transparent layer which is superimposed on the video playing layer and has a size consistent with that of the video playing layer.

Illustratively, the receiving end may display video stream data through the video playing layer, and when a video frame corresponding to each video frame data is displayed on the video playing layer, if there is target object data in the displayed video frame, the target object data is displayed in a transparent layer that is superimposed on the video playing layer and has a size consistent with that of the video playing layer, so as to realize the superimposed display of the content corresponding to the video frame data and the content corresponding to the target object data.

In the method for processing video data, the receiving end may receive video stream data sent by the sending end, analyze the video stream data to obtain a target object detection result for the video stream data, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, the target video frame data is video frame data including a target object in the video stream data, and the video stream data and the target object data are displayed in a superimposed manner. According to the video data processing method provided by the embodiment of the disclosure, target object data does not need to be coded into video frame data for displaying, a receiving end can analyze inserted target object data from video stream data sent by a sending end and perform superposition display on the video stream data and the analyzed target object data, so that coding and decoding processes introduced by the sending end due to operations such as picture frames and the like can be avoided, time consumption can be reduced, video display efficiency and timeliness are improved, and due to the fact that the coding and decoding processes are avoided, occupation of CPU resources of the sending end can be reduced, hardware requirements on the sending end can be reduced, and hardware cost is reduced.

In one embodiment, the target video frame data may include additional enhancement information, and the additional enhancement information may include target object data corresponding to the target video frame data, and referring to fig. 8, step 504 may include:

step 802, analyzing video stream data to obtain additional enhancement information of each target video frame data;

and step 804, constructing and obtaining a target object detection result according to the target object data included in the additional enhancement information of each target video frame data.

For example, the receiving end may parse the video stream data, and further parse the video stream data to obtain SEI information of each target video frame data in the video stream data. And further, after each target object data is obtained from each SEI information, a target object detection result is constructed according to the target object data, and the target object detection result can be a set of the target object data.

As an example, it is assumed that the video stream data is h.264 format encoded, flash (streaming media format) format encapsulated real-time video stream data. The receiving end may parse SEI information of each target video frame data in the real-time video stream data when the flv-formatted video stream data is converted and encapsulated into a BMFF (i.e., MP4(Moving Picture Experts Group 4) fragment, and obtain target object data in each SEI information, thereby obtaining a target object detection result according to each target object data.

According to the video data processing method provided by the embodiment of the disclosure, target object data is used as SEI information of target video frame data corresponding to the target object data, so that the target object data is inserted into the target video frame data corresponding to each target object data, and second video stream data is obtained. The encoding and decoding processes introduced by operations such as picture frames are avoided, so that time consumption can be reduced, the display efficiency and timeliness of videos are improved, occupation of CPU resources of a sending end is reduced, the hardware requirement on the sending end is reduced, and the hardware cost is reduced.

In one embodiment, referring to fig. 9, step 606 may comprise:

step 902, determining a display timestamp carried by currently displayed video frame data;

step 904, determining whether target object data associated with the display time stamp exists in the target object detection result according to the display time stamp;

and step 906, in the case that the target object data associated with the display time stamp exists in the target object detection result, performing overlay display on the currently displayed video frame data and the target object data associated with the display time stamp.

For example, a display time stamp corresponding to currently displayed video frame data may be obtained, and according to the display time stamp of the video frame data, it may be determined whether target object data associated with the display time stamp exists from a target object detection result, and if so, content corresponding to the display target object data may be superimposed on video content corresponding to the currently displayed video frame data.

For example: the display time stamp corresponding to the nth video frame data may be determined when the nth video frame corresponding to the nth video frame data is presented, and the target object data associated with the display time stamp is included in the target object data, so that when the nth video frame is presented, the content corresponding to the target object data is displayed on the nth video frame in an overlapping manner.

In an example, when the sending end may replace the acquisition time stamp carried in each target object data in the video stream data with the display time stamp carried in the corresponding target video frame data, that is, the target object data may include the display time stamp of the corresponding target video frame data, the receiving end may determine, from the target object detection result, whether the target object data corresponding to the display time stamp exists through the display time stamp corresponding to the currently displayed video frame data, and in a case that the target object data corresponding to the display time stamp exists, determine that the target object data corresponding to the display time stamp is the target object data corresponding to the target video frame data.

In another example, the target object data may include therein an acquisition time stamp of the corresponding target video frame data. When the video stream data is packaged in the flv format, the sending end may update the display time stamp corresponding to each video frame data in the video stream data to the time stamp associated with the acquisition time stamp of each video frame data according to a preset association relationship, where the preset association relationship may be a preset proportional relationship. When the receiving end displays the video frame data, the display time stamp corresponding to the video frame data can be obtained, and the display time stamp is determined according to the preset incidence relation, so that the corresponding acquisition time stamp is determined. And determining whether target object data corresponding to the acquisition time stamp exists from the target object detection result, and if the target object data corresponding to the acquisition time stamp exists, determining the target object data corresponding to the display time stamp as the target object data corresponding to the target video frame data.

According to the video data processing method provided by the embodiment of the disclosure, the target object data associated with the display timestamp is determined through the display timestamp corresponding to the currently displayed video frame data, so that the target object data and the displayed video frame data are displayed in an overlapping manner, and the display synchronization between the target object data in the video stream data and the corresponding target video frame data can be ensured.

In one embodiment, referring to fig. 10, step 606 may comprise:

step 1002, responding to a display setting operation aiming at target object data, and determining a display mode aiming at the target object data;

and 1004, performing overlay display on the video stream data and the target object data according to the display mode of the target object data.

In the embodiment of the disclosure, a display mode for the target object data may be determined in response to a display setting operation for the target object data, and the display mode may be used to indicate how to display the target object data. For example, assuming that the target object data includes frame data corresponding to the target object, the display manner may be used to indicate a color of a frame corresponding to the frame data, a line type, a frame specifying the displayed target object, and the like.

For example, referring to fig. 11, the display interface of the receiving end may include a setting area, and the user may set a display mode for the target object data in the setting area. For example, the target object may include at least one target object, that is, the target object data in the target object detection result may correspond to the at least one target object, and the target object data may include identification information of the target object. For example: the target object data 1 includes frame data of the target object 1 and frame data of the target object 2, and the target object data 2 includes frame data of the target object 1 and frame data of the target object 3, that is, the same target object data may include frame data of at least one target object, and different target object data may include frame data of the same or different target objects.

Illustratively, the display mode options may include a confirmation option of the to-be-displayed frame, and the user may determine the to-be-displayed target object from the plurality of target objects by triggering the confirmation option, further determine that the picture frame of the to-be-displayed target object is the to-be-displayed frame, and generate the corresponding display mode according to the to-be-displayed frame. When the receiving end performs the overlapping display of the target object data, the receiving end can respond to the display mode, obtain the identification information of the target object corresponding to the frame data from the target object data, and when the identification information identifies the target object to be displayed, determine that the frame data corresponds to the frame to be displayed, and then can overlap and display the picture frame corresponding to the frame data, otherwise, do not display the picture frame corresponding to the frame data. And when the user does not set the frame to be displayed, displaying the frame data corresponding to all the target objects by default.

Therefore, when a plurality of target objects exist in the video picture, the user can select to display the target object data (for example, a picture frame corresponding to the frame data) of a specific target object by setting the display mode, and hide other target object data which do not concern the target object, so that better interaction and experience can be provided for the user.

Illustratively, the presentation mode options may further include options such as a color of the frame to be presented, a line type, and the like. After the frames to be displayed (or all default frames to be displayed) are set, the frames to be set (all default frames to be displayed are the frames to be set) can be further determined from the frames to be displayed, after the frames to be set are selected, the color, the line type and the like corresponding to the frames to be set are selected from the options of the color, the line type and the like, and then the corresponding display mode is generated according to the selected color, the line type and the frames to be set.

When the receiving end performs the superposition display of the target object data, the receiving end can respond to the display mode, obtain the identification information of the target object corresponding to the frame data from the target object data, and perform frame drawing on the frame data in the transparent layer according to the color and the line type in the display mode when the target object corresponding to the frame to be set is identified by the identification information, so as to perform the superposition display on the frame data, otherwise, perform frame drawing and superposition display on the frame data by adopting the default color and the line type.

In a possible implementation manner, the target object data may include frame data for the target object and attribute information (e.g., gender, age, etc.) of the target object, and the receiving end may determine, in response to a setting operation of a user, a display manner of a picture frame corresponding to the frame data of the target object with different attribute information. It is rational, the color of the picture frame can be set, for example: the colour that can set up the frame that the male sex corresponds is blue, and the colour of the frame that the female corresponds is pink, perhaps sets up the frame that children correspond and be green, and the frame that the youth corresponds is red, and the old is purple etc. to the frame of usefulness, perhaps also can synthesize sex and age and carry out the setting of frame show mode.

According to the video data processing method provided by the embodiment of the disclosure, a user can dynamically set the display mode of the target object data by interacting with the receiving end, so that the target object data can be displayed as required, the display mode of the target object data can be enriched, and the user experience is improved.

In order that those skilled in the art will better understand the embodiments of the present disclosure, the embodiments of the present disclosure are described below by way of specific examples.

Referring to fig. 12, in this example, a sending end is taken as a server, and a receiving end is taken as a terminal device. The server receives the first video stream data and caches the first video stream data in a video cache service. Assuming that the first video stream data corresponds to a video within 5 minutes, the server must obtain a target object detection result for the first video stream data within 5 minutes. In this example, the target object detection result includes frame data (which may include a collection timestamp ts and a rectangle parameter corresponding to a frame) corresponding to the target object. After obtaining the target object detection result, the server aligns the frame data with the corresponding target video frame in the video cache service through the millisecond-level acquisition timestamp ts in each video frame data in the first video stream data and the acquisition timestamp ts of the target video frame corresponding to the frame data in the target object detection result, inserts the frame data into the corresponding target video frame data to serve as the SEI information of the target video frame data, and obtains second video stream data, wherein the second video stream data can be encoded in an H.264 format and encapsulated in an flv format.

In addition, in order to align the video content with the frame when the terminal device displays the video content, the server needs to update the pts (Presentation Time Stamp) value of the data of each video frame in the second video stream to the value directly related to the capture Time Stamp ts of the video frame data in the video caching service (or, in another example, the pts may not be updated, and ts corresponding to the frame data may be modified to the pts of the corresponding target video frame data).

The server may push the second video stream data containing the frame data to the terminal device.

And the terminal equipment receives second video stream data pushed by the server by a WebSocket protocol. In the flv.js module, the frame data in the SEI information in each video frame data in the second video stream data may be parsed when the flv format of the second video stream data is transcoded into a BMFF (MP4 fragment) fragment.

The terminal device may place a transparent canvas layer on the video layer, display the frame in the second video stream data by using the video layer, and display the frame obtained based on the frame data by using the canvas layer, where the canvas layer and the video layer have the same size, and the video layer and the canvas layer can be strictly aligned. When the video layer displays a frame, the video layer can determine pts in video frame data corresponding to the frame, determine whether there is frame data with which ts is positively correlated with the pts, and display the frame corresponding to the frame data on the canvas layer under the condition that there is the frame data with which ts is positively correlated with the pts.

The video data processing method provided by the embodiment of the disclosure can be applied to target object detection, target object deployment and control and other scenes, the sending end can use each target object data as the SEI information of each corresponding target video frame data in video stream data after detecting the target object in each target video in the video stream and obtaining the target object data corresponding to the target video frame, and after inserting each target object data into each corresponding target video frame data, the sending end sends the video stream data carrying the target object data to the receiving end, so that the receiving end can perform superposition display on the video stream data and the target object data, time consumption can be reduced, video display efficiency and timeliness can be improved, occupation of CPU resources of the sending end can be reduced, hardware requirements on the sending end can be reduced, and hardware cost can be reduced.

It should be understood that although the various steps in the flow charts of fig. 1-12 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-12 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 13, there is provided a video data processing apparatus including: an obtaining module 1302, an inserting module 1304, and a sending module 1306, wherein:

an obtaining module 1302, configured to obtain a target object detection result for first video stream data, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data of a target object included in the first video stream data;

an inserting module 1304, configured to insert each target object data into the target video frame data corresponding to each target object data to obtain second video stream data;

a sending module 1306, configured to send the second video stream data to a receiving end, so that the receiving end displays the second video stream data including the target object data.

The video data processing device provided by the embodiment of the disclosure directly inserts the target object data into the target video frame data corresponding to each target object data, i.e. obtains the second video stream data, and directly sends the second video stream data to the receiving end for playing, thereby avoiding the encoding and decoding processes introduced by operations such as picture frames, reducing time consumption, improving the video display efficiency and timeliness, and avoiding the encoding and decoding processes, thereby reducing the occupation of the sending end resources, reducing the hardware requirement on the sending end, and reducing the hardware cost.

In one embodiment, the insertion module 1304 is further configured to:

In one embodiment, the above apparatus further comprises:

and the determining module is used for updating the display time stamp carried in each video frame data according to the acquisition time stamp of each video frame data in the first video stream data.

In one embodiment, the insertion module 1304 is further configured to:

In one embodiment, as shown in fig. 14, there is provided a video data processing apparatus including: a receiving module 1402, a parsing module 1404, and a presentation module 1406, wherein:

a receiving module 1402, configured to receive video stream data sent by a sending end;

an analyzing module 1404, configured to analyze the video stream data to obtain a target object detection result for the video stream data, where the target object detection result includes target object data corresponding to at least one frame of target video frame data, and the target video frame data is video frame data of a target object included in the video stream data;

a display module 1406, configured to perform an overlay display on the video stream data and the target object data.

The video data processing device provided by the embodiment of the disclosure, target object data does not need to be encoded into video frame data for display, a receiving end can analyze inserted target object data from video stream data sent by a sending end, and superimpose and display the video stream data and the analyzed target object data, so that encoding and decoding processes introduced by operations such as sending end picture frames and the like can be avoided, time consumption can be reduced, video display efficiency and timeliness are improved, and due to the fact that encoding and decoding processes are avoided, occupation of sending end resources can be reduced, hardware requirements on the sending end can be reduced, and hardware cost is reduced.

In one embodiment, the target video frame data includes additional enhancement information, the additional enhancement information includes target object data corresponding to the target video frame data, and the parsing module 1404 is further configured to:

In one embodiment, the display module 1406 is further configured to:

For specific limitations of the video data processing apparatus, reference may be made to the above limitations of the video data processing method, which are not described herein again. The respective modules in the video data processing apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 15. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing video stream data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of processing video data.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 16. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of processing video data. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the configurations shown in fig. 15 and 16 are block diagrams of only some of the configurations relevant to the present application, and do not constitute a limitation on the computing devices to which the present application may be applied, and a particular computing device may include more or less components than those shown, or some of the components may be combined, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for processing video data, the method comprising:

2. The method of claim 1, wherein the inserting each of the target object data into the target video frame data corresponding to each of the target object data to obtain second video stream data comprises:

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. The method according to claim 2, wherein said inserting each of the target object data into each of the target video data as additional enhancement information of the target video frame data corresponding to each of the target object data, respectively, comprises:

5. The method according to any one of claims 1, 2 or 4, wherein the target object comprises any one of a target person, a target vehicle, a target animal, and the target object data comprises frame data corresponding to the target object and/or attribute data of the target object.

6. A method for processing video data, the method comprising:

receiving video stream data sent by a sending end;

7. The method of claim 6, wherein the target video frame data includes additional enhancement information, the additional enhancement information including target object data corresponding to the target video frame data,

8. The method according to claim 6 or 7, wherein said displaying the video stream data and the target object data in an overlay manner comprises:

9. The method according to claim 6 or 7, wherein said displaying the video stream data and the target object data in an overlay manner comprises:

10. An apparatus for processing video data, the apparatus comprising:

11. An apparatus for processing video data, the apparatus comprising:

the receiving module is used for receiving video stream data sent by a receiving end;

12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 5 or 6 to 9.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5 or 6 to 9.