CN111093077A

CN111093077A - Video coding method and device, electronic equipment and storage medium

Info

Publication number: CN111093077A
Application number: CN201911423699.1A
Authority: CN
Inventors: 徐志国
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-01

Abstract

The invention relates to the technical field of video coding, in particular to a video coding method, a video coding device and electronic equipment, wherein the video coding method comprises the following steps: acquiring initial image data in a reference frame; tracking the initial image data through a target tracking algorithm to obtain target image data of a current frame; encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame; and transmitting the target data packet and the current frame at intervals through a preset protocol. The invention can reduce the operation amount and the transmission bandwidth.

Description

Video coding method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of video coding technologies, and in particular, to a video coding method and apparatus, an electronic device, and a storage medium.

Background

Artificial Intelligence (AI) cameras have been widely used in various fields, especially in the security field. The AI camera used in this field has functions of face/body/vehicle snapshot recognition and the like. Currently, the AI camera implements these functions in the following ways: 1. the method comprises the steps of firstly processing an original image by using an AI algorithm, carrying out classification detection, capturing and tracking objects, extracting object characteristic data, simultaneously carrying out video compression coding on the original image, and finally respectively sending the object characteristic data and a coded video stream to a cloud end by using different protocols. 2. The method comprises the steps of firstly carrying out video compression coding on an original image, carrying out classification detection, snapshot and object tracking on a video stream in a camera by a third party, extracting object characteristic data, and finally respectively sending the object characteristic data and the coded video stream to a cloud end by using different protocols. The two modes are that the image characteristic data and the coded video stream are respectively uploaded to the cloud end through different protocols, so that the operation burden is large in the transmission process, and the occupied bandwidth is high. Therefore, the existing video coding method has the problems of large calculation amount and high transmission bandwidth.

Disclosure of Invention

The embodiment of the invention provides a video coding method, aiming at solving the problems of large operation amount and high transmission bandwidth in the existing video coding method.

In a first aspect, an embodiment of the present invention provides a video encoding method, where the method includes the following steps:

acquiring initial image data in a reference frame;

tracking the initial image data through a target tracking algorithm to obtain target image data of a current frame;

encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame;

and transmitting the target data packet and the current frame at intervals through a preset protocol.

In a second aspect, an embodiment of the present invention further provides a video encoding apparatus, including:

the acquisition module is used for acquiring initial image data in a reference frame;

the tracking module is used for tracking the initial image data through a target tracking algorithm to obtain target image data of a current frame;

the encoding module is used for encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame;

and the transmission module is used for carrying out interval transmission on the target data packet and the current frame through a preset protocol.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: the video coding device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the video coding method provided by the embodiment of the invention when executing the computer program.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps in the video encoding method provided by the embodiment of the present invention.

In the embodiment of the invention, after the initial image data is obtained, the initial image data is tracked through a target tracking algorithm to track the target image data of the current frame, then the target image data of the current frame is encoded to obtain the target data packet corresponding to the target image data of the current frame, the target data packet is transmitted with the current frame at intervals through a preset protocol, the target data packet is transmitted with the current frame at intervals through the preset protocol, the target data packet and the current frame are transmitted through one protocol, the problem that the target data packet and the current frame need to be transmitted respectively through different protocols in the prior art is solved, the transmission process is simplified, meanwhile, the transmission through one preset protocol also reduces the operation amount, and the transmission bandwidth is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a video encoding method according to an embodiment of the present invention;

fig. 2 is a flowchart of another video encoding method according to an embodiment of the present invention;

fig. 3 is a flowchart of another video encoding method according to an embodiment of the present invention;

fig. 4 is a flowchart of another video encoding method according to an embodiment of the present invention;

fig. 5 is a flowchart of another video encoding method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another video encoding apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of another video encoding apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another video encoding apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of another video encoding apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of another video encoding apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, fig. 1 is a flowchart of a video encoding method according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:

s101, acquiring initial image data in a reference frame.

The reference frame may be a reference image frame, that is, an image frame that is taken as a basis. The initial image data may be all or part of image data in the reference frame, and the initial image data may be obtained by capturing an object (a human face, a human body, a vehicle, a tree, or the like) by using an AI camera. If the initial image data is only a partial image data in the reference frame, it indicates that the image data includes other interference image data besides the initial image data, for example: the initial image data is a human body shot by the camera, and in addition, the camera also shoots interference image data of trees, vehicles, buildings and the like except the human body. The image data in the reference frame can be identified through a pre-trained matrix algorithm, and the initial image data can be automatically identified. The matrix algorithm may be trained on a large amount of image data in advance to achieve the purpose of automatically recognizing the initial image data from the large amount of image data. After the initial image data is identified, the characteristics of the initial image data can be collected, and the characteristics can include pixel point coordinates of the initial image data, key points of a human face (eyes, a nose and the like), key points of a human body (hands, feet, a head and the like) and the like.

The initial image data described above may refer to one or more objects in the reference frame, such as: a certain pedestrian present in the reference frame, or a plurality of pedestrians present in the reference frame. The initial image data may also include different types of objects, for example: and simultaneously acquiring a face image of a driver running the red light and vehicle information (license plate number, vehicle mark, vehicle color, vehicle model and the like) corresponding to a vehicle driven by the driver.

The user can send a request related to the acquisition of the initial image data in the reference frame to the electronic equipment for executing the video coding method through the mobile terminal, and the electronic equipment decodes the request to execute the video coding method. The electronic device may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an electronic book reader, an MP3 player (Moving Picture Experts Group audio Layer III, mpeg compression standard audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, mpeg compression standard audio Layer 4), a laptop computer, and the like.

And S102, tracking the initial image data through a target tracking algorithm to obtain target image data of the current frame.

The target tracking algorithm is used for tracking the path of the object corresponding to the initial image data, so that information such as the area or specific position to which the object corresponding to the initial image data moves can be found conveniently. The reference frame and the current frame can be adjacent frames, images in the adjacent frames are usually very similar whether a moving object or a static object exists in the adjacent frames, and redundancy exists in the current frame and the reference frame. The current frame and the reference frame are image frames, and in the embodiment of the present invention, the description is made by the form of the reference frame and the current frame only for distinguishing different image frames. The target image data of the current frame may be data obtained by tracking the initial image data and then moving the initial image data to the current frame.

S103, encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame.

The encoding of the target image data may be compression encoding of the target image data by means of entropy encoding, which is actually a compression process. The entropy coding converts a series of element symbols representing a video sequence into a compressed code stream for transmission or storage. No image information is lost during entropy coding.

The initial image data, which is obtained by the AI algorithm and represented by the element symbol, may be converted into a compressed code stream that may be transmitted or stored, i.e., the above-mentioned target data packet. The input element symbols may include quantized transform coefficients, motion vectors, additional information (i.e., flag bit information important for correct decoding), and the like.

And S104, transmitting the target data packet with the current frame at intervals through a preset protocol.

The target data packet may be transmitted at an interval with the current frame through a preset protocol, where the preset protocol may enable a real-time transport protocol (RTP protocol), and after the RTP protocol transmits a frame of image frame, the RTP protocol may sequentially transmit the target data packet corresponding to the frame of image frame, and then transmit the target data packet in the same manner, so that the current frame and the target data packet corresponding to the current frame are transmitted at an interval through one protocol, for example: sequentially forming a first frame image frame, a target data packet of the first frame image frame, a second frame image frame, a target data packet of the second frame image frame, a third frame image frame and a target data packet of the third frame image frame, and transmitting the first frame image frame, the target data packet of the first frame image frame, the second frame image frame, the third frame image frame and the target data packet of the third frame image frame to the cloud. The cloud may obtain the required image information according to the received image frames and the corresponding target data packets, for example: and extracting the position information of the required object from the corresponding target data packet, and decoding the position information to capture the object image of the required object from the decoded code stream.

After the initial image data is obtained, the initial image data is tracked through a target tracking algorithm to track the target image data of the current frame, then the target image data of the current frame is encoded to obtain a target data packet corresponding to the target image data of the current frame, the target data packet is transmitted at intervals with the current frame through a preset protocol, the target data packet is transmitted at intervals with the current frame through the preset protocol, the target data packet and the current frame are transmitted through one protocol, the problem that the target data packet and the current frame need to be transmitted through different protocols respectively in the prior art is solved, the transmission process is simplified, meanwhile, the transmission through one preset protocol also reduces the operation amount, and the transmission bandwidth is reduced.

As shown in fig. 2, fig. 2 is a flowchart of another video encoding method according to an embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:

s201, detecting the area where the object to be tracked in the reference frame is located through an AI classifier.

The object to be tracked can be an object in the initial image data identified by the matrix algorithm, the position of the object to be tracked can be the position coordinate of the object to be tracked in the reference frame, and the position coordinate of the object to be tracked in the reference frame can be represented by the pixel point coordinate in the image frame. When the object to be tracked is obtained, a region where the object to be tracked is located may be selected, where the region may be a rectangular region, and only the object to be tracked may be included in the rectangular region. The above-mentioned region may include the length and width of the region where the object to be tracked is located, and the length and width may be represented by pixel points, for example: the length is 10 pixel points, and the width is 8 pixel points.

S202, expanding the area where the object to be tracked is located to obtain a target detection area.

The expansion of the region where the object to be tracked is located may be transverse expansion, longitudinal expansion, or both transverse and longitudinal expansion. A Prediction Unit (PU) may be used as a minimum basic unit for expanding an area where an object to be tracked is located, and the size of the PU is determined according to a coding standard used in current coding, for example: in the encoding standard of h.264, the PU is fixed to 16x16, and in the encoding standard of h.265, the PU is selectable from the smallest 8x8 to the largest 64x 64.

The PU is obtained by partitioning a Minimum Code Unit (MCU), the PU may include a complete or incomplete MCU, and the boundary of the PU and the boundary of the MCU may or may not coincide. The area size of the target detection area is an integer multiple of PU, i.e. the boundary of the target detection area may be coincident with PU in the image frame, for example: in the used coding standard, the size of the PU is 4 × 4, and the size of the region obtained by expanding the region where the object to be tracked is 32 × 32 is 64 times that of the PU.

As a possible embodiment, when the area where the object to be tracked is located is expanded, and the size of the area is not an integer multiple of the prediction unit, the length and/or the width of the expanded area need to be rounded, and the rounding may be performed by, for example: in the used coding standard, the size of PU is 4 × 4, and the size of the region obtained by expanding the region where the object to be tracked is 33 × 33, the region obtained by expanding needs to be adjusted to be 32 × 32, which is 64 times that of PU, and the region obtained by adjusting is the target detection region. For another example: in the coding standard used, if the size of the PU is 8 × 8 and the extended area is 39 × 39, the extended area needs to be adjusted to 40 × 40 to 25 times that of the PU, and the adjusted area is the target detection area.

And S203, identifying the position information of the target detection area to obtain initial image data in the reference frame.

The position information of the target detection area may refer to coordinates of the target detection area in the image frame, and the coordinates may be two-dimensional coordinates or three-dimensional coordinates. The size of the target detection area is extracted to be convenient to be used as a reference area of the current frame, and the position information of the target detection area is obtained to be convenient to track the initial image data in the area selected by the user in the reference frame.

And S204, tracking the initial image data through a target tracking algorithm to obtain target image data of the current frame.

After the area where the object to be tracked is located is detected, the area where the object to be tracked is located can be dynamically tracked through a target tracking algorithm until the reference frame is transferred to the current frame, and the image data obtained at the current frame is the target image data.

S205, encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame.

And S206, carrying out interval transmission on the target data packet and the current frame through a preset protocol.

In this embodiment, a target detection area is formed by detecting an area where an object to be tracked is located in a reference frame and expanding the area where the object to be tracked is located, then position information of the target detection area is extracted and used for tracking by a target tracking algorithm, until it is confirmed that the object to be tracked moves into a current frame, corresponding target image data is obtained, so that the position information that the object to be tracked is transferred into the current frame can be extracted from a target data packet at the cloud, an object image after the object to be tracked is transferred is captured from a decoded code stream, the target image data is encoded after the target image data is obtained, the encoded target data packet is transmitted at intervals with the current frame through an RTP protocol, the target data packet is transmitted at intervals with the current frame through a preset protocol, and the target data packet is transmitted with the current frame through one protocol, the problem that the target data packet and the current frame need to be transmitted respectively through different protocols in the prior art is solved, the transmission process is simplified, meanwhile, the calculation amount is reduced through transmission of one preset protocol, and the transmission bandwidth is reduced.

As shown in fig. 3, fig. 3 is a flowchart of another video encoding method according to an embodiment of the present invention, and as shown in fig. 3, the method includes the following steps:

s301, acquiring initial image data in a reference frame.

S302, tracking the object to be tracked through a target tracking algorithm to obtain motion data generated in the moving process of the object to be tracked.

The motion data may include data of a change in position of the object to be tracked, which is transferred from the reference frame to the current frame, and may also include data of a change in an area where the object to be tracked is located. The motion data may be obtained for motion estimation and motion compensation calculations for moving the object to be tracked from the reference frame to the current frame. The motion estimation divides the current frame of the image sequence into a plurality of non-overlapping macroblocks, and considers the displacement of all pixels in the macroblocks to be the same. The motion compensation can predict and compensate the local image of the current frame by the local image of the reference frame, and can reduce the redundancy existing between the current frame and the reference frame. The motion compensation may include global motion compensation and block motion compensation. The current frame and the reference frame may be adjacent frames, and the adjacent frames are adjacent in the playing sequence, but are not necessarily adjacent in the coding relation.

In this embodiment, the motion estimation described above may represent a relative offset motion process from the object to be tracked to the spatial position of the target object. The motion data can also comprise object residual errors, the object residual errors represent that the reference frame is subtracted from the current frame to obtain residual errors containing less information, therefore, coding can be carried out through a lower code rate, and the reference frame can be recovered only through simple addition during decoding. The smaller the prediction residual, the smaller the compression ratio of the object. Of course, the motion data may also include attribute information of the object, and the like. After the motion data is calculated, the motion data may be encoded by a currently used encoding standard, and the currently used encoding standard may be different from an encoding standard for encoding the above-mentioned background region and object residual, for example: the PU in the coding standard may be 8x 8.

S303, acquiring the position information and the motion data of the object to be tracked, and calculating the position information of the target object to obtain target image data, wherein the object to be tracked is associated with the target object.

The position information of the object to be tracked can be used as a reference point, and the calculation can be performed by combining the motion data of the object to be tracked, which is transferred from the reference frame to the current frame, wherein the calculation can be a summation calculation, so that the position information of the target object in the current frame can be obtained. The association of the object to be tracked and the target object may indicate that the target object is the same object of the object to be tracked after the position information is changed.

S304, encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame.

S305, carrying out interval transmission on the target data packet and the current frame through a preset protocol.

Optionally, step S102 may further include:

and tracking the object to be tracked through a target tracking algorithm to obtain motion data generated by the object to be tracked in the moving process, wherein the motion data comprises an offset distance and an offset direction.

The offset distance refers to a spatial distance generated in a process of converting an object to be tracked into a target object, and the spatial distance can be calculated through multiple dimensions. The offset direction is a straight direction in which the object to be tracked points at the target object, and the offset direction may point at any angle, specifically, according to a direction in which the object to be tracked is transferred to the target object. The movement of the object to be tracked in the image frame may indicate that any portion included in the object to be tracked does not change during the transfer process, but moves as a whole, so that the offset direction of each portion of the object to be tracked, which is pointed to by the target object, is the parallel direction. Of course, the movement of the object to be tracked in the image frame may also indicate that a part of the portion included in the object to be tracked changes during the transfer process, so that the offset direction of the portion pointed to by the changed portion corresponding to the object to be tracked and the offset direction formed by the unchanged portion are not parallel.

And calculating the position information of the target object according to the position information of the object to be tracked, the offset distance and the offset direction so as to obtain target image data.

The position information, the offset distance and the offset direction corresponding to the object to be tracked are combined, and the specific position information of the target object in the current frame can be calculated. The object to be tracked is conveniently and accurately tracked by combining a plurality of parameters of the object to be tracked in the mode, the accuracy of acquiring the position information of the target object is improved, and the target object is conveniently locked.

Optionally, after the position information of the target object is calculated according to the position information of the object to be tracked, the offset distance, and the offset direction, the method further includes:

and acquiring the area where the target object is located based on the position information of the target object.

After the position information of the target object in the current frame is determined, the region where the target object is located needs to be acquired, the acquired region where the target object is located may be a rectangular region, and the length and the width of the rectangular region are determined according to the shape and the size of the target object.

And expanding the area where the target object is located to be consistent with the size of the target detection area.

Specifically, after the position information of the target object in the current frame is obtained, the region where the target object is located can be expanded, the region where the target object is located is expanded to be consistent with the target detection region, the target object is kept consistent without deformation, calculation of motion data is facilitated, and calculation errors are reduced. The area where the target object is located is expanded to be consistent with the size of the target detection area, and as long as the object to be tracked does not leave the tracked image interface after moving, the position of the object to be tracked moving to the current frame in the target detection area can be tracked based on a target tracking algorithm.

In the region where the target object is expanded, a background region may be included in addition to the target object, for example: people, cars, plants, etc. in the background area. The background area may be an area different from the background area in the target detection area, that is, the background area indicates that, in the target tracking process, only the position information, the attribute information, and the like of the object to be tracked, which is interested by the user, are tracked, and other interference information is not tracked.

In the embodiment, the image data of the object to be tracked in the reference frame is obtained, the image data of the object to be tracked is tracked through a target tracking algorithm, the offset distance and the offset direction of the object to be tracked are obtained in the tracking process, the target image data (including position information) corresponding to the target object in the current frame is determined according to the position information, the offset distance and the offset direction of the object to be tracked, the encoded target data packet is transmitted with the current frame at intervals through an RTP (real-time transport protocol) protocol, the target data packet is transmitted with the current frame at intervals through a preset protocol, the target data packet and the current frame are transmitted through one protocol, the problem that the target data packet and the current frame need to be transmitted respectively through different protocols in the prior art is solved, the transmission process is simplified, and the operation amount is reduced through the transmission of one preset protocol, and reduces the bandwidth of the transmission.

As shown in fig. 4, fig. 4 is a flowchart of another video encoding method according to an embodiment of the present invention, and as shown in fig. 4, the method includes the following steps:

s401, acquiring initial image data in a reference frame, wherein the initial image data comprises an object to be tracked.

S402, judging whether the time for tracking the object to be tracked reaches a preset time threshold value.

And S403, if the tracking time of the object to be tracked reaches a preset time threshold, detecting the object to be tracked in the reference frame again.

As a possible embodiment, there may be a situation that the moving object to be tracked leaves the image, which causes the object to be tracked to be lost when tracking the object to be tracked, so that a time threshold may be preset in the target tracking algorithm, the object to be tracked in the reference frame is detected for multiple times, and the object to be tracked is prevented from being lost when tracking the object to be tracked. The time threshold may be set according to specific needs, or may be dynamically adjusted, and may be increased if the object is tracked for a long time, or may be decreased if the object is detected at a high speed.

When the time for tracking the reference frame reaches the preset time threshold, the object to be tracked in the reference frame may be detected again, for example: and the time threshold is 10ms, and when the time for tracking the object to be tracked in the reference frame reaches 10ms, detecting the area where the object to be tracked in the reference frame is located again. The area where the object to be tracked is located can be detected through the AI classifier.

S404, if the tracking time of the object to be tracked does not reach a preset time threshold value, continuing to track the initial image data to obtain the target image data of the current frame.

When the time for tracking the object to be tracked in the reference frame does not reach the preset time threshold, the offset position and the offset direction of the object to be tracked in the reference frame may be obtained according to a target tracking algorithm, for example: and the time threshold is 10ms, and when the time for tracking the object to be tracked in the reference frame reaches 10ms, detecting the area where the object to be tracked in the reference frame is located again. And obtaining target image data corresponding to the target object in the current frame based on the obtained position information of the object to be tracked, the offset position and the offset direction.

S405, encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame.

And S406, transmitting the target data packet with the current frame at intervals through a preset protocol.

In this embodiment, by setting a time threshold, the time for tracking the object to be tracked in the reference frame and the size of the time threshold are determined, and a corresponding action is executed according to the determination result. Therefore, when the object to be tracked is a moving object, the situation that the moving object leaves the image to cause tracking loss of the object to be tracked can be avoided.

As shown in fig. 5, fig. 5 is a flowchart of another video encoding method according to an embodiment of the present invention, and as shown in fig. 5, the setting method further includes the following steps:

s501, acquiring initial image data in a reference frame.

S502, tracking the initial image data through a target tracking algorithm to obtain target image data of the current frame.

S503, encoding the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame.

S504, extracting the current frame and a target data packet, wherein the target data packet corresponds to the current frame.

The target data packet corresponds to the current frame and indicates that the target data packet includes position information and attribute information of a target object in the current frame. And after the target data packet is obtained, extracting data of the target data packet and the current frame corresponding to the target data packet for subsequent uploading.

And S505, transmitting the target data packet and the current frame at intervals through a preset protocol to obtain a data transmission block of the current frame.

After the current frame and the target data packet are extracted, the target data packet and the current frame are transmitted at intervals through an RTP (real-time transport protocol), the current frame and the target data packet corresponding to the current frame are located at adjacent positions, and the target data packet is analyzed. Corresponding image data in the current frame can be obtained. Taking a current frame and a target data packet corresponding to the current frame as a data transmission block, wherein the data transmission block is transmitted based on an RTP protocol. If the multiple image frames and the target data packets corresponding to the multiple image frames need to be uploaded, a target data chain is formed, and interval transmission is formed on the target data chain, for example: a first frame image frame, a target data packet for a first frame image frame, a second frame image frame, a target data packet for a second frame image frame, a third frame image frame, a target data packet for a third frame image frame, and so on.

The following diagram is a specific schematic diagram of the present embodiment for transmitting a plurality of target data packets and corresponding multi-frame image frames, taking H264/H265 as an example, the transmission timing sequence is as follows:

wherein SEI (additional enhancement information) may be used for transmission of the target packet, NALU (network abstraction layer unit type) located before SEI represents a previous image frame corresponding to SEI; NALU located after SEI represents the next image frame of SEI. The multi-frame image frames and the target data packets are crossed and inserted to form an image data chain, and can be transmitted through the same RTP protocol without forming a plurality of image data chains by the image frames and the target data packets and transmitting through different protocols. In this way, the bandwidth of the transmission can be reduced.

And S507, uploading the data transmission block of the current frame through a network transmission unit.

Specifically, because the encoding standards of different encoding schemes all specify how to transmit data through the network, there are usually several different data types, one of which is a data type that can be customized by a user. Therefore, the network transmission unit refers to a transmission unit that can be provided for users to use automatically, and can put data in for network transmission based on the corresponding coding standard.

The network transmission may be uploading the image data link to the cloud via a provided network link. The cloud may obtain the required information according to the received image frames and the target data packet, for example: and extracting the position information of the object from the target data packet, inputting the information into a decoder for decoding, and capturing the object image in the decoded code stream.

The network links may comprise various connections, such as wired, wireless, or fiber optic cable connections, to name a few. It should be noted that the Wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi (Wireless-Fidelity) connection, a bluetooth connection, a wimax (world Interoperability for microwave access) connection, a Zigbee connection, a uwb (ultra wideband) connection, and other Wireless connection means now known or developed in the future.

In the embodiment, the initial image data of the reference frame is obtained, the object to be tracked in the reference frame is tracked to obtain the target image data, the target image data is compressed and encoded to form the target data packet, the target data packet and the current frame are transmitted at intervals through an RTP (real time protocol) protocol to obtain the data transmission block of the current frame, and the data transmission block of the current frame is uploaded to the cloud end through the network transmission unit. The target data packet and the current frame are transmitted through one protocol, the problem that the target data packet and the current frame need to be transmitted through different protocols respectively in the prior art is solved, the transmission process is simplified, meanwhile, the operation amount is reduced through transmission of one preset protocol, and the transmission bandwidth is reduced.

As shown in fig. 6, fig. 6 is a block diagram of a face quality threshold acquisition apparatus according to an embodiment of the present invention, and as shown in fig. 6, the apparatus includes:

an obtaining module 601, configured to obtain initial image data in a reference frame;

a tracking module 602, configured to track the initial image data through a target tracking algorithm to obtain target image data of a current frame;

the encoding module 603 is configured to encode the target image data of the current frame to obtain a target data packet corresponding to the target image data of the current frame;

the transmission module 604 is configured to perform interval transmission on the target data packet and the current frame through a preset protocol.

Optionally, the initial image data includes position information of the object to be tracked and the target detection area, as shown in fig. 7, the obtaining module 601 includes:

a first detecting unit 6011, configured to detect, through an AI classifier, an area where an object to be tracked in a reference frame is located;

a first expansion unit 6012, configured to expand an area where an object to be tracked is located, to obtain a target detection area;

an extracting unit 6013, configured to identify position information of the target detection region to obtain initial image data in the reference frame.

Optionally, the initial image data includes the object to be tracked and the position information of the object to be tracked, and the target image data of the current frame includes the target object and the position information of the target object, as shown in fig. 8, the tracking module 602 includes:

the first tracking unit 6021 is configured to track the object to be tracked through a target tracking algorithm to obtain motion data generated by the object to be tracked in the moving process;

the calculating unit 6022 is configured to obtain the position information and the motion data of the object to be tracked, and calculate the position information of the target object to obtain target image data, where the object to be tracked is associated with the target object.

The first tracking unit 6021 is further configured to track the object to be tracked by using a target tracking algorithm, so as to obtain motion data generated by the object to be tracked in the moving process, where the motion data includes an offset distance and an offset direction.

The calculating unit 6022 is further configured to calculate the position information of the target object according to the position information of the object to be tracked, the offset distance, and the offset direction, so as to obtain target image data.

Optionally, the initial image data includes an object to be tracked, the target tracking algorithm includes a preset time threshold, as shown in fig. 9, the tracking module 602 further includes:

the judging unit 6023 is configured to judge whether the time for tracking the object to be tracked reaches a preset time threshold;

the second detection unit 6024 is configured to detect the object to be tracked in the reference frame again if the time for tracking the object to be tracked reaches the preset time threshold;

the first obtaining unit 6025 is configured to continue to track the initial image data to obtain target image data of the current frame if the time for tracking the object to be tracked does not reach the preset time threshold.

Optionally, as shown in fig. 10, the tracking module 602 further includes:

a second obtaining unit 6026 configured to obtain an area where the target object is located based on the position information of the target object;

a second expanding unit 6027, configured to expand the region where the target object is located to be consistent with the size of the target detection region.

Optionally, as shown in fig. 11, the transmission module 604 includes:

the extracting unit 6041 is configured to extract a current frame and a target packet, where the target packet corresponds to the current frame;

a binding unit 6042, configured to perform interval transmission on the target data packet and the current frame through a preset protocol to obtain a data transmission block of the current frame;

and an uploading unit 6043, configured to upload the data transmission block of the current frame through the network transmission unit.

The video encoding device provided by the embodiment of the invention can realize each implementation mode of the video encoding method and corresponding beneficial effects, and is not repeated here for avoiding repetition.

As shown in fig. 12, fig. 12 is a structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 12, includes: a memory 1202, a processor 1201, and a computer program stored on the memory 1202 and executable on the processor 1201, wherein:

the processor 1201 is configured to call the computer program stored in the memory 1202, and perform the following steps:

acquiring initial image data in a reference frame;

and carrying out interval transmission on the target data packet and the current frame through a preset protocol.

Optionally, the initial image data includes position information of the object to be tracked and the target detection region, and the step of acquiring the initial image data in the reference frame, which is performed by the processor 1201, includes:

detecting the area of an object to be tracked in the reference frame through an AI classifier;

expanding the area where the object to be tracked is located to obtain a target detection area;

position information of the target detection area is identified to obtain initial image data in the reference frame.

Optionally, the step of obtaining the target image data of the current frame includes the step of tracking the initial image data by a target tracking algorithm executed by the processor 1201 for processing the target image data of the current frame including the target object and the position information of the target object, where the initial image data includes the position information of the object to be tracked and the position information of the object to be tracked, and the step of obtaining the target image data of the current frame includes the step of:

tracking an object to be tracked through a target tracking algorithm to obtain motion data generated by the object to be tracked in the moving process;

the method comprises the steps of obtaining position information and motion data of an object to be tracked, calculating the position information of a target object to obtain target image data, and enabling the object to be tracked to be associated with the target object.

Optionally, the initial image data includes an object to be tracked and position information of the object to be tracked, the target image data of the current frame includes a target object and position information of the target object, and the processor 1201 tracks the initial image data through a target tracking algorithm to obtain the target image data of the current frame, where the step of obtaining the target image data of the current frame includes:

tracking an object to be tracked through a target tracking algorithm to obtain motion data generated by the object to be tracked in the moving process, wherein the motion data comprises an offset distance and an offset direction;

Optionally, the initial image data includes an object to be tracked, the target tracking algorithm includes a preset time threshold, and the step of tracking the initial image data by the target tracking algorithm executed by the processor 1201 to obtain the target image data of the current frame includes:

judging whether the time for tracking the object to be tracked reaches a preset time threshold value or not;

if the time for tracking the object to be tracked reaches a preset time threshold value, detecting the object to be tracked in the reference frame again;

and if the tracking time of the object to be tracked does not reach a preset time threshold, continuing to track the initial image data to obtain the target image data of the current frame.

Optionally, after calculating the position information of the target object according to the position information of the object to be tracked, the offset distance, and the offset direction, the processor 1201 is further configured to:

acquiring the area where the target object is located based on the position information of the target object;

Optionally, the step of performing, by the processor 1201, interval transmission on the target data packet and the current frame through a preset protocol includes:

extracting a current frame and a target data packet, wherein the target data packet corresponds to the current frame;

carrying out interval transmission on a target data packet and a current frame through a preset protocol to obtain a data transmission block of the current frame;

and uploading the data transmission block of the current frame through a network transmission unit.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the video encoding method provided in the embodiment of the present invention, and can achieve the same technical effect, and in order to avoid repetition, the computer program is not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A video encoding method, characterized in that said method comprises the steps of:

acquiring initial image data in a reference frame;

2. The method of claim 1, wherein the initial image data includes position information of an object to be tracked and a target detection area, and the step of acquiring the initial image data in the reference frame includes:

detecting the area of the object to be tracked in the reference frame through an AI classifier;

and identifying the position information of the target detection area to obtain initial image data in the reference frame.

3. The method of claim 1, wherein the initial image data comprises an object to be tracked and position information of the object to be tracked, the target image data of the current frame comprises a target object and position information of the target object, and the step of tracking the initial image data by a target tracking algorithm to obtain the target image data of the current frame comprises:

tracking the object to be tracked through a target tracking algorithm to obtain motion data generated by the object to be tracked in the moving process;

and acquiring the position information and the motion data of the object to be tracked, and calculating the position information of the target object to obtain the target image data, wherein the object to be tracked is associated with the target object.

4. The method of claim 1, wherein the initial image data comprises an object to be tracked and position information of the object to be tracked, the target image data of the current frame comprises a target object and position information of the target object, and the step of tracking the initial image data by a target tracking algorithm to obtain the target image data of the current frame comprises:

tracking the object to be tracked through the target tracking algorithm to obtain motion data generated by the object to be tracked in the moving process, wherein the motion data comprises an offset distance and an offset direction;

and calculating the position information of the target object according to the position information of the object to be tracked, the offset distance and the offset direction so as to obtain the target image data.

5. The method of claim 1, wherein the initial image data comprises an object to be tracked, the target tracking algorithm comprises a preset time threshold, and the step of tracking the initial image data through the target tracking algorithm to obtain the target image data of the current frame comprises:

if the time for tracking the object to be tracked reaches the preset time threshold, detecting the object to be tracked in the reference frame again;

if the time for tracking the object to be tracked does not reach the preset time threshold, continuing to track the initial image data to obtain the target image data of the current frame.

6. The method according to claim 4, wherein after the calculating the position information of the target object based on the position information of the object to be tracked and the offset distance and the offset direction, the method further comprises:

7. The method of claim 1, wherein the step of transmitting the target packet with a predetermined interval from the current frame comprises:

extracting the current frame and the target data packet, wherein the target data packet corresponds to the current frame;

transmitting the target data packet and the current frame at intervals through the preset protocol to obtain a data transmission block of the current frame;

8. A video encoding apparatus, comprising:

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps in a video encoding method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of a video encoding method as claimed in any one of claims 1 to 7.