CN113255564B

CN113255564B - Real-time video identification accelerator based on key object splicing

Info

Publication number: CN113255564B
Application number: CN202110652261.1A
Authority: CN
Inventors: 宋卓然; 鲁恒; 景乃锋; 梁晓峣
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2022-05-06
Anticipated expiration: 2041-06-11
Also published as: CN113255564A

Abstract

The invention discloses a real-time video identification accelerator based on key object splicing, which comprises an object tracking module, an object aggregation module, an object splitting module, a preset neural network accelerator, an updated object queue module and a main memory module. The object tracking module is used for acquiring the original position information of a key object rectangular frame in P frame image data or the original position information of the key object rectangular frame in B frame image data; the object aggregation module is used for merging the key object rectangular frame in the P frame image data and/or the key object rectangular frame in the B frame image data to obtain a composite frame; the preset neural network accelerator is used for processing the synthetic frames to obtain a synthetic frame identification result; the object splitting module is used for splitting the combined frame and returning a splitting result to the original image data. The invention greatly saves the calculation workload in the target video identification task and improves the processing speed and the identification accuracy of the identification task.

Description

Real-time video identification accelerator based on key object splicing

Technical Field

The invention relates to the technical field of neural networks, in particular to a real-time video identification accelerator based on key object splicing.

Background

Deep convolutional neural networks have found widespread application in image recognition, such as in classification, detection, and segmentation of images. With the development of the deep convolutional neural network, people gradually expand the application range of the deep convolutional neural network to the video field.

In general, a video identification task based on a deep neural network can take each video frame as an independent picture and input the picture into the deep neural network for identification, that is, the video identification is taken as an image identification task to identify each frame. However, directly applying the network model suitable for the image recognition task to all video frames needs to bear huge calculation overhead and energy overhead; on the other hand, the neural network applied to the image recognition task is good at processing static objects, and cannot capture moving characteristics of objects between video frames, thereby resulting in a low accuracy of video recognition.

Therefore, researchers have proposed a deep neural network model for the video recognition task, which utilizes temporal locality between video frames to further enhance the recognition accuracy. Caelles et al propose that a double-current FCN network model divides the foreground and the outline of each frame, but the double-current FCN neural network still needs to be applied to each frame, so that the method consumes a lot of time and energy, and the method does not utilize the time locality among video frames, so that the identification accuracy is difficult to guarantee. In order to achieve higher accuracy, Cheng et al propose a Segflow method, which extracts inter-frame time locality-optical flow information by using a neural network, and then assists the neural network of each frame to be identified by using the optical flow information, so as to obtain a final identification result. However, this method takes too much effort to extract the optical flow, and therefore the improved recognition speed on the TiTAN X GPU is also limited.

Disclosure of Invention

The invention aims to solve the technical problems that each frame of video image is often required to be processed in the conventional neural network processing video identification task, the time consumption is long, the energy consumption is high, the identification accuracy rate is difficult to guarantee, and the identification speed is difficult to increase.

In order to solve the above technical problem, the present invention provides a real-time video recognition accelerator based on key object splicing, which is characterized by comprising: the system comprises an object tracking module, an object aggregation module, an object splitting module, a preset neural network accelerator, an update object queue module and a main memory module, wherein the update object queue module and the main memory module are respectively connected with the object tracking module, the object aggregation module and the object splitting module;

the main memory module is used for storing the synthesized frame, the identification result of the I-type frame image, the identification result of the P-type frame image, the identification result of the B-type frame image, the P-type frame image data obtained after the target video is decoded, the B-type frame image data, the I-type frame image data, the motion vector table and the intra-frame prediction mode table;

the object tracking module is used for acquiring original position information of a key object rectangular frame in P frame image data or original position information of the key object rectangular frame in B frame image data based on the motion vector table, the intra-frame preset mode table and the acquired image identification result;

the object aggregation module is used for merging the key object rectangular frame in the P frame image data and/or the key object rectangular frame in the B frame image data based on the original position information of the key object rectangular frame in the P frame image data, the corresponding P frame image data and/or the original position information of the key object rectangular frame in the B frame image data and the corresponding B frame image data to obtain a composite frame, the placement position information of the key object rectangular frame in the composite frame in the P frame image data and/or the placement position information of the key object rectangular frame in the composite frame in the B frame image data;

the preset neural network accelerator is used for processing the I-type frame image data and the synthetic frame to obtain an I-type frame image recognition result and a synthetic frame recognition result;

the updating object queue module is used for storing original position information of a key object rectangular frame in P frame image data, original position information of a key object rectangular frame in B frame image data, placement position information of the key object rectangular frame in the P frame image data in a composite frame and placement position information of the key object rectangular frame in the B frame image data in the composite frame;

the object splitting module is used for splitting the identification result of the corresponding composite frame based on the placement position information of the key object rectangular frame in the P frame image data and/or the placement position information of the key object rectangular frame in the B frame image data in the composite frame, and returning the split result to the corresponding P frame image data or B frame image data based on the original position information of the key object rectangular frame in the P frame image data and/or the original position information of the key object rectangular frame in the B frame image data to obtain the identification result of the P frame image data or the identification result of the B frame image data;

wherein the acquired image recognition result comprises a recognition result of the acquired I-frame image data, a recognition result of the P-frame image data and/or a recognition result of the B-frame image data.

Preferably, the object tracking module comprises a recovery unit and a classification unit which are connected;

the recovery unit is used for storing P-type frame image data information and B-type frame image data information, determining a segmentation small block to be processed and segmentation small block information to be processed based on the P-type frame image data information and the B-type frame image data information, acquiring a motion vector or an intra-frame prediction mode of the segmentation small block to be processed from the main memory module based on the segmentation small block information to be processed, acquiring address information of a reference segmentation small block identification result of the segmentation small block to be processed based on the acquired motion vector or intra-frame prediction mode, then respectively sending the address information of the reference segmentation small block identification result of the segmentation small block to be processed and original position information of the segmentation small block to be processed in the segmentation small block information to the main memory module and the classification unit, and simultaneously sending frame end information to the update object queue module, to end the updating of the position information of the key object rectangular frame in the P frame image data or the B frame image data;

the classification unit is configured to receive a reference segmentation small block identification result returned by the main memory module based on address information of a reference segmentation small block identification result of the to-be-processed segmentation small block, and determine whether the to-be-processed segmentation small block is a key segmentation small block based on the received reference segmentation small block identification result, if so, send update position information of a key object rectangular frame in corresponding frame image data to the update object queue module based on original position information of the to-be-processed segmentation small block, and otherwise, not send information to the update object queue module.

Preferably, the recovery unit comprises a first storage subunit and an address calculation subunit which are connected;

the first storage subunit is used for storing P-type frame image data information and B-type frame image data information, determining a to-be-processed segmented small block and to-be-processed segmented small block information based on the P-type frame image data information and the B-type frame image data information, acquiring a motion vector or an intra-frame prediction mode of the to-be-processed segmented small block from the main memory module based on the to-be-processed segmented small block information, then sending the motion vector or the intra-frame prediction mode of the to-be-processed segmented small block and original position information of the to-be-processed segmented small block in the to-be-processed segmented small block information to the address calculation subunit, and simultaneously sending frame end information to the update object queue module to end updating of key object rectangular frame position information in the P-type frame image data or the B-type frame image data;

the address calculation subunit obtains address information of a reference segmentation small block identification result of the segmentation small block to be processed based on the motion vector or intra-frame prediction mode of the segmentation small block to be processed, and respectively sends the address information of the reference segmentation small block identification result of the segmentation small block to be processed and original position information of the segmentation small block to be processed to the main memory module and the classification unit.

Preferably, the determining of the segmentation small blocks to be processed based on the P-type frame image data information and the B-type frame image data information comprises:

determining a preset tracking sequence based on the P-type frame image data information and the B-type frame image data information;

determining frame image data to be processed by sequentially using P frame image data or B frame image data as the frame image data to be processed according to a preset tracking sequence;

determining temporary segmentation small blocks by sequentially taking the segmentation small blocks in the frame image data to be processed as temporary segmentation small blocks;

reading the frame number of the last I frame image data which is processed currently from the preset neural network accelerator as a reference frame number, judging whether the frame number of the frame image data to be processed is smaller than the reference frame number, if so, determining the temporary segmentation small block as a segmentation small block to be processed, otherwise, reading the frame number of the last I frame image data which is processed currently from the preset neural network accelerator as the reference frame number again, and judging whether the frame number of the frame image data to be processed is smaller than the reference frame number again;

the preset tracking sequence is the sequence of the target video after I-type frame image data is eliminated.

Preferably, the classification unit includes a key segmentation small block judgment subunit and a first coordinate comparison subunit and a second coordinate comparison subunit connected to the key segmentation small block judgment subunit respectively:

the key segmentation small block judgment subunit is connected with the main memory module and used for receiving a reference segmentation small block identification result and judging whether the reference segmentation small block identification result contains preset pixels or not, if yes, the to-be-processed segmentation small block is judged to be a key segmentation small block, a comparison signal is sent to the first coordinate comparison subunit and the second coordinate comparison subunit, and if not, the to-be-processed segmentation small block is judged to be a non-key segmentation small block;

the first coordinate comparison subunit is configured to, after receiving the comparison signal, determine whether an upper left corner coordinate in the original position information of the to-be-processed segmented small block is smaller than an upper left corner coordinate of a corresponding key object rectangular frame, if so, send the upper left corner coordinate in the original position coordinate of the to-be-processed segmented small block to the updated object queue module as upper left updated coordinate information, and otherwise, send no information to the updated object queue module;

and the second coordinate comparison subunit is configured to, after receiving the comparison signal, determine whether a lower-right corner coordinate in the original position information of the to-be-processed segmented small block is greater than or equal to a lower-right corner coordinate of a corresponding key object rectangular frame, if so, send the lower-right corner coordinate in the original position coordinate of the to-be-processed segmented small block as lower-right update coordinate information to the update object queue module, and otherwise, not send information to the update object queue module.

Preferably, the key division small block judgment subunit includes a first comparator and a second comparator, and an input end of the first comparator and an input end of the second comparator are connected to the main memory module respectively;

the first coordinate comparison subunit comprises a first multiplexer and a third comparator, wherein the input end of the first multiplexer is respectively connected with the recovery unit and the updated object queue module, the input end of the third comparator is respectively connected with the recovery unit and the updated object queue module, the output end of the third comparator is connected with the output control end of the first multiplexer, and the output end of the first comparator is connected with the switch control end of the first multiplexer;

the second coordinate comparison subunit comprises a second multiplexer and a fourth comparator, the input end of the second multiplexer is respectively connected with the recovery unit and the updated object queue module, the input end of the fourth comparator is respectively connected with the recovery unit and the updated object queue module, the output end of the fourth comparator is connected with the output control end of the second multiplexer, and the output end of the second comparator is connected with the switch control end of the second multiplexer.

Preferably, the object aggregation module includes a segmentation unit, an idle region selection unit, and a composite frame generation unit, which are connected in sequence;

assuming that a key object rectangular frame in P frame image data to be placed or a key object rectangular frame in B frame image data is a key object rectangular frame to be placed, and an idle area in which the key object rectangular frame to be placed is an idle area to be placed;

the idle area selection unit is used for selecting an idle area to be placed in an idle area list based on the original position information of the rectangular frame of the key object to be placed, sending an identifier of the idle area to be placed to the segmentation unit, acquiring the placement position information of the rectangular frame of the key object to be placed in the idle area to be placed, sending the placement position information to the update object queue module, sending a synthesis completion instruction to the synthesis frame generation unit, and updating the idle area list based on the received identifier of the idle area to be placed and two new idle area information;

the dividing unit is used for acquiring coordinate information of the to-be-placed free area from the free area list based on the identifier of the to-be-placed free area, dividing the to-be-placed free area based on the coordinate information of the to-be-placed free area and the original position information of the to-be-placed key object rectangular frame to acquire two new free areas, and sending the identifier of the to-be-placed free area and the two new free area information to the free area list;

the composite frame generating unit is used for generating an idle frame, acquiring corresponding key segmentation small blocks from the main memory module based on the original position information of the rectangular frame of the key object to be placed, placing the key segmentation small blocks in the idle frame based on the placement position information of the rectangular frame of the key object to be placed to form a composite frame, and finishing the synthesis of the current composite frame after receiving the synthesis finishing instruction and sending the composite frame to the main memory module.

Preferably, the free area selection unit comprises a comparison operation subunit, and a parameter operation subunit and a free area list which are connected with the comparison operation subunit;

the idle area list is connected with the synthetic frame generating unit and used for storing all idle area information in the idle frame and updating the idle area list based on the identifier of the idle area to be placed and two new idle area information;

the parameter operation subunit is configured to calculate a comparison parameter of the rectangular frame of the key object to be placed based on the original position information of the rectangular frame of the key object to be placed, and send the comparison parameter of the rectangular frame of the key object to be placed to the comparison operation subunit;

the comparison operation subunit is used for acquiring first parameters of all the free areas from the free area list, comparing the first parameters of all the idle areas with the first parameters of the rectangular frame of the key object to be placed respectively, screening out all the idle areas of which the first parameters are greater than the first parameters of the rectangular frame of the key object to be placed and forming an available idle area set, then screening out an idle area with the minimum difference value between the second parameter and the second parameter of the rectangular frame of the key object to be placed from the available idle area set as an idle area to be placed, and sending the frame number of the idle area to be placed to the segmentation unit, meanwhile, the method is used for acquiring the placement position information of the rectangular frame of the key object to be placed in the free area to be placed, sending the updated object queue to the updated object queue module, and sending a synthesis completion instruction to the synthesis frame generating unit;

the height and the width are first parameters, the area is second parameters, and the first parameters and the second parameters are combined into comparison parameters.

Preferably, the comparison operation subunit comprises a reduction tree filter and a plurality of paths of comparison circuits respectively connected with the reduction tree filter;

the comparison circuit comprises a height comparator, a width comparator, an AND gate device, a subtracter and a control switch; the output ends of the height comparator and the width comparator are connected with two input ends of the AND gate device, the output end of the AND gate device is connected with the output control end of the control switch, the output end of the subtracter is connected with the input end of the control switch, and the output end of the control switch is connected with the reduction tree screener.

The reduction tree screener is used for screening out an idle area with the smallest difference value between a second parameter and the second parameter of the rectangular frame of the key object to be placed from the available idle area set as an idle area to be placed, sending the frame number of the idle area to be placed to the segmentation unit, obtaining placement position information of the rectangular frame of the key object to be placed in the idle area to be placed, sending the placement position information to the updated object queue module, and sending a synthesis completion instruction to the synthesis frame generation unit.

Preferably, segmenting the to-be-placed free area based on the original position information of the to-be-placed key object rectangular frame includes:

acquiring the height difference and the width difference between the idle area to be placed and the rectangular frame of the key object to be placed based on the coordinate information of the idle area to be placed and the original position information of the rectangular frame of the key object to be placed;

when the height difference is larger than the width difference, dividing the to-be-placed idle area in which the to-be-placed key object rectangular frame is placed along a straight line where the outer edge of the bottom edge of the to-be-placed key object rectangular frame is located;

and when the height difference is smaller than the width difference, dividing the to-be-placed idle area in which the to-be-placed key object rectangular frame is placed along the straight line where the right outer edge of the to-be-placed key object rectangular frame is located.

Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:

by applying the real-time video recognition accelerator based on key object splicing provided by the embodiment of the invention, the task parallelism among object tracking, object aggregation and the preset neural network accelerator can be realized, so that the delay generated during the generation of a synthetic frame is covered; the parallelism of recovery operation and classification operation in the object tracking task can be further realized, namely, the classification operation can be immediately carried out after the recovery operation is finished after one object is segmented little and soon, so that the delay generated when the synthetic frame is generated is covered; further, object-level parallelism in an object aggregation algorithm can be realized, namely, aggregation operation is started immediately after a key object rectangular box is determined; furthermore, the object splitting algorithm is irrelevant to the object tracking, the object aggregation and the preset neural network accelerator, so that the object splitting algorithm and the preorder process can be parallel.

The real-time video identification accelerator based on key object splicing substantially aggregates key objects in a plurality of continuous video frames and takes the synthesized frames as the input of the preset neural network accelerator, so that the reduction of the data volume input to the preset neural network accelerator is realized, namely, the redundant calculation corresponding to the video frames is reduced by extruding the non-key information input to the preset neural network accelerator, the calculation workload in the target video identification task is greatly saved, and the processing speed and the identification accuracy of the identification task are improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic structural diagram of a real-time video recognition accelerator based on key object splicing according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an object tracking module in a real-time video recognition accelerator based on key object splicing according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram illustrating an object aggregation module in a real-time video recognition accelerator based on key object splicing according to an embodiment of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

Deep convolutional neural networks have found widespread application in image recognition, such as in classification, detection, and segmentation of images. With the development of the deep convolutional neural network, people gradually expand the application range of the deep convolutional neural network to the video field. Researchers have proposed a deep neural network model for the video recognition task, which utilizes the temporal locality between video frames to further enhance the recognition accuracy. Caelles et al propose that a double-current FCN network model divides the foreground and the outline of each frame, but the double-current FCN neural network still needs to be applied to each frame, so that the method consumes a lot of time and energy, and the method does not utilize the time locality among video frames, so that the identification accuracy is difficult to guarantee. For higher accuracy, Cheng et al propose a Segflow method, which extracts inter-frame time locality-optical flow information by using a neural network, and then uses the optical flow information to assist the neural network of each frame in recognition to obtain a final recognition result. However, this method takes too much effort to extract the optical flow, and therefore the improved recognition speed on the TiTAN X GPU is also limited.

The problems can be solved through a pure software algorithm, but the pure software algorithm cannot fully mine parallelism existing in the algorithm, so that the improvement of the overall performance is not high enough. In addition, programming in a conventional neural network accelerator is also a difficult point, because the ecology of the conventional neural network accelerator is not as perfect as that of a GPU, and since special optimization of code is difficult in the conventional neural network accelerator, performance is easily reduced.

Example one

In order to solve the technical problems in the prior art, the embodiment of the invention provides a real-time video identification accelerator based on key object splicing.

FIG. 1 is a schematic structural diagram of a real-time video recognition accelerator based on key object splicing according to an embodiment of the present invention; referring to fig. 1, the accelerator for real-time video recognition based on key object splicing according to the present invention includes an object tracking module, an object aggregation module, an object splitting module, a preset neural network accelerator, an updated object queue module, and a main memory module, wherein the object tracking module, the object aggregation module, the object splitting module, and the preset neural network accelerator are respectively connected to the updated object queue module and the main memory module, and the main memory module is further connected to the preset neural network accelerator.

In the embodiment of the invention, a real-time video identification accelerator based on key object splicing may have a plurality of objects to be identified in each frame of image data in an identification task of processing a target video, but for the simplicity of description, a single-object video segmentation task is taken as an example below, and of course, the accelerator can also be applied to a multi-object video segmentation task and a video detection task. If a plurality of identification objects need to be identified or detected, all the identification objects can be set as key objects, and then the target video is identified through the accelerator; or the recognition objects in the plurality of recognition objects are sequentially used as key objects, and the recognition of the multiple recognition objects is realized by repeatedly inputting the key objects into the weapon. The latter of the two processing methods is more efficient for the case of multiple recognition objects.

The video coding and decoding standard of the target video aiming at the real-time video identification accelerator based on key object splicing is a classification with I frame image data, B frame image data and P frame image data, each frame of image data is divided according to a preset size block, and the video coding and decoding standard is provided with a motion vector table and an intra-frame prediction mode table. For example, the video encoding and decoding standard of the target video may be h.265 video, and at this time, the small blocks are divided into coding tree blocks; the video coding and decoding standard of the target video can also be H.264 video, and the small blocks are divided into macro blocks. Wherein, the motion vector is the amount of the motion track of the divided small block expressed by the video decoder by recording the code stream of the dependency relationship. The video decoder refers to the depended frames and the depended small blocks as reference frames and reference small blocks, and the motion vector table comprises the reference frames depended on by each B frame image data and each P frame image data in the target video respectively and the reference small blocks depended on by each small block in the B frame image data and each small block in the P frame image data respectively.

It should be noted that each frame of image data may be divided according to a basic unit of divided small blocks, and a typical size of the divided small blocks is 8 × 8 pixels. The video decoder decompresses the bitstream back into successive video frames in a specified decoding order during decoding. For I frame image data, dividing small blocks for intra-frame decoding; for P-frame image data and B-frame image data, the partition tiles are intra-and inter-frame decoded using reference partition tiles, motion vectors, and residuals. The specific decoding process of the video decoder for the I-type frame image data, the P-type frame image data and the B-type frame image data includes the following features: for the class I frame image data, each small partition block selects a certain small partition block in a certain direction, such as the up-down direction, the left-right direction and the like, according to the intra-frame prediction mode, and the residual error between the small partition blocks is added to obtain the final value of each small partition block. For P-type frame image data, each small partition block can be selected to be coded in a frame or predicted between frames; therefore, when decoding the small partitioned blocks in the image data of the P-type frame, the video decoder firstly needs to determine whether the decoding mode is interframe or intraframe according to the information of the small partitioned blocks; if the prediction is intra-frame prediction, performing intra-frame decoding on the prediction; if the prediction is inter-frame prediction, the video decoder needs to locate a reference segmentation small block in a preamble reference frame according to the motion vector, and adds a residual error between the reference segmentation small block and the reference segmentation small block to obtain a final numerical value of each segmentation small block. For B-type frame image data, whether the decoding mode is interframe or intraframe needs to be determined; if the prediction is interframe prediction, a preset decoder divides small blocks according to the reference of the motion vector in the preamble or the subsequent reference frame in the video playing sequence, and adds the residual error between the small blocks to obtain the final value of each divided small block.

Therefore, based on the above, by decoding the target video through the preset video decoder, not only the I-type frame image data, the P-type frame image data, and the B-type frame image data of the target video, but also the corresponding motion vector table and the intra prediction mode table can be obtained. Among them, there are 35 intra prediction modes in the h.265 coding standard. And the video decoder records the decoding sequence of the frames according to the inter-frame dependency relationship in the decoding process, so that the decoding sequence and the playing sequence of the frames in the target video are usually inconsistent. For example, assume that (I0, B1, B2, B3, P4, I5, B6, P7) is the playing order of the video, and that (I0, P4, B3, B2, B1, I5, P7, B6) is the actual decoding order, because B3 depends on I0 and P4. Further, the video decoder may convert the codestream back into a conventional sequence of frames according to a particular decoding order. It should be noted that all decoded I, P and B frames are written back to global storage or a buffer for display.

In the embodiment of the invention, the I-type frame image data, the P-type frame image data, the B-type frame image data, the motion vector table and the intra-frame prediction mode table which are obtained by decoding are all stored in a main memory module, and the main memory module also stores a synthetic frame synthesized by an object aggregation module and an I-type frame image identification result and a synthetic frame identification result which are obtained by processing the I-type frame image data and the synthetic frame by a preset neural network accelerator. In this embodiment, the object tracking module, the object aggregation module, the object splitting module, and the preset neural network accelerator are all connected to the main memory module, so that the modules can read or store corresponding data from the main memory module.

In order to facilitate the implementation of the parallel operation of multiple modules, the present embodiment provides a first storage subunit and a second storage subunit, where the first storage subunit is configured to store the P-type frame image data information and the B-type frame image data information, and the second storage subunit is configured to store the I-type frame image data information. Preferably, the P-type frame image data information at least includes position information where the P-type frame image data is stored in the main memory module, and position information where the P-type frame image data is stored in the main memory module corresponding to the motion vector and the intra prediction mode; the content of the B-type frame image data information and the I-type frame image data information is the same as that of the P-type frame image data information. Furthermore, the first storage subunit is arranged in the object tracking module, and the second storage subunit is arranged in the preset neural network accelerator; the object tracking operation and the neural network execution process of the I frame image data are parallelized by supporting the first storage subunit to be directly communicated with the object tracking module and supporting the second storage subunit to be directly communicated with the preset neural network accelerator.

Specifically, the object tracking module is mainly configured to obtain original position information of a key object rectangular frame in P-frame image data or original position information of a key object rectangular frame in B-frame image data based on a motion vector table, an intra-frame preset mode table, and an obtained image recognition result. Wherein the acquired image recognition result includes a recognition result of I-frame image data, a recognition result of P-frame image data, and a recognition result of B-frame image data, which have been currently acquired.

FIG. 2 is a schematic structural diagram of an object tracking module in a real-time video recognition accelerator based on key object splicing according to an embodiment of the present invention; referring to fig. 2, the object tracking module includes a recovery unit and a classification unit connected. Since video frames are organized in units of divided small blocks in the video encoding and decoding stage, we can perform in units of divided small blocks also on the recovery operation and the classification operation, and do not need to perform in units of frames. The recovery unit is mainly used for storing P-type frame image data information and B-type frame image data information, and determining the segmentation small blocks to be processed and the corresponding segmentation small block information to be processed based on the P-type frame image data information and the B-type frame image data information. The information of the to-be-processed divided small blocks comprises the position information of the motion vectors or the position information of the intra-frame prediction mode of the to-be-processed divided small blocks, so the motion vectors or the intra-frame prediction mode of the to-be-processed divided small blocks can be read from the main memory module based on the information of the to-be-processed divided small blocks; then, address information of a reference segmentation small block identification result of the segmentation small block to be processed is obtained based on a motion vector or an intra-frame prediction mode of the segmentation small block to be processed, and the address information of the reference segmentation small block identification result of the segmentation small block to be processed and original position information of the segmentation small block to be processed are respectively sent to a main memory module and a classification unit; meanwhile, the object tracking module can also send frame ending information to the updated object queue module, so that the updated object queue module ends the updating of the position information of the key object rectangular frame in the P frame image data or the B frame image data.

Further, the recovery unit essentially comprises a first storage subunit and an address calculation subunit connected. As the operation object of the object tracking module is substantially a small segmentation block one by one, the object tracking module needs to determine the current small segmentation block to be processed each time the object tracking module operates. Therefore, the first storage subunit is also used for determining the small segmented blocks to be processed based on the P-type frame image data information and the B-type frame image data information, acquiring the corresponding small segmented block information to be processed from the P-type frame image data information or the B-type frame image data information after determining the small segmented blocks to be processed, reading the motion vector or the intra-frame prediction mode of the small segmented blocks to be processed from the main memory module based on the motion vector coordinate information or the intra-frame prediction mode coordinate information in the small segmented block information to be processed, and sending the acquired motion vector or intra-frame prediction mode of the small segmented blocks to be processed and the original position information of the small segmented blocks to be processed in the small segmented block information to the address calculation subunit. Meanwhile, the first storage subunit is further configured to send frame end information to the update object queue module after each frame of image data completes the object tracking operation, so as to end updating of the position information of the key object rectangular frame in the P frame of image data or in the B frame of image data, that is, the first storage subunit sends frame end information to the update object queue module after the last segmented small block in the P frame of image data or the B frame of image data is used as the segmented small block to be processed to complete object tracking.

The address calculation subunit is mainly configured to obtain address information of a reference small-block-division recognition result of the to-be-processed divided block based on the motion vector or the intra-frame prediction mode of the to-be-processed divided block sent by the first storage subunit, and send the original position information of the to-be-processed divided block sent by the first storage subunit and the address information of the reference small-block-division recognition result of the to-be-processed divided block to the classification unit and the main memory module, respectively.

Preferably, the determination method of the to-be-processed divided small blocks in the first storage subunit is as follows: determining a preset tracking sequence based on the P-type frame image data information and the B-type frame image data information; the preset tracking sequence is substantially the decoding sequence of the target video excluding the sequence of the I-type frame image data. And determining the image data of the frame to be processed from the image data of the P type frame and the image data of the B type frame in a frame to be processed determining mode, and determining the temporary segmentation small blocks from the image data of the frame to be processed in a temporary segmentation small block determining mode. Reading the frame number of the last I frame image data processed currently from a preset neural network accelerator as a reference frame number, judging whether the frame number of the frame image data to be processed is smaller than the reference frame number, if so, determining the temporary segmentation small blocks as segmentation small blocks to be processed, otherwise, reading the frame number of the last I frame image data processed currently from the preset neural network accelerator again as the reference frame number, and judging whether the frame number of the frame image data to be processed is smaller than the reference frame number again. Further preferably, the frame to be processed is determined by sequentially using P-frame image data or B-frame image data as frame image data to be processed in a preset tracking order; and the temporary segmentation small blocks are determined in a mode that the segmentation small blocks in the frame image data to be processed are sequentially used as temporary segmentation small blocks according to the arrangement sequence. By repeating the determination method of the segmentation small blocks to be processed, all segmentation small blocks contained in the P-type frame image data and the B-type frame image data can be sequentially used as the segmentation small blocks to be processed.

The method for determining the to-be-processed divided small blocks disclosed by the embodiment of the invention can effectively ensure the correctness of the execution of the target video identification task and avoid the occurrence of deadlock. For example, the image data information contained in the first storage subunit is P4, B3, B2, B1, P8, B7, B6, and the image data information contained in the second storage subunit is I5, I9, wherein the head pointer of the first storage subunit points to the P4 frame and the head pointer of the second storage subunit points to the I5 frame, we can find that the frame number of the head pointer of the first storage subunit is less than the frame number of the head pointer of the second storage subunit, so that the P4 frame can be executed before the I5 frame and no deadlock problem occurs.

After the main memory module obtains the address information of the reference segmentation small block identification result of the segmentation small block to be processed, the main memory module sends the corresponding reference segmentation small block identification result to the classification unit based on the address information of the reference segmentation small block identification result of the segmentation small block to be processed. And after obtaining the identification result of the reference segmentation small block, the classification unit needs to judge whether the corresponding segmentation small block to be processed is the key segmentation small block or not based on the identification result of the segmentation small block, if so, the classification unit sends the update position information of the key object rectangular frame in the corresponding frame image data to the update object queue module based on the original position information of the segmentation small block to be processed, and otherwise, the classification unit does not send information to the update object queue module. And after receiving the updated position information, the updated object queue module updates the position information of the key object rectangular frame in the corresponding frame image data based on the updated position information. It should be noted that, as long as the classification unit receives a reference segmentation small block identification result, the classification operation can determine whether the segmentation small block is a key segmentation small block.

Further, the classification unit comprises a key segmentation small block judgment subunit, and a first coordinate comparison subunit and a second coordinate comparison subunit which are respectively connected with the key segmentation small block judgment subunit. The key segmentation small block judgment subunit is connected with the main memory module and is mainly used for receiving the identification result of the reference segmentation small block and judging whether the identification result of the reference segmentation small block contains preset pixels or not, if yes, the to-be-processed segmentation small block is judged to be the key segmentation small block, a comparison signal is sent to the first coordinate comparison subunit and the second coordinate comparison subunit, and if not, the to-be-processed segmentation small block is judged to be the non-key segmentation small block. Preferably, the preset pixel may be a white pixel or a pixel having a pixel value of 255. The non-key segmentation small blocks are segmentation small blocks which do not need to be processed by a preset neural network accelerator, and therefore the non-key segmentation small blocks do not need to be added into the key object rectangular frame. The first coordinate comparison subunit and the second coordinate comparison subunit determine the subunit of the updated coordinate of the corresponding key object rectangular frame by respectively comparing the coordinates of the upper left corner and the lower right corner of the key segmentation small block with the coordinates of the upper left corner and the lower right corner of the corresponding key object rectangular frame. Further, after receiving the comparison signal, the first coordinate comparison subunit determines whether the upper left-hand coordinate in the original position information of the segmented small block to be processed is smaller than the upper left-hand coordinate of the corresponding key object rectangular frame, if so, sends the upper left-hand coordinate in the original position information of the segmented small block to be processed as the upper left-hand update coordinate information to the update object queue module, otherwise, does not send information to the update object queue module (or returns the upper left-hand coordinate of the corresponding key object rectangular frame to the update object queue module). Meanwhile, after receiving the comparison signal, the second coordinate comparison subunit judges whether the upper right-hand coordinate in the original position information of the to-be-processed segmented small block is smaller than the upper right-hand coordinate of the corresponding key object rectangular frame, if so, the upper right-hand coordinate in the original position information of the to-be-processed segmented small block is taken as the upper right-hand update coordinate information and sent to the update object queue module, and if not, the second coordinate comparison subunit does not send information to the update object queue module (or can return the upper left-hand coordinate of the corresponding key object rectangular frame to the update object queue module). And the key object rectangular frame corresponding to the segmentation small block to be processed is the key object rectangular frame in the current state in the frame image data to which the segmentation small block to be processed belongs.

Further, referring to fig. 2, the classification unit may have the following specific structure: the key segmentation small block judgment subunit comprises a first comparator and a second comparator, wherein the input ends of the first comparator and the second comparator are respectively connected with the main memory module; the first comparator and the second comparator are respectively used for receiving the identification result of the reference segmentation small block and judging whether the identification result of the reference segmentation small block contains preset pixels or not, if yes, the segmentation small block to be processed is judged to be a key segmentation small block, meanwhile, the first price comparator sends a comparison signal to the first coordinate comparison subunit, the second price comparator sends the comparison signal to the second coordinate comparison subunit, and if not, the segmentation small block to be processed is judged to be a non-key segmentation small block.

The first coordinate comparison subunit comprises a first multiplexer and a third comparator, wherein the input end of the first multiplexer is respectively connected with the address calculation subunit of the recovery unit and the update object queue module, so that the first multiplexer acquires the upper left-hand coordinates in the original position information of the small blocks to be processed and the upper left-hand coordinates of the current key object rectangular frame to be updated; meanwhile, the input end of a third comparator is also respectively connected with the address calculation subunit of the recovery unit and the update object queue module, so that the third comparator can judge whether the upper left-hand coordinate in the original position information of the to-be-processed segmented small block is smaller than the upper left-hand coordinate of the current to-be-updated key object rectangular frame; the output end of the third comparator is connected with the output control end of the first multi-path comparator, so that the first multi-path comparator can control the output content according to the judgment result of the third comparator; the output end of the first comparator is connected with the switch control end of the first multiplexer, so that the working state of the first multiplexer is controlled based on the comparison result of the first comparator. Similarly, the second coordinate comparison subunit comprises a second multiplexer and a fourth comparator, the input end of the second multiplexer is respectively connected with the address calculation subunit and the update object queue module of the recovery unit, the input end of the fourth comparator is respectively connected with the address calculation subunit and the update object queue module of the recovery unit, the output end of the fourth comparator is connected with the output control end of the second multiplexer, and the output end of the second comparator is connected with the switch control end of the second multiplexer.

The recovery unit and the classification unit can accelerate the execution speed of object tracking through fine-grained block-level parallelism, and simultaneously, the block-level parallelism can greatly reduce the on-chip cache overhead.

The object aggregation module is mainly used for merging the key object rectangular frame in the P frame image data and/or the key object rectangular frame in the B frame image data based on the original position information of the key object rectangular frame in the P frame image data, the corresponding original position information of the key object rectangular frame in the P frame image data and/or the corresponding B frame image data, and obtaining the composite frame, the placement position information of the key object rectangular frame in the P frame image data in the composite frame and/or the placement position information of the key object rectangular frame in the B frame image data in the composite frame. The object aggregation module is used for finding a proper placement position for each key object rectangular frame, so that the key object rectangular frames can be placed into the composite frame as compactly as possible, and corresponding original image pixels are taken out from the main memory module according to the coordinates of the key object rectangular frames in the composite frame and filled into the composite frame to construct a brand new composite frame.

Fig. 3 shows a schematic structural diagram of an object aggregation module in a real-time video recognition accelerator based on key object splicing according to an embodiment of the present invention, and referring to fig. 3, the object aggregation module includes a segmentation unit, an idle region selection unit, and a synthesized frame generation unit, which are connected in sequence.

Firstly, a key object rectangular frame in P frame image data to be placed or a key object rectangular frame in B frame image data is assumed to be a key object rectangular frame to be placed, and an idle area in which the key object rectangular frame to be placed is assumed to be an idle area to be placed. The idle area selection unit is mainly used for selecting an idle area to be placed in the idle area list based on the original position information of the rectangular frame of the key object to be placed and sending an identifier of the idle area to be placed to the segmentation unit; and the system is also used for acquiring the placement position information of the rectangular frame of the key object to be placed in the free area to be placed, sending the placement position information to the updated object queue module, further sending a synthesis completion instruction to the synthesis frame generation unit, and updating the free area list based on the received identifier of the free area to be placed and the two new free area information.

Setting the height and the width as first parameters and the area as second parameters, wherein the first parameters and the second parameters are combined into comparison parameters. Furthermore, the idle area selection unit specifically comprises a comparison operation subunit, and a parameter operation subunit and an idle area list which are connected with the comparison operation subunit; the idle area list is connected with the composite frame generating unit and used for storing idle area information of all idle areas in the idle frame generated by the composite frame generating unit, and meanwhile, the idle area list can update the idle area list based on the received identifier of the idle area to be placed and two new idle area information. The parameter operation subunit can read the original position information of the current rectangular frame of the key object to be placed from the updated object queue module, calculate the comparison parameter of the rectangular frame of the key object to be placed based on the original position information of the rectangular frame of the key object to be placed, and send the comparison parameter of the rectangular frame of the key object to be placed to the comparison operation subunit.

The comparison operation subunit is used for acquiring first parameters of all idle areas from the idle area list, comparing the first parameters of all idle areas with first parameters of a rectangular frame of the key object to be placed respectively, screening all idle areas with first parameters larger than the first parameters of the rectangular frame of the key object to be placed, forming an available idle area set, screening an idle area with the smallest difference value between a second parameter and the second parameter of the rectangular frame of the key object to be placed from the available idle area set, taking the idle area as the idle area to be placed (the step can be realized by a reduction tree algorithm), and sending a frame number of the idle area to be placed to the segmentation unit so as to facilitate the acquisition of the idle area to be placed by the segmentation unit; and the comparison operation subunit is used for acquiring the placement position information of the rectangular frame of the key object to be placed in the idle area to be placed, sending the placement position information to the updated object queue module, and sending a synthesis completion instruction to the synthesis frame generation unit.

Referring to fig. 3, the comparison operation subunit may specifically include a reduction tree filter and a plurality of comparison circuits respectively connected to the reduction tree filter. The circuit configurations of the multi-way comparator circuits are the same. The further comparison circuit can comprise a height comparator, a width comparator, an AND gate device, a subtracter and a control switch; the output ends of the height comparator and the width comparator are respectively connected with two input ends of an AND gate device, the output end of the AND gate device is connected with the output control end of a control switch, the output end of a subtracter is connected with the input end of the control switch, and the output end of the control switch is connected with a reduction tree screener. The height comparator can realize the comparison between the height of the rectangular frame of the key object to be placed and the height information in the information of a certain idle area in the idle area list, and the width comparator can realize the comparison between the width of the rectangular frame of the key object to be placed and the width information in the information of a certain idle area in the idle area list; the AND gate device can screen out idle areas with the height and width meeting the conditions of being larger than the height and width of a rectangular frame of the key object to be placed, and the set of the idle areas is an available idle area set; the subtracter can realize the difference operation between the area of the rectangular frame of the key object to be placed and the area of the idle area; and the control switch is used for outputting the area difference between the idle area with the height and the width meeting the condition that the height and the width of the rectangular frame of the key object to be placed are larger than those of the rectangular frame of the key object to be placed.

The reduction tree screener can realize that an idle area with the minimum area difference with a rectangular frame of the key object to be placed is screened out from the available idle areas in a centralized manner to serve as the idle area to be placed, and the frame number of the idle area to be placed is sent to the segmentation unit; meanwhile, the reduction tree filter is also used for calculating the placement position information of the rectangular frame of the key object to be placed in the idle area to be placed and sending the placement position information to the updated object queue module; and the reduction tree filter is also set to receive no area difference value within the preset time, and then a synthesis completion instruction is sent to the synthesis frame generation unit to realize that the current synthesis frame completes the synthesis operation.

The segmentation unit is used for acquiring the coordinate information of the to-be-placed free area from the free area list based on the identifier of the to-be-placed free area, segmenting the to-be-placed free area based on the coordinate information of the to-be-placed free area and the original position information of the to-be-placed key object rectangular frame to acquire two new free areas, and sending the identifier of the to-be-placed free area and the two new free area information to the free area list. The specific way of dividing the idle area to be placed is as follows: acquiring the height difference and the width difference of the idle area to be placed and the rectangular frame of the key object to be placed based on the coordinate information of the idle area to be placed and the original position information of the rectangular frame of the key object to be placed; when the height difference is larger than the width difference, dividing the to-be-placed idle area in which the to-be-placed key object rectangular frame is placed along a straight line where the outer edge of the bottom edge of the to-be-placed key object rectangular frame is located; and when the height difference is smaller than the width difference, dividing the idle area to be placed, in which the rectangular frame to be placed with the key object is placed, along the straight line where the outer edge of the right side of the rectangular frame to be placed with the key object is located.

The composite frame generating unit is used for generating an idle frame to be used as a composite basis of the composite frame before the key object rectangular frame is placed for the first time or after the composite frame is synthesized and sent to the updated object queue module. And meanwhile, the composite frame generating unit is used for acquiring corresponding key segmentation small blocks from the main memory module based on the original position information of the rectangular frame of the key object to be placed, placing the key segmentation small blocks in the idle frame based on the placement position information of the rectangular frame of the key object to be placed in the idle area to be placed to form a composite frame, and further sending the current composite frame to the main memory module after receiving the synthesis completion instruction.

The object updating queue module is used for storing the original position information of the key object rectangular frame in the P frame image data generated by the object tracking module and the original position information of the key object rectangular frame in the B frame image data, storing the placement position information of the key object rectangular frame in the composite frame in the P frame image data generated by the object aggregation module and the placement position information of the key object rectangular frame in the composite frame in the B frame image data, and simultaneously being used for reading the required data by the modules.

The object splitting module is mainly used for splitting the recognition result of the synthetic frame based on the placement position information of the key object rectangular frame in the P frame image data and/or the placement position information of the key object rectangular frame in the B frame image data in the synthetic frame, and returning the split result to the corresponding P frame image data or B frame image data based on the original position information of the key object rectangular frame in the P frame image data and/or the original position information of the key object rectangular frame in the B frame image data to obtain the recognition result of the P frame image data or the recognition result of the B frame image data.

The real-time video identification accelerator based on key object splicing provided by the embodiment of the invention can realize the task parallelism among object tracking, object aggregation and a preset neural network accelerator, thereby covering the delay generated during the generation of a synthetic frame; the parallelism of recovery operation and classification operation in the object tracking task can be further realized, namely, the classification operation can be immediately carried out after the recovery operation is finished after one object is segmented little and soon, so that the delay generated when the synthetic frame is generated is covered; further, object-level parallelism in an object aggregation algorithm can be realized, namely, aggregation operation is started immediately after a key object rectangular box is determined; furthermore, the object splitting algorithm is irrelevant to the object tracking, the object aggregation and the preset neural network accelerator, so that the object splitting algorithm and the preorder process can be parallel.

In order to describe a collaboration mode among modules in the real-time video recognition accelerator based on key object splicing in the embodiment of the present invention, a section of a working process of the accelerator in the embodiment of the present invention is taken as an example to be described below. The objects involved are all example objects, for example, the image data involved is set as example frame image data, the segmentation small blocks involved are example segmentation small blocks, and since the process of processing the class I frame image data by the preset neural network accelerator to obtain the class I frame image recognition result is not the focus of the present description, the process is not specifically described.

First, the object tracking module determines the enumerated small divided blocks as the small divided blocks to be processed based on the above-described determination method of the small divided blocks to be processed in the first storage subunit in the recovery unit, acquires the enumerated small divided block information, reads the original position information of the enumerated small divided blocks stored in the main memory module and the motion vector coordinate information or intra-frame prediction mode coordinate information of the enumerated small divided blocks from the enumerated small divided block information, reads the motion vector or intra-frame prediction mode of the enumerated small divided blocks from the corresponding position of the main memory module based on the motion vector coordinate information or intra-frame prediction mode coordinate information of the enumerated small divided blocks, and sends the motion vector or intra-frame prediction mode of the enumerated small divided blocks and the original position information of the enumerated small divided blocks to the address calculation subunit of the object tracking module.

After receiving the motion vector or the intra-frame prediction mode of the enumerated divided small blocks, an address calculation subunit in the recovery unit calculates the address information of the identification result of the reference divided small blocks of the enumerated divided small blocks based on the motion vector or the intra-frame prediction mode of the enumerated divided small blocks, then sends the address information of the identification result of the reference divided small blocks of the enumerated divided small blocks to the main memory module, and sends the original position information of the enumerated divided small blocks to the classification unit of the object tracking module. And after receiving the address information of the reference segmentation small block identification result of the enumeration segmentation small block, the main memory module returns the corresponding enumeration reference segmentation small block identification result to the classification unit of the object tracking module.

A first comparator and a second comparator in the classification unit receive the identification result of the example reference segmentation small block and respectively judge whether the identification result of the example reference segmentation small block contains white pixels, if yes, the first comparator sends a starting signal to a first multiplexer, and the second comparator sends a starting signal to a second multiplexer; the third comparator and the fourth comparator respectively receive the original positions of the example division small blocks, and read the original position information of the current example key object rectangular frame in the example frame image data to which the example division small blocks belong from the update object queue module. And then the third comparator judges whether the upper left-hand coordinate of the original position of the listing small block is smaller than the upper left-hand coordinate in the original position information of the listing key object rectangular frame, if so, the third comparator sends information to the first multi-path selector, so that the first multi-path selector outputs the upper left-hand coordinate of the original position of the listing small block as a new upper left-hand coordinate in the original position information of the listing key object rectangular frame, otherwise, the third comparator sends information to the first multi-path selector, and the first multi-path selector outputs the original upper left-hand coordinate in the original position information of the listing key object rectangular frame as the upper left-hand coordinate in the original position information of the listing key object rectangular frame. Similarly, the fourth comparator may control the output of the second multiplexer to determine the coordinates of the top right corner of the original position information of the list key object rectangle frame.

Repeating the steps to obtain the final original position information of the key object rectangular frame of the example frame image data, storing the final original position information in the update object queue module, and setting the key object rectangular frame as the example key object rectangular frame.

After the example key object rectangular frame is obtained, the parameter operation subunit in the object aggregation module idle area selection unit obtains the original position information of the example key object rectangular frame from the update object queue module, calculates the height, width and area of the example key object rectangular frame based on the original position information of the example key object rectangular frame, and sends the height, width and area of the example key object rectangular frame to the comparison operation subunit. A comparison circuit in the comparison operation subunit compares the height of the rectangular frame of the example key object with the height of one idle area in the idle area list through a height comparator, compares the width of the rectangular frame of the example key object with the width of the same idle area in the idle area list through a width comparator, and judges whether the height and the width of the rectangular frame of the example key object are both smaller than the height and the width of the idle area through an AND gate device; the comparing operation subunit screens out a plurality of idle areas meeting the requirements by using the multi-path comparator through the above mode. And then the comparison operation subunit obtains and calculates the area difference between the rectangular frame of the example key object and the plurality of screened idle areas through the subtracter, screens out the idle area to be placed of the example rectangular frame of the example key object through the reduction tree screener, and sends the frame number of the idle area to be placed to the segmentation unit of the object aggregation module. And meanwhile, obtaining the information of the placement position of the rectangular frame of the examinating key object in the to-be-placed idle area through the reduction tree filter, and sending the information to the updated object queue module.

Then the segmentation unit acquires the coordinate information of the idle area to be placed from the idle area list based on the identifier of the idle area to be placed, segments the idle area to be placed in the above mode based on the coordinate information of the idle area to be placed and the original position information of the rectangular frame of the key object to be placed, acquires two new idle areas, and sends the identifier of the idle area to be placed and the two new idle area information to the idle area list; the free area list updates itself based on an identifier exemplifying the free area to be placed and two new free area information sent to the free area list.

The object aggregation module generates an idle frame, reads original position information of the example key object rectangular frame and placement position information of the example key object rectangular frame from the update object queue module, reads corresponding key segmentation small blocks from the main memory module based on the original position information of the example key object rectangular frame, and places the corresponding key segmentation small blocks in the idle frame based on the placement position information of the example key object rectangular frame to form a composite frame.

The synthesis of the synthesized frame can be realized by repeating the steps, the synthesized frame is set as an exemplary synthesized frame, and the exemplary synthesized frame is sent to the main memory module.

Then the preset neural network accelerator reads the exampled synthetic frame from the main memory module, processes the exampled synthetic frame to obtain the identification result of the exampled synthetic frame, and stores the identification result of the exampled synthetic frame in the main memory module.

The object splitting module reads the recognition result of the synthesized frame from the main memory module, splits the synthesized frame based on the placement position information of all the key object rectangular frames included in the recognition result of the synthesized frame, and returns the split result to the corresponding frame image data based on the original position information of all the key object rectangular frames included in the recognition result of the synthesized frame.

The invention relates to a real-time video identification accelerator based on key object splicing, which is characterized in that key objects in a plurality of continuous video frames are aggregated, and a synthesized frame is used as the input of a preset neural network accelerator, so that the reduction of the data amount input to the preset neural network accelerator is realized, namely, the redundant calculation corresponding to the video frames is reduced by extruding non-key information input to the preset neural network accelerator, the calculation workload in a target video identification task is greatly saved, and the processing speed and the identification accuracy of the identification task are improved.

Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A real-time video recognition accelerator based on key object stitching, comprising: the system comprises an object tracking module, an object aggregation module, an object splitting module, a preset neural network accelerator, an update object queue module and a main memory module, wherein the update object queue module and the main memory module are respectively connected with the object tracking module, the object aggregation module and the object splitting module;

2. The accelerator according to claim 1, wherein the object tracking module comprises a recovery unit and a classification unit connected;

3. The accelerator of claim 2, wherein the recovery unit comprises a first storage subunit and an address calculation subunit connected;

4. The accelerator of claim 3, wherein determining a split bin to be processed based on the P-type frame image data information and B-type frame image data information comprises:

5. The accelerator according to claim 2, wherein the classification unit comprises a key segmentation patch judgment subunit and a first coordinate comparison subunit and a second coordinate comparison subunit connected to the key segmentation patch judgment subunit, respectively:

6. The accelerator according to claim 5,

the key segmentation small block judgment subunit comprises a first comparator and a second comparator, and the input end of the first comparator and the input end of the second comparator are respectively connected with the main memory module;

7. The accelerator according to claim 1, wherein the object aggregation module comprises a segmentation unit, an idle region selection unit and a composite frame generation unit connected in sequence;

the method comprises the steps that a key object rectangular frame in P frame image data to be placed or a key object rectangular frame in B frame image data is assumed to be a key object rectangular frame to be placed, and an idle area in which the key object rectangular frame to be placed is assumed to be an idle area to be placed;

the dividing unit is used for acquiring the coordinate information of the idle area to be placed from the idle area list based on the identifier of the idle area to be placed, dividing the idle area to be placed based on the coordinate information of the idle area to be placed and the original position information of the rectangular frame of the key object to be placed so as to acquire two new idle areas, and sending the identifier of the idle area to be placed and the two new idle area information to the idle area list;

8. The accelerator according to claim 7, wherein the free area selection unit comprises a comparison operation subunit, and a parameter operation subunit and a free area list connected to the comparison operation subunit;

9. The accelerator according to claim 8, wherein the comparison operation subunit comprises a reduction tree filter and a plurality of comparison circuits respectively connected to the reduction tree filter;

the comparison circuit comprises a height comparator, a width comparator, an AND gate device, a subtracter and a control switch; the output ends of the height comparator and the width comparator are connected with two input ends of the AND gate device, the output end of the AND gate device is connected with the output control end of the control switch, the output end of the subtracter is connected with the input end of the control switch, and the output end of the control switch is connected with the reduction tree screener;

10. The accelerator according to claim 7, wherein segmenting the to-be-placed free region based on the original position information of the to-be-placed key object rectangular box comprises: