CN112929665A - Target tracking method, device, equipment and medium combining super-resolution and video coding - Google Patents

Target tracking method, device, equipment and medium combining super-resolution and video coding Download PDF

Info

Publication number
CN112929665A
CN112929665A CN202110121731.1A CN202110121731A CN112929665A CN 112929665 A CN112929665 A CN 112929665A CN 202110121731 A CN202110121731 A CN 202110121731A CN 112929665 A CN112929665 A CN 112929665A
Authority
CN
China
Prior art keywords
video
resolution
target tracking
super
resolution video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110121731.1A
Other languages
Chinese (zh)
Inventor
向国庆
文映博
严韫瑶
张鹏
贾惠柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Boya Huishi Intelligent Technology Research Institute Co ltd
Original Assignee
Beijing Boya Huishi Intelligent Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Boya Huishi Intelligent Technology Research Institute Co ltd filed Critical Beijing Boya Huishi Intelligent Technology Research Institute Co ltd
Priority to CN202110121731.1A priority Critical patent/CN112929665A/en
Publication of CN112929665A publication Critical patent/CN112929665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Abstract

The disclosure relates to the technical field of hardware video encoder design, and particularly provides a target tracking method, device, equipment and medium combining super-resolution and video encoding, wherein the method comprises the following steps: acquiring a low-resolution video to be input, and compressing, coding and reconstructing the low-resolution video to obtain a reconstructed intermediate video; inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video; and carrying out target tracking operation on the high-resolution video to obtain an enhanced target tracking video. According to the method, the well-designed combined super-resolution video coding module is used for improving the resolution of the low-resolution video, each frame after enhancement is subjected to target tracking, and the tracking video is finally obtained. Compared with the method for directly tracking the target on the low-resolution video, the method can averagely improve the tracking accuracy by 80 percent.

Description

Target tracking method, device, equipment and medium combining super-resolution and video coding
Technical Field
The present disclosure relates to the field of hardware video encoder design technologies, and more particularly, to a target tracking method, apparatus, device, and medium combining super-resolution and video encoding.
Background
In a video compression coding scene, it is often difficult to transmit high-quality images or videos due to the limitation of bandwidth, and such images transmitted in a low-bandwidth environment mostly have the disadvantages of blocking effect, image blurring, and severe transmission noise, which not only affects the subjective experience of people, but also has a very severe impact on information extraction. In a typical case, for example, in a live broadcast process, if network fluctuation is large, a recorded video is forced to select a low-quality mode for compression during transmission, and finally, only the problems of detail blurring, serious blocking effect and much noise are transmitted to a viewer, so that a good viewing experience is difficult to obtain. This problem not only affects the subjective experience of people, but also presents a considerable obstacle to the high-level image processing tasks of target detection and tracking. Therefore, super-resolution processing of video is a very challenging but significant topic. The current mainstream scheme is to improve coding efficiency or bandwidth, but although the schemes obtain high-quality video images, the schemes have no essential solution effect on the problems, and the finally obtained video image effect still cannot meet the visual effect of human eyes at an extremely low network speed, so that the task of completing target detection and tracking is difficult to undertake.
Disclosure of Invention
The technical problem that in the prior art, transmission is still difficult after high-definition video compression is solved.
To achieve the above technical object, the present disclosure provides a target tracking method combining super-resolution and video coding, including:
acquiring a low-resolution video to be input, and compressing, coding and reconstructing the low-resolution video to obtain a reconstructed intermediate video;
inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video;
and carrying out target tracking operation on the high-resolution video to obtain an enhanced target tracking video.
Further, the step of obtaining a reconstructed intermediate video after the low-resolution video is compressed, encoded and reconstructed specifically includes:
and performing avs3 compression coding and reconstruction on the low-resolution video to obtain a reconstructed intermediate video, wherein the size of the intermediate video is 360x 180.
Further, the step of inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video specifically includes:
carrying out image feature extraction on the intermediate video frame by frame, and carrying out convolution operation to obtain a feature map;
performing collapse operation treatment on the characteristic diagram to obtain a collapsed convolutional layer;
mapping the collapsed convolution layer to obtain mapped image data;
and carrying out deconvolution operation on the mapped image data to obtain a trained high-resolution video image.
Further, after performing deconvolution operation on the mapped image data to obtain a trained high-resolution video image, the method further includes:
judging whether the loss of the trained high-resolution video image exceeds a preset loss threshold value or not, if so, calculating the image loss, performing reverse propagation, and performing the image feature extraction again; and if not, ending the process of the super-resolution network deep learning training.
Further, the obtaining of the enhanced target tracking video by performing the target tracking operation on the high-resolution video specifically includes:
performing Fast R-CNN target detection on image data of the high-resolution video frame by frame;
tracking according to a result calculated by a Fast R-CNN target detection algorithm, and acquiring a tracking result by using a multi-target tracking algorithm;
and performing smooth interpolation on the target tracking result processed by the multi-target tracking algorithm, and generating a target track video.
Further, the performing Fast R-CNN target detection on the image data of the high-resolution video frame by frame specifically includes:
extracting a candidate region, extracting the candidate region from the input image by using a selective search algorithm, and mapping the candidate region to a final convolution characteristic layer according to a spatial position relation;
carrying out region normalization, and carrying out ROI pooling operation on each candidate region on the convolution feature layer to obtain features with fixed dimensions;
and inputting the extracted features into a full connection layer, classifying by utilizing Softmax, and regressing the positions of the candidate regions to obtain a target detection result.
Further, the multi-target tracking algorithm is a Deep Sort multi-target tracking algorithm.
To achieve the above technical object, the present disclosure can also provide a target tracking apparatus combining super-resolution and video coding, including:
the video acquisition module is used for acquiring a low-resolution video to be input and obtaining a reconstructed intermediate video after the low-resolution video is compressed, coded and reconstructed;
the super-resolution learning module is used for inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video;
and the target tracking module is used for carrying out target tracking operation on the high-resolution video to obtain an enhanced target tracking video.
To achieve the above technical objects, the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the above-mentioned target tracking method of joint super-resolution and video coding when being executed by a processor.
To achieve the above technical object, the present disclosure also provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the target tracking method combining super-resolution and video coding as described above when executing the computer program.
The beneficial effect of this disclosure does:
the utility model discloses a target tracking device based on joint super-resolution and video coding uses well-designed joint super-resolution video coding module to carry out resolution enhancement to the low resolution video to carry out target tracking to each frame after will strengthening, finally obtains the tracking video, and this design has effectually improved the tracking accuracy. Compared with the method for directly tracking the target on the low-resolution video, the method can averagely improve the tracking accuracy by 80 percent.
Drawings
Fig. 1 shows a schematic flow diagram of embodiment 1 of the present disclosure;
fig. 2 shows a super-resolution network deep learning flow diagram of embodiment 1 of the present disclosure;
fig. 3 shows a flow diagram of a target tracking process of embodiment 1 of the present disclosure;
fig. 4 shows a schematic structural diagram of embodiment 2 of the present disclosure;
fig. 5 shows a schematic structural diagram of embodiment 4 of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
Various structural schematics according to embodiments of the present disclosure are shown in the figures. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.
The first embodiment is as follows:
as shown in fig. 1:
the present disclosure provides a target tracking method combining super-resolution and video coding, comprising:
s101: acquiring a low-resolution video to be input, and compressing, coding and reconstructing the low-resolution video to obtain a reconstructed intermediate video;
specifically, the step of obtaining a reconstructed intermediate video after the low-resolution video is subjected to compression coding and reconstruction specifically includes:
and performing avs3 compression coding and reconstruction on the low-resolution video to obtain a reconstructed intermediate video, wherein the size of the intermediate video is 360x 180.
S102: inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video;
further, the step of inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video specifically includes:
carrying out image feature extraction on the intermediate video frame by frame, and carrying out convolution operation to obtain a feature map;
performing collapse operation treatment on the characteristic diagram to obtain a collapsed convolutional layer;
mapping the collapsed convolution layer to obtain mapped image data;
and carrying out deconvolution operation on the mapped image data to obtain a trained high-resolution video image.
Further, after performing deconvolution operation on the mapped image data to obtain a trained high-resolution video image, the method further includes:
judging whether the loss of the trained high-resolution video image exceeds a preset loss threshold value or not, if so, calculating the image loss, performing reverse propagation, and performing the image feature extraction again; and if not, ending the process of the super-resolution network deep learning training.
As shown in fig. 2: the process of deep learning training of the super-resolution network in the first embodiment of the disclosure is shown:
the video is input frame by frame, each frame is compressed to the size of 360x180, and by way of example, the video can be compressed and transmitted by using an AVS3 encoding standard, and the AVS3 is a new generation video compression standard, and the compression efficiency is extremely high. It is noted that the standard is not limited to the AVS 3.
And carrying out video reconstruction on the compressed code stream, and inputting the reconstructed video into a super-resolution network frame by frame.
The image after entering the network is firstly subjected to feature extraction, convolution operation with the step length of 1 is carried out by a convolution kernel with the size of 5x5, the number of output channels is 56, and the activation function is ReLU.
A collapsing operation (collapsing) was performed on the resulting signature, here using a 3x3 receptive field, 12 output channels, and a ReLU as the activation function.
And then carrying out Mapping (Mapping) operation on the collapsed convolution layer, wherein the size of a convolution kernel is 3x3, the step size is 1, the number of output channels is 12, the Mapping operation is iterated and circulated for four times, the number of the last output channels is 56, and the activation function is ReLU.
And finally, carrying out deconvolution operation on the mapped image to obtain a super-resolution video image, wherein a convolution kernel with the size of 9x9 is used, the output channel is 1, and the activation function is ReLU.
Judging whether the loss of the trained high-resolution video image exceeds a preset loss threshold value or not, wherein the number of the adopted loss functions is two, and the formula (1) is a cross entropy loss function, and the formula (2) is a cross entropy loss function
Figure BDA0002922271070000051
Figure BDA0002922271070000052
S103: and carrying out target tracking operation on the high-resolution video to obtain an enhanced target tracking video.
Further, the obtaining of the enhanced target tracking video by performing the target tracking operation on the high-resolution video specifically includes:
performing Fast R-CNN target detection on image data of the high-resolution video frame by frame;
tracking according to a result calculated by a Fast R-CNN target detection algorithm, and acquiring a tracking result by using a multi-target tracking algorithm;
and performing smooth interpolation on the target tracking result processed by the multi-target tracking algorithm, and generating a target track video.
Further, the performing Fast R-CNN target detection on the image data of the high-resolution video frame by frame specifically includes:
extracting a candidate region, extracting the candidate region from the input image by using a selective search algorithm, and mapping the candidate region to a final convolution characteristic layer according to a spatial position relation;
carrying out region normalization, and carrying out ROI pooling operation on each candidate region on the convolution feature layer to obtain features with fixed dimensions;
and inputting the extracted features into a full connection layer, classifying by utilizing Softmax, and regressing the positions of the candidate regions to obtain a target detection result.
Further, the multi-target tracking algorithm is a Deep Sort multi-target tracking algorithm.
As shown in fig. 3, a schematic process diagram of target tracking in the first embodiment of the disclosure is shown:
performing Fast R-CNN target detection on each frame of input image, firstly extracting candidate regions, extracting the candidate regions from the input image by using a Selective Search algorithm, and mapping the candidate regions to a final convolution characteristic layer according to a spatial position relationship; then carrying out regional normalization, and carrying out ROI Pooling operation on each candidate region on the convolution feature layer to obtain features with fixed dimensions; and finally, inputting the extracted features into a full connection layer, classifying by using Softmax, and regressing the positions of the candidate regions to obtain a target detection result.
And tracking according to the result detected by the Fast R-CNN algorithm, and obtaining a tracking result by using Deep Sort. Deep Sort is a multi-target tracking algorithm, basically thought of as tracking-by-detection, and performs data association by using a motion model and appearance information, the running speed is mainly determined by a detection algorithm, the algorithm performs target detection on each frame, and then matches a previous motion trajectory with a current detection object by a weighted Hungary matching algorithm to form a motion trajectory of an object. The weight is obtained by weighting and summing the Mahalanobis distance between the point and the motion trail and the similarity of the image blocks.
And performing smooth interpolation on the target tracking result processed by Deep Sort, and generating a target track video.
Example two:
as shown in fig. 4:
the present disclosure can also provide a target tracking apparatus combining super-resolution and video coding, including:
the video acquiring module 201 is configured to acquire a low-resolution video to be input, and compress, encode and reconstruct the low-resolution video to obtain a reconstructed intermediate video;
a super-resolution learning module 202, configured to input the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video;
and the target tracking module 203 is configured to perform a target tracking operation on the high-resolution video to obtain an enhanced target tracking video.
The video acquisition module 201 of the present disclosure is sequentially connected to the super-resolution learning module 202 and the target tracking module 203.
Example three:
the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the above-described target tracking method of joint super-resolution and video coding when executed by a processor.
The computer storage medium of the present disclosure may be implemented with a semiconductor memory, a magnetic core memory, a magnetic drum memory, or a magnetic disk memory.
Semiconductor memories are mainly used as semiconductor memory elements of computers, and there are two types, Mos and bipolar memory elements. Mos devices have high integration, simple process, but slow speed. The bipolar element has the advantages of complex process, high power consumption, low integration level and high speed. NMos and CMos were introduced to make Mos memory dominate in semiconductor memory. NMos is fast, e.g. 45ns for 1K bit sram from intel. The CMos power consumption is low, and the access time of the 4K-bit CMos static memory is 300 ns. The semiconductor memories described above are all Random Access Memories (RAMs), i.e. read and write new contents randomly during operation. And a semiconductor Read Only Memory (ROM), which can be read out randomly but cannot be written in during operation, is used to store solidified programs and data. The ROM is classified into a non-rewritable fuse type ROM, PROM, and a rewritable EPROM.
The magnetic core memory has the characteristics of low cost and high reliability, and has more than 20 years of practical use experience. Magnetic core memories were widely used as main memories before the mid 70's. The storage capacity can reach more than 10 bits, and the access time is 300ns at the fastest speed. The typical international magnetic core memory has a capacity of 4 MS-8 MB and an access cycle of 1.0-1.5 mus. After semiconductor memory is rapidly developed to replace magnetic core memory as a main memory location, magnetic core memory can still be applied as a large-capacity expansion memory.
Drum memory, an external memory for magnetic recording. Because of its fast information access speed and stable and reliable operation, it is being replaced by disk memory, but it is still used as external memory for real-time process control computers and medium and large computers. In order to meet the needs of small and micro computers, subminiature magnetic drums have emerged, which are small, lightweight, highly reliable, and convenient to use.
Magnetic disk memory, an external memory for magnetic recording. It combines the advantages of drum and tape storage, i.e. its storage capacity is larger than that of drum, its access speed is faster than that of tape storage, and it can be stored off-line, so that the magnetic disk is widely used as large-capacity external storage in various computer systems. Magnetic disks are generally classified into two main categories, hard disks and floppy disk memories.
Hard disk memories are of a wide variety. The structure is divided into a replaceable type and a fixed type. The replaceable disk is replaceable and the fixed disk is fixed. The replaceable and fixed magnetic disks have both multi-disk combinations and single-chip structures, and are divided into fixed head types and movable head types. The fixed head type magnetic disk has a small capacity, a low recording density, a high access speed, and a high cost. The movable head type magnetic disk has a high recording density (up to 1000 to 6250 bits/inch) and thus a large capacity, but has a low access speed compared with a fixed head magnetic disk. The storage capacity of a magnetic disk product can reach several hundred megabytes with a bit density of 6250 bits per inch and a track density of 475 tracks per inch. The disk set of the multiple replaceable disk memory can be replaced, so that the disk set has large off-body capacity, large capacity and high speed, can store large-capacity information data, and is widely applied to an online information retrieval system and a database management system.
Example four:
the present disclosure also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above-mentioned target tracking method combining super-resolution and video coding when executing the computer program.
Fig. 5 is a schematic diagram of an internal structure of the electronic device in one embodiment. As shown in fig. 5, the electronic device includes a processor, a storage medium, a memory, and a network interface connected through a system bus. The storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a target tracking method combining super-resolution and video coding when being executed by the processor. The processor of the electrical device is used to provide computing and control capabilities to support the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a target tracking method that combines super resolution and video encoding. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The electronic device includes, but is not limited to, a smart phone, a computer, a tablet, a wearable smart device, an artificial smart device, a mobile power source, and the like.
The processor may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor is a Control Unit of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing remote data reading and writing programs, etc.) stored in the memory and calling data stored in the memory.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connected communication between the memory and at least one processor or the like.
Fig. 5 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 5 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the electronic device may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. A target tracking method combining super-resolution and video coding is characterized by comprising the following steps:
acquiring a low-resolution video to be input, and compressing, coding and reconstructing the low-resolution video to obtain a reconstructed intermediate video;
inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video;
and carrying out target tracking operation on the high-resolution video to obtain an enhanced target tracking video.
2. The method according to claim 1, wherein the step of obtaining the reconstructed intermediate video after the low resolution video is compressed, encoded and reconstructed is specifically as follows:
and performing avs3 compression coding and reconstruction on the low-resolution video to obtain a reconstructed intermediate video, wherein the size of the intermediate video is 360x 180.
3. The method according to claim 1, wherein the step of inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video specifically comprises:
carrying out image feature extraction on the intermediate video frame by frame, and carrying out convolution operation to obtain a feature map;
performing collapse operation treatment on the characteristic diagram to obtain a collapsed convolutional layer;
mapping the collapsed convolution layer to obtain mapped image data;
and carrying out deconvolution operation on the mapped image data to obtain a trained high-resolution video image.
4. The method of claim 3, wherein after deconvolving the mapped image data to obtain a trained high resolution video image, further comprising:
judging whether the loss of the trained high-resolution video image exceeds a preset loss threshold value or not, if so, calculating the image loss, performing reverse propagation, and performing the image feature extraction again; and if not, ending the process of the super-resolution network deep learning training.
5. The method according to claim 1, wherein the performing the target tracking operation on the high-resolution video to obtain the enhanced target tracking video specifically comprises:
performing Fast R-CNN target detection on image data of the high-resolution video frame by frame;
tracking according to a result calculated by a Fast R-CNN target detection algorithm, and acquiring a tracking result by using a multi-target tracking algorithm;
and performing smooth interpolation on the target tracking result processed by the multi-target tracking algorithm, and generating a target track video.
6. The method according to claim 5, wherein the performing Fast R-CNN target detection on image data of the high-resolution video frame by frame specifically comprises:
extracting a candidate region, extracting the candidate region from the input image by using a selective search algorithm, and mapping the candidate region to a final convolution characteristic layer according to a spatial position relation;
carrying out region normalization, and carrying out ROI pooling operation on each candidate region on the convolution feature layer to obtain features with fixed dimensions;
and inputting the extracted features into a full connection layer, classifying by utilizing Softmax, and regressing the positions of the candidate regions to obtain a target detection result.
7. The method according to any one of claims 5 to 6, wherein the multi-target tracking algorithm is a Deep Sort multi-target tracking algorithm.
8. A target tracking apparatus combining super-resolution and video coding, comprising:
the video acquisition module is used for acquiring a low-resolution video to be input and obtaining a reconstructed intermediate video after the low-resolution video is compressed, coded and reconstructed;
the super-resolution learning module is used for inputting the reconstructed intermediate video into a super-resolution network for deep learning training to obtain a trained high-resolution video;
and the target tracking module is used for carrying out target tracking operation on the high-resolution video to obtain an enhanced target tracking video.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps corresponding to the method for target tracking with joint super resolution and video coding as claimed in any one of claims 1 to 7 when executing the computer program.
10. A computer storage medium having stored thereon computer program instructions, wherein the program instructions, when executed by a processor, are adapted to carry out the steps corresponding to the method for target tracking for joint super resolution and video coding as claimed in any of claims 1 to 7.
CN202110121731.1A 2021-01-28 2021-01-28 Target tracking method, device, equipment and medium combining super-resolution and video coding Pending CN112929665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110121731.1A CN112929665A (en) 2021-01-28 2021-01-28 Target tracking method, device, equipment and medium combining super-resolution and video coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110121731.1A CN112929665A (en) 2021-01-28 2021-01-28 Target tracking method, device, equipment and medium combining super-resolution and video coding

Publications (1)

Publication Number Publication Date
CN112929665A true CN112929665A (en) 2021-06-08

Family

ID=76168231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110121731.1A Pending CN112929665A (en) 2021-01-28 2021-01-28 Target tracking method, device, equipment and medium combining super-resolution and video coding

Country Status (1)

Country Link
CN (1) CN112929665A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301383A (en) * 2017-06-07 2017-10-27 华南理工大学 A kind of pavement marking recognition methods based on Fast R CNN
CN107481188A (en) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 A kind of image super-resolution reconstructing method
CN110188807A (en) * 2019-05-21 2019-08-30 重庆大学 Tunnel pedestrian target detection method based on cascade super-resolution network and improvement Faster R-CNN
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
US20190391235A1 (en) * 2018-06-20 2019-12-26 Metawave Corporation Super-resolution radar for autonomous vehicles
CN111784624A (en) * 2019-04-02 2020-10-16 北京沃东天骏信息技术有限公司 Target detection method, device, equipment and computer readable storage medium
CN112037252A (en) * 2020-08-04 2020-12-04 深圳技术大学 Eagle eye vision-based target tracking method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301383A (en) * 2017-06-07 2017-10-27 华南理工大学 A kind of pavement marking recognition methods based on Fast R CNN
CN107481188A (en) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 A kind of image super-resolution reconstructing method
US20190391235A1 (en) * 2018-06-20 2019-12-26 Metawave Corporation Super-resolution radar for autonomous vehicles
CN111784624A (en) * 2019-04-02 2020-10-16 北京沃东天骏信息技术有限公司 Target detection method, device, equipment and computer readable storage medium
CN110188807A (en) * 2019-05-21 2019-08-30 重庆大学 Tunnel pedestrian target detection method based on cascade super-resolution network and improvement Faster R-CNN
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
CN112037252A (en) * 2020-08-04 2020-12-04 深圳技术大学 Eagle eye vision-based target tracking method and system

Similar Documents

Publication Publication Date Title
Hou et al. Divide-and-assemble: Learning block-wise memory for unsupervised anomaly detection
Zhang et al. Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training
Liu et al. Query2label: A simple transformer way to multi-label classification
Hu et al. Recurrently aggregating deep features for salient object detection
Ren et al. Multi-modality learning for human action recognition
Liu et al. A benchmark dataset and comparison study for multi-modal human action analytics
CN111340708B (en) Method for rapidly generating high-resolution complete face image according to prior information
Shen et al. An efficient multiresolution network for vehicle reidentification
CN112950471A (en) Video super-resolution processing method and device, super-resolution reconstruction model and medium
Xi et al. Salient object detection based on an efficient end-to-end saliency regression network
Singh et al. Image corpus representative summarization
Dai et al. Video scene segmentation using tensor-train faster-RCNN for multimedia IoT systems
Tian et al. Action recognition using local consistent group sparse coding with spatio-temporal structure
Singh et al. Depth based enlarged temporal dimension of 3D deep convolutional network for activity recognition
Le et al. Cross-resolution feature fusion for fast hand detection in intelligent homecare systems
Li et al. Representation learning for compressed video action recognition via attentive cross-modal interaction with motion enhancement
Hoang et al. 3D skeleton-based action recognition with convolutional neural networks
Li et al. Video summarization with a graph convolutional attention network
Chiang et al. A multi-embedding neural model for incident video retrieval
Liu et al. Self-supervised motion perception for spatiotemporal representation learning
Liu et al. Student behavior recognition from heterogeneous view perception in class based on 3-D multiscale residual dense network for the analysis of case teaching
Li et al. HoloSeg: An efficient holographic segmentation network for real-time scene parsing
CN112929665A (en) Target tracking method, device, equipment and medium combining super-resolution and video coding
CN114882444B (en) Image fusion processing method, device and medium
Pan et al. SFGN: Representing the sequence with one super frame for video person re-identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210608

RJ01 Rejection of invention patent application after publication