WO2022171036A1 - Video target tracking method, video target tracking apparatus, storage medium, and electronic device - Google Patents

Video target tracking method, video target tracking apparatus, storage medium, and electronic device Download PDF

Info

Publication number
WO2022171036A1
WO2022171036A1 PCT/CN2022/075086 CN2022075086W WO2022171036A1 WO 2022171036 A1 WO2022171036 A1 WO 2022171036A1 CN 2022075086 W CN2022075086 W CN 2022075086W WO 2022171036 A1 WO2022171036 A1 WO 2022171036A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
tracked
feature
video
Prior art date
Application number
PCT/CN2022/075086
Other languages
French (fr)
Chinese (zh)
Inventor
江毅
孙培泽
袁泽寰
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2022171036A1 publication Critical patent/WO2022171036A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Definitions

  • the present application is based on the Chinese application with the application number of 202110179157.5 and the filing date of February 9, 2021, and claims its priority.
  • the disclosure of the Chinese application is hereby incorporated into the present application as a whole.
  • the present disclosure relates to the technical field of image processing, and in particular, to a video target tracking method, a video target tracking device, a storage medium, and an electronic device.
  • Video target tracking is the basis of many video application fields such as human behavior analysis and sports video commentary, and requires high real-time performance.
  • the video target tracking in the related art is usually based on the process of first target detection and then target tracking. Specifically, the target detection is performed on the two frames before and after the video, and then the detected targets are matched into pairs, so as to achieve target tracking.
  • the inventor believes that, in the related art, target detection needs to be performed first, and then target tracking is performed, so the delay is relatively high, especially in a scenario where there are many targets to be tracked, the delay problem is particularly obvious.
  • the present disclosure provides a video target tracking method, tracking device, storage medium and electronic equipment, so as to realize end-to-end video target tracking and reduce the time delay of video target tracking.
  • the present disclosure provides a video target tracking method, the method comprising:
  • the target to be tracked in the image is determined.
  • the present disclosure provides a video target tracking device, the device comprising:
  • the acquisition module is used to acquire the video to be tracked
  • a tracking module configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
  • a first determination submodule configured to determine, for each frame of the video to be tracked, a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
  • the second determination submodule is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to the first similarity calculation.
  • a similarity calculation result determining the target feature vector in all the feature vectors of the feature map;
  • the third determination sub-module is configured to determine the target to be tracked in the image according to the target feature vector.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements any of the video target tracking methods provided by the embodiments of the present disclosure.
  • the present disclosure provides an electronic device, comprising:
  • a processing device is configured to execute the computer program in the storage device to implement any video target tracking method provided by the embodiments of the present disclosure.
  • the present disclosure provides a computer program, comprising: instructions, when executed by a processor, the instructions cause the processor to execute any of the video object tracking methods provided by the embodiments of the present disclosure.
  • the present disclosure provides a computer program product comprising instructions, which when executed by a processor, cause the processor to execute any of the video object tracking methods provided by the embodiments of the present disclosure.
  • the target tracking model can perform the first similarity calculation between the feature vector corresponding to each frame of the video to be tracked and the feature vector corresponding to the target to be tracked in the target detection image, so as to determine the first similarity according to the first similarity calculation result.
  • the target to be tracked in each frame of image Therefore, the target to be tracked in each frame of image output by the target tracking model can correspond to the target to be tracked in the target detection image one-to-one, that is, target detection and target association can be completed at the same time, thereby reducing the time in the target tracking process. extension.
  • FIG. 1 is a flowchart of a method for tracking a video target according to an exemplary embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a target tracking process in a video target tracking method according to an exemplary embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a target tracking process in another video target tracking method according to an exemplary embodiment of the present disclosure
  • FIG. 4 is a block diagram of a video target tracking apparatus according to an exemplary embodiment of the present disclosure.
  • FIG. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • Video target tracking in the related art is usually based on a process of target detection and then target tracking.
  • the detection module performs target detection on the two frames before and after the video, and then the detected targets are matched into pairs by the association module, so as to achieve target tracking.
  • the model components of this process are relatively complex, and the delay is relatively high, especially in scenarios where there are many targets to be tracked, the delay problem is particularly obvious.
  • the present disclosure proposes a video target tracking method, a video target tracking device, a storage medium and an electronic device, so as to realize end-to-end video target tracking and reduce the time delay of video target tracking.
  • FIG. 1 is a flowchart of a video target tracking method according to an exemplary embodiment of the present disclosure. 1, the video target tracking method includes:
  • Step 101 acquiring the video to be tracked.
  • acquiring the video to be tracked may be in response to the user's video input operation, acquiring the video input by the user, or automatically acquiring the video captured by the image capturing device from the image capturing device after receiving the target tracking instruction, etc. , which is not limited in this embodiment of the present disclosure.
  • Step 102 Input the video to be tracked into the target tracking model to obtain the target tracking result corresponding to the video to be tracked.
  • the target tracking model is used to perform the following processing: for each frame of the video to be tracked, determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked; The first similarity calculation is performed between each feature vector in the feature map and the feature vector corresponding to the target to be tracked in the target detection image, and the target feature vector is determined in all the feature vectors in the feature map according to the first similarity calculation result; According to the target feature vector, the target to be tracked in the image is determined.
  • the target tracking model can perform the first similarity calculation between the feature vector corresponding to each frame of the video to be tracked and the feature vector corresponding to the target to be tracked in the target detection image, so as to determine the first similarity according to the first similarity calculation result.
  • the video to be tracked may be input into the target tracking model.
  • each frame of images in the video to be tracked has a time sequence, so a video image sequence composed of multiple frames of images arranged in time sequence can be obtained according to the video to be tracked. Therefore, inputting the video to be tracked into the target tracking model may also be inputting the video image sequence corresponding to the video to be tracked into the target tracking model.
  • the training of the target tracking model may be performed according to the sample images and sample target information corresponding to the sample images.
  • the sample image can be input into the target tracking model to obtain the predicted target information output by the target tracking model for the sample image, then the loss function is calculated according to the predicted target information and the sample target information, and finally the loss function is calculated according to the As a result, the parameters of the target tracking model are adjusted to make the target tracking model output more accurate target information.
  • the target detection function and the target tracking function of the target tracking model can be trained synchronously.
  • the model training method in the related art is to gradually train the detection module and the association module. In scenarios with many targets to be tracked, this training method requires It takes a lot of time and it is difficult to achieve the optimal training effect.
  • the method of synchronizing the training of the target detection function and the target tracking function of the target tracking model in the embodiment of the present disclosure not only simplifies the components of the target tracking model, but also simplifies the training process of the target tracking model, which can better satisfy multiple targets. Tracked scene requirements.
  • the target tracking model can determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image for each frame of the video to be tracked, and then compare each feature vector in the feature map corresponding to the image with the target detection image.
  • the first similarity calculation is performed on the feature vector corresponding to the target to be tracked in the image, and the target feature vector is determined from all the feature vectors in the feature map according to the first similarity calculation result.
  • the target to be tracked in the image is determined according to the target feature vector.
  • the target detection image may be a previous frame image of the image including the target to be tracked, or the target detection image may be a preset input image including the target to be tracked.
  • the video target tracking method provided by the embodiments of the present disclosure can be applied to two application scenarios where the tracking target is given and the tracking target is unknown.
  • the target detection image may be a preset input image including the target to be tracked, for example, the target to be tracked is person A, then the preset input image may be the person photographed by an image acquisition device Full body photo of A.
  • the target detection image may be the previous frame of the image including the target to be tracked.
  • a feature vector corresponding to the target to be tracked in the target detection image may be determined, and the feature vector may be a result of vectorization based on the image features of the center pixel of the target to be tracked, or the feature vector may be It is the result obtained by vectorizing the image feature of a certain pixel point that the target to be tracked can be distinguished from other targets, etc., which is not limited in this embodiment of the present disclosure.
  • the manner of determining the feature vector is similar to that in the related art, and details are not repeated here.
  • the feature map corresponding to the image may be an image obtained by quantization according to the image feature vector of each pixel in the image.
  • the feature vector corresponding to the target to be tracked in the target detection image is a pixel-level feature vector, so each feature vector in the feature map corresponding to the target to be tracked image corresponding to the target to be tracked can be compared with the target detection image corresponding to the target to be tracked.
  • the feature vector performs the first similarity calculation, so as to achieve target tracking according to the first similarity calculation result.
  • the first similarity calculation may be to perform vector dot product calculation, Euclidean distance calculation, etc. between the feature vector corresponding to each pixel in the image and the feature vector corresponding to the target to be tracked in the target detection image.
  • the method of calculating the first similarity is not limited.
  • the target tracking model may include an attention mechanism module, and the attention mechanism module may perform the first similarity calculation process to determine the feature vector corresponding to the target existing in both the image and the target detection image, that is, to obtain the target feature. vector.
  • the target detection image is a preset input image including the target to be tracked.
  • the attention mechanism module can correspond to each feature vector in the feature map with the target to be tracked in the preset input image
  • the first similarity calculation is performed on the feature vector of , and the target feature vector is output according to the first similarity calculation result, so that the target tracking model can determine the target to be tracked in the frame image according to the target feature vector.
  • determining the target feature vector among all the feature vectors of the feature map may be: when the target detection image includes N targets to be tracked, among all the feature vectors of the feature map, Select N eigenvectors with the largest first similarity calculation result as target eigenvectors, where N is a positive integer.
  • the target detection image is a preset input image including the target to be tracked. It can be determined by the target detection method in the related art that the target detection image includes N targets to be tracked, that is, the number of feature vectors corresponding to the targets to be tracked is N. In this scenario, for each frame of the video to be tracked, among all the feature vectors included in the feature map corresponding to the image, the N feature vectors with the largest first similarity calculation result may be selected as the target feature vector to determine The target to be tracked in this image.
  • all the first similarity calculation results can be sorted from large to small, and then, from all the feature vectors included in the feature map corresponding to the image, N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • the target feature vector is not limited in this embodiment of the present disclosure.
  • the target to be tracked determined according to the selected N feature vectors is the target existing in each frame of the video to be tracked and the target detection image, that is, the target to be tracked in each frame of the image output by the target tracking model.
  • the tracking target can be in one-to-one correspondence with the target to be tracked in the target detection image, so target detection and target association can be completed at the same time, thereby reducing the time delay in the target tracking process.
  • the target tracking model can also be used to determine the feature vectors corresponding to all targets in the image according to the pre-trained position vector parameters.
  • determining the target feature vector among all the feature vectors of the feature map may be: when the target detection image includes N targets to be tracked, among all the feature vectors of the feature map, select the first target feature vector.
  • the N eigenvectors with the largest similarity calculation result are regarded as similar eigenvectors, and N is a positive integer. Then, the feature vectors corresponding to all targets in the image and N similar feature vectors are deduplicated to obtain target feature vectors.
  • the target detection image is the previous frame of the image including the target to be tracked.
  • the attention mechanism module can compare each feature vector in the feature map with the target to be tracked in the previous frame of image.
  • a first similarity calculation is performed on the corresponding feature vector, and a similar feature vector is determined according to the first similarity calculation result.
  • the attention mechanism module can also determine the feature vectors of all objects in this frame of images according to the pre-trained position vector parameters.
  • the target tracking model can perform feature vector fusion based on the similar feature vectors and the feature vectors of all targets in the frame image to obtain the target feature vector. Finally, the target tracking model can determine the target to be tracked in the current frame image according to the target feature vector.
  • the position vector parameter may include a plurality of unit position vectors.
  • the position vector parameter may include equal to or more than H ⁇ W unit position vectors to cover each pixel in the image. Location. It should be understood that, the number of unit position vectors included in the position vector parameter can be set to be larger to adapt to the image size in different scenarios.
  • the position vector parameter may be obtained by training in the following way: determining the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with Corresponding sample target information, and then calculate the loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
  • the initial position vector parameter may be a random value, that is, after setting the number of unit position vectors in the position vector, the value of each unit position vector is a random value. Then, according to the result of the loss function of the target tracking model in the training process, the position vector parameter can be adjusted through the back-propagation algorithm, so that the position vector parameter can more accurately predict the position of the target in the image.
  • the feature vector determined according to the pre-trained position vector parameter may be used as the target feature vector. It should be understood that in a scenario where the tracking target is unknown, the target detection image may be the previous frame of the image, so the similarity calculation cannot be performed on the first frame of image.
  • the feature vectors corresponding to all targets in the image determined according to the pre-trained position vector parameters may be used as target feature vectors to determine the target to be tracked in the image.
  • each frame of the video to be tracked may correspond to the previous frame of image including the target to be tracked, so the corresponding The feature vector and the first similarity calculation result determine the target feature vector.
  • N feature vectors with the largest first similarity calculation result may be selected as similar feature vectors, where N is a positive integer. Then, the feature vectors corresponding to all targets in the image and N similar feature vectors can be deduplicated to obtain target feature vectors.
  • the N similar feature vectors represent the feature vectors corresponding to the targets existing in each frame of image in the video to be tracked and the target detection image, and the feature vector determined according to the position vector parameter is each frame in the video to be tracked.
  • the determined eigenvectors are eigenvectors corresponding to the targets B1, B2, and B3, and the two have eigenvectors corresponding to the same targets (B1 and B2).
  • feature vector fusion can be performed. For example, the feature vectors corresponding to all the targets in the image and N similar feature vectors can be deduplicated to obtain the target feature vector for determining the target to be tracked. Target.
  • the feature vectors corresponding to all the targets in the image and N similar feature vectors are deduplicated to obtain the target feature vector, which can be: for the feature vector corresponding to each target in the image, the feature vector Perform the second similarity calculation with N similar feature vectors, when the second similarity calculation result is greater than or equal to the preset similarity, delete the second similarity from the feature vectors corresponding to all targets in the image or from the N similar feature vectors The eigenvector corresponding to the result of the degree calculation. Then, the eigenvectors corresponding to all the targets in the deleted image and the remaining eigenvectors in the N similar eigenvectors are taken as target eigenvectors.
  • the second similarity calculation may be for a feature vector corresponding to each target in the image, performing vector dot product calculation, Euclidean distance calculation, etc. on the feature vector and N similar feature vectors.
  • the method of similarity calculation is not limited.
  • the preset similarity can be customized according to the actual situation, which is not limited in the present disclosure.
  • the feature vector corresponding to a certain target in the image and a certain feature vector among the N similar feature vectors can be regarded as the same feature vector, so that the feature vector can be regarded as the same feature vector.
  • Vector to perform the delete operation is not limited.
  • the deletion operation can be performed in the feature vectors corresponding to all the targets in the image or in the N similar feature vectors.
  • the N similar eigenvectors are the eigenvectors corresponding to the targets B1 and B2, and the eigenvectors determined according to the position vector parameters are the eigenvectors corresponding to the targets B1, B2, and B3.
  • the feature vectors corresponding to all the targets in the image and the remaining feature vectors in the N similar feature vectors can be used as target feature vectors.
  • the N similar eigenvectors are empty, and the eigenvectors corresponding to all targets in the image are those corresponding to the targets B1, B2 and B3.
  • the feature vectors corresponding to all the targets in the image and the remaining feature vectors in the N similar feature vectors are the feature vectors corresponding to the targets B1, B2 and B3, that is, the target feature vectors are the feature vectors corresponding to the targets B1, B2 and B3 .
  • deduplication processing method is only a possible method for feature vector fusion for feature vectors provided by the embodiment of the present disclosure. During the specific implementation of the present disclosure, other methods may also be used to The feature vector and the N similar feature vectors are fused to the feature vector, which is not limited in this embodiment of the present disclosure.
  • the feature vectors corresponding to all the targets in the image can be determined through the position vector parameter, and the similar feature vectors corresponding to the targets existing in both the image and the target detection image can be determined according to the second similarity calculation result, and then the similar feature
  • the vector and the feature vectors corresponding to all the targets in the image are fused with feature vectors to remove redundant feature vectors, and at the same time improve the computational efficiency, a more accurate target to be tracked can be obtained.
  • the target feature vector can be subjected to linear feature transformation to obtain tracking frame information corresponding to the target to be tracked in the image, where the tracking frame information includes the position information of the tracking frame corresponding to the target to be tracked and size information, so that the target to be tracked can be indicated in the image according to the tracking frame information.
  • linear feature transformation to obtain tracking frame information corresponding to the target to be tracked in the image, where the tracking frame information includes the position information of the tracking frame corresponding to the target to be tracked and size information, so that the target to be tracked can be indicated in the image according to the tracking frame information.
  • an embodiment of the present disclosure also provides a video target tracking apparatus, which can become part or all of an electronic device through software, hardware, or a combination of the two. 4, the video target tracking device includes:
  • an acquisition module 401 configured to acquire the video to be tracked
  • a tracking module 402 configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
  • the first determination sub-module 4021 is configured to, for each frame of the video to be tracked, determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked ;
  • the second determination sub-module 4022 is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to The first similarity calculation result, the target feature vector is determined in all the feature vectors of the feature map;
  • the third determination sub-module 4023 is configured to determine the target to be tracked in the image according to the target feature vector.
  • the target detection image is an image of the previous frame of the image that includes the target to be tracked; or, the target detection image is a preset input image that includes the target to be tracked.
  • the second determination submodule 4022 is used for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
  • the target tracking model is also used to determine the corresponding feature vectors of all targets in the image according to the position vector parameter of pre-training, and the second determination submodule is used for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
  • Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
  • the image is a first frame image of the video to be tracked, and the device further includes:
  • the fourth determination sub-module is configured to use the feature vector determined according to the pre-trained position vector parameter as the target feature vector.
  • the second determination submodule 4022 is used for:
  • the feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
  • the apparatus 400 further includes the following modules for obtaining the position vector parameters through training:
  • the first training module is used to determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with the corresponding sample target information;
  • the second training module is configured to calculate a loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
  • modules may be implemented as software components executing on one or more general-purpose processors, or as hardware, such as programmable logic devices and/or application specific integrated circuits, that perform certain functions or combinations thereof.
  • the modules may be embodied in the form of a software product that may be stored in non-volatile storage media including a computer device (eg, a personal computer, a server, a network device, mobile terminal, etc.) to implement the method described in the embodiments of the present invention.
  • a computer device eg, a personal computer, a server, a network device, mobile terminal, etc.
  • the above-mentioned modules may also be implemented on a single device, or may be distributed on multiple devices. The functions of these modules can be combined with each other or further split into multiple sub-modules.
  • an embodiment of the present disclosure further provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, implements the steps of any of the above video target tracking methods.
  • an electronic device including:
  • a processing device is used to execute the computer program in the storage device, so as to realize the steps of any of the above video target tracking methods.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device shown in FIG. 5 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 508 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • I/O interface 505 input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration
  • An output device 507 such as a computer
  • a storage device 508 including, for example, a magnetic tape, a hard disk, etc.
  • Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 509 , or from the storage device 508 , or from the ROM 502 .
  • the processing apparatus 501 executes the above-mentioned functions defined in the methods of the embodiments of the present disclosure.
  • an embodiment of the present disclosure provides a computer program, including: instructions, when executed by a processor, the instructions cause the processor to execute any of the above video object tracking methods.
  • an embodiment of the present disclosure provides a computer program product, including instructions, when executed by a processor, the instructions cause the processor to execute any of the above video object tracking methods.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol)
  • HTTP HyperText Transfer Protocol
  • communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires the video to be tracked; inputs the video to be tracked into the target tracking model to obtain The target tracking result corresponding to the video to be tracked, and the target tracking model is used to perform the following processing: for each frame of the video to be tracked, determine the feature corresponding to the target to be tracked in the target detection image corresponding to the image vector, the target detection image includes the target to be tracked; first similarity is performed between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image degree calculation, and according to the first similarity calculation result, determine the target feature vector from all the feature vectors in the feature map; according to the target feature vector, determine the target to be tracked in the image.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the module does not constitute a limitation of the module itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a video object tracking method, the method comprising:
  • the target to be tracked in the image is determined.
  • Example 2 provides the method of Example 1, wherein the target detection image is a previous frame image of the image including the target to be tracked; or
  • the target detection image is a preset input image including the target to be tracked.
  • Example 3 provides the method of Example 1, wherein according to the first similarity calculation result, determining a target feature vector among all feature vectors of the feature map, including:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
  • Example 4 provides the method of Example 1, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, The first similarity calculation result determines the target feature vector in all the feature vectors of the feature map, including:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
  • Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
  • Example 5 provides the method of Example 4, where the image is a first frame image of the video to be tracked, and the method further includes:
  • the feature vector determined according to the pre-trained position vector parameter is used as the target feature vector.
  • Example 6 provides the method of Example 4, wherein the feature vectors corresponding to all targets in the image and the N similar feature vectors are deduplicated to obtain target features vector, including:
  • the feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
  • Example 7 provides the method of any one of Examples 4-6, and the position vector parameter is obtained by training in the following manner:
  • a loss function is calculated according to the predicted target information and the sample target information, and the initial position vector parameter is adjusted according to the calculation result of the loss function.
  • Example 8 provides a video target tracking apparatus, the apparatus comprising:
  • the acquisition module is used to acquire the video to be tracked
  • a tracking module configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
  • a first determination submodule configured to determine, for each frame of the video to be tracked, a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
  • the second determination submodule is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to the first similarity calculation.
  • a similarity calculation result determining the target feature vector in all the feature vectors of the feature map;
  • the third determination sub-module is configured to determine the target to be tracked in the image according to the target feature vector.
  • Example 9 provides the apparatus of Example 8, and the target detection image is a previous frame image of the image including the target to be tracked; or, the target detection image is a preset input image including the target to be tracked.
  • Example 10 provides the apparatus of Example 8, the second determination submodule for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
  • Example 11 provides the apparatus of Example 8, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, and the first Two determine the sub-module for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
  • Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
  • Example 12 provides the apparatus of Example 11, where the image is a first frame image of the video to be tracked, and the apparatus further includes:
  • the fourth determination sub-module is configured to use the feature vector determined according to the pre-trained position vector parameter as the target feature vector.
  • Example 13 provides the apparatus of Example 11, the second determination submodule for:
  • the feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
  • Example 14 provides the apparatus of any one of Examples 11-13, the apparatus further comprising the following module for training to obtain the position vector parameter:
  • the first training module is used to determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with the corresponding sample target information;
  • the second training module is configured to calculate a loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
  • Example 15 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the method of any one of Examples 1-7.
  • Example 16 provides an electronic device comprising:
  • a processing device configured to execute the computer program in the storage device, to implement the steps of the method in any one of Examples 1-7.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a video target tracking method, a video target tracking apparatus, a storage medium, and an electronic device, for realizing end-to-end video target tracking and reducing the time delay of video target tracking. The video target tracking method comprises: acquiring a video to be tracked; and inputting said video into a target tracking model to obtain a target tracking result corresponding to said video, the target tracking model being used for performing the following processing: for each image frame of said video, determining a feature vector corresponding to a target to be tracked in a target detection image corresponding to the image, the target detection image comprising said target; performing a first similarity calculation on each feature vector in a feature map corresponding to the image and the feature vector corresponding to said target in the target detection image, and determining a target feature vector from among all the feature vectors in the feature map according to the first similarity calculation result; and determining said target in the image according to the target feature vector.

Description

视频目标追踪方法、视频目标追踪装置、存储介质及电子设备Video target tracking method, video target tracking device, storage medium and electronic device
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请是以申请号为202110179157.5,申请日为2021年2月9日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。The present application is based on the Chinese application with the application number of 202110179157.5 and the filing date of February 9, 2021, and claims its priority. The disclosure of the Chinese application is hereby incorporated into the present application as a whole.
技术领域technical field
本公开涉及图像处理技术领域,具体地,涉及一种视频目标追踪方法、视频目标追踪装置、存储介质及电子设备。The present disclosure relates to the technical field of image processing, and in particular, to a video target tracking method, a video target tracking device, a storage medium, and an electronic device.
背景技术Background technique
视频目标追踪是人类行为分析、体育视频解说等众多视频应用领域的基础,对实时性要求较高。但是,相关技术中的视频目标追踪通常是基于先目标检测再目标追踪的流程。具体地,先对视频中的前后两帧做目标检测,然后将检测到的目标匹配成对,从而实现目标的追踪。Video target tracking is the basis of many video application fields such as human behavior analysis and sports video commentary, and requires high real-time performance. However, the video target tracking in the related art is usually based on the process of first target detection and then target tracking. Specifically, the target detection is performed on the two frames before and after the video, and then the detected targets are matched into pairs, so as to achieve target tracking.
发明内容SUMMARY OF THE INVENTION
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。This Summary is provided to introduce concepts in a simplified form that are described in detail in the Detailed Description section that follows. This summary section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.
发明人认为,相关技术中,需要先执行目标检测,再执行目标追踪,因此时延较高,特别是在待追踪目标较多的场景下,时延问题尤其明显。The inventor believes that, in the related art, target detection needs to be performed first, and then target tracking is performed, so the delay is relatively high, especially in a scenario where there are many targets to be tracked, the delay problem is particularly obvious.
为了解决上述技术问题,本公开提供了一种视频目标的追踪方法、追踪装置、存储介质及电子设备,以实现端到端的视频目标追踪,减小视频目标追踪 的时延。In order to solve the above technical problems, the present disclosure provides a video target tracking method, tracking device, storage medium and electronic equipment, so as to realize end-to-end video target tracking and reduce the time delay of video target tracking.
第一方面,本公开提供一种视频目标追踪方法,所述方法包括:In a first aspect, the present disclosure provides a video target tracking method, the method comprising:
获取待追踪视频;Get the video to be tracked;
将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述目标追踪模型用于执行如下处理:Input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, and the target tracking model is used to perform the following processing:
针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;For each frame of the video to be tracked, determine a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;Perform a first similarity calculation on each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and according to the first similarity calculation result, in the Determine the target feature vector from all the feature vectors of the feature map;
根据所述目标特征向量,确定所述图像中的待追踪目标。According to the target feature vector, the target to be tracked in the image is determined.
第二方面,本公开提供一种视频目标追踪装置,所述装置包括:In a second aspect, the present disclosure provides a video target tracking device, the device comprising:
获取模块,用于获取待追踪视频;The acquisition module is used to acquire the video to be tracked;
追踪模块,用于将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述追踪模块包括:A tracking module, configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
第一确定子模块,用于针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;a first determination submodule, configured to determine, for each frame of the video to be tracked, a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
第二确定子模块,用于将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;The second determination submodule is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to the first similarity calculation. A similarity calculation result, determining the target feature vector in all the feature vectors of the feature map;
第三确定子模块,用于根据所述目标特征向量,确定所述图像中的待追踪目标。The third determination sub-module is configured to determine the target to be tracked in the image according to the target feature vector.
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现本公开实施例提供的任一视频目标追踪方法。In a third aspect, the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements any of the video target tracking methods provided by the embodiments of the present disclosure.
第四方面,本公开提供一种电子设备,包括:In a fourth aspect, the present disclosure provides an electronic device, comprising:
存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现本公开实施例提供的任一视频目标追踪方法。A processing device is configured to execute the computer program in the storage device to implement any video target tracking method provided by the embodiments of the present disclosure.
第五方面,本公开提供了一种计算机程序,包括:指令,所述指令当由处理器执行时使所述处理器执行本公开实施例提供的任一视频目标追踪方法。In a fifth aspect, the present disclosure provides a computer program, comprising: instructions, when executed by a processor, the instructions cause the processor to execute any of the video object tracking methods provided by the embodiments of the present disclosure.
第六方面,本公开提供了一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行本公开实施例提供的任一视频目标追踪方法。In a sixth aspect, the present disclosure provides a computer program product comprising instructions, which when executed by a processor, cause the processor to execute any of the video object tracking methods provided by the embodiments of the present disclosure.
通过上述技术方案,目标追踪模型可以将待追踪视频的每一帧图像对应的特征向量与目标检测图像中待追踪目标对应的特征向量进行第一相似度计算,从而根据第一相似度计算结果确定每一帧图像中的待追踪目标。由此,目标追踪模型输出的每一帧图像中的待追踪目标可以与目标检测图像中的待追踪目标一一对应,即可以同时完成目标检测和目标关联,从而可以减少目标追踪过程中的时延。Through the above technical solution, the target tracking model can perform the first similarity calculation between the feature vector corresponding to each frame of the video to be tracked and the feature vector corresponding to the target to be tracked in the target detection image, so as to determine the first similarity according to the first similarity calculation result. The target to be tracked in each frame of image. Therefore, the target to be tracked in each frame of image output by the target tracking model can correspond to the target to be tracked in the target detection image one-to-one, that is, target detection and target association can be completed at the same time, thereby reducing the time in the target tracking process. extension.
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale. In the attached image:
图1是根据本公开一示例性实施例示出的一种视频目标追踪方法的流程图;1 is a flowchart of a method for tracking a video target according to an exemplary embodiment of the present disclosure;
图2是根据本公开一示例性实施例示出的一种视频目标追踪方法中的目标追踪过程示意图;2 is a schematic diagram of a target tracking process in a video target tracking method according to an exemplary embodiment of the present disclosure;
图3是根据本公开一示例性实施例示出的另一种视频目标追踪方法中的目标追踪过程示意图;3 is a schematic diagram of a target tracking process in another video target tracking method according to an exemplary embodiment of the present disclosure;
图4是根据本公开一示例性实施例示出的一种视频目标追踪装置的框图;4 is a block diagram of a video target tracking apparatus according to an exemplary embodiment of the present disclosure;
图5是根据本公开一示例性实施例示出的一种电子设备的框图。FIG. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。另外需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence. In addition, it should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "a" or more".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.
相关技术中的视频目标追踪通常是基于先目标检测再目标追踪的流程。具体地,先通过检测模块对视频中的前后两帧做目标检测,然后通过关联模块将检测到的目标匹配成对,从而实现目标的追踪。此种流程的模型组件较为复杂,时延较高,特别是在待追踪目标较多的场景下,时延问题尤其明显。Video target tracking in the related art is usually based on a process of target detection and then target tracking. Specifically, the detection module performs target detection on the two frames before and after the video, and then the detected targets are matched into pairs by the association module, so as to achieve target tracking. The model components of this process are relatively complex, and the delay is relatively high, especially in scenarios where there are many targets to be tracked, the delay problem is particularly obvious.
有鉴于此,本公开提出一种视频目标追踪方法、视频目标追踪装置、存储 介质及电子设备,以实现端到端的视频目标追踪,减小视频目标追踪的时延。In view of this, the present disclosure proposes a video target tracking method, a video target tracking device, a storage medium and an electronic device, so as to realize end-to-end video target tracking and reduce the time delay of video target tracking.
图1是根据本公开一示例性实施例示出的一种视频目标追踪方法的流程图。参照图1,该视频目标追踪方法包括:FIG. 1 is a flowchart of a video target tracking method according to an exemplary embodiment of the present disclosure. 1, the video target tracking method includes:
步骤101,获取待追踪视频。 Step 101, acquiring the video to be tracked.
示例地,获取待追踪视频可以是响应于用户的视频输入操作,获取用户输入的视频,也可以是在接收到目标追踪指令后自动从图像采集设备中获取该图像采集设备拍摄的视频,等等,本公开实施例对此不作限定。For example, acquiring the video to be tracked may be in response to the user's video input operation, acquiring the video input by the user, or automatically acquiring the video captured by the image capturing device from the image capturing device after receiving the target tracking instruction, etc. , which is not limited in this embodiment of the present disclosure.
步骤102,将待追踪视频输入目标追踪模型,以得到待追踪视频对应的目标追踪结果。该目标追踪模型用于执行如下处理:针对待追踪视频的每一帧图像,确定图像对应的目标检测图像中待追踪目标对应的特征向量,该目标检测图像包括所述待追踪目标;将图像对应的特征图中的每一特征向量与目标检测图像中待追踪目标对应的特征向量进行第一相似度计算,并根据第一相似度计算结果,在特征图的所有特征向量中确定目标特征向量;根据目标特征向量,确定图像中的待追踪目标。Step 102: Input the video to be tracked into the target tracking model to obtain the target tracking result corresponding to the video to be tracked. The target tracking model is used to perform the following processing: for each frame of the video to be tracked, determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked; The first similarity calculation is performed between each feature vector in the feature map and the feature vector corresponding to the target to be tracked in the target detection image, and the target feature vector is determined in all the feature vectors in the feature map according to the first similarity calculation result; According to the target feature vector, the target to be tracked in the image is determined.
通过上述方式,目标追踪模型可以将待追踪视频的每一帧图像对应的特征向量与目标检测图像中待追踪目标对应的特征向量进行第一相似度计算,从而根据第一相似度计算结果确定每一帧图像中的待追踪目标。由此,目标追踪模型输出的每一帧图像中的待追踪目标可以与目标检测图像中的待追踪目标一一对应,即可以同时完成目标检测和目标关联,从而可以减少目标追踪过程中的时延。In the above manner, the target tracking model can perform the first similarity calculation between the feature vector corresponding to each frame of the video to be tracked and the feature vector corresponding to the target to be tracked in the target detection image, so as to determine the first similarity according to the first similarity calculation result. The target to be tracked in a frame of images. Therefore, the target to be tracked in each frame of image output by the target tracking model can correspond to the target to be tracked in the target detection image one-to-one, that is, target detection and target association can be completed at the same time, thereby reducing the time in the target tracking process. extension.
为了使得本领域技术人员更加理解本公开提供的视频目标追踪方法,下面对上述各步骤进行详细举例说明。In order to make those skilled in the art better understand the video target tracking method provided by the present disclosure, the above steps are illustrated in detail below.
示例地,在获取待追踪视频之后,可以将该待追踪视频输入目标追踪模型。应当理解的是,待追踪视频中每一帧图像具有先后时间顺序,因此根据该待追踪视频可以得到由按照时间顺序排列的多帧图像组成的视频图像序列。因此,将待追踪视频输入目标追踪模型也可以是将该待追踪视频对应的视频图像序列输入目标追踪模型。For example, after acquiring the video to be tracked, the video to be tracked may be input into the target tracking model. It should be understood that each frame of images in the video to be tracked has a time sequence, so a video image sequence composed of multiple frames of images arranged in time sequence can be obtained according to the video to be tracked. Therefore, inputting the video to be tracked into the target tracking model may also be inputting the video image sequence corresponding to the video to be tracked into the target tracking model.
示例地,目标追踪模型的训练可以是根据样本图像和样本图像对应的样本目标信息而进行的。具体的,可以将样本图像输入目标追踪模型,得到目标追踪模型输出的针对该样本图像的预测目标信息,然后根据该预测目标信息与样本目标信息进行损失函数的计算,最后根据该损失函数的计算结果调整目标追踪模型的参数,以使目标追踪模型输出更加准确的目标信息。由此,可以实现对目标追踪模型的目标检测功能和目标追踪功能进行同步训练。For example, the training of the target tracking model may be performed according to the sample images and sample target information corresponding to the sample images. Specifically, the sample image can be input into the target tracking model to obtain the predicted target information output by the target tracking model for the sample image, then the loss function is calculated according to the predicted target information and the sample target information, and finally the loss function is calculated according to the As a result, the parameters of the target tracking model are adjusted to make the target tracking model output more accurate target information. Thus, the target detection function and the target tracking function of the target tracking model can be trained synchronously.
应当理解的是,由于相关技术需要先目标检测再目标关联,因此相关技术中的模型训练方式是逐步对检测模块和关联模块进行训练,在待追踪目标较多的场景下,此种训练方式需要耗费大量时间,难以达到最优的训练效果。而本公开实施例中对目标追踪模型的目标检测功能和目标追踪功能进行同步训练的方式,不仅简化了目标追踪模型的组件,还简化了目标追踪模型的训练过程,可以更好的满足多目标追踪的场景需求。It should be understood that since the related art requires target detection and then target association, the model training method in the related art is to gradually train the detection module and the association module. In scenarios with many targets to be tracked, this training method requires It takes a lot of time and it is difficult to achieve the optimal training effect. However, the method of synchronizing the training of the target detection function and the target tracking function of the target tracking model in the embodiment of the present disclosure not only simplifies the components of the target tracking model, but also simplifies the training process of the target tracking model, which can better satisfy multiple targets. Tracked scene requirements.
在应用阶段,目标追踪模型可以针对待追踪视频的每一帧图像,确定图像对应的目标检测图像中待追踪目标对应的特征向量,然后将图像对应的特征图中的每一特征向量与目标检测图像中待追踪目标对应的特征向量进行第一相似度计算,并根据第一相似度计算结果,在特征图的所有特征向量中确定目标特征向量。最后,根据目标特征向量确定图像中的待追踪目标。In the application stage, the target tracking model can determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image for each frame of the video to be tracked, and then compare each feature vector in the feature map corresponding to the image with the target detection image. The first similarity calculation is performed on the feature vector corresponding to the target to be tracked in the image, and the target feature vector is determined from all the feature vectors in the feature map according to the first similarity calculation result. Finally, the target to be tracked in the image is determined according to the target feature vector.
示例地,目标检测图像可以是包括待追踪目标的、图像的上一帧图像,或者目标检测图像可以是包括待追踪目标的预设输入图像。应当理解的是,本公开实施例提供的视频目标追踪方法可以应用于追踪目标给定和追踪目标未知的两种应用场景。其中,在追踪目标给定的场景下,目标检测图像可以是包括待追踪目标的预设输入图像,比如待追踪目标为人物A,则该预设输入图像可以是通过图像采集设备拍摄的该人物A的全身照片。在追踪目标未知的场景下,需要追踪每一帧图像中的所有目标,则目标检测图像可以是包括待追踪目标的、该图像的上一帧图像。For example, the target detection image may be a previous frame image of the image including the target to be tracked, or the target detection image may be a preset input image including the target to be tracked. It should be understood that the video target tracking method provided by the embodiments of the present disclosure can be applied to two application scenarios where the tracking target is given and the tracking target is unknown. Wherein, in the given scene of the tracking target, the target detection image may be a preset input image including the target to be tracked, for example, the target to be tracked is person A, then the preset input image may be the person photographed by an image acquisition device Full body photo of A. In a scenario where the tracking target is unknown, all targets in each frame of image need to be tracked, and the target detection image may be the previous frame of the image including the target to be tracked.
在确定目标检测图像之后,可以确定该目标检测图像中待追踪目标对应的特征向量,该特征向量可以是根据待追踪目标中心像素点的图像特征进行 向量化后得到的结果,或者该特征向量可以是对待追踪目标能够与其他目标区分的某一像素点的图像特征进行向量化后得到的结果,等等,本公开实施例对此不作限定。特征向量的确定方式与相关技术中类似,这里不再赘述。After the target detection image is determined, a feature vector corresponding to the target to be tracked in the target detection image may be determined, and the feature vector may be a result of vectorization based on the image features of the center pixel of the target to be tracked, or the feature vector may be It is the result obtained by vectorizing the image feature of a certain pixel point that the target to be tracked can be distinguished from other targets, etc., which is not limited in this embodiment of the present disclosure. The manner of determining the feature vector is similar to that in the related art, and details are not repeated here.
示例地,图像对应的特征图可以是根据图像中每一像素点的图像特征向量化后得到的图像。并且,目标检测图像中待追踪目标对应的特征向量为像素级别的特征向量,因此可以将待追踪目标对应的将图像对应的特征图中的每一特征向量与目标检测图像中待追踪目标对应的特征向量进行第一相似度计算,从而根据第一相似度计算结果实现目标追踪。示例地,第一相似度计算可以是将图像中每一像素点对应的特征向量与目标检测图像中待追踪目标对应的特征向量进行向量点乘计算、欧式距离计算等等,本公开实施例对于第一相似度计算的方式不作限定。For example, the feature map corresponding to the image may be an image obtained by quantization according to the image feature vector of each pixel in the image. Moreover, the feature vector corresponding to the target to be tracked in the target detection image is a pixel-level feature vector, so each feature vector in the feature map corresponding to the target to be tracked image corresponding to the target to be tracked can be compared with the target detection image corresponding to the target to be tracked. The feature vector performs the first similarity calculation, so as to achieve target tracking according to the first similarity calculation result. For example, the first similarity calculation may be to perform vector dot product calculation, Euclidean distance calculation, etc. between the feature vector corresponding to each pixel in the image and the feature vector corresponding to the target to be tracked in the target detection image. The method of calculating the first similarity is not limited.
应当理解的是,目标追踪模型可以包括注意力机制模块,该注意力机制模块可以执行第一相似度计算过程,以确定图像和目标检测图像中均存在的目标对应的特征向量,即得到目标特征向量。It should be understood that the target tracking model may include an attention mechanism module, and the attention mechanism module may perform the first similarity calculation process to determine the feature vector corresponding to the target existing in both the image and the target detection image, that is, to obtain the target feature. vector.
比如,参照图2,在追踪目标给定的场景下,目标检测图像为包括待追踪目标的预设输入图像。将待追踪视频的每一帧图像分别作为本帧图像,确定该本帧图像的特征图,然后注意力机制模块可以将该特征图中的每一特征向量与预设输入图像中待追踪目标对应的特征向量进行第一相似度计算,并根据第一相似度计算结果,输出目标特征向量,从而目标追踪模型可以根据该目标特征向量确定本帧图像中的待追踪目标。For example, referring to FIG. 2 , in a given scene of a tracking target, the target detection image is a preset input image including the target to be tracked. Take each frame of the image to be tracked as the image of the current frame, determine the feature map of the image of the current frame, and then the attention mechanism module can correspond to each feature vector in the feature map with the target to be tracked in the preset input image The first similarity calculation is performed on the feature vector of , and the target feature vector is output according to the first similarity calculation result, so that the target tracking model can determine the target to be tracked in the frame image according to the target feature vector.
在可能的方式中,根据第一相似度计算结果,在特征图的所有特征向量中确定目标特征向量可以是:当目标检测图像包括N个待追踪目标时,在特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为目标特征向量,N为正整数。In a possible way, according to the first similarity calculation result, determining the target feature vector among all the feature vectors of the feature map may be: when the target detection image includes N targets to be tracked, among all the feature vectors of the feature map, Select N eigenvectors with the largest first similarity calculation result as target eigenvectors, where N is a positive integer.
示例地,在追踪目标给定的场景下,目标检测图像为包括待追踪目标的预设输入图像。可以通过相关技术中的目标检测方式确定目标检测图像中包括N个待追踪目标,即待追踪目标对应的特征向量的数量为N。在此种场景下, 针对待追踪视频的每一帧图像,可以在该图像对应的特征图包括的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为目标特征向量来确定该图像中的待追踪目标。其中,可以先将所有第一相似度计算结果进行从大到小的排序,然后在图像对应的特征图包括的所有特征向量中,选取排序最前的第一相似度计算结果对应的N个特征向量作为目标特征向量。当然,也可以将所有第一相似度计算结果进行从小到大的排序,然后在图像对应的特征图包括的所有特征向量中,选取排序最后的第一相似度计算结果对应的N个特征向量作为目标特征向量,本公开实施例对此不作限定。For example, in a given scene of a tracking target, the target detection image is a preset input image including the target to be tracked. It can be determined by the target detection method in the related art that the target detection image includes N targets to be tracked, that is, the number of feature vectors corresponding to the targets to be tracked is N. In this scenario, for each frame of the video to be tracked, among all the feature vectors included in the feature map corresponding to the image, the N feature vectors with the largest first similarity calculation result may be selected as the target feature vector to determine The target to be tracked in this image. Among them, all the first similarity calculation results can be sorted from large to small, and then, from all the feature vectors included in the feature map corresponding to the image, N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector. Of course, it is also possible to sort all the first similarity calculation results from small to large, and then, from all the feature vectors included in the feature map corresponding to the image, select N feature vectors corresponding to the last sorted first similarity calculation result as The target feature vector is not limited in this embodiment of the present disclosure.
通过上述方式,根据选取出的N个特征向量所确定的待追踪目标是待追踪视频中每一帧图像与目标检测图像中均存在的目标,即目标追踪模型输出的每一帧图像中的待追踪目标可以与目标检测图像中的待追踪目标一一对应,因此可以同时完成目标检测和目标关联,从而可以减少目标追踪过程中的时延。In the above manner, the target to be tracked determined according to the selected N feature vectors is the target existing in each frame of the video to be tracked and the target detection image, that is, the target to be tracked in each frame of the image output by the target tracking model. The tracking target can be in one-to-one correspondence with the target to be tracked in the target detection image, so target detection and target association can be completed at the same time, thereby reducing the time delay in the target tracking process.
在可能的方式中,在追踪目标未知的场景下,由于需要追踪图像中的所有目标,因此目标追踪模型还可以用于根据预训练的位置向量参数确定图像中所有目标对应的特征向量。相应地,根据第一相似度计算结果,在特征图的所有特征向量中确定目标特征向量可以是:当目标检测图像中包括N个待追踪目标时,在特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为相似特征向量,N为正整数。然后,对图像中所有目标对应的特征向量和N个相似特征向量进行去重处理,以得到目标特征向量。In a possible way, in a scenario where the tracking target is unknown, since all targets in the image need to be tracked, the target tracking model can also be used to determine the feature vectors corresponding to all targets in the image according to the pre-trained position vector parameters. Correspondingly, according to the first similarity calculation result, determining the target feature vector among all the feature vectors of the feature map may be: when the target detection image includes N targets to be tracked, among all the feature vectors of the feature map, select the first target feature vector. The N eigenvectors with the largest similarity calculation result are regarded as similar eigenvectors, and N is a positive integer. Then, the feature vectors corresponding to all targets in the image and N similar feature vectors are deduplicated to obtain target feature vectors.
例如,参照图3,在追踪目标未知的场景下,目标检测图像为包括待追踪目标的、图像的上一帧图像。将待追踪视频的每一帧图像分别作为本帧图像,先确定该本帧图像的特征图,然后注意力机制模块可以将该特征图中的每一特征向量与上一帧图像中待追踪目标对应的特征向量进行第一相似度计算,并根据第一相似度计算结果,确定相似特征向量。同时,注意力机制模块还可以根据预训练的位置向量参数确定本帧图像中所有目标的特征向量。之后,目标追踪模型可以根据对相似特征向量和本帧图像中所有目标的特征向量进行 特征向量融合,以得到目标特征向量。最后,目标追踪模型可以根据该目标特征向量确定本帧图像中的待追踪目标。For example, referring to FIG. 3 , in a scenario where the tracking target is unknown, the target detection image is the previous frame of the image including the target to be tracked. Take each frame of the image to be tracked as the image of the current frame, first determine the feature map of the image of the current frame, and then the attention mechanism module can compare each feature vector in the feature map with the target to be tracked in the previous frame of image. A first similarity calculation is performed on the corresponding feature vector, and a similar feature vector is determined according to the first similarity calculation result. At the same time, the attention mechanism module can also determine the feature vectors of all objects in this frame of images according to the pre-trained position vector parameters. After that, the target tracking model can perform feature vector fusion based on the similar feature vectors and the feature vectors of all targets in the frame image to obtain the target feature vector. Finally, the target tracking model can determine the target to be tracked in the current frame image according to the target feature vector.
示例地,位置向量参数可以包括多个单位位置向量,比如对于H×W的图像,该位置向量参数可以包括等于或多于H×W个单位位置向量,以覆盖该图像中的每个像素点位置。应当理解的是,位置向量参数包括的单位位置向量的数量可以设定得较大,以适应不同场景下的图像尺寸。Illustratively, the position vector parameter may include a plurality of unit position vectors. For example, for an H×W image, the position vector parameter may include equal to or more than H×W unit position vectors to cover each pixel in the image. Location. It should be understood that, the number of unit position vectors included in the position vector parameter can be set to be larger to adapt to the image size in different scenarios.
在可能的方式中,位置向量参数可以是通过如下方式训练得到的:根据初始位置向量参数确定样本图像中目标对应的预测特征向量,以得到样本图像对应的预测目标信息,其中样本图像预先标注有对应的样本目标信息,然后根据预测目标信息与样本目标信息计算损失函数,并根据损失函数的计算结果,调整初始位置向量参数。In a possible way, the position vector parameter may be obtained by training in the following way: determining the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with Corresponding sample target information, and then calculate the loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
示例地,初始位置向量参数可以是随机值,即在设定位置向量中单位位置向量的数量之后,每一单位位置向量的取值为随机值。然后,可以根据目标追踪模型在训练过程中的损失函数结果,通过反向传播算法调整该位置向量参数,以使该位置向量参数可以更加准确的预测图像中目标的位置。For example, the initial position vector parameter may be a random value, that is, after setting the number of unit position vectors in the position vector, the value of each unit position vector is a random value. Then, according to the result of the loss function of the target tracking model in the training process, the position vector parameter can be adjusted through the back-propagation algorithm, so that the position vector parameter can more accurately predict the position of the target in the image.
在可能的方式中,若图像为待追踪视频的第一帧图像,则可以将根据预训练的位置向量参数确定的特征向量作为目标特征向量。应当理解的是,在追踪目标未知的场景下,目标检测图像可以是图像的上一帧图像,因此第一帧图像无法进行相似度计算。在本公开实施例中,可以将根据预训练的位置向量参数确定的该图像中所有目标对应的特征向量作为目标特征向量,以确定该图像中的待追踪目标。In a possible manner, if the image is the first frame image of the video to be tracked, the feature vector determined according to the pre-trained position vector parameter may be used as the target feature vector. It should be understood that in a scenario where the tracking target is unknown, the target detection image may be the previous frame of the image, so the similarity calculation cannot be performed on the first frame of image. In the embodiment of the present disclosure, the feature vectors corresponding to all targets in the image determined according to the pre-trained position vector parameters may be used as target feature vectors to determine the target to be tracked in the image.
在根据位置向量参数确定第一帧图像中的待追踪目标后,待追踪视频的每一帧图像可以对应有包括待追踪目标的上一帧图像,因此可以结合每一帧图像中所有目标对应的特征向量与第一相似度计算结果确定目标特征向量。After the target to be tracked in the first frame of image is determined according to the position vector parameter, each frame of the video to be tracked may correspond to the previous frame of image including the target to be tracked, so the corresponding The feature vector and the first similarity calculation result determine the target feature vector.
示例地,当目标检测图像中包括N个待追踪目标时,可以先在特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为相似特征向量,N为正整数。然后可以对图像中所有目标对应的特征向量和N个相 似特征向量进行去重处理,以得到目标特征向量。其中,确定N个相似特征向量的过程可以参照上文关于选取第一相似度计算结果最大的N个特征向量作为目标特征向量的内容,这里不再赘述。For example, when the target detection image includes N targets to be tracked, from all the feature vectors in the feature map, N feature vectors with the largest first similarity calculation result may be selected as similar feature vectors, where N is a positive integer. Then, the feature vectors corresponding to all targets in the image and N similar feature vectors can be deduplicated to obtain target feature vectors. Wherein, for the process of determining the N similar feature vectors, reference may be made to the above content about selecting the N feature vectors with the largest first similarity calculation result as the target feature vector, which will not be repeated here.
示例地,N个相似特征向量表征的是待追踪视频中每一帧图像与目标检测图像中均存在的目标所对应的特征向量,根据位置向量参数确定的特征向量是待追踪视频中每一帧图像中所有目标对应的特征向量,两者存在相同的目标所对应的特征向量。比如,待追踪视频中的某一帧图像中包括目标B1、B2和B3,目标检测图像中包括目标B1和B2,则N个相似特征向量为目标B1和B2对应的特征向量,根据位置向量参数确定的特征向量为目标B1、B2和B3对应的特征向量,两者存在相同的目标(B1和B2)所对应的特征向量。在此种情况下,为了避免向量冗余,可以进行特征向量融合,比如可以对图像中所有目标对应的特征向量和N个相似特征向量进行去重处理,以得到目标特征向量用于确定待追踪目标。Illustratively, the N similar feature vectors represent the feature vectors corresponding to the targets existing in each frame of image in the video to be tracked and the target detection image, and the feature vector determined according to the position vector parameter is each frame in the video to be tracked. The feature vectors corresponding to all targets in the image, and the feature vectors corresponding to the same targets exist in both. For example, if a certain frame of image in the video to be tracked includes targets B1, B2 and B3, and the target detection image includes targets B1 and B2, the N similar feature vectors are the feature vectors corresponding to the targets B1 and B2. According to the position vector parameter The determined eigenvectors are eigenvectors corresponding to the targets B1, B2, and B3, and the two have eigenvectors corresponding to the same targets (B1 and B2). In this case, in order to avoid vector redundancy, feature vector fusion can be performed. For example, the feature vectors corresponding to all the targets in the image and N similar feature vectors can be deduplicated to obtain the target feature vector for determining the target to be tracked. Target.
在可能的方式中,对图像中所有目标对应的特征向量和N个相似特征向量进行去重处理,以得到目标特征向量,可以是:针对图像中每一目标对应的特征向量,将该特征向量与N个相似特征向量进行第二相似度计算,当第二相似度计算结果大于或等于预设相似度时,在图像中所有目标对应的特征向量中或N个相似特征向量中删除第二相似度计算结果对应的特征向量。然后,将删除处理后图像中所有目标对应的特征向量和N个相似特征向量中的剩余特征向量作为目标特征向量。In a possible way, the feature vectors corresponding to all the targets in the image and N similar feature vectors are deduplicated to obtain the target feature vector, which can be: for the feature vector corresponding to each target in the image, the feature vector Perform the second similarity calculation with N similar feature vectors, when the second similarity calculation result is greater than or equal to the preset similarity, delete the second similarity from the feature vectors corresponding to all targets in the image or from the N similar feature vectors The eigenvector corresponding to the result of the degree calculation. Then, the eigenvectors corresponding to all the targets in the deleted image and the remaining eigenvectors in the N similar eigenvectors are taken as target eigenvectors.
示例地,第二相似度计算可以是针对图像中每一目标对应的特征向量,将该特征向量与N个相似特征向量进行向量点乘计算、欧式距离计算等等,本公开实施例对于第二相似度计算的方式不作限定。预设相似度可以根据实际情况自定义,本公开对此也不作限定。当第二相似度计算结果大于或等于预设相似度,则可以将图像中某一目标对应的特征向量与N个相似特征向量中的某一特征向量视为同一特征向量,从而可以对该特征向量执行删除操作。其中,考虑到该特征向量可能同时存在于图像中所有目标对应的特征向量中和N个 相似特征向量中,因此可以在图像中所有目标对应的特征向量中或者N个相似特征向量中执行删除操作。Exemplarily, the second similarity calculation may be for a feature vector corresponding to each target in the image, performing vector dot product calculation, Euclidean distance calculation, etc. on the feature vector and N similar feature vectors. The method of similarity calculation is not limited. The preset similarity can be customized according to the actual situation, which is not limited in the present disclosure. When the second similarity calculation result is greater than or equal to the preset similarity, the feature vector corresponding to a certain target in the image and a certain feature vector among the N similar feature vectors can be regarded as the same feature vector, so that the feature vector can be regarded as the same feature vector. Vector to perform the delete operation. Among them, considering that the feature vector may exist in the feature vectors corresponding to all the targets in the image and in the N similar feature vectors at the same time, the deletion operation can be performed in the feature vectors corresponding to all the targets in the image or in the N similar feature vectors. .
比如,上述举例中,N个相似特征向量为目标B1和B2对应的特征向量,根据位置向量参数确定的特征向量为目标B1、B2和B3对应的特征向量,在第二相似度计算之后,可以确定大于或等于预设相似度的第二相似度计算结果所对应的特征向量为目标B1和B2对应的特征向量。可以选择在N个相似特征向量中删除目标B1和B2对应的特征向量,删除之后N个相似特征向量为空。或者,可以选择在根据位置向量参数确定的特征向量(即图像中所有目标对应的特征向量中)中删除目标B1和B2对应的特征向量,删除剩余特征向量为目标B3对应的特征向量。应当理解的是,删除处理后,可以优先保留图像和目标检测图像中均存在的目标所对应的特征向量,从而保证目标关联,进而实现目标追踪。For example, in the above example, the N similar eigenvectors are the eigenvectors corresponding to the targets B1 and B2, and the eigenvectors determined according to the position vector parameters are the eigenvectors corresponding to the targets B1, B2, and B3. After the second similarity is calculated, you can It is determined that the feature vector corresponding to the second similarity calculation result greater than or equal to the preset similarity is the feature vector corresponding to the targets B1 and B2. You can choose to delete the eigenvectors corresponding to the targets B1 and B2 from the N similar eigenvectors, and the N similar eigenvectors are empty after deletion. Alternatively, you can choose to delete the feature vectors corresponding to the targets B1 and B2 from the feature vectors determined according to the position vector parameters (that is, the feature vectors corresponding to all targets in the image), and delete the remaining feature vectors to be the feature vector corresponding to the target B3. It should be understood that, after the deletion process, feature vectors corresponding to targets existing in both the image and the target detection image can be preferentially retained, so as to ensure target association and thus achieve target tracking.
在删除处理后,可以将图像中所有目标对应的特征向量和N个相似特征向量中的剩余特征向量作为目标特征向量。比如,上述举例中,选择在N个相似特征向量中删除目标B1和B2对应的特征向量之后,N个相似特征向量为空,图像中所有目标对应的特征向量为目标B1、B2和B3对应的特征向量,则图像中所有目标对应的特征向量和N个相似特征向量中的剩余特征向量为目标B1、B2和B3对应的特征向量,即目标特征向量为目标B1、B2和B3对应的特征向量。After the deletion process, the feature vectors corresponding to all the targets in the image and the remaining feature vectors in the N similar feature vectors can be used as target feature vectors. For example, in the above example, after selecting to delete the eigenvectors corresponding to the targets B1 and B2 from the N similar eigenvectors, the N similar eigenvectors are empty, and the eigenvectors corresponding to all targets in the image are those corresponding to the targets B1, B2 and B3. feature vector, then the feature vectors corresponding to all the targets in the image and the remaining feature vectors in the N similar feature vectors are the feature vectors corresponding to the targets B1, B2 and B3, that is, the target feature vectors are the feature vectors corresponding to the targets B1, B2 and B3 .
应当理解的是,上述去重处理方式仅是本公开实施例提供的一种针对特征向量进行特征向量融合的可能方式,在本公开具体实施时,还可以通过其他方式对图像中所有目标对应的特征向量和N个相似特征向量进行特征向量融合,本公开实施例对此不作限定。It should be understood that the above-mentioned deduplication processing method is only a possible method for feature vector fusion for feature vectors provided by the embodiment of the present disclosure. During the specific implementation of the present disclosure, other methods may also be used to The feature vector and the N similar feature vectors are fused to the feature vector, which is not limited in this embodiment of the present disclosure.
通过上述方式,可以通过位置向量参数确定图像中所有目标对应的特征向量,并根据第二相似度计算结果确定该图像和目标检测图像中均存在的目标对应的相似特征向量,然后对该相似特征向量和图像中所有目标对应的特征向量进行特征向量融合,以去除冗余的特征向量,在提升计算效率的同时, 得到更加准确的待追踪目标。In the above manner, the feature vectors corresponding to all the targets in the image can be determined through the position vector parameter, and the similar feature vectors corresponding to the targets existing in both the image and the target detection image can be determined according to the second similarity calculation result, and then the similar feature The vector and the feature vectors corresponding to all the targets in the image are fused with feature vectors to remove redundant feature vectors, and at the same time improve the computational efficiency, a more accurate target to be tracked can be obtained.
通过上述任一方式得到目标特征向量之后,可以将该目标特征向量进行线性特征转换,以得到图像中待追踪目标对应的追踪框信息,该追踪框信息包括待追踪目标对应的追踪框的位置信息和尺寸信息,从而根据该追踪框信息可以在图像中示意出待追踪目标,该过程与相关技术类似,这里不再赘述。After the target feature vector is obtained by any of the above methods, the target feature vector can be subjected to linear feature transformation to obtain tracking frame information corresponding to the target to be tracked in the image, where the tracking frame information includes the position information of the tracking frame corresponding to the target to be tracked and size information, so that the target to be tracked can be indicated in the image according to the tracking frame information. This process is similar to the related art, and will not be repeated here.
基于同一发明构思,本公开实施例还提供一种视频目标追踪装置,该装置可以通过软件、硬件或两者结合的方式成为电子设备的部分或全部。参照图4,该视频目标追踪装置包括:Based on the same inventive concept, an embodiment of the present disclosure also provides a video target tracking apparatus, which can become part or all of an electronic device through software, hardware, or a combination of the two. 4, the video target tracking device includes:
获取模块401,用于获取待追踪视频;an acquisition module 401, configured to acquire the video to be tracked;
追踪模块402,用于将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述追踪模块包括:A tracking module 402, configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
第一确定子模块4021,用于针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;The first determination sub-module 4021 is configured to, for each frame of the video to be tracked, determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked ;
第二确定子模块4022,用于将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;The second determination sub-module 4022 is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to The first similarity calculation result, the target feature vector is determined in all the feature vectors of the feature map;
第三确定子模块4023,用于根据所述目标特征向量,确定所述图像中的待追踪目标。The third determination sub-module 4023 is configured to determine the target to be tracked in the image according to the target feature vector.
可选地,所述目标检测图像为包括所述待追踪目标的、所述图像的上一帧图像;或者,所述目标检测图像为包括所述待追踪目标的预设输入图像。Optionally, the target detection image is an image of the previous frame of the image that includes the target to be tracked; or, the target detection image is a preset input image that includes the target to be tracked.
可选地,所述第二确定子模块4022用于:Optionally, the second determination submodule 4022 is used for:
当所述目标检测图像包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为目标特征向量,所述N为正整数。When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
可选地,所述目标追踪模型还用于根据预训练的位置向量参数确定所述 图像中所有目标对应的特征向量,所述第二确定子模块用于:Optionally, the target tracking model is also used to determine the corresponding feature vectors of all targets in the image according to the position vector parameter of pre-training, and the second determination submodule is used for:
当所述目标检测图像中包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为相似特征向量,所述N为正整数;When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
对所述图像中所有目标对应的特征向量和所述N个相似特征向量进行去重处理,以得到目标特征向量。Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
可选地,所述图像为所述待追踪视频的第一帧图像,所述装置还包括:Optionally, the image is a first frame image of the video to be tracked, and the device further includes:
第四确定子模块,用于将根据预训练的所述位置向量参数确定的所述特征向量作为所述目标特征向量。The fourth determination sub-module is configured to use the feature vector determined according to the pre-trained position vector parameter as the target feature vector.
可选地,所述第二确定子模块4022用于:Optionally, the second determination submodule 4022 is used for:
针对所述图像中每一目标对应的特征向量,将所述特征向量与所述N个相似特征向量进行第二相似度计算;For the feature vector corresponding to each target in the image, a second similarity calculation is performed on the feature vector and the N similar feature vectors;
当第二相似度计算结果大于或等于预设相似度时,在所述图像中所有目标对应的特征向量中或所述N个相似特征向量中删除所述第二相似度计算结果对应的特征向量;When the second similarity calculation result is greater than or equal to the preset similarity, delete the feature vector corresponding to the second similarity calculation result from the feature vectors corresponding to all the targets in the image or the N similar feature vectors ;
将删除处理后所述图像中所有目标对应的特征向量和所述N个相似特征向量中的剩余特征向量作为目标特征向量。The feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
可选地,所述装置400还包括用于训练得到所述位置向量参数的如下模块:Optionally, the apparatus 400 further includes the following modules for obtaining the position vector parameters through training:
第一训练模块,用于根据初始位置向量参数确定样本图像中目标对应的预测特征向量,以得到所述样本图像对应的预测目标信息,其中所述样本图像预先标注有对应的样本目标信息;The first training module is used to determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with the corresponding sample target information;
第二训练模块,用于根据所述预测目标信息与所述样本目标信息计算损失函数,并根据所述损失函数的计算结果,调整所述初始位置向量参数。The second training module is configured to calculate a loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
上述模块可以被实现为在一个或多个通用处理器上执行的软件组件,也可以被实现为诸如执行某些功能或其组合的硬件,诸如可编程逻辑设备和/或专用集成电路。在一些实施例中,这些模块可以体现为软件产品的形式,该软 件产品可以存储在非易失性存储介质中,这些非易失性存储介质中包括使得计算机设备(例如个人计算机、服务器、网络设备、移动终端等)实现本发明实施例中描述的方法。在另一些实施例中,上述模块还可以在单个设备上实现,也可以分布在多个设备上。这些模块的功能可以相互合并,也可以进一步拆分为多个子模块。The above-described modules may be implemented as software components executing on one or more general-purpose processors, or as hardware, such as programmable logic devices and/or application specific integrated circuits, that perform certain functions or combinations thereof. In some embodiments, the modules may be embodied in the form of a software product that may be stored in non-volatile storage media including a computer device (eg, a personal computer, a server, a network device, mobile terminal, etc.) to implement the method described in the embodiments of the present invention. In other embodiments, the above-mentioned modules may also be implemented on a single device, or may be distributed on multiple devices. The functions of these modules can be combined with each other or further split into multiple sub-modules.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
基于同一发明构思,本公开实施例还提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现上述任一视频目标追踪方法的步骤。Based on the same inventive concept, an embodiment of the present disclosure further provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, implements the steps of any of the above video target tracking methods.
基于同一发明构思,本公开实施例还提供一种电子设备,包括:Based on the same inventive concept, an embodiment of the present disclosure also provides an electronic device, including:
存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现上述任一视频目标追踪方法的步骤。A processing device is used to execute the computer program in the storage device, so as to realize the steps of any of the above video target tracking methods.
下面参考图5,其示出了适于用来实现本公开实施例的电子设备)500的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring next to FIG. 5 , it shows a schematic structural diagram of an electronic device 500 suitable for implementing an embodiment of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 5 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
如图5所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 5 , an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 508 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504 .
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键 盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509 . Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。基于同一发明构思,本公开实施例提供一种计算机程序,包括:指令,所述指令当由处理器执行时使所述处理器执行上述任一视频目标追踪方法。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 509 , or from the storage device 508 , or from the ROM 502 . When the computer program is executed by the processing apparatus 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed. Based on the same inventive concept, an embodiment of the present disclosure provides a computer program, including: instructions, when executed by a processor, the instructions cause the processor to execute any of the above video object tracking methods.
基于同一发明构思,本公开实施例提供一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行上述任一视频目标追踪方法。Based on the same inventive concept, an embodiment of the present disclosure provides a computer program product, including instructions, when executed by a processor, the instructions cause the processor to execute any of the above video object tracking methods.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可 以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
在一些实施方式中,可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol), can be used for communication, and can communicate with digital data in any form or medium (eg, communication network) interconnection. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取待追踪视频;将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述目标追踪模型用于执行如下处理:针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;根据所述目标特征向量,确定所述图像中的待追踪目标。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires the video to be tracked; inputs the video to be tracked into the target tracking model to obtain The target tracking result corresponding to the video to be tracked, and the target tracking model is used to perform the following processing: for each frame of the video to be tracked, determine the feature corresponding to the target to be tracked in the target detection image corresponding to the image vector, the target detection image includes the target to be tracked; first similarity is performed between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image degree calculation, and according to the first similarity calculation result, determine the target feature vector from all the feature vectors in the feature map; according to the target feature vector, determine the target to be tracked in the image.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如 “C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定。The modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the module does not constitute a limitation of the module itself under certain circumstances.
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或 半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本公开的一个或多个实施例,示例1提供了一种视频目标追踪方法,所述方法包括:According to one or more embodiments of the present disclosure, Example 1 provides a video object tracking method, the method comprising:
获取待追踪视频;Get the video to be tracked;
将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述目标追踪模型用于执行如下处理:Input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, and the target tracking model is used to perform the following processing:
针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;For each frame of the video to be tracked, determine a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;Perform a first similarity calculation on each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and according to the first similarity calculation result, in the Determine the target feature vector from all the feature vectors of the feature map;
根据所述目标特征向量,确定所述图像中的待追踪目标。According to the target feature vector, the target to be tracked in the image is determined.
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述目标检测图像为包括所述待追踪目标的、所述图像的上一帧图像;或者According to one or more embodiments of the present disclosure, Example 2 provides the method of Example 1, wherein the target detection image is a previous frame image of the image including the target to be tracked; or
所述目标检测图像为包括所述待追踪目标的预设输入图像。The target detection image is a preset input image including the target to be tracked.
根据本公开的一个或多个实施例,示例3提供了示例1的方法,所述根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量,包括:According to one or more embodiments of the present disclosure, Example 3 provides the method of Example 1, wherein according to the first similarity calculation result, determining a target feature vector among all feature vectors of the feature map, including:
当所述目标检测图像包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为目标特征向量,所述N为正整数。When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
根据本公开的一个或多个实施例,示例4提供了示例1的方法,所述目标追踪模型还用于根据预训练的位置向量参数确定所述图像中所有目标对应 的特征向量,所述根据所述第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量,包括:According to one or more embodiments of the present disclosure, Example 4 provides the method of Example 1, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, The first similarity calculation result determines the target feature vector in all the feature vectors of the feature map, including:
当所述目标检测图像中包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为相似特征向量,所述N为正整数;When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
对所述图像中所有目标对应的特征向量和所述N个相似特征向量进行去重处理,以得到目标特征向量。Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
根据本公开的一个或多个实施例,示例5提供了示例4的方法,所述图像为所述待追踪视频的第一帧图像,所述方法还包括:According to one or more embodiments of the present disclosure, Example 5 provides the method of Example 4, where the image is a first frame image of the video to be tracked, and the method further includes:
将根据预训练的所述位置向量参数确定的所述特征向量作为所述目标特征向量。The feature vector determined according to the pre-trained position vector parameter is used as the target feature vector.
根据本公开的一个或多个实施例,示例6提供了示例4的方法,所述对所述图像中所有目标对应的特征向量和所述N个相似特征向量进行去重处理,以得到目标特征向量,包括:According to one or more embodiments of the present disclosure, Example 6 provides the method of Example 4, wherein the feature vectors corresponding to all targets in the image and the N similar feature vectors are deduplicated to obtain target features vector, including:
针对所述图像中每一目标对应的特征向量,将所述特征向量与所述N个相似特征向量进行第二相似度计算;For the feature vector corresponding to each target in the image, a second similarity calculation is performed on the feature vector and the N similar feature vectors;
当第二相似度计算结果大于或等于预设相似度时,在所述图像中所有目标对应的特征向量中或所述N个相似特征向量中删除所述第二相似度计算结果对应的特征向量;When the second similarity calculation result is greater than or equal to the preset similarity, delete the feature vector corresponding to the second similarity calculation result from the feature vectors corresponding to all the targets in the image or the N similar feature vectors ;
将删除处理后所述图像中所有目标对应的特征向量和所述N个相似特征向量中的剩余特征向量作为目标特征向量。The feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
根据本公开的一个或多个实施例,示例7提供了示例4-6任一项的方法,所述位置向量参数是通过如下方式训练得到的:According to one or more embodiments of the present disclosure, Example 7 provides the method of any one of Examples 4-6, and the position vector parameter is obtained by training in the following manner:
根据初始位置向量参数确定样本图像中目标对应的预测特征向量,以得到所述样本图像对应的预测目标信息,其中所述样本图像预先标注有对应的样本目标信息;Determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with corresponding sample target information;
根据所述预测目标信息与所述样本目标信息计算损失函数,并根据所述 损失函数的计算结果,调整所述初始位置向量参数。A loss function is calculated according to the predicted target information and the sample target information, and the initial position vector parameter is adjusted according to the calculation result of the loss function.
根据本公开的一个或多个实施例,示例8提供了一种视频目标追踪装置,所述装置包括:According to one or more embodiments of the present disclosure, Example 8 provides a video target tracking apparatus, the apparatus comprising:
获取模块,用于获取待追踪视频;The acquisition module is used to acquire the video to be tracked;
追踪模块,用于将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述追踪模块包括:A tracking module, configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
第一确定子模块,用于针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;a first determination submodule, configured to determine, for each frame of the video to be tracked, a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
第二确定子模块,用于将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;The second determination submodule is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to the first similarity calculation. A similarity calculation result, determining the target feature vector in all the feature vectors of the feature map;
第三确定子模块,用于根据所述目标特征向量,确定所述图像中的待追踪目标。The third determination sub-module is configured to determine the target to be tracked in the image according to the target feature vector.
根据本公开的一个或多个实施例,示例9提供了示例8的装置,所述目标检测图像为包括所述待追踪目标的、所述图像的上一帧图像;或者,所述目标检测图像为包括所述待追踪目标的预设输入图像。According to one or more embodiments of the present disclosure, Example 9 provides the apparatus of Example 8, and the target detection image is a previous frame image of the image including the target to be tracked; or, the target detection image is a preset input image including the target to be tracked.
根据本公开的一个或多个实施例,示例10提供了示例8的装置,所述第二确定子模块用于:According to one or more embodiments of the present disclosure, Example 10 provides the apparatus of Example 8, the second determination submodule for:
当所述目标检测图像包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为目标特征向量,所述N为正整数。When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
根据本公开的一个或多个实施例,示例11提供了示例8的装置,所述目标追踪模型还用于根据预训练的位置向量参数确定所述图像中所有目标对应的特征向量,所述第二确定子模块用于:According to one or more embodiments of the present disclosure, Example 11 provides the apparatus of Example 8, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, and the first Two determine the sub-module for:
当所述目标检测图像中包括N个所述待追踪目标时,在所述特征图的所 有特征向量中,选取第一相似度计算结果最大的N个特征向量作为相似特征向量,所述N为正整数;When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
对所述图像中所有目标对应的特征向量和所述N个相似特征向量进行去重处理,以得到目标特征向量。Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
根据本公开的一个或多个实施例,示例12提供了示例11的装置,所述图像为所述待追踪视频的第一帧图像,所述装置还包括:According to one or more embodiments of the present disclosure, Example 12 provides the apparatus of Example 11, where the image is a first frame image of the video to be tracked, and the apparatus further includes:
第四确定子模块,用于将根据预训练的所述位置向量参数确定的所述特征向量作为所述目标特征向量。The fourth determination sub-module is configured to use the feature vector determined according to the pre-trained position vector parameter as the target feature vector.
根据本公开的一个或多个实施例,示例13提供了示例11的装置,所述第二确定子模块用于:According to one or more embodiments of the present disclosure, Example 13 provides the apparatus of Example 11, the second determination submodule for:
针对所述图像中每一目标对应的特征向量,将所述特征向量与所述N个相似特征向量进行第二相似度计算;For the feature vector corresponding to each target in the image, a second similarity calculation is performed on the feature vector and the N similar feature vectors;
当第二相似度计算结果大于或等于预设相似度时,在所述图像中所有目标对应的特征向量中或所述N个相似特征向量中删除所述第二相似度计算结果对应的特征向量;When the second similarity calculation result is greater than or equal to the preset similarity, delete the feature vector corresponding to the second similarity calculation result from the feature vectors corresponding to all the targets in the image or the N similar feature vectors ;
将删除处理后所述图像中所有目标对应的特征向量和所述N个相似特征向量中的剩余特征向量作为目标特征向量。The feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
根据本公开的一个或多个实施例,示例14提供了示例11-13任一项的装置,所述装置还包括用于训练得到所述位置向量参数的如下模块:According to one or more embodiments of the present disclosure, Example 14 provides the apparatus of any one of Examples 11-13, the apparatus further comprising the following module for training to obtain the position vector parameter:
第一训练模块,用于根据初始位置向量参数确定样本图像中目标对应的预测特征向量,以得到所述样本图像对应的预测目标信息,其中所述样本图像预先标注有对应的样本目标信息;The first training module is used to determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with the corresponding sample target information;
第二训练模块,用于根据所述预测目标信息与所述样本目标信息计算损失函数,并根据所述损失函数的计算结果,调整所述初始位置向量参数。The second training module is configured to calculate a loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
根据本公开的一个或多个实施例,示例15提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1-7任一项所述方法的步骤。According to one or more embodiments of the present disclosure, Example 15 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the method of any one of Examples 1-7.
根据本公开的一个或多个实施例,示例16提供了一种电子设备,包括:According to one or more embodiments of the present disclosure, Example 16 provides an electronic device comprising:
存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1-7任一项所述方法的步骤。A processing device, configured to execute the computer program in the storage device, to implement the steps of the method in any one of Examples 1-7.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims. Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

Claims (17)

  1. 一种视频目标追踪方法,包括:A video target tracking method, comprising:
    获取待追踪视频;Get the video to be tracked;
    将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述目标追踪模型用于执行如下处理:Input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, and the target tracking model is used to perform the following processing:
    针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;For each frame of the video to be tracked, determine a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
    将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;Perform a first similarity calculation on each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and according to the first similarity calculation result, in the Determine the target feature vector from all the feature vectors of the feature map;
    根据所述目标特征向量,确定所述图像中的待追踪目标。According to the target feature vector, the target to be tracked in the image is determined.
  2. 根据权利要求1所述的视频目标追踪方法,其中,所述目标检测图像为包括所述待追踪目标的、所述图像的上一帧图像;或者The video target tracking method according to claim 1, wherein the target detection image is a previous frame image of the image including the target to be tracked; or
    所述目标检测图像为包括所述待追踪目标的预设输入图像。The target detection image is a preset input image including the target to be tracked.
  3. 根据权利要求1或2所述的视频目标追踪方法,其中,所述根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量,包括:The video target tracking method according to claim 1 or 2, wherein, according to the first similarity calculation result, the target feature vector is determined from all the feature vectors of the feature map, including:
    当所述目标检测图像包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为目标特征向量,所述N为正整数。When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
  4. 根据权利要求1-3任一项所述的视频目标追踪方法,其中,所述目标追踪模型还用于根据预训练的位置向量参数确定所述图像中所有目标对应的特征向量,所述根据所述第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量,包括:The video target tracking method according to any one of claims 1-3, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, and the The first similarity calculation result is determined, and the target feature vector is determined in all the feature vectors of the feature map, including:
    当所述目标检测图像中包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为相似特征向量,所述N为正整数;When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
    对所述图像中所有目标对应的特征向量和所述N个相似特征向量进行去重处理,以得到目标特征向量。Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
  5. 根据权利要求1-3任一项所述的视频目标追踪方法,其中,所述目标追踪模型还用于根据预训练的位置向量参数确定所述图像中所有目标对应的特征向量,所述根据所述第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量,包括:The video target tracking method according to any one of claims 1-3, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, and the The first similarity calculation result is determined, and the target feature vector is determined in all the feature vectors of the feature map, including:
    当所述目标检测图像中包括N个所述待追踪目标时,在所述特征图的所有特征向量中,选取第一相似度计算结果最大的N个特征向量作为相似特征向量,所述N为正整数;When the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
    对所述图像中所有目标对应的特征向量和所述N个相似特征向量进行融合处理,以得到目标特征向量。The feature vectors corresponding to all targets in the image and the N similar feature vectors are fused to obtain target feature vectors.
  6. 根据权利要求4或5所述的视频目标追踪方法,其中,所述图像为所述待追踪视频的第一帧图像,所述视频目标追踪方法还包括:The video target tracking method according to claim 4 or 5, wherein the image is a first frame image of the video to be tracked, and the video target tracking method further comprises:
    将根据预训练的所述位置向量参数确定的所述特征向量作为所述目标特征向量。The feature vector determined according to the pre-trained position vector parameter is used as the target feature vector.
  7. 根据权利要求4所述的视频目标追踪方法,其中,所述对所述图像中所有目标对应的特征向量和所述N个相似特征向量进行去重处理,以得到目标特征向量,包括:The video target tracking method according to claim 4, wherein the deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors, comprising:
    针对所述图像中每一目标对应的特征向量,将所述特征向量与所述N个相似特征向量进行第二相似度计算;For the feature vector corresponding to each target in the image, a second similarity calculation is performed on the feature vector and the N similar feature vectors;
    当第二相似度计算结果大于或等于预设相似度时,在所述图像中所有目 标对应的特征向量中或所述N个相似特征向量中删除所述第二相似度计算结果对应的特征向量;When the second similarity calculation result is greater than or equal to the preset similarity, delete the feature vector corresponding to the second similarity calculation result from the feature vectors corresponding to all the targets in the image or the N similar feature vectors ;
    将删除处理后所述图像中所有目标对应的特征向量和所述N个相似特征向量中的剩余特征向量作为目标特征向量。The feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
  8. 根据权利要求4-7任一项所述的视频目标追踪方法,其中,所述位置向量参数是通过如下方式训练得到的:The video target tracking method according to any one of claims 4-7, wherein the position vector parameter is obtained by training in the following manner:
    根据初始位置向量参数确定样本图像中目标对应的预测特征向量,以得到所述样本图像对应的预测目标信息,其中所述样本图像预先标注有对应的样本目标信息;Determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with corresponding sample target information;
    根据所述预测目标信息与所述样本目标信息计算损失函数,并根据所述损失函数的计算结果,调整所述初始位置向量参数。A loss function is calculated according to the predicted target information and the sample target information, and the initial position vector parameter is adjusted according to the calculation result of the loss function.
  9. 根据权利要求1-8任一项所述的视频目标追踪方法,其中,所述针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,包括:The video target tracking method according to any one of claims 1-8, wherein, for each frame of the video to be tracked, the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image is determined. ,include:
    通过对待追踪目标中心的像素点的图像特征进行向量化,或对待追踪目标能够与其他目标区分的某一像素点的图像特征进行向量化,得到所述图像对应的目标检测图像中待追踪目标对应的特征向量。By vectorizing the image feature of the pixel at the center of the target to be tracked, or the image feature of a pixel that can be distinguished from other targets, the target detection image corresponding to the image corresponds to the target to be tracked. eigenvectors of .
  10. 根据权利要求1-9任一项所述的视频目标追踪方法,其中,所述图像对应的特征图通过对图像中每一像素点的图像特征向量化得到。The video target tracking method according to any one of claims 1-9, wherein the feature map corresponding to the image is obtained by quantizing the image feature vector of each pixel in the image.
  11. 根据权利要求1-10任一项所述的视频目标追踪方法,其中,所述将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,包括:The video target tracking method according to any one of claims 1-10, wherein the step of comparing each feature vector in the feature map corresponding to the image with all the target detection images corresponding to the to-be-tracked target The first similarity calculation is performed on the feature vector, including:
    将所述图像中的每一特征向量与目标检测图像中所述待追踪目标对应的 特征向量进行向量点乘计算或欧式距离计算。Perform vector dot product calculation or Euclidean distance calculation between each feature vector in the image and the feature vector corresponding to the target to be tracked in the target detection image.
  12. 根据权利要求1-11任一项所述的视频目标追踪方法,其中,所述根据所述目标特征向量,确定所述图像中的待追踪目标,包括:The video target tracking method according to any one of claims 1-11, wherein the determining the target to be tracked in the image according to the target feature vector comprises:
    根据所述目标特征向量,确定所述图像中的在目标检测图像和所述图像中都存在的待追踪目标。According to the target feature vector, the target to be tracked that exists in both the target detection image and the image in the image is determined.
  13. 一种视频目标追踪装置,包括:A video target tracking device, comprising:
    获取模块,用于获取待追踪视频;The acquisition module is used to acquire the video to be tracked;
    追踪模块,用于将所述待追踪视频输入目标追踪模型,以得到所述待追踪视频对应的目标追踪结果,所述追踪模块包括:A tracking module, configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
    第一确定子模块,用于针对所述待追踪视频的每一帧图像,确定所述图像对应的目标检测图像中待追踪目标对应的特征向量,所述目标检测图像包括所述待追踪目标;a first determination submodule, configured to determine, for each frame of the video to be tracked, a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
    第二确定子模块,用于将所述图像对应的特征图中的每一特征向量与所述目标检测图像中所述待追踪目标对应的所述特征向量进行第一相似度计算,并根据第一相似度计算结果,在所述特征图的所有特征向量中确定目标特征向量;The second determination submodule is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to the first similarity calculation. A similarity calculation result, determining the target feature vector in all the feature vectors of the feature map;
    第三确定子模块,用于根据所述目标特征向量,确定所述图像中的待追踪目标。The third determination sub-module is configured to determine the target to be tracked in the image according to the target feature vector.
  14. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理装置执行时实现权利要求1-12中任一项所述的视频目标追踪方法的步骤。A computer-readable medium on which a computer program is stored, wherein, when the program is executed by a processing device, the steps of the video object tracking method according to any one of claims 1-12 are implemented.
  15. 一种电子设备,包括:An electronic device comprising:
    存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求 1-12中任一项所述的视频目标追踪方法的步骤。A processing device is configured to execute the computer program in the storage device to implement the steps of the video target tracking method according to any one of claims 1-12.
  16. 一种计算机程序,包括:A computer program comprising:
    指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-12中任一项所述的视频目标追踪方法。Instructions which, when executed by a processor, cause the processor to perform the video object tracking method of any of claims 1-12.
  17. 一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1-12中任一项所述的视频目标追踪方法。A computer program product comprising instructions which, when executed by a processor, cause the processor to perform the video object tracking method of any of claims 1-12.
PCT/CN2022/075086 2021-02-09 2022-01-29 Video target tracking method, video target tracking apparatus, storage medium, and electronic device WO2022171036A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110179157.5A CN112907628A (en) 2021-02-09 2021-02-09 Video target tracking method and device, storage medium and electronic equipment
CN202110179157.5 2021-02-09

Publications (1)

Publication Number Publication Date
WO2022171036A1 true WO2022171036A1 (en) 2022-08-18

Family

ID=76123159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075086 WO2022171036A1 (en) 2021-02-09 2022-01-29 Video target tracking method, video target tracking apparatus, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN112907628A (en)
WO (1) WO2022171036A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385497A (en) * 2023-05-29 2023-07-04 成都与睿创新科技有限公司 Custom target tracking method and system for body cavity
CN117975198A (en) * 2024-02-02 2024-05-03 北京视觉世界科技有限公司 Automatic construction method of target detection class data set and related equipment thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112907628A (en) * 2021-02-09 2021-06-04 北京有竹居网络技术有限公司 Video target tracking method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214238A (en) * 2017-06-30 2019-01-15 百度在线网络技术(北京)有限公司 Multi-object tracking method, device, equipment and storage medium
US20200065617A1 (en) * 2018-08-24 2020-02-27 Nec Laboratories America, Inc. Unsupervised domain adaptation for video classification
CN111242973A (en) * 2020-01-06 2020-06-05 上海商汤临港智能科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN111311635A (en) * 2020-02-08 2020-06-19 腾讯科技(深圳)有限公司 Target positioning method, device and system
CN112907628A (en) * 2021-02-09 2021-06-04 北京有竹居网络技术有限公司 Video target tracking method and device, storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829397B (en) * 2019-01-16 2021-04-02 创新奇智(北京)科技有限公司 Video annotation method and system based on image clustering and electronic equipment
CN110717414B (en) * 2019-09-24 2023-01-03 青岛海信网络科技股份有限公司 Target detection tracking method, device and equipment
CN111898416A (en) * 2020-06-17 2020-11-06 绍兴埃瓦科技有限公司 Video stream processing method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214238A (en) * 2017-06-30 2019-01-15 百度在线网络技术(北京)有限公司 Multi-object tracking method, device, equipment and storage medium
US20200065617A1 (en) * 2018-08-24 2020-02-27 Nec Laboratories America, Inc. Unsupervised domain adaptation for video classification
CN111242973A (en) * 2020-01-06 2020-06-05 上海商汤临港智能科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN111311635A (en) * 2020-02-08 2020-06-19 腾讯科技(深圳)有限公司 Target positioning method, device and system
CN112907628A (en) * 2021-02-09 2021-06-04 北京有竹居网络技术有限公司 Video target tracking method and device, storage medium and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385497A (en) * 2023-05-29 2023-07-04 成都与睿创新科技有限公司 Custom target tracking method and system for body cavity
CN116385497B (en) * 2023-05-29 2023-08-22 成都与睿创新科技有限公司 Custom target tracking method and system for body cavity
CN117975198A (en) * 2024-02-02 2024-05-03 北京视觉世界科技有限公司 Automatic construction method of target detection class data set and related equipment thereof

Also Published As

Publication number Publication date
CN112907628A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
WO2022171036A1 (en) Video target tracking method, video target tracking apparatus, storage medium, and electronic device
CN110298413B (en) Image feature extraction method and device, storage medium and electronic equipment
JP2023547917A (en) Image segmentation method, device, equipment and storage medium
WO2022252881A1 (en) Image processing method and apparatus, and readable medium and electronic device
WO2022105779A1 (en) Image processing method, model training method, and apparatus, medium, and device
WO2023030370A1 (en) Endoscope image detection method and apparatus, storage medium, and electronic device
CN111784712B (en) Image processing method, device, equipment and computer readable medium
WO2022028254A1 (en) Positioning model optimization method, positioning method and positioning device
CN113033580B (en) Image processing method, device, storage medium and electronic equipment
CN110347875B (en) Video scene classification method and device, mobile terminal and storage medium
WO2023179310A1 (en) Image restoration method and apparatus, device, medium, and product
WO2022233223A1 (en) Image splicing method and apparatus, and device and medium
WO2023030427A1 (en) Training method for generative model, polyp identification method and apparatus, medium, and device
CN113449070A (en) Multimodal data retrieval method, device, medium and electronic equipment
CN112330788A (en) Image processing method, image processing device, readable medium and electronic equipment
CN113610034B (en) Method and device for identifying character entities in video, storage medium and electronic equipment
CN108257081B (en) Method and device for generating pictures
CN111862351B (en) Positioning model optimization method, positioning method and positioning equipment
CN111311609B (en) Image segmentation method and device, electronic equipment and storage medium
CN110765304A (en) Image processing method, image processing device, electronic equipment and computer readable medium
WO2023016290A1 (en) Video classification method and apparatus, readable medium and electronic device
WO2022194145A1 (en) Photographing position determination method and apparatus, device, and medium
WO2022052889A1 (en) Image recognition method and apparatus, electronic device, and computer-readable medium
CN113435528B (en) Method, device, readable medium and electronic equipment for classifying objects
CN114004229A (en) Text recognition method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22752197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22752197

Country of ref document: EP

Kind code of ref document: A1