CN117876428B

CN117876428B - Target tracking method, device, computer equipment and medium based on image processing

Info

Publication number: CN117876428B
Application number: CN202410275450.5A
Authority: CN
Inventors: 董方; 沈傲然
Original assignee: Jinrui Tongchuang Beijing Technology Co ltd
Current assignee: Jinrui Tongchuang Beijing Technology Co ltd
Priority date: 2024-03-12
Filing date: 2024-03-12
Publication date: 2024-05-17
Anticipated expiration: 2044-03-12
Also published as: CN117876428A

Abstract

The invention provides a target tracking method, a device, computer equipment and a medium based on image processing, which relate to the technical field of video image processing, wherein the method comprises the following steps: inputting video image data into an edge detection model to perform an edge detection task, generating an edge feature image, inputting an area growth model to perform an area growth task, generating a pixel feature image, inputting a semantic segmentation model to perform a semantic segmentation task, and generating a semantic feature image; image fusion is carried out on the edge feature image and the semantic feature image, image fusion is carried out on the pixel feature image and the semantic feature image, and a final feature image is generated in a feature image addition mode; determining a contour image of the target object to be detected according to the image features in the final feature image; and performing a target tracking task on the target object to be detected in the video image data based on the contour image. According to the scheme, the images are fused and then tracked, so that the accuracy of target tracking is improved.

Description

Target tracking method, device, computer equipment and medium based on image processing

Technical Field

The present invention relates to the field of video image processing technologies, and in particular, to an image processing-based target tracking method, apparatus, computer device, and medium.

Background

The acquisition of objects in images based on the provided coordinate points is widely used in a variety of different application fields. However, because the images are complex and various and the objects in the images show irregularities, relatively clear judgment of the objects and the edges of the objects is challenging, and secondly, the accuracy of the provided coordinates also affects the processing effect. On the basis of extracting the obtained object, the marked object is tracked in real time, and the accuracy of object tracking is limited by the accuracy of the extracted object, the effect of a tracking algorithm and the performance processing of edge technology. Thus, a more accurate image-based processing method is required.

Meanwhile, the transmitted image original data and the processed data are required to be matched and stored for subsequent tracking processing, and the data collection mode and the storage mechanism have great influence on the performance level user experience effect of the system. Meanwhile, the target tracking method needs the processing of the image in the early stage and the following target tracking task, the overall processing of the video data is complex, the higher the resolution is, the higher the requirement of the image data on the machine performance is, and the great difficulty is brought to the system stability and the real-time performance.

Disclosure of Invention

In view of the above, the embodiment of the invention provides an image processing-based target tracking method, so as to solve the technical problem of low accuracy in tracking targets in images in the prior art. The method comprises the following steps:

Inputting video image data into an edge detection model to perform an edge detection task, generating an edge feature image, inputting video image data into a region growth model to perform a region growth task, generating a pixel feature image, inputting video image data into a semantic segmentation model to perform a semantic segmentation task, and generating a semantic feature image;

Performing image fusion on the edge feature image and the semantic feature image to generate a first fusion image, performing image fusion on the pixel feature image and the semantic feature image to generate a second fusion image, and generating a final feature image by adding the first fusion image and the second fusion image in a feature image adding mode;

determining a contour image of the target object to be detected according to the image features in the final feature image;

And performing a target tracking task on the target object to be detected in the video image data based on the contour image.

The embodiment of the invention also provides a target tracking device based on image processing, which is used for solving the technical problem of lower target tracking accuracy in images in the prior art. The device comprises:

the feature image generation module is used for inputting video image data into the edge detection model to perform an edge detection task, generating an edge feature image, inputting video image data into the region growth model to perform a region growth task, generating a pixel feature image, inputting video image data into the semantic segmentation model to perform a semantic segmentation task, and generating a semantic feature image;

the image feature fusion module is used for carrying out image fusion on the edge feature image and the semantic feature image to generate a first fusion image, carrying out image fusion on the pixel feature image and the semantic feature image to generate a second fusion image, and generating a final feature image by adding the first fusion image and the second fusion image through a feature image;

the contour image extraction module is used for determining a contour image of the target object to be detected according to the image features in the final feature image;

And the target tracking module is used for carrying out a target tracking task on the target object to be detected in the video image data based on the contour image.

The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes any target tracking method based on image processing when executing the computer program so as to solve the technical problem of lower target tracking accuracy in images in the prior art.

The embodiment of the invention also provides a computer readable storage medium which stores a computer program for executing any target tracking method based on image processing, so as to solve the technical problem of lower target tracking accuracy in images in the prior art.

Compared with the prior art, the beneficial effects that above-mentioned at least one technical scheme that this description embodiment adopted can reach include at least:

the method comprises the steps of generating a semantic feature image through a semantic segmentation model, combining edge detection and region growth, mutually fusing and superposing three features of edge features, pixel features and semantic features, comprehensively utilizing feature information of multiple paths of data to obtain a final feature image, effectively solving the problems of blurring and inaccuracy of the edge, applying the obtained final feature image to target tracking, and improving the accuracy of subsequent target tracking.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an object tracking method based on image processing according to an embodiment of the present invention;

FIG. 2 is a block diagram of a computer device according to an embodiment of the present invention;

Fig. 3 is a block diagram of an object tracking device based on image processing according to an embodiment of the present invention.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In an embodiment of the present invention, there is provided an image processing-based object tracking method, as shown in fig. 1, including:

step S101: inputting video image data into an edge detection model to perform an edge detection task, generating an edge feature image, inputting video image data into a region growth model to perform a region growth task, generating a pixel feature image, inputting video image data into a semantic segmentation model to perform a semantic segmentation task, and generating a semantic feature image;

Step S102: performing image fusion on the edge feature image and the semantic feature image to generate a first fusion image, performing image fusion on the pixel feature image and the semantic feature image to generate a second fusion image, and generating a final feature image by adding the first fusion image and the second fusion image in a feature image adding mode;

Step S103: determining a contour image of the target object to be detected according to the image features in the final feature image;

step S104: and performing a target tracking task on the target object to be detected in the video image data based on the contour image.

Specifically, the region growing task combines point clouds with similarity to form a region. Firstly, a seed point is found out for each region to be segmented and used as a starting point of growth, and then points with the same or similar properties as the seeds in the surrounding vicinity of the seed point are merged into the region where the seed pixels are located. And the new dot continues to grow as a seed around until no more pixels meeting the condition can be included to complete a region growing task. The generated pixel characteristic image has a clearer and more recognizable boundary through the regional growth task.

In the implementation, in order to improve the accuracy of the generated semantic feature image, the video image data is input into a semantic segmentation model to carry out a semantic segmentation task, and the semantic feature image is generated by the following steps:

Extracting an original image in video image data, and generating a preprocessed image after carrying out data preprocessing on the original image; inputting the preprocessed image into the trained neural network to extract a trunk feature network of the image, and obtaining a feature map of the preprocessed image; setting a region of interest for each coordinate point in the feature map, obtaining a plurality of candidate regions of interest, performing binary classification and bounding box regression on the candidate regions of interest by using a region generation network, filtering invalid candidate regions of interest, and generating filtered regions of interest; dividing the filtered region of interest into a plurality of grids, performing bilinear interpolation in each grid, and generating a semantic feature image through processing of a full convolution neural network.

Specifically, the data preprocessing firstly unifies the size and the dimension of the original image, and then, the normalization processing is carried out. The trained neural network may use a ResNet network or other backbone feature extraction network. And carrying out foreground binary classification and bounding box regression on the candidate regions of interest by using a region generation network, filtering out a part of invalid candidate regions of interest, and generating a filtered region of interest. And (3) corresponding pixels of the original image with pixels of the semantic feature image, corresponding the semantic feature image with features of fixed size, and processing the filtered region of interest through a full convolution neural network to generate the semantic feature image. Through the processing, the input areas with different feature sizes can obtain the output features with the same size, so that the problem of mismatch caused by twice quantization in the pooling operation of the region of interest is avoided, and the accuracy of semantic segmentation is improved.

In the implementation, in order to improve the recognition degree of the features and more accurately determine the edges of the images, the image fusion of the edge feature images and the semantic feature images is realized through the following steps of generating a first fusion image, the image fusion of the pixel feature images and the semantic feature images is performed, generating a second fusion image, and generating a final feature image by adding the first fusion image and the second fusion image through feature images:

Carrying out feature stitching on image edge features in the edge feature images and image semantic features in the semantic feature images through feature image channels to generate a first fusion image, wherein the number of feature channels of the first fusion image is the sum of the number of feature channels of the image edge features and the image semantic features; performing feature stitching on image pixel segmentation features in the pixel feature images and image semantic features in the semantic feature images through feature image channels to generate second fusion images, wherein the number of feature channels of the second fusion images is the sum of the number of feature channels of the image pixel segmentation features and the number of feature channels of the image semantic features; and superposing pixels at corresponding positions in the feature images of the first fusion image and the second fusion image to generate a final feature image, wherein the number of feature channels of the final feature image is the same as that of the feature channels of the first fusion image.

Specifically, in order to solve the problems of blurring of edges and ambiguous segmentation in segmentation, semantic segmentation is performed by combining an edge detection method. Feature stitching and feature fusion are used to improve edge sharpness. And carrying out feature stitching on the image edge features in the edge feature images and the image semantic features in the semantic feature images through feature image channels, adding values of elements at positions corresponding to the upper and lower feature images, strengthening the edge features, and increasing the information quantity of the features. And carrying out feature stitching on image pixel segmentation features in the pixel feature images and image semantic features in the semantic feature images through feature map channels, and adding values of elements at positions corresponding to the upper and lower feature maps. Through the two-time stitching, the image pixel segmentation feature and the image edge feature are added to the feature respectively, so that the information quantity of the feature is increased, and the information of the detail part is improved. The first fused image and the second fused image are overlapped by pixels, the upper and lower paths of features are directly cascaded and fused, and different channels are combined, so that the number of features is increased.

In specific implementation, in order to extract the contour image of the target object, determining the contour image of the target object to be detected according to the image features in the final feature image is realized through the following steps:

Classifying all the features of the final feature image through a multi-class classification function to generate a plurality of feature classes; and generating a contour image of the target object to be detected by the features in each feature class.

In specific implementation, in order to detect the contour image after classifying the features, the following steps are adopted to realize classification of all the features of the final feature image through a multi-category classification function, so as to generate a plurality of feature categories:

Inputting all the features of the final feature image into a multi-layer convolution network, and fusing the features of the sub-networks by utilizing the multi-layer convolution network to generate a feature fusion image, wherein the features of the sub-networks comprise image semantic features, image edge features and image pixel segmentation features; and calculating the loss of each feature in the feature fusion image, classifying all the features according to the loss, and generating a plurality of feature classes.

In the specific implementation, in order to improve the accuracy and the efficiency of target tracking by adopting different processes for single-target tracking and multi-target tracking, the following steps are adopted to realize target tracking of a target object to be detected in video image data based on a contour image:

When the target object to be detected is one, determining a contour image of the target object to be detected in an initial frame image of video image data, determining a plurality of candidate contour images of the target object to be detected in each next frame image of the video image data, extracting characteristics of the plurality of candidate contour images, calculating a confidence score of each candidate contour image, and taking the candidate contour image with the highest confidence score as a predicted contour image of the target object to be detected in each next frame image; when a plurality of target objects to be detected are detected, determining a contour image of each target object to be detected in each frame of image of video image data, extracting appearance characteristics and motion characteristics of each contour image in each frame of image, calculating the matching degree between the contour images in the front frame of image and the rear frame of image by using a Hungary algorithm and a cascade matching algorithm, and distributing a unique target identifier for each contour image with the matching degree reaching a preset threshold value.

Specifically, in practical application, a user can select one or more objects as tracking targets according to requirements, and different methods are adopted to track the targets. Single target tracking algorithms typically employ a single target tracker to achieve continuous tracking of targets through target feature extraction and matching. Single object tracking extracts the position and motion state of the object of interest. The multi-target tracking algorithm needs to process simultaneous tracking of multiple targets, and the strategy comprises multi-target association and data association technology, and the association is carried out through the space and time relationship between the targets so as to realize accurate tracking of the multiple targets. Appearance features extracted by multi-target tracking are used for feature comparison to determine the same target in motion. Each tracked target is assigned a unique target identifier, and each target is distinguished by the unique target identifier.

In the implementation, in order to solve the problem that video data collection, storage and a large amount of computation are excessively consumed, the performance and stability of the whole system are improved, and the collection channel and the distributed system are constructed by the following steps:

Constructing a plurality of collection channels of video image data, wherein the collection channels include a sensor data channel and a video stream data channel; and constructing a distributed system, and processing an edge detection task, a region growing task, a semantic segmentation task and a target tracking task in parallel in the distributed system.

Specifically, an efficient data collection pipeline is established, various data sources such as sensor data, video streams and the like are accessed, and real-time or batch data acquisition and processing are performed. For storage of video image data, a suitable database technology (such as a NoSQL database or a distributed file storage system) can be selected, and data compression and index optimization are performed to improve the storage efficiency and the retrieval speed of the data. In the whole system construction, a distributed system architecture is adopted, and a load balancing and fault tolerance mechanism is introduced so as to ensure the stability of the system under high load and abnormal conditions. Meanwhile, system performance optimization, including algorithm optimization, parallel computation, hardware acceleration and the like, is performed to improve response speed and throughput of the system.

In this embodiment, a computer device is provided, as shown in fig. 2, including a memory 201, a processor 202, and a computer program stored on the memory and executable on the processor, where the processor implements any of the above-mentioned image processing-based object tracking methods when executing the computer program.

In particular, the computer device may be a computer terminal, a server or similar computing means.

In the present embodiment, there is provided a computer-readable storage medium storing a computer program that executes any of the above-described image processing-based object tracking methods.

In particular, computer-readable storage media include both permanent and non-permanent, removable and non-removable media. The computer-readable storage medium may implement information storage by any method or technique. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable storage media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

Based on the same inventive concept, the embodiment of the invention also provides an object tracking device based on image processing, as described in the following embodiment. Since the principle of solving the problem of the image processing-based target tracking apparatus is similar to that of the image processing-based target tracking method, the implementation of the image processing-based target tracking apparatus can be referred to the implementation of the image processing-based target tracking method, and the repetition is omitted. As used below, the term "unit" or "module" may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 3 is a block diagram of a target tracking apparatus based on image processing according to an embodiment of the present invention, as shown in fig. 3, including: the feature image generation module 301, the picture feature fusion module 302, the contour image extraction module 303, and the target tracking module 304 are described below.

The feature image generating module 301 is configured to input video image data into an edge detection model for performing an edge detection task, generate an edge feature image, input video image data into an area growth model for performing an area growth task, generate a pixel feature image, input video image data into a semantic segmentation model for performing a semantic segmentation task, and generate a semantic feature image;

The image feature fusion module 302 is configured to perform image fusion on the edge feature image and the semantic feature image to generate a first fused image, perform image fusion on the pixel feature image and the semantic feature image to generate a second fused image, and generate a final feature image by adding the feature images to the first fused image and the second fused image;

A contour image extraction module 303, configured to determine a contour image of the target object to be detected according to the image features in the final feature image;

the target tracking module 304 is configured to perform a target tracking task on a target object to be detected in the video image data based on the contour image.

In one embodiment, the feature image generation module includes:

The preprocessing unit is used for extracting an original image in the video image data, and generating a preprocessed image after data preprocessing is carried out on the original image;

The trunk feature extraction unit is used for inputting the preprocessed image into the trained neural network to extract a trunk feature network of the image, so as to obtain a feature map of the preprocessed image;

The interest region filtering unit is used for setting an interest region for each coordinate point in the feature map, obtaining a plurality of candidate interest regions, performing binary classification and bounding box regression on the candidate interest regions by using a region generation network, filtering invalid candidate interest regions, and generating a filtered interest region;

And the characteristic image generating unit is used for dividing the filtered region of interest into a plurality of grids, performing bilinear interpolation in each grid, and generating a semantic characteristic image after processing through a full convolution neural network.

In one embodiment, the picture feature fusion module includes:

The first fusion unit is used for carrying out feature stitching on the image edge features in the edge feature images and the image semantic features in the semantic feature images through feature image channels to generate a first fusion image, wherein the number of feature channels of the first fusion image is the sum of the number of feature channels of the image edge features and the image semantic features;

The second fusion unit is used for carrying out feature stitching on the image pixel segmentation features in the pixel feature images and the image semantic features in the semantic feature images through feature image channels to generate second fusion images, wherein the number of the feature channels of the second fusion images is the sum of the number of the feature channels of the image pixel segmentation features and the number of the feature channels of the image semantic features;

And the feature superposition unit is used for superposing pixels at corresponding positions in the feature map of the first fusion image and the feature map of the second fusion image together to generate a final feature image, wherein the number of feature channels of the final feature image is the same as that of the feature channels of the first fusion image.

In one embodiment, the contour image extraction module includes:

The feature classification unit is used for classifying all the features of the final feature image through a multi-category classification function to generate a plurality of feature categories;

and the contour image acquisition unit is used for generating a contour image of the target object to be detected from the features in each feature class.

In one embodiment, the feature classification unit includes:

The characteristic fusion unit is used for inputting all the characteristics of the final characteristic image into a multi-layer convolution network, and fusing the characteristics of the sub-network by utilizing the multi-layer convolution network to generate a characteristic fusion image, wherein the characteristics of the sub-network comprise image semantic characteristics, image edge characteristics and image pixel segmentation characteristics;

the feature classification unit is used for calculating the loss of each feature in the feature fusion image, classifying all the features according to the loss, and generating a plurality of feature classes.

In one embodiment, the target tracking module comprises:

A single target tracking unit, configured to determine, when the target object to be detected is one, a contour image of the target object to be detected in an initial frame image of the video image data, determine a plurality of candidate contour images of the target object to be detected in each next frame image of the video image data, extract features of the plurality of candidate contour images, calculate a confidence score of each candidate contour image, and use the candidate contour image with the highest confidence score as a predicted contour image of the target object to be detected in each next frame image;

And the plurality of target tracking units are used for determining the outline image of each target object to be detected in each frame of image of the video image data when the target objects to be detected are a plurality of, extracting the appearance characteristic and the motion characteristic of each outline image in each frame of image, calculating the matching degree between the outline images in the front frame of image and the back frame of image by using the Hungary algorithm and the cascade matching algorithm, and distributing a unique target identifier for each outline image with the matching degree reaching a preset threshold value.

In one embodiment, the apparatus further comprises:

The system and the channel construction module are used for constructing channels and the distributed system to improve the stability and the performance of the system.

In one embodiment, a system and channel construction module includes:

A collection channel construction unit for constructing a plurality of collection channels of video image data, wherein the collection channels include a sensor data channel and a video stream data channel;

The distributed system construction unit is used for processing the edge detection task, the region growing task, the semantic segmentation task and the target tracking task in parallel in the distributed system.

The embodiment of the invention realizes the following technical effects:

Generating a semantic feature image through a semantic segmentation model, combining edge detection and region growth, mutually fusing and superposing three features of edge features, pixel features and semantic features, comprehensively utilizing feature information of multiple paths of data to obtain a final feature image, effectively solving the problems of blurring and inaccuracy of the edge, applying the obtained final feature image to target tracking, and improving the accuracy of subsequent target tracking; different target tracking methods are adopted for different numbers of target objects, so that the accuracy of target tracking is improved, and the calculated amount is relatively reduced; the data is collected and processed in real time or in batch through the collecting pipelines of various data sources, and the distributed storage system is adopted to compress and index and optimize the image and video data, so that the storage efficiency and the retrieval speed of the data are improved; and a distributed system architecture is adopted, a load balancing and fault tolerant mechanism is introduced to ensure the stability of the system under high load and abnormal conditions, and the response speed and throughput of the whole target tracking method are improved through system optimization such as algorithm optimization, parallel computing and hardware acceleration.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than what is shown or described, or they may be separately fabricated into individual integrated circuit modules, or a plurality of modules or steps in them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image processing-based target tracking method, comprising:

inputting video image data into an edge detection model to perform an edge detection task, generating an edge feature image, inputting the video image data into a region growth model to perform a region growth task, generating a pixel feature image, inputting the video image data into a semantic segmentation model to perform a semantic segmentation task, and generating a semantic feature image;

Performing image fusion on the edge feature image and the semantic feature image to generate a first fusion image, performing image fusion on the pixel feature image and the semantic feature image to generate a second fusion image, and generating a final feature image by adding feature images of the first fusion image and the second fusion image;

and performing target tracking task on the target object to be detected in the video image data based on the contour image.

2. The image processing-based object tracking method according to claim 1, wherein inputting the video image data into a semantic segmentation model for a semantic segmentation task, generating a semantic feature image, comprises:

Extracting an original image in the video image data, and generating a preprocessed image after carrying out data preprocessing on the original image;

inputting the preprocessed image into a trained neural network to extract a main feature network of the image, and obtaining a feature map of the preprocessed image;

Setting a region of interest for each coordinate point in the feature map, obtaining a plurality of candidate regions of interest, performing binary classification and bounding box regression on the candidate regions of interest by using a region generation network, filtering the invalid candidate regions of interest, and generating filtered regions of interest;

dividing the filtered region of interest into a plurality of grids, carrying out bilinear interpolation in each grid, and generating a semantic feature image after processing through a full convolution neural network.

3. The image processing-based object tracking method according to claim 1, wherein performing image fusion on the edge feature image and the semantic feature image to generate a first fused image, performing image fusion on the pixel feature image and the semantic feature image to generate a second fused image, and generating a final feature image by adding feature images from the first fused image and the second fused image, comprises:

Performing feature stitching on image edge features in the edge feature images and image semantic features in the semantic feature images through feature image channels to generate the first fusion image, wherein the number of feature channels of the first fusion image is the sum of the number of feature channels of the image edge features and the image semantic features;

Performing feature stitching on image pixel segmentation features in the pixel feature images and image semantic features in the semantic feature images through feature image channels to generate the second fusion image, wherein the number of feature channels of the second fusion image is the sum of the number of feature channels of the image pixel segmentation features and the image semantic features;

And superposing pixels at corresponding positions in the feature images of the first fusion image and the second fusion image to generate the final feature image, wherein the number of feature channels of the final feature image is the same as that of the first fusion image.

4. The image processing-based object tracking method according to claim 3, wherein determining a contour image of the object to be detected based on the image features in the final feature image, comprises:

classifying all the features of the final feature image through a multi-class classification function to generate a plurality of feature classes;

And generating a contour image of the target object to be detected by the features in each feature class.

5. The image processing-based object tracking method of claim 4, wherein classifying all features of the final feature image by a multi-class classification function generates a plurality of feature classes, comprising:

Inputting all the features of the final feature image into a multi-layer convolution network, and fusing the features of a sub-network by utilizing the multi-layer convolution network to generate a feature fusion image, wherein the features of the sub-network comprise the image semantic features, the image edge features and the image pixel segmentation features;

and calculating the loss of each feature in the feature fusion image, classifying all the features according to the loss, and generating a plurality of feature classes.

6. The image processing-based object tracking method according to any one of claims 1 to 5, characterized in that object tracking of the object to be detected in the video image data based on the contour image includes:

When the target object to be detected is one, determining a contour image of the target object to be detected in an initial frame image of the video image data, determining a plurality of candidate contour images of the target object to be detected in each next frame image of the video image data, extracting characteristics of the plurality of candidate contour images, calculating a confidence score of each candidate contour image, and taking the candidate contour image with the highest confidence score as a predicted contour image of the target object to be detected in each next frame image;

When a plurality of target objects to be detected are detected, determining contour images of each target object to be detected in each frame of image of the video image data, extracting appearance characteristics and motion characteristics of each contour image in each frame of image, calculating the matching degree between the contour images in the front frame of image and the rear frame of image by using a Hungary algorithm and a cascade matching algorithm, and distributing unique target identification for each contour image with the matching degree reaching a preset threshold value.

7. The image processing-based object tracking method according to any one of claims 1 to 5, characterized by further comprising:

Constructing a plurality of collection channels of the video image data, wherein the collection channels comprise a sensor data channel and a video stream data channel;

and constructing a distributed system, and processing the edge detection task, the region growing task, the semantic segmentation task and the target tracking task in the distributed system in parallel.

8. An image processing-based object tracking apparatus, comprising:

The feature image generation module is used for inputting video image data into an edge detection model to perform an edge detection task, generating an edge feature image, inputting the video image data into a region growth model to perform a region growth task, generating a pixel feature image, inputting the video image data into a semantic segmentation model to perform a semantic segmentation task, and generating a semantic feature image;

the image feature fusion module is used for carrying out image fusion on the edge feature image and the semantic feature image to generate a first fusion image, carrying out image fusion on the pixel feature image and the semantic feature image to generate a second fusion image, and generating a final feature image by adding the feature images of the first fusion image and the second fusion image;

And the target tracking module is used for carrying out target tracking task on the target object to be detected in the video image data based on the contour image.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the image processing based object tracking method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program that executes the image processing-based object tracking method according to any one of claims 1 to 7.