CN116665177B

CN116665177B - Data processing method, device, electronic device and storage medium

Info

Publication number: CN116665177B
Application number: CN202310950400.8A
Authority: CN
Inventors: 郑杨韬; 李帅君; 朱子凌
Original assignee: Foss Hangzhou Intelligent Technology Co Ltd
Current assignee: Foss Hangzhou Intelligent Technology Co Ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-13
Anticipated expiration: 2043-07-31
Also published as: CN116665177A

Abstract

The application relates to a data processing method, a device, an electronic device and a storage medium, wherein the data processing method comprises the following steps: acquiring data to be processed of continuous multiframes; determining pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in data to be processed of each frame; performing target association matching on continuous multi-frame data to be processed based on pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in two adjacent frames of data to be processed, and determining continuous characteristic information of each target object; and labeling each piece of data to be processed based on the continuous characteristic information to obtain a labeling result of each target object. The method and the device solve the problem of low accuracy of target detection or target tracking data preprocessing in the automatic driving scene, realize the pre-marking of the target detection or target tracking data and improve the accuracy of data preprocessing.

Description

Data processing method, device, electronic device and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.

Background

Target detection and multi-target tracking have been an important research direction in the industry and academia of the autopilot field. In an automatic driving data closed-loop system, the performance and the perception boundary of a perception model are improved in a circulating mode by applying data acquired in the target detection and multi-target tracking process to the training of the perception model, but the acquired data cannot be directly used for the training of the perception model, and the acquired data is marked in a manual mode in a traditional mode, so that the marked data is used for the training of the perception model.

The existing pre-labeling algorithm is often based on single-frame labeling, and the precision is limited by the performance of a perception model; in the labeling process of continuous frames, the current pre-labeling algorithm has the following defects:

1) The target detection is not accurate enough. The accuracy of the existing target detection technology cannot reach 100%. This means that no matter what model is used for target detection, there is a certain false detection or omission, which directly affects the target tracking accuracy based on the target detection result.

2) Frequent target occlusion results in tracking target ID hops. When a tracked object is partially or completely occluded by other objects within the middle few frames, the object IDs before and after it is occluded can jump. This is because the prior art cannot reasonably predict the target when it is occluded, and it is more common to predict it using kalman filtering. However, there is a significant disadvantage to this algorithm in that linear interpolation is used for prediction, and the information used is the target frame and its changes before being blocked, which results in inaccurate prediction.

3) The middle distance matching is accurate, and the short distance matching and the long distance matching are inaccurate. In 2D labeling, all objects follow the principle of near-far-small, so in an autopilot scenario, objects that are near are also changing rapidly in their detection frame size, and their background changes very much, due to their rapid movement and large objects. The prior art adopts the same scale to measure when targets between different frames are matched, and has better target effect on the middle distance (10 m-50 m) but poorer target effect on the short distance (0.1 m-10 m). In 3D labeling, the close-range targets tend to have faster distribution changes of point clouds due to faster relative self-vehicle movement, resulting in larger differences after feature extraction; and the remote target is too sparse in point cloud, so that fewer point clouds cannot effectively identify the target, and the labeling precision is low.

The defects directly lead to lower accuracy of continuous frame multi-target data marking and target tracking, and further lead to lower accuracy of pre-marking.

Disclosure of Invention

In this embodiment, a data processing method, device, electronic device and storage medium are provided to solve the problem of low accuracy of continuous frame multi-target data labeling and target tracking.

In a first aspect, in this embodiment, there is provided a data processing method, including:

acquiring data to be processed of continuous multiframes;

determining pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in the data to be processed of each frame, wherein the attribute characteristics comprise at least one of the position and the size of the target object in the corresponding frame;

performing target association matching on the continuous multi-frame data to be processed based on pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in the adjacent two frames of data to be processed, and determining continuous characteristic information of each target object, wherein the continuous characteristic information is determined based on whether the target object appears in the continuous multi-frame;

and labeling each piece of data to be processed based on the continuous characteristic information to obtain a labeling result of each target object.

In some embodiments, performing object-related matching on consecutive multi-frame data to be processed based on pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in two adjacent frames of data to be processed, and determining continuous characteristic information of each target object includes:

Determining pixel distribution characteristic values of every two adjacent frames of target objects based on pixel distribution characteristics of all target objects in a previous frame and pixel distribution characteristics of all target objects in a next frame in the adjacent two frames of data to be processed;

determining depth feature values of every two adjacent target objects based on the depth features of all target objects in the previous frame and the depth features of all target objects in the next frame;

determining attribute feature values of every two adjacent target objects based on the attribute features of all target objects in the previous frame and the attribute features of all target objects in the next frame;

and determining continuous characteristic information of each target object based on the pixel distribution characteristic value, the depth characteristic value and the attribute characteristic value of every two adjacent target objects.

In some embodiments, determining continuous feature information for each target object based on pixel distribution feature values, depth feature values, and attribute feature values for adjacent two-frame-by-two target objects includes:

determining a distance measurement characteristic value of each two target objects based on pixel distribution characteristic values, depth characteristic values and attribute characteristic values of every two adjacent target objects;

Determining continuous characteristic information of each target object based on the distance measurement characteristic values of every two target objects; the continuous characteristic information comprises whether the target object is continuous in the two adjacent frames of data to be processed.

In some of these embodiments, determining continuous feature information for each target object based on the distance metric feature values for the target objects comprises:

acquiring all distance measurement characteristic values associated with a current target object in the distance measurement characteristic values of every two target objects, wherein the current target object is any target object in the data to be processed of the next frame;

determining a minimum distance measurement characteristic value according to all the distance measurement characteristic values;

if the minimum distance measurement characteristic value is smaller than or equal to a first preset threshold, determining that every two target objects corresponding to the minimum distance measurement characteristic value are current target objects, and enabling the current target objects to be continuous in the corresponding adjacent two frames of data to be processed;

and if the minimum distance measurement characteristic value is larger than a first preset threshold, determining that the current target object is discontinuous in the data to be processed of the corresponding adjacent two frames.

In some embodiments, labeling each data to be processed based on the continuous feature information to obtain a labeling result of each target object includes:

Determining whether the first target object and the second target object belong to the same target object according to the similarity of the first target object and the second target object, wherein the first target object and the second target object are any two target objects which do not appear in the same frame, the first target object is any target object which is continuous in the data to be processed in the first subsequence, the second target object is any target object which is continuous in the data to be processed in the second subsequence, and the first subsequence and the second subsequence are two subsequences which are discontinuous in continuous multiframes;

if the first target object and the second target object belong to the same target object, performing target interpolation on all data to be processed between the first subsequence and the second subsequence to obtain interpolated data to be processed, and marking the same target object with the same identification information until the identification information of all the same target objects in the continuous multi-frame data to be processed after interpolation is the same;

labeling each target object in the interpolated continuous multi-frame data to be processed to obtain a labeling result of each target object, wherein the labeling result of each target object comprises at least one of the category, the position and the gesture of each target object in the corresponding frame.

In some embodiments, before determining whether the first target object and the second target object belong to the same target object according to the similarity of the first target object and the second target object, the method further includes:

determining a first overall characteristic of the first target object based on pixel distribution characteristics, depth characteristics and attribute characteristics of the first target object in each piece of data to be processed in the first subsequence;

determining a second overall characteristic of the second target object based on the pixel distribution characteristic, the depth characteristic and the attribute characteristic of the second target object in each data to be processed in the second subsequence;

and determining the similarity of the first target object and the second target object based on the first integral feature and the second integral feature.

In some embodiments, the first sub-sequence to-be-processed data includes m-th to n-th frame to-be-processed data, and the second sub-sequence to-be-processed data includes p-th to q-th frame to-be-processed data; correspondingly, the performing target interpolation on all the data to be processed between the first sub-sequence and the second sub-sequence to obtain interpolated data to be processed includes:

according to a first target object in the nth frame of data to be processed and a second target object in the p frame of data to be processed, carrying out target interpolation on the (n+1) -th frame of data to be processed to obtain interpolation target objects in the (n+1) -th frame of data to be processed to the (p-1) -th frame of data to be processed;

Determining the maximum value of the contact ratio of an interpolation target object in the ith frame of data to be processed and each target object in the ith frame of data to be processed, wherein the ith frame of data to be processed is any frame of data to be processed from the (n+1) th frame of data to the (p-1) th frame of data to be processed;

if the maximum value of the overlap ratio is larger than a second preset threshold value, determining a final interpolation object in the data to be processed of the ith frame according to a target object corresponding to the maximum value of the overlap ratio and the interpolation target object of the ith frame;

and determining the data to be processed after the interpolation of the ith frame based on the final interpolation object.

In some embodiments, after labeling each piece of data to be processed based on the continuous feature information, labeling results of each target object are obtained, the method further includes:

inputting the data comprising the labeling result of each target object into a pre-labeling perception model to obtain a model labeling result of each target object;

and adjusting model parameters of the pre-labeling sensing model according to the model labeling result and the corresponding labeling result of each target object to obtain an adjusted pre-labeling sensing model, wherein the adjusted pre-labeling sensing model is used for labeling the next data to be processed.

In a second aspect, in this embodiment, there is provided a data processing apparatus including:

the acquisition module is used for acquiring the data to be processed of the continuous multiframes;

the determining module is used for determining pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in the data to be processed of each frame, wherein the attribute characteristics comprise at least one of the position and the size of the target object in the corresponding frame;

the target association module is used for carrying out target association matching on the continuous multi-frame data to be processed based on pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in the two adjacent frames of data to be processed, and determining continuous characteristic information of each target object, wherein the continuous characteristic information is determined based on whether the target object appears in the continuous multi-frame;

and the labeling module is used for labeling each piece of data to be processed based on the continuous characteristic information to obtain a labeling result of each target object.

In a third aspect, in this embodiment, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the data processing method of the first aspect when executing the computer program.

In a fourth aspect, in the present embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method of the first aspect described above.

Compared with the related art, in the data processing method provided in this embodiment, by acquiring the data to be processed of the continuous multiframe, determining the pixel distribution feature, the depth feature and the attribute feature of each target object in the data to be processed of each frame, further, according to the pixel distribution feature, the depth feature and the attribute feature of each target object of two adjacent frames in the data to be processed of the continuous multiframe, performing target association matching on the target objects in the continuous multiframe, thereby determining whether continuous feature information of each target object is continuous in the continuous multiframe, and further, determining the continuity of each target object by the feature information of a plurality of dimensions of the target objects due to the small degree of dimensional change of the target objects in the data to be processed of the two adjacent frames, and determining the continuity of the target object by the feature information of a plurality of dimensions of the target object, the accuracy of determining the target object in the data to be processed of the continuous frames is improved, and further, the problem that the accuracy of the determined target object is low due to continuous target detection by the same acquisition size is avoided, and further, labeling processing is performed on each target object according to the continuous feature information, thereby determining the identification result of the target object in the data to be processed of the continuous frame.

In addition, the method and the device perform target identification and feature extraction with high quality through the pre-labeled large model which is continuously and iteratively updated based on the vehicle-end screening data. By means of hierarchical matching and matching methods between target sequences, the organic combination of local (adjacent frames) and global (all sequence frames) information is utilized, and efficient matching of all targets is achieved. And the results of target detection and target interpolation are synthesized, the similarity among all target frames is determined, unreasonable target frames are eliminated, and the accurate and reasonable target tracking result is ensured. Furthermore, the target detection and tracking results obtained by combining the image pixel statistical features and the depth features are used for quickly pre-marking the incremental data, so that the marking capacity of a data closed-loop system is greatly improved, and powerful support is provided for iterative optimization of a subsequent algorithm.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a schematic diagram of an application scenario of a data processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of a data processing method according to an embodiment of the present application;

FIG. 3 is a flowchart of an embodiment of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a continuous result of continuous frame images according to an embodiment of the present application;

FIG. 5 is a block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 6 is an internal structure diagram of a computer device according to an embodiment of the present application.

Detailed Description

The present application will be described and illustrated with reference to the accompanying drawings and examples for a clearer understanding of the objects, technical solutions and advantages of the present application.

Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these" and similar terms in this application are not intended to be limiting in number, but may be singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used herein, are intended to encompass non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this disclosure, merely distinguish similar objects and do not represent a particular ordering for objects.

The data processing method provided by the embodiment of the application can be applied to the application scene shown in fig. 1, as shown in fig. 1, fig. 1 is a schematic diagram of the application scene of the data processing method provided by the embodiment of the application,

wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. Specifically, the terminal 102 is an intelligent vehicle-mounted device, such as a vehicle-mounted laser radar sensor, a vehicle-mounted camera, or an intelligent driving domain controller. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In the embodiment of the present application, a data processing method is provided, and fig. 2 is a flowchart of the data processing method provided in the embodiment of the present application, where an execution body of the method may be an electronic device, and optionally, the electronic device may be a server or a terminal device, but the present application is not limited thereto. Specifically, as shown in fig. 2, the process includes the following steps:

Step S201, obtain the data to be processed of consecutive multiframes.

In an automatic driving scene, traffic scene data is acquired in real time through a laser radar sensor or a camera in an automatic driving vehicle, and the acquired data is transmitted to an electronic device, so that the electronic device acquires to-be-processed data of continuous multiple frames, specifically, to-be-processed data can include to-be-processed image data acquired by the camera and to-be-processed point cloud data acquired by the laser radar sensor, and in the continuous multiple frames, each frame of to-be-processed data can include data of a plurality of traffic participants, such as vehicles in roads, lane lines, signs, pedestrians, tree houses and the like.

It should be noted that, in the embodiment of the present application, only the image data to be processed of consecutive frames in the autopilot scene is taken as an example for illustration, and in practical application, the data processing method of the present application may also be applied to video monitoring scenes, and may also be applied to other technical fields related to data processing, where no limitation is made.

Step S202, determining pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in each frame of data to be processed.

For example, each frame of data to be processed may include a plurality of traffic participants, each of which may serve as a target object, and further, determine a pixel distribution feature, a depth feature, and an attribute feature of each target object in each frame of data to be processed, where the attribute feature may include a position and/or a size of the target object in a corresponding frame.

Specifically, the pixel distribution characteristics of the target object may be represented by histogram characteristic information of pixels of the target object in RGB three channels, the depth characteristics of the target object may be represented by deep characteristic information extracted from an image including the target object by a deep learning algorithm, the position of the target object may be represented by a center position of a minimum detection frame including the target object, and the size of the target object may be represented by an area including the minimum detection frame of the target object.

Step S203, performing object association matching on the continuous multi-frame data to be processed based on the pixel distribution feature, the depth feature and the attribute feature of each object in the adjacent two frames of data to be processed, and determining the continuous feature information of each object.

Wherein the continuous characteristic information is determined based on whether the target object appears in consecutive multiframes.

Further, in the continuous multi-frame data to be processed, whether any target object is continuous in the continuous two-frame data to be processed is determined according to the pixel distribution characteristics, the depth characteristics and the attribute characteristics of each target object in the adjacent two-frame data to be processed, so that the relevance of a plurality of target objects in the adjacent two frames can be accurately determined.

In particular, the continuous characteristic information may include that the target object appears continuously in a part of the frames in the continuous multiframe, that the target object appears in any one of the continuous multiframe or in several discontinuous frames, or that the target object appears continuously in all the frames in the continuous multiframe.

And step S204, labeling processing is carried out on each piece of data to be processed based on the continuous characteristic information, and a labeling result of each target object is obtained.

Further, all target objects in each piece of data to be processed are marked according to the continuous characteristic information, so that a marking result of each target object is obtained, the target objects in the continuous multi-frame data to be processed are pre-marked, and the accuracy of determining the target objects is improved.

Specifically, the labeling result of the target object may include at least one of a category, a position, and a posture of the target object in the corresponding frame. The same target object can be marked with the same identification information in the corresponding continuous frames, and the identification information of different target objects is different.

In the implementation process, the pixel distribution characteristics, the depth characteristics and the attribute characteristics of each target object in the continuous multi-frame to-be-processed data are obtained, and further, according to the pixel distribution characteristics, the depth characteristics and the attribute characteristics of each target object in the continuous multi-frame to-be-processed data, the target objects in the continuous multi-frame are subjected to target association matching, so that whether continuous characteristic information of each target object in the continuous multi-frame is determined, as the size change degree of the target object in the continuous multi-frame to-be-processed data is smaller, the continuity of each target object is determined through the characteristic information of the plurality of characteristic information in the adjacent two frames to-be-processed data, and the continuity of the target object is determined through the characteristic information of the plurality of dimensions of the target object, thereby improving the accuracy of target object determination in the continuous frame to-be-processed data, further avoiding the problem that the accuracy of the target object is lower due to continuous target detection through the same acquisition size, further, labeling processing is carried out on each target object according to the continuous characteristic information, thereby determining the identification result of the target object in the continuous frame to be-processed data in advance, and further improving the accuracy of target detection or target tracking data.

In some embodiments, performing object association matching on continuous multi-frame to-be-processed data based on pixel distribution characteristics, depth characteristics and attribute characteristics of each target object in two adjacent frames of to-be-processed data, and determining continuous characteristic information of each target object may include the following steps:

step 1: and determining pixel distribution characteristic values of every two adjacent frames of target objects based on the pixel distribution characteristics of all target objects in the previous frame and the pixel distribution characteristics of all target objects in the next frame in the data to be processed of every two adjacent frames.

Step 2: and determining depth characteristic values of every two adjacent target objects based on the depth characteristics of all target objects in the previous frame and the depth characteristics of all target objects in the next frame.

Step 3: and determining attribute characteristic values of every two adjacent target objects based on the attribute characteristics of all target objects in the previous frame and the attribute characteristics of all target objects in the next frame.

Step 4: and determining continuous characteristic information of each target object based on the pixel distribution characteristic value, the depth characteristic value and the attribute characteristic value of every two adjacent target objects.

Illustratively, taking two target objects in any two adjacent frames of image data to be processed in continuous multi-frame image data to be processed as an example. If the target object A is any target object in the image data to be processed of the previous frame, the target object B is any target object in the image data to be processed of the next frame.

According to the pixel distribution characteristics of the target object A and the pixel distribution characteristics of the target object B, determining the pixel distribution characteristic values of the target object A and the target object B, wherein the pixel distribution characteristic values can be determined by using the similarity of the multi-channel histogram characteristics of the target object A and the target object B. Specifically, the similarity may be determined according to the euclidean distance between the target object a and the target object B, or may be determined according to the cosine similarity between the target object a and the target object B, or may be determined according to the manhattan distance between the target object a and the target object B, or may be determined in other manners, which is not limited herein.

Further, according to the similarity between the depth features of the target object A and the depth features of the target object B, determining the depth feature values of the target object A and the target object B.

Further, the attribute features include a position and a size of the target object, a position feature value of the target object a and a position feature value of the target object B are determined according to the position of the target object a and the position of the target object B, a size feature value of the target object a and the size feature value of the target object B are determined according to a difference value of the sizes of the target object a and the target object B, and further, an average value of the position feature value and the size feature value of the target object a and the target object B is determined as an attribute feature value of the target object a and the target object B.

Further, according to the pixel distribution characteristic value, the depth characteristic value and the attribute characteristic value of the target object a and the target object B, it is determined whether the target object a and the target object B are the same target object continuously appearing in the two adjacent frames, if the target object a and the target object B are the same target object continuously appearing in the two adjacent frames, that is, the target object continuously appears in the two adjacent frames, the target object a and the target object B in the two adjacent frames are labeled with the same identification information.

And determining whether all the two target objects in the two adjacent frames are the same target object continuously appearing in the two adjacent frames in the mode, and if one of the two target objects does not continuously appear in the two adjacent frames, namely the two target objects are different target objects, marking the different target objects with different identification information.

Further, after determining whether all the target objects in the image data to be processed of the next frame appear continuously in the two adjacent frames, determining continuous information of all the target objects in the continuous multi-frames in the above manner. Specifically, the image data to be processed of the next frame is used as the image data to be processed of the previous frame in the next two adjacent frames to determine continuous information of the target object.

In the implementation process, the pixel distribution characteristic values of the two target objects are determined according to the pixel distribution characteristics of the two adjacent target objects, the depth characteristic values of the two target objects are determined according to the depth characteristics of the two adjacent target objects, the attribute characteristic values of the two target objects are determined according to the attribute characteristics of the two adjacent target objects, and the continuity of the two target objects is determined according to the pixel distribution characteristic values, the depth characteristic values and the attribute characteristics of the two target objects, so that the continuous characteristic information of each target object is determined, and the continuity of the target object is determined according to the characteristic information of multiple dimensions of the two target objects, so that the accuracy of determining the continuity of the target object is improved.

In some embodiments, determining continuous feature information of each target object based on the pixel distribution feature value, the depth feature value, and the attribute feature value of every two adjacent target objects may include the steps of:

step 1: and determining the distance measurement characteristic value of each two adjacent target objects based on the pixel distribution characteristic value, the depth characteristic value and the attribute characteristic value of each two adjacent target objects.

Step 2: determining continuous characteristic information of each target object based on the distance measurement characteristic values of every two target objects; the continuous characteristic information comprises whether the target object is continuous in the two adjacent frames of data to be processed.

Illustratively, determining the distance metric feature value of the pairwise target object based on the pixel distribution feature value, the depth feature value and the attribute feature value of the pairwise target object of adjacent two frames includes: and determining the distance measurement characteristic value of each two adjacent target objects based on the pixel distribution characteristic value, the depth characteristic value and the weighted average value of the attribute characteristic values of each two adjacent target objects.

Based on the pixel distribution feature value, the depth feature value and the attribute feature value of every two adjacent frame target objects, determining the distance measurement feature value of every two target objects may also include: and determining the distance measurement characteristic value of each two adjacent target objects based on the average value of the pixel distribution characteristic value, the depth characteristic value and the attribute characteristic value of each two adjacent target objects.

Taking the above target object a and the target object B as an example, the distance measurement feature value of the target object a and the target object B is determined according to the pixel distribution feature value, the depth feature value and the attribute feature value of the target object a and the target object B, and specifically, the distance measurement feature value may be a weighted average value of the pixel distribution feature value, the depth feature value and the attribute feature value of the target object a and the target object B, may be an average value of the pixel distribution feature value, the depth feature value and the attribute feature value of the target object a and the target object B, or may be determined by means of other feature fusion, which is not limited herein.

Further, determining continuous feature information of each target object based on the distance metric feature values of the target objects comprises: and determining continuous characteristic information of each target object based on the distance measurement characteristic values of every two target objects and a first preset threshold value.

Specifically, according to the distance measurement characteristic value of the target object a and the target object B and the first preset threshold, it is determined whether the target object a and the target object B are the same target object continuously appearing in the two adjacent frames, and if the distance measurement characteristic value of the target object a and the target object B is smaller than or equal to the first preset threshold, the target object a and the target object B are the same target object continuously appearing in the two adjacent frames, that is, the target continuously appears in the two adjacent frames.

In the implementation process, according to the pixel distribution characteristic values, the depth characteristic values and the attribute characteristic values of every two adjacent frames of the object, the distance measurement characteristic values of every two object are determined from the characteristic information of multiple dimensions of the object, the accuracy of determining the object is improved, whether the object is continuous in every two adjacent frames is determined according to the distance measurement characteristic values, and the continuity judgment of the object is realized.

In some embodiments, determining continuous feature information of each target object based on the distance metric feature values of the target objects may include the steps of:

step 1: and acquiring all distance measurement characteristic values associated with the current target object in the distance measurement characteristic values of the target objects, wherein the current target object is any target object in the data to be processed of the next frame.

Step 2: and determining the minimum distance measurement characteristic value according to all the distance measurement characteristic values.

Step 3: if the minimum distance measurement characteristic value is smaller than or equal to a first preset threshold, determining that every two target objects corresponding to the minimum distance measurement characteristic value are current target objects, and enabling the current target objects to be continuous in the corresponding adjacent two frames of data to be processed. And if the minimum distance measurement characteristic value is larger than a first preset threshold, determining that the current target object is discontinuous in the data to be processed of the corresponding adjacent two frames.

In the two adjacent frames of image data to be processed, any target object in the image data to be processed of the next frame is determined as the current target object, for example, the target object B is specifically determined, the distance measurement feature values of all the target objects and the target object B in the image data to be processed of the previous frame are determined, and the minimum value in all the distance measurement feature values is determined, so that the minimum distance measurement feature value is obtained.

Further, the size between the minimum distance measurement characteristic value and the first preset threshold is judged, if the minimum distance measurement characteristic value is smaller than or equal to the first preset threshold, in the image data to be processed in the previous frame, the target object corresponding to the minimum distance measurement characteristic value and the target object B are the same target object, that is, the target object B continuously appears in the two continuous frames, and in addition, the target object in the image data to be processed in the previous frame and the target object in the image data to be processed in the next frame can be marked by the same identification information.

If the minimum distance measurement characteristic value is greater than the first preset threshold, the target object B is discontinuous in the two adjacent frames of image data to be processed, that is, in the two adjacent frames of image data to be processed, the target object B does not appear in the previous frame of image data to be processed, but only appears in the next frame of image data to be processed, and in addition, the target object B may be marked with new identification information, that is, the identification information of the target object is different from the identification information of any target object in the previous frame of image data to be processed.

Further, it can be determined whether all the target objects in the image data to be processed of the subsequent frame appear in the previous frame, so as to obtain continuous characteristic information of each target object in the image data to be processed of the subsequent frame.

In the implementation process, the continuity of the current target object in the corresponding two continuous adjacent frames is determined according to the minimum distance measurement characteristic value and the first preset threshold value, so that the accuracy of determining the continuity of the current target object is improved.

In some embodiments, labeling each piece of data to be processed based on the continuous feature information to obtain a labeling result of each target object may include the following steps:

step 1: according to the similarity of the first target object and the second target object, determining whether the first target object and the second target object belong to the same target object, wherein the first target object and the second target object are any two target objects which do not appear in the same frame, the first target object is any target object continuous in the data to be processed in the first subsequence, the second target object is any target object continuous in the data to be processed in the second subsequence, and the first subsequence and the second subsequence are two subsequences discontinuous in continuous multiframes.

Step 2: if the first target object and the second target object belong to the same target object, performing target interpolation on all the data to be processed between the first subsequence and the second subsequence to obtain interpolated data to be processed, and marking the same target object with the same identification information until the identification information of all the same target objects in the continuous multi-frame data to be processed after interpolation is the same.

Step 3: labeling each target object in the interpolated continuous multi-frame data to be processed to obtain a labeling result of each target object, wherein the labeling result of each target object comprises at least one of the category, the position and the gesture of each target object in the corresponding frame.

For example, in the continuous tracking process of the target, since there may be a partial frame in which a certain target object or several target objects do not appear in the partial frame, so that a situation in which a jump of the target object may occur in a continuous multiframe, in order to avoid that the jump of the target object affects the accuracy of target detection or target tracking, the jump-generating target object may be interpolated, so that the target object is continuous.

Since the continuity of the target object is determined by the feature information of the target objects of two adjacent frames, if the target object generates a jump in the continuous frames, the same target object is identified as a different target object before and after the jump, and therefore, whether the target object before and after the jump is the same target object needs to be determined.

In the continuous multi-frame image to be processed, the continuous multi-frame image to be processed can be divided into a plurality of sub-sequence images according to the continuity characteristics of a plurality of target objects which do not appear in the same frame, and the sub-sequences are not continuous with each other.

Specifically, if the continuous multi-frame includes all 1 st to 20 th frames, there are three target objects X, Y, Z that respectively appear continuously in different sequence frames, and there is a discontinuous frame number between the different sequence frames, the target object X appears continuously in 1 st to 5 th frames, the target object Y appears continuously in 8 th to 10 th frames, and the target object Z appears continuously in 15 th to 20 th frames, the 1 st to 20 th frames may be divided into three sub-sequences, where the sub-sequence 1 includes 1 st to 5 th frames, the sub-sequence 2 includes 8 th to 10 th frames, the sub-sequence 3 includes 15 th to 20 th frames, and there is a discontinuous frame number between the sub-sequence 1, the sub-sequence 2, and the sub-sequence 3, so that the sub-sequence 1, the sub-sequence 2, and the sub-sequence 3 are discontinuous with each other, and the sub-sequence 1 is adjacent to the sub-sequence 2, and the sub-sequence 3 are adjacent.

If the first target object is any target object continuous in the image data to be processed in the first sub-sequence, the second target object is any target object continuous in the image data to be processed in the second sub-sequence, and the first target object and the second target object do not appear in the same frame, the first sub-sequence and the second sub-sequence are two sub-sequences which are discontinuous and adjacent, whether the first target object and the second target object are the same target object before and after the jump is generated can be judged. That is, it is determined whether the target object X in the sub-sequence 1 and the target object Y in the sub-sequence 2 are the same target object before and after the occurrence of the jump.

According to the similarity of the first target object and the second target object, whether the first target object and the second target object belong to the same target object is determined, if the first target object and the second target object belong to the same target object before and after jump, target interpolation is carried out on all image data to be processed between the first subsequence and the second subsequence, and the image data to be processed after interpolation is obtained, so that the target object is supplemented in all image data to be processed between the first subsequence and the second subsequence in a target interpolation mode, the target object continuously appears, and the influence on the accuracy of target detection and target tracking due to jump of the target object is avoided.

If the target object X and the target object Y belong to the same target object, carrying out target interpolation on the 6 th to 7 th frames of image data to be processed, thereby realizing the supplement of the target object X in the 6 th to 7 th frames of image data to be processed and enabling the target object X to continuously appear in the 1 st to 10 th frames of image data to be processed.

Further, the same target object is marked by the same identification information in the continuous multiframes, and the identification of the next jump state is carried out until the identification information of all the same target objects in the interpolated continuous multiframes to-be-processed image data is the same. Namely, marking the same identification information of the target object X in the image data to be processed of the 1 st frame to the 10 th frame.

In the implementation process, only one jump state is taken as an example for explanation, in practical application, the same target object may generate multiple jumps, and if multiple jumps exist, it is necessary to determine whether the target object before and after each jump is the same target object.

In the above 1 st to 20 th frames, there may be two hops, that is, the 6 th to 7 th frames and the 11 th to 14 th frames, further, if the target object X and the target object Y do not belong to the same target object, it is determined according to the above manner whether the target object Y and the target object Z belong to the same target object, if the target object X and the target object Y belong to the same target object, it is determined according to the above manner that the interpolated 1 st to 10 th frames of to-be-processed image data determine whether the target object X and the 15 th to 20 th frames of to-be-processed image data determine whether the target object Y belong to the same target object, until the identification information of all the same target objects in the interpolated continuous multi-frame to-be-processed image data is the same, that is, that all the possible hops are generated is identified.

In the implementation process, whether the target objects in the data to be processed in different subsequences are the same target object is determined according to the similarity of the target objects in the data to be processed in two discontinuous subsequences, and target interpolation is carried out on the data to be processed, which generates jump between the same target objects, so that the accuracy of target detection and target tracking is prevented from being influenced due to the jump of the target objects.

In some embodiments, before determining whether the first target object and the second target object belong to the same target object according to the similarity of the first target object and the second target object, the method may further include the following steps:

step 1: a first global feature of the first target object is determined based on the pixel distribution feature, the depth feature, and the attribute feature of the first target object in each of the data to be processed in the first sub-sequence.

Step 2: a second global feature of the second target object is determined based on the pixel distribution feature, the depth feature, and the attribute feature of the second target object in each of the data to be processed in the second sub-sequence.

Step 3: and determining the similarity of the first target object and the second target object based on the first integral feature and the second integral feature.

Illustratively, the attribute features include a size of the target object, and the global features of the target object may characterize a cluster center in consecutive multiframes corresponding to the target object.

Specifically, according to the pixel distribution characteristics of the first target object in each image data to be processed in the first sub-sequence, the determined pixel distribution distance metric value of the first target object, according to the depth characteristics of the first target object in each image data to be processed in the first sub-sequence, the determined depth characteristic distance metric value of the first target object, according to the attribute characteristics of the first target object in each image data to be processed in the first sub-sequence, the determined attribute characteristic distance metric value of the first target object, and further, the weighted average of the pixel distribution distance metric value, the depth characteristic distance metric value and the attribute characteristic distance metric value of the first target object is determined as the first integral characteristic of the first target object.

According to the pixel distribution characteristics of the second target object in each image data to be processed in the second sub-sequence, determining a pixel distribution distance metric value of the second target object, according to the depth characteristics of the second target object in each image data to be processed in the second sub-sequence, determining a depth characteristic distance metric value of the second target object, according to the attribute characteristics of the second target object in each image data to be processed in the second sub-sequence, determining an attribute characteristic distance metric value of the second target object, and further, determining the pixel distribution distance metric value, the depth characteristic distance metric value and the weighted average value of the attribute characteristic distance metric values of the second target object as a second integral characteristic of the second target object.

Further, the similarity of the first target object and the second target object is determined according to the integral feature difference value of the first integral feature and the second integral feature. If the integral characteristic difference is smaller than or equal to the first preset difference, the first target object and the second target object belong to the same target object, and if the integral characteristic difference is larger than the first preset difference, the first target object and the second target object do not belong to the same target object.

In the implementation process, the overall characteristics of the target object are determined according to the pixel distribution characteristics, the depth characteristics and the attribute characteristics of the target object, so that the characteristics of the target object are fused from multiple dimensions, the overall characteristic description of the target object is realized, the similarity of the two target objects is determined according to the overall characteristics of the first target object and the second target object, and the accuracy of determining the similarity of the target objects is improved.

In some embodiments, the first sub-sequence of data to be processed includes m-th to n-th frames of data to be processed, and the second sub-sequence of data to be processed includes p-th to q-th frames of data to be processed; correspondingly, performing target interpolation on all the data to be processed between the first subsequence and the second subsequence to obtain interpolated data to be processed, which may include the following steps:

step 1: and carrying out target interpolation on the data to be processed of the n+1st frame to the p-1st frame according to the first target object in the data to be processed of the n frame and the second target object in the data to be processed of the p frame, so as to obtain interpolation target objects in the data to be processed of the n+1st frame to the p-1st frame.

Step 2: determining the maximum value of the contact ratio of an interpolation target object in the ith frame of data to be processed and each target object in the ith frame of data to be processed, wherein the ith frame of data to be processed is any frame of data to be processed from the (n+1) th frame of data to the (p-1) th frame of data to be processed.

Step 3: if the maximum value of the overlap ratio is larger than a second preset threshold value, determining a final interpolation object in the data to be processed of the ith frame according to the target object corresponding to the maximum value of the overlap ratio and the interpolation target object of the ith frame.

Step 4: and determining the data to be processed after the interpolation of the ith frame based on the final interpolation object.

For example, if the m-th to n-th frame of image data to be processed is the 1 st to 5 th frame of image data to be processed, the first target object is the target object X, the p-th to Q-th frame of image data to be processed is the 8 th to 10 th frame of image data to be processed, the second target object is the target object Y, and the target object X and the target object Y belong to the same target object, the target object X in the 5 th frame of image data to be processed is taken as a difference starting point, the target object X in the 8 th frame of image data to be processed is taken as an interpolation end point, and the 6 th to 7 th frame of image data to be processed are subjected to target interpolation in a linear interpolation manner, so as to obtain an interpolation target object Q1 in the 6 th frame of image data to be processed, and an interpolation target object Q2 in the 7 th frame of image data to be processed.

If the i-th frame of to-be-processed image data is the 6-th frame of to-be-processed image data, determining the contact ratio of each original target object in the target object Q1 and the 6-th frame of to-be-processed image data, wherein the original target object is the target object before target interpolation in the 6-th frame of to-be-processed image data. Specifically, the overlap ratio may be represented by an Intersection-over-Union (IoU) of the target object Q1 and each original target object in the image data to be processed of the 6 th frame.

Further, determining the maximum value of the contact ratio, comparing the maximum value of the contact ratio with a second preset threshold value, and if the maximum value of the contact ratio is larger than the second preset threshold value, determining a final interpolation object in the 6 th frame of image data to be processed from the original target object and the target object Q1 corresponding to the maximum value of the contact ratio in the 6 th frame of image data to be processed, specifically, determining the one with larger contact ratio in the original target object and the target object Q1 corresponding to the maximum value of the contact ratio as the final interpolation object in the 6 th frame of image data to be processed.

The confidence level of the interpolation target object may be determined by weighted average according to the confidence level of the first target object in the nth frame of to-be-processed image data and the confidence level of the second target object in the p frame of to-be-processed image data, where the weight coefficients corresponding to the n+1st frame of to-be-processed image data and the p-1st frame of to-be-processed image data are different.

Further, a final interpolation object is reserved in the image data to be processed of the ith frame, and a target object with smaller confidence in the original target object corresponding to the maximum value of the coincidence degree and the target object Q1 is deleted, so that the image data to be processed after interpolation of the ith frame is obtained.

In the implementation process, target interpolation is carried out on the data to be processed, which generates jump, according to the two frames of data before and after interpolation, so that the continuity of the target object is ensured, and the final interpolation object is determined according to the maximum value of the coincidence ratio of the interpolation target object and the original target object, so that repeated interpolation caused by errors in the target object identification process is avoided, and further, the accuracy of target detection and data processing in the target final process is improved.

In some embodiments, the labeling process is performed on each piece of data to be processed based on the continuous feature information, and after the labeling result of each target object is obtained, the method further includes the following steps:

step 1: inputting the data comprising the labeling result of each target object into a pre-labeling perception model to obtain the model labeling result of each target object.

Step 2: and according to the model labeling result of each target object and the corresponding labeling result, adjusting model parameters of the pre-labeling sensing model to obtain an adjusted pre-labeling sensing model, wherein the adjusted pre-labeling sensing model is used for labeling the next data to be processed.

Illustratively, the labeling result of each target object may be used to train the pre-labeled perceptual model, thereby improving the accuracy of the pre-labeled perceptual model.

Specifically, the image data comprising the labeling result of each target object is input into a pre-labeling perception model to obtain a model labeling result of each target object, a model labeling error is determined according to the model labeling result of each target object and the corresponding labeling result, and further, model parameters of the pre-labeling perception model are adjusted according to the model labeling error, so that an adjusted pre-labeling perception model is obtained, the image data passing through the pre-labeling is used for training of the pre-labeling perception model, and the labeling accuracy of the pre-labeling perception model is improved. And the labeling result of the data is applied to the training of the pre-labeling perception model, so that the training and iteration of the pre-labeling perception model are realized, the continuous evolution of the pre-labeling perception model is ensured, and the labeling requirement is met.

Fig. 3 is a flowchart of an embodiment of a data processing method according to an embodiment of the present application, where the data processing method may be applied to a tightly coupled data closed loop system, and the tightly coupled data closed loop system includes: the device comprises a target detection module, a characteristic extraction module, a matching association module, an interpolation module and a processing module. The target detection module is used for detecting targets in the data, the feature extraction module is used for extracting the histogram feature, the depth feature, the size and the position of each target, the matching association module is used for determining whether the targets appear in continuous multiframes, the interpolation module is used for carrying out target interpolation on the data, and the processing module is used for carrying out labeling processing on the targets in the data.

As shown in fig. 3, the process includes the steps of:

step S301, acquiring continuous frame images.

Specifically, in an automatic driving scene, a continuous frame image is acquired through an image sensor, wherein the continuous frame image includes a total of S frames, the S may be 300, the acquisition frequency may be fHz, the f may be 25, the total acquisition duration is S/f seconds, and each frame image is numbered according to the acquisition time sequence, so that the frame number of each frame image, that is, the frame number is 1-300.

Step S302, performing target detection on continuous frame images to obtain a detection result of each image.

The method comprises the steps of determining targets in each image through a target detection module, inputting continuous frame images into the target detection module to obtain target detection results of each image, reserving the target detection results with confidence coefficient larger than a threshold C, wherein the target detection results comprise the types, the postures, the positions and the sizes of the targets, and marking the targets in each frame image according to the frame sequence numbers of the images.

And d is marked as a detection result of all targets, the total set of identification information of all targets is I, any frame in continuous frames is marked as I, and the identification information of any target in I is marked as id-n.

Step S303, determining the continuity of the target according to the detection result of each image.

And inputting the local picture block of each target object in each frame image into a feature extraction module to obtain three-channel histogram features, depth features and position center coordinate distances and sizes of each target, and further determining the histogram feature values, depth features and position center coordinate distances and sizes of all targets and targets id-n in the (i-1) th frame.

Further, the histogram feature value, depth feature and feature value of the position center coordinate distance and size of all targets and targets id-n in the (i-1) frame are input into a matching association module to perform target matching.

Specifically, the histogram feature value, depth feature and position center coordinate distance and size feature value of each target in the (i-1) frame and the target id-n are weighted and averaged to obtain the distance measurement value of each target in the (i-1) frame and the target id-n, the minimum distance measurement value in all the distance measurement is determined, if the minimum distance measurement value is smaller than or equal to a first preset threshold value, the target (i-1) d-m corresponding to the minimum distance measurement value in the (i-1) frame is successfully associated with the target id-n, namely the target (i-1) d-m and the target id-n are the same target, namely, the target is continuous in the (i-1) frame and the (i) frame image, and then the association of all the targets in the (i) frame is completed, if the target which is not successfully associated exists in the (i-1) frame, the identification information of the target which is not successfully associated is reserved, the identification information of all the targets in the (i-1) frame image is updated, the identification information of the target id-n in the (i-1) frame image can be modified to be further updated, the identification information of the target id-n in the (i-1) frame image is further updated, and all the targets in the (i-1) frame image are continuously obtained, and the result of the continuous association is obtained.

And step S304, performing target interpolation on the jump image according to the continuity of the target to obtain an interpolated target.

For example, since a portion of the target objects may jump during the continuous image acquisition process, and the succession of the targets is determined according to the feature information of the targets in the two adjacent frames, it is necessary whether the targets before and after the jump are the same target.

Specifically, whether two targets in the total set I exist in a certain frame at the same time is determined, if two targets in all frames do not exist at the same time and occur in discontinuous sub-sequence frames respectively, the similarity of the two targets is calculated respectively.

As an example, the cluster centers of the two targets in the corresponding continuous frames are calculated respectively, further, the distance measurement value of the cluster centers of the two targets is determined, if the distance measurement value is smaller than the second target threshold, the two targets are the same target, the identification information of the two targets is combined, and the identification with smaller identification information can be used as the identification of the target.

Further, performing target interpolation on the discontinuous frames through an interpolation module, specifically, adopting target detection frames of frames before and after the discontinuous positions as head and tail values of linear interpolation, and performing linear interpolation to obtain interpolation target detection frames of each frame in the discontinuous positions in the middle.

Step S305, screening the interpolated targets to obtain interpolated continuous frame images.

Further, determining the intersection ratio of the interpolation target detection frame and all target detection frames in the corresponding frames, determining the maximum value of the intersection ratio, and if the maximum value of the intersection ratio is larger than a second preset threshold value, reserving the detection frame with the maximum confidence in the detection frame of the maximum value of the intersection ratio and the interpolation target detection frame, so as to obtain a final interpolation target, and further, interpolating all the jump targets in the mode, so that interpolated continuous frame images are obtained.

Fig. 4 is a schematic diagram of a result of continuity of continuous frame images provided in an embodiment of the present application, as shown in fig. 4, in which image 1 includes object 1 and object 2, that is, object 1 is continuous in images 1 to 2, in images 1 to 300 includes object 2, that is, object 2 is continuous in images 1 to 300, in images 1 does not include object 3, but in images 2 includes object 3, that is, object 3 is discontinuous in images 2, in images 299 to 300 includes object 4, that is, object 4 is continuous in images 299 to 300.

And step S306, labeling the interpolated continuous frame images to obtain labeling results of the continuous frame images.

Further, labeling is carried out on the targets in the interpolated continuous frame images through a processing module, a continuous frame multi-target tracking result is obtained, and the result can comprise information such as the category, the position, the size and the gesture of the targets, so that the pre-labeling of the targets is realized.

Further, after fine labeling and manual auditing are carried out on pre-labeled data, the pre-labeled data is used as a true value for further training and iteration of a current pre-labeling perception model to obtain a trained pre-labeling perception model, the trained pre-labeling perception model can be used for labeling of next data, so that continuous evolution of the pre-labeling perception model is guaranteed, labeling requirements are met, accuracy of the pre-labeling perception model is further improved, the labeled data is used for training and iteration of the pre-labeling perception model after fine labeling, the trained pre-labeling perception model is obtained, the trained pre-labeling perception model is used for labeling of next data, and data closed loop is achieved.

Compared with the related art, in the data processing method provided in this embodiment, the pixel distribution feature, the depth feature and the attribute feature of each target object in the data to be processed of each frame are determined by acquiring the data to be processed of the continuous multiple frames, further, according to the pixel distribution feature, the depth feature and the attribute feature of each target object of two adjacent frames in the data to be processed of the continuous multiple frames, the target objects in the continuous multiple frames are subjected to target association matching, so that whether the continuous feature information of each target object is continuous in the continuous multiple frames is determined, due to the fact that the size change degree of the target object in the data to be processed of the two adjacent frames is small, the continuity of each target object is determined by the feature information of the plurality of dimensions of the target objects, the accuracy of the determination of the target objects in the data to be processed of the continuous frames is improved, the problem that the accuracy of the determined target objects is low due to continuous target detection of the same acquisition size is avoided, the generated target interpolation is performed on the generated target objects in the continuous multiple frames, the continuity of the target objects in the continuous frames is improved, the overall accuracy of the target object is improved, the target object is detected in the continuous frame is detected, and the target object is processed in the continuous frame is further, the overall frame is detected, and the target object is processed according to the continuous frame is detected, and the target object is detected according to the overall feature information is determined. In addition, the pre-marking of the data is performed in the mode, so that the time of manual marking is reduced, and the efficiency of data marking is improved.

Although the steps in the flowcharts according to the embodiments described above are shown in order as indicated by the arrows, these steps are not necessarily executed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

In this embodiment, a data processing device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.

Fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes:

an obtaining module 501, configured to obtain data to be processed of consecutive multiframes;

a determining module 502, configured to determine a pixel distribution feature, a depth feature, and an attribute feature of each target object in data to be processed of each frame, where the attribute feature includes at least one of a position and a size of the target object in the corresponding frame;

a target association module 503, configured to perform target association matching on the continuous multi-frame data to be processed based on the pixel distribution feature, the depth feature, and the attribute feature of each target object in the adjacent two frames of data to be processed, and determine continuous feature information of each target object, where the continuous feature information is determined based on whether the target object appears in the continuous multi-frame;

the labeling module 504 is configured to label each piece of data to be processed based on the continuous feature information, so as to obtain a labeling result of each target object.

In some of these embodiments, the target association module 503 is specifically configured to:

In some of these embodiments, the labeling module 504 is specifically configured to:

In some embodiments, if the first sub-sequence to-be-processed data is the mth frame to the nth frame to-be-processed data, the second sub-sequence to-be-processed data is the p frame to the q frame to-be-processed data, the labeling module 504 is specifically configured to:

In some of these embodiments, the labeling module 504 is further configured to:

inputting data comprising the labeling result of each target object into a pre-labeling perception model to obtain a model labeling result of each target object;

and according to the model labeling result of each target object and the corresponding labeling result, adjusting model parameters of the pre-labeling sensing model to obtain an adjusted pre-labeling sensing model, wherein the adjusted pre-labeling sensing model is used for labeling the next data to be processed.

The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the various modules described above may be located in the same processor; or the above modules may be located in different processors in any combination.

In one embodiment, a computer device is provided, where the computer device may be a server, and an internal structure diagram of the computer device may be as shown in fig. 6, and fig. 6 is an internal structure diagram of the computer device provided by an embodiment of the present application. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data collected by the sensor. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data processing method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, there is also provided an electronic device including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method embodiments described above when executing the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (StaticRandom Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of data processing, comprising:

acquiring data to be processed of continuous multiframes;

Labeling each piece of data to be processed based on the continuous characteristic information to obtain a labeling result of each target object;

labeling each piece of data to be processed based on the continuous characteristic information to obtain a labeling result of each target object, wherein the labeling process comprises the following steps:

determining whether a first target object and a second target object belong to the same target object according to the similarity of the first target object and the second target object, wherein the first target object and the second target object are any two target objects which do not appear in the same frame, the first target object is any target object which is continuous in data to be processed in a first subsequence, the second target object is any target object which is continuous in data to be processed in a second subsequence, and the first subsequence and the second subsequence are two subsequences which are discontinuous in the continuous multiframe;

if the first target object and the second target object belong to the same target object, performing target interpolation on all data to be processed between the first subsequence and the second subsequence to obtain interpolated data to be processed, and marking the same target object with the same identification information until the identification information of all the same target objects in the interpolated continuous multi-frame data to be processed is the same;

Labeling each target object in the interpolated continuous multi-frame data to be processed to obtain a labeling result of each target object, wherein the labeling result of each target object comprises at least one of the category, the position and the gesture of each target object in a corresponding frame;

before determining whether the first target object and the second target object belong to the same target object according to the similarity of the first target object and the second target object, the method further comprises:

determining a first overall feature of the first target object based on pixel distribution features, depth features and attribute features of the first target object in each piece of data to be processed in the first subsequence;

determining a first global feature of the first target object based on pixel distribution features, depth features, and attribute features of the first target object in each piece of data to be processed in the first sub-sequence includes:

determining a pixel distribution distance metric value of the first target object according to the pixel distribution characteristics of the first target object in each piece of data to be processed in the first subsequence, determining a depth characteristic distance metric value of the first target object according to the depth characteristics of the first target object in each piece of data to be processed in the first subsequence, determining an attribute characteristic distance metric value of the first target object according to the attribute characteristics of the first target object in each piece of data to be processed in the first subsequence, and determining weighted average values of the pixel distribution distance metric value, the depth characteristic distance metric value and the attribute characteristic distance metric value of the first target object as a first integral characteristic of the first target object;

Determining a second overall characteristic of the second target object based on pixel distribution characteristics, depth characteristics and attribute characteristics of the second target object in each piece of data to be processed in the second subsequence;

determining a second overall feature of the second target object based on pixel distribution features, depth features, and attribute features of the second target object in each piece of data to be processed in the second sub-sequence, including:

determining a pixel distribution distance metric of the second target object according to the pixel distribution characteristics of the second target object in each piece of data to be processed in the second subsequence, determining a depth characteristic distance metric of the second target object according to the depth characteristics of the second target object in each piece of data to be processed in the second subsequence, determining an attribute characteristic distance metric of the second target object according to the attribute characteristics of the second target object in each piece of data to be processed in the second subsequence, and determining weighted average values of the pixel distribution distance metric, the depth characteristic distance metric and the attribute characteristic distance metric of the second target object as a second integral characteristic of the second target object;

Determining a similarity of the first target object and a second target object based on the first global feature and the second global feature;

determining a similarity of the first target object and a second target object based on the first global feature and the second global feature, comprising:

determining the similarity of the first target object and the second target object according to the integral feature difference value of the first integral feature and the second integral feature; if the integral characteristic difference value is smaller than or equal to a first preset difference value, the first target object and the second target object belong to the same target object, and if the integral characteristic difference value is larger than the first preset difference value, the first target object and the second target object do not belong to the same target object;

the first sub-sequence to-be-processed data comprise m-th to n-th frames of to-be-processed data, and the second sub-sequence to-be-processed data comprise p-th to q-th frames of to-be-processed data; correspondingly, the performing target interpolation on all the data to be processed between the first sub-sequence and the second sub-sequence to obtain interpolated data to be processed includes:

Performing target interpolation on the (n+1) -th frame to (p-1) -th frame to-be-processed data according to the first target object in the (n) -th frame to-be-processed data and the second target object in the (p) -th frame to-be-processed data to obtain interpolation target objects in the (n+1) -th to (p-1) -th frame to-be-processed data;

2. The method according to claim 1, wherein the determining continuous feature information of each target object based on pixel distribution features, depth features, and attribute features of each target object in the two adjacent frames of the data to be processed, performing target association matching on the continuous multiple frames of the data to be processed, includes:

3. The method according to claim 2, wherein determining continuous feature information of each target object based on pixel distribution feature values, depth feature values, and attribute feature values of each target object of two adjacent frames comprises:

Determining continuous characteristic information of each target object based on the distance measurement characteristic values of every two target objects; the continuous characteristic information comprises whether the target object is continuous in two adjacent frames of data to be processed.

4. A data processing method according to claim 3, wherein determining continuous feature information of each target object based on the distance metric feature values of the target objects comprises:

acquiring all distance measurement characteristic values associated with a current target object in the distance measurement characteristic values of every two target objects, wherein the current target object is any target object in the data to be processed of the subsequent frame;

if the minimum distance measurement characteristic value is smaller than or equal to a first preset threshold, determining that every two target objects corresponding to the minimum distance measurement characteristic value are the current target objects, and the current target objects are continuous in the data to be processed corresponding to two adjacent frames;

5. The method according to claim 1, further comprising, after labeling each piece of data to be processed based on the continuous feature information, a labeling result of each target object:

6. A data processing apparatus, comprising:

The labeling module is used for labeling each piece of data to be processed based on the continuous characteristic information to obtain a labeling result of each target object;

the labeling module is specifically used for: determining whether a first target object and a second target object belong to the same target object according to the similarity of the first target object and the second target object, wherein the first target object and the second target object are any two target objects which do not appear in the same frame, the first target object is any target object which is continuous in data to be processed in a first subsequence, the second target object is any target object which is continuous in data to be processed in a second subsequence, and the first subsequence and the second subsequence are two subsequences which are discontinuous in the continuous multiframe;

the labeling module is further configured to: determining a first overall feature of the first target object based on pixel distribution features, depth features and attribute features of the first target object in each piece of data to be processed in the first subsequence; determining a second overall characteristic of the second target object based on pixel distribution characteristics, depth characteristics and attribute characteristics of the second target object in each piece of data to be processed in the second subsequence; determining a similarity of the first target object and a second target object based on the first global feature and the second global feature;

the labeling module is specifically used for: determining a pixel distribution distance metric value of the first target object according to the pixel distribution characteristics of the first target object in each piece of data to be processed in the first subsequence, determining a depth characteristic distance metric value of the first target object according to the depth characteristics of the first target object in each piece of data to be processed in the first subsequence, determining an attribute characteristic distance metric value of the first target object according to the attribute characteristics of the first target object in each piece of data to be processed in the first subsequence, and determining weighted average values of the pixel distribution distance metric value, the depth characteristic distance metric value and the attribute characteristic distance metric value of the first target object as a first integral characteristic of the first target object;

The first sub-sequence to-be-processed data comprise m-th to n-th frames of to-be-processed data, and the second sub-sequence to-be-processed data comprise p-th to q-th frames of to-be-processed data; correspondingly, the labeling module is specifically configured to:

7. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the data processing method of any one of claims 1 to 5.

8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the data processing method of any one of claims 1 to 5.