CN114943748A

CN114943748A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN114943748A
Application number: CN202110169707.5A
Authority: CN
Inventors: 魏延恒; 张严浩; 郑赟; 潘攀; 徐盈辉
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2022-08-26

Abstract

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a frame image of video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image; determining a feature map according to the similarity between target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity; determining a classification result of each target object according to the feature map of each target object so as to determine a moving track of the target object in the video data; the accuracy of moving track identification can be improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, an electronic device, and a storage medium.

Background

The video multi-target track identification is a computer vision task, is mainly used for determining tracks of different targets in a video, and has important application in the fields of security protection, automatic driving and the like.

However, the current recognition method for the target movement track has low recognition accuracy.

Disclosure of Invention

The embodiment of the application provides a data processing method which can improve the accuracy of movement track analysis.

Correspondingly, the embodiment of the application also provides a data processing device, an electronic device and a storage medium, which are used for ensuring the realization and the application of the system.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring a frame image of video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image; determining a feature map according to the similarity between target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity; and determining the classification result of each target object according to the characteristic diagram of each target object so as to determine the movement track of the target object in the video data.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring a frame image of road video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image; determining a feature map according to the similarity between target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity; determining a classification result of each target object according to the feature map of each target object so as to determine a moving track of the target object in the road video data; and determining the behavior type of the target object according to the movement track so as to determine a corresponding behavior analysis result.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring a frame image of live video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image; determining a feature map according to the similarity between target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity; determining a classification result of each target object according to the feature map of each target object so as to determine a moving track of the target object in the live video data; and adding virtual information to the target object in the live video data according to the moving track.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: acquiring a frame image of the driving video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image; determining a feature map according to the similarity between target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity; determining the classification result of each target object according to the feature map of each target object so as to determine the moving track of the target object in the driving video data; and determining a driving instruction according to the movement track so as to control the vehicle to run.

In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method includes: providing an interactive page to acquire video data to be processed based on the interactive page; acquiring a frame image of video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image; determining a feature map according to the similarity between target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity; and determining the classification result of each target object according to the feature map of each target object to determine the movement track of the target object in the video data as an analysis result, and issuing the analysis result.

In order to solve the above problem, an embodiment of the present application discloses a data processing apparatus, where the apparatus includes: the video data acquisition module is used for acquiring a frame image of the video data and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image; the characteristic diagram acquisition module is used for determining a characteristic diagram according to the similarity between target objects in the frame image combination, wherein the similarity comprises characteristic similarity and/or space similarity; and the moving track acquisition module is used for determining the classification result of each target object according to the feature map of each target object so as to determine the moving track of the target object in the video data.

In order to solve the above problem, an embodiment of the present application discloses an electronic device, including: a processor; and a memory having executable code stored thereon, which when executed, causes the processor to perform the method as described in one or more of the above embodiments.

To address the above issues, embodiments of the present application disclose one or more machine-readable media having executable code stored thereon that, when executed, cause a processor to perform a method as described in one or more of the above embodiments.

Compared with the prior art, the embodiment of the application has the following advantages:

in the embodiment of the application, a frame image of video data can be obtained, and a feature vector and a frame image combination of a target object in a detection frame of the frame image are determined; then, according to the feature vectors of the target objects in the frame image combination, determining feature similarity between the target objects in the frame image combination; determining the spatial similarity between the target objects according to the detection frame; then, according to the feature similarity and the space similarity between the target objects, association can be established for the same target object in different frame images to form corresponding feature maps. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data. In the data association process, the data association can be performed according to the feature similarity and the space similarity between the target objects, so that the association of the features and the association of the space positions between the target objects can be considered, more accurate classification can be obtained, and the accuracy of the analysis of the moving track of the target objects in the video data is improved.

Drawings

FIG. 1 is a schematic flow chart diagram of a data processing method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a data processing method according to another embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 5 is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 6A is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 6B is a schematic flow chart diagram illustrating a data processing method according to yet another embodiment of the present application;

FIG. 7A is a schematic flow chart diagram of a data processing method according to yet another embodiment of the present application;

FIG. 7B is a schematic flow chart diagram illustrating a data processing method according to yet another embodiment of the present application;

FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 9 is a schematic block diagram of a data processing apparatus according to another embodiment of the present application;

FIG. 10 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

FIG. 11 is a schematic block diagram of a data processing apparatus according to yet another embodiment of the present application;

FIG. 12 is a schematic diagram of a data processing apparatus according to yet another embodiment of the present application;

fig. 13 is a schematic structural diagram of an exemplary apparatus provided in one embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

The method and the device for identifying the target track can be applied to the field of target track identification of video data, and the target track identification refers to identification of the track of the target in the video data. The method and the device for analyzing the moving track of the target object in the video data can analyze the characteristics and the positions of the target (or called target object) in different frame images in the video data, so that data association is performed between the target objects in the different frame images, and therefore the corresponding classification of the target objects in the frame images is obtained, and the moving track of the target objects in the video data is obtained.

Specifically, as shown in fig. 1, in the embodiment of the present application, a plurality of frame images of video data may be obtained, then, on one hand, a detection frame used for positioning a target object in the frame images may be determined, and feature recognition may be performed on the target object in the detection frame to obtain a feature vector; on the other hand, a target number of consecutive frame images may be acquired to form a frame image combination, which may also be referred to as a group of pictures (graph). After the frame image combination and the feature vectors of the target objects in the frame images are determined, the related objects corresponding to the target objects in the target frame images can be screened out according to the feature vectors of the target objects in the target frame images in the frame image combination, and the corresponding feature vector forming nodes are extracted, wherein the target frame images are the frame images in the frame image combination, the related objects of the target objects in the target frame images comprise a first object and a second object, the first object refers to an object with the feature vectors similar to the feature vectors of the target objects, and the second object refers to an object with the feature vectors similar to the feature vectors of the first object; the nodes include node features that characterize objects in the video data. After the nodes are determined, determining adjacent nodes according to the characteristic vector of the target object, and establishing a first connecting edge between the nodes; and determining nodes meeting the coincidence condition according to the coincidence degree between the detection frames of the target object, and adding a second connecting edge to form the feature graph. Then, the classification result of the target object can be determined according to the node characteristics of the nodes in the characteristic diagram and the edge characteristics of the connecting edges. After the classification result is determined, the target objects in each frame of image can be connected according to the classification result of the target objects, so as to obtain the moving track of the target objects in the video data. It should be noted that the trajectory diagram in the example shown in fig. 1 is only for convenience of understanding, and is not to be considered as a limitation on the movement trajectory.

The inventor of the embodiment of the application adopts various ways to identify the moving track, but finds that the identification accuracy of some identification ways is not high in the practical application process, for example, one way to identify the target track is to extract feature vectors of the target in different frame images in video data, compare feature similarities between target objects according to the feature vectors, perform data association on the target in different frame images in the video according to the feature similarities, determine the corresponding classification of the target, and further determine the track of the target in the video. However, in the process of data association, the inventors find that, if a feature matching method is adopted, the position of the target in the video is likely to jump, and the recognition effect on the target track is poor. Therefore, the inventor provides a new target trajectory identification method, and in the process of data association (establishing association between target objects), data association analysis can be performed according to the feature similarity and the spatial similarity of the target objects in different frame images, so that the association of features and the association of spatial positions between the target objects can be considered, more accurate classification can be obtained, and the accuracy of analysis of the movement trajectory of the target objects in video data can be improved.

The embodiment of the application can be applied to scenes for identifying the target track of the target object of the video data, for example, the application can be applied to scenes such as a security field, automatic driving, live video and the like, for example, the embodiment of the application can be applied to scenes in the security field, the security video can be analyzed, the feature vector and the detection frame of the target object are determined, the feature correlation and the spatial correlation between the target objects in different frame images are analyzed, the classification result of the target object is obtained, the track of the target object in the video is determined, and whether the behavior of the target object meets the specification (such as entering the area where the target object is forbidden by mistake) is determined. For another example, the embodiment of the present application may be applied to an automatic driving scenario, and the driving video data may be analyzed to determine a moving track of a target object (such as a vehicle, a pedestrian, and an obstacle) in the driving video data, so as to determine a corresponding driving instruction to control the driving of the vehicle. For another example, the embodiment of the application can also be applied to an analysis scene of a live video, and can identify the moving track of the face of the anchor, so as to add virtual information (such as a glasses special effect) to the face of the anchor, so as to improve the live effect.

The embodiment of the application provides a data processing method which can be applied to a processing end, wherein the processing end can be understood as equipment for acquiring video data and analyzing the video data. Specifically, as shown in fig. 2, the method includes:

step 202, obtaining a frame image of the video data, and determining a feature vector of a target object in a detection frame of the frame image and a frame image combination. The video data can be road video shot by a road camera, video shot by a community camera, live video, vehicle driving video and the like. After the frame images of the video data are obtained, on one hand, the characteristics of the target object can be extracted, and on the other hand, partial continuous frame images can be screened out to form a frame image combination. Specifically, on one hand, in the present embodiment, the target object may be identified, the number of the target objects in the frame image is determined, the corresponding detection frame is determined, and then the feature vector of the target object in the detection frame is extracted. Specifically, as an optional embodiment, the determining a feature vector of a target object in a detection frame of a frame image includes: determining a detection frame in the frame image, and extracting a detection image in the detection frame; and extracting the characteristics of the target object in the detected image to obtain a characteristic vector. The method and the device for detecting the target object in the frame image can identify the target object in the frame image, and determine the detection frame corresponding to the target object by adopting a frame Regression (Bounding Box Regression) mode, wherein the detection frame is used for positioning the target object. And then extracting the detection image in the detection frame, and further extracting the characteristics corresponding to the target object to obtain a characteristic vector. In an alternative example, the embodiment of the present application may adopt a pre-trained Convolutional Neural Network (CNN) model based on a Residual Network (ResNet) as a Network architecture (e.g., ResNet50) to extract features of a target object, where the Residual Network is characterized by easy optimization and can improve accuracy by increasing a considerable depth. The inner residual block uses jump connection, and the problem of gradient disappearance caused by depth increase in a deep neural network is relieved. The CNN model is a kind of feed forward Neural Networks (fed forward Neural Networks) containing convolution calculation and having a deep structure, and is one of the representative algorithms of deep learning (deep learning).

On the other hand, in the embodiment of the present application, a manner of dynamically updating a frame image combination may be adopted, and frame images in the frame image combination are continuously updated, so that different frame images can be continuously analyzed, specifically, as an optional embodiment, the step of determining the frame image combination includes: and acquiring continuous frame images of a target number as frame image combinations, and updating the frame images in the frame image combinations after determining the feature images corresponding to the frame image combinations. In this embodiment of the present application, the target number of frame images included in a frame image combination may be preset, and an empty group queue is established to add continuous frame images of the target number to the empty group queue to form a frame image combination for analysis, specifically, as an optional embodiment, the obtaining of the continuous frame images of the target number as the frame image combination includes: acquiring continuous frame images of a target number; and determining a graph group queue, and adding a target number of continuous frame images to the graph group queue to form a frame image combination. The image group queue may be understood as a task queue to be analyzed, and in the embodiment of the present application, a target number of frame images may be added to the task queue to be analyzed, so as to analyze the target number of frame images and determine a corresponding feature map. In addition, the embodiment of the application can continuously update the frame images in the frame image combination in a dynamic evolution mode, so that a plurality of groups of frame image combinations are formed, and the frame images of the video data are analyzed. Specifically, as an optional embodiment, the updating the frame image in the frame image combination includes: at least one frame image is deleted from the image group queue and a corresponding number of frame images are added to the image group queue to form an updated frame image combination. In this embodiment, a dynamic evolution manner may be adopted to continuously delete frame images from the graph group queue and add a corresponding number of frame images to form a frame image combination including a target number of consecutive frame images, and in this embodiment, nodes corresponding to the deleted frame images may be deleted and nodes corresponding to frame images that are not deleted may be multiplexed.

After the frame image combination and the feature vectors of the frame images are determined, in step 204, feature maps, which include feature similarities and/or spatial similarities, for associating the same target object in different frame images, may be determined according to similarities between target objects in the frame image combination. The characteristic graph comprises nodes and connecting edges, the nodes comprise node characteristics, and the node characteristics represent characteristic vectors of the target object; the connecting edges characterize the existence of associations between the nodes. The method and the device for determining the feature similarity of the target objects in the frame image combination can establish the association between the target objects according to at least one of the feature similarity and the space similarity between the target objects; determining the spatial similarity between the target objects according to the coincidence degree between the detection frames of the target objects; then, according to the feature similarity and/or the space similarity between the target objects, association can be established for the same target object to form a corresponding feature map. Specifically, as an optional embodiment, the determining the feature map according to the similarity between the target objects in the frame image combination includes: screening out related objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination, and extracting corresponding characteristic vectors to form nodes; and adding connecting edges between the nodes according to the similarity between the nodes to form a characteristic diagram corresponding to the target object in the target frame image.

According to the method and the device, the relevant objects corresponding to the target objects in the target frame images can be screened out according to the characteristic vectors of the target objects in the frame image combinations, and the corresponding characteristic vectors are extracted to form nodes. The target frame image is a frame image in a frame image combination, related objects of a target object in the target frame image comprise a first object and a second object, the first object refers to an object with a characteristic vector similar to that of the target object, and the second object refers to an object with a characteristic vector similar to that of the first object. Specifically, as an optional embodiment, the screening out, according to a feature vector of a target object in a frame image combination, a relevant object corresponding to the target object in a target frame image, and extracting a corresponding feature vector to form a node includes: screening out a first object related to a target object of the target frame image according to the characteristic vector of the target object in the frame image combination, and screening out a second object related to the first object; determining the feature vectors of the first object and the second object to form a node.

The method and the device for processing the image data can preset the preset number of the first objects and the preset number of the second objects, and perform similarity sorting on the objects similar to the target objects in the target frame image based on the similarity of feature vectors between the target objects, so as to screen out the first objects with the first preset number, and sort the target objects similar to the first objects by taking each first object as a two-hop node (or a two-hop neighbor), so as to screen out the second objects with the second preset number, and further obtain the target objects in the target frame image, the first objects of the target objects and the feature vectors of the second objects of the target objects, and form a plurality of nodes corresponding to the target objects in the target frame image. In an alternative example, the embodiment of the present application may use a proximity algorithm (K-Nearest Neighbor, KNN) to determine the first object and the second object to obtain corresponding nodes, where the proximity algorithm is a method for classifying each record in the data set.

After determining a plurality of nodes corresponding to a target object in a target frame image, the plurality of nodes corresponding to the target object in the target frame image may include a node corresponding to the target object and may also include a node not corresponding to the target object, and therefore, in the embodiment of the present application, a connection edge may be added between the nodes in a manner of adding a connection edge between the nodes. Specifically, as an optional embodiment, the adding a connection edge between nodes according to the similarity between nodes includes: determining feature similarity among the nodes according to the feature vectors of the nodes, and determining adjacent nodes of the feature vectors according to the feature similarity so as to add a first connecting edge; and determining the spatial similarity among the nodes according to the coincidence degree among the detection frames of the nodes, and determining the nodes meeting the coincidence condition so as to add a second connecting edge.

The nodes adjacent to the feature vector can be understood as the nodes with the highest feature similarity, the feature vector of each node can be obtained in the embodiment of the application, then the feature similarity between the feature vectors of every two nodes is determined, and a first connecting edge is added between the nodes with the highest feature similarity. The degree of overlap may also be referred to as an interaction over Union (IoU), which refers to a ratio between intersections and unions between detection frames. The motion of the target object in the continuous frame images is a continuous process, and therefore, in the embodiment of the present application, in addition to establishing a correlation (adding a connecting edge) between the target objects according to the feature similarity of the target objects, an association may also be established between the target objects according to the coincidence degree between the detection frames of the target objects in different frame images, and a coincidence condition may be preset to screen nodes according to the coincidence condition so as to add a connecting edge between the nodes. For example, the overlap condition may be preset to IoU exceeding 0.5, so that two-by-two matching analysis may be performed between nodes, and a second connecting edge may be added between IoU and 0.5 node combinations.

After determining the feature maps of the target objects, in step 206, the classification result of each target object may be determined according to the feature map of each target object, so as to determine the movement track of the target object in the video data. After the feature graph of the target object is determined, feature extraction can be performed on nodes and connecting edges in the feature graph, so that node features and edge features are obtained, and then a classification result of the target object is determined. The classification of the target object can be understood as dividing the detection frames belonging to the same target into the same category, and connecting the detection frames in each frame of image to obtain the moving track of the target object in the video data.

In the embodiment of the present application, a Neural network model (NN) may be used to perform recognition processing on the feature map to obtain a classification result of the target object, where the Neural network model is a complex network system formed by widely interconnecting a large number of simple processing units (called neurons), and is a highly complex nonlinear dynamical learning system. Specifically, as an optional embodiment, the determining the classification result of each target object according to the feature map of each target object includes: and inputting the feature graph into a feature processing model, and determining a classification result of the target object, wherein the feature processing model is used for determining node features of the nodes and edge features of the connecting edges, and determining the classification result of the target object. The feature processing model may be understood as one of Neural Network models, for example, the feature processing model of the embodiment of the present application may adopt Graph Convolutional Network (GCN), Graph Neural Networks (GNN), and the like. In an optional embodiment, the feature processing model may adopt a graph convolution network (or graph convolution model), where the graph convolution model includes a convolution layer and a full connected layer (FC), and the convolution layer is used to extract node features of the feature graph and edge features of connected edges and perform optimization processing, for example, the extracted node features and edge features may be subjected to dimension reduction processing to subsequently classify the target object. Each node of the fully connected layer is connected with all nodes of the previous layer and is used for integrating the characteristics extracted by the convolutional layer so as to determine a classification result. In an example, the feature processing model may adopt an evolution Graph Convolutional neural Network (E-GCN) in a GCN model, and in an embodiment of the present application, four Convolutional layers and two full-connected layers may be set to process data, so as to obtain a classification result of a target object. The graph convolution model can still have high analysis accuracy by performing data processing before being untrained, and therefore, in the embodiment of the application, the graph convolution model may adopt a trained graph convolution model or an untrained graph convolution model to perform data processing, and may be specifically configured according to requirements.

In addition, in the training process of the feature processing model, the feature processing model can integrate the features extracted from the convolutional layer through the full-link layer, and then determine a classification result and a corresponding loss function, so that the feature processing model can be reversely adjusted according to the loss function, and the trained feature processing model can be obtained. In the training process of the feature processing model, part of the existing process of determining the loss function by the full connection layer only considers whether the classification result is correct, but does not consider the probability of each classification corresponding to the target object, so that the edge feature of the feature map is adjusted by using the method, and the classification error is easily caused. Specifically, as an optional embodiment, the method further includes: and determining the probability of the target object in each frame image corresponding to each category through a full connection layer of the feature processing model, and determining the corresponding cross entropy as a loss function. Cross Entropy (Cross Entropy) is used primarily to measure the dissimilarity between two probability distributions. In the training process of the feature processing model, the corresponding cross entropy can be determined to serve as the loss function according to the probability that the target object corresponds to each category, the probability condition that the target object corresponds to different categories can be considered, so that the edge features can be adjusted according to the loss function, and a more accurate classification result can be obtained.

After classifying the target objects, the present application may connect target objects belonging to the same category in different frame images to obtain corresponding movement tracks, specifically, as an optional embodiment, the determining the movement tracks of the target objects in the video data includes: and connecting the target objects in the frame images according to the classification result of the target objects in the frame images to obtain the moving track of the target objects in the video data. After the movement trajectory is determined, other data processing may be performed according to the movement trajectory, for example, in a scenario of motion prediction, a next movement trajectory of the target object may be predicted according to the movement trajectory; for another example, in an automatic driving scenario, a driving instruction that the vehicle needs to execute may be determined according to the movement trajectory; for another example, in a road security scene, whether a vehicle or a pedestrian violates a traffic rule or not may be determined according to the movement trajectory, and then a behavior analysis result may be output.

In addition, in the training process of the feature processing model, the corrected moving track can be determined in a manual correction mode, so that the loss function of the feature processing model can be determined through the corrected moving track. Specifically, as an optional embodiment, the method further includes: providing a correction page to show the moving track of the target object through the correction page; and acquiring correction information of the moving track of the target object, and determining a correction result so as to determine a loss function of the characteristic processing model through the correction result. In the training process of the feature processing model, a corresponding movement track can be determined according to a classification result output by the feature processing model, and the movement track is displayed in a correction page, in addition, the correction page can comprise a calibration control for the movement track, an analyst can correct the movement track of a target object in a video through triggering of the calibration control, so that a corresponding correction result is determined, and then a loss function can be determined according to the difference between the correction result (the corrected movement track) and the movement track of the target object, so that the feature processing model is reversely adjusted through the loss function, and therefore the accuracy of the feature processing model is improved.

In the embodiment of the application, a frame image of video data can be obtained, and a feature vector and a frame image combination of a target object in a detection frame of the frame image are determined; then, according to the feature vectors of the target objects in the frame image combination, determining feature similarity among the target objects in the frame image combination; determining the spatial similarity between the target objects according to the detection frame; then, according to the feature similarity and the space similarity between the target objects, association can be established for the same target object in different frame images to form corresponding feature maps. And determining a classification result corresponding to the target object according to the feature map, and obtaining a moving track of the target object in the video data. In the data association process, the data association can be performed according to the feature similarity and the space similarity between the target objects, so that the association of the features and the association of the space positions between the target objects can be considered, more accurate classification can be obtained, and the accuracy of the analysis of the moving track of the target objects in the video data is improved.

On the basis of the foregoing embodiments, an embodiment of the present application further provides a data processing method, which can be applied to a processing end, and specifically, as shown in fig. 3, the method includes:

step 302, acquiring a frame image of the video data.

Step 304, determining a graph group queue, and adding a target number of continuous frame images to the graph group queue to form a frame image combination. As an optional embodiment, the present embodiment may adopt a dynamic evolution manner to continuously update the frame image combination, and specifically, the step of updating the frame image in the frame image combination includes: at least one frame image is removed from the image group queue and a corresponding number of frame images is added to the image group queue to form a frame image combination.

Step 306, determining a detection frame in the frame image, and extracting a detection image in the detection frame.

And 308, extracting the characteristics of the target object in the detected image to obtain a characteristic vector.

And 310, screening out a first object related to the target object of the target frame image according to the characteristic vector of the target object in the frame image combination, and screening out a second object related to the first object.

Step 312, determining the feature vectors of the first object and the second object to form a node.

Step 314, determining feature similarity between the nodes according to the feature vectors of the nodes, and determining the nodes adjacent to the feature vectors according to the feature similarity to add the first connecting edge.

And step 316, determining the spatial similarity among the nodes according to the coincidence degree among the detection frames of the nodes, determining the nodes meeting the coincidence condition, adding a second connecting edge, and forming a characteristic diagram of the target object in the corresponding target frame image.

And step 318, inputting the feature map into the feature processing model, and determining a classification result of the target object.

And step 320, connecting the target objects in the frame images according to the classification result of the target objects in the frame images to obtain the moving track of the target objects in the video data.

In the embodiment of the application, frame images of video data can be acquired, then, on one hand, an image group queue can be determined, and a target number of continuous frame images are added to the image group queue to form a frame image combination; on the other hand, a detection frame used for positioning the target object in the frame image can be determined, and feature extraction is performed on the target object in the detection frame to obtain a feature vector of the target object. Then, screening out a first object and a second object corresponding to the target object in the target frame image according to the characteristic vector of the target object in the frame image combination, and further extracting the corresponding characteristic vector to form a node; then, according to the feature similarity between the nodes and the coincidence degree of the detection frame, a connection edge is added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the feature map, and obtaining a moving track of the target object in the video data.

On the basis of the foregoing embodiment, an embodiment of the present application further provides a data processing method, which can be applied to a processing end, where the processing end can be understood as a device for receiving and analyzing video data, the method can analyze target objects in different frame images in road video data, determine a detection frame and a feature vector of the target object, perform data association on the target objects in the different frame images, and further determine a classification result of the target object, so as to determine a moving track of the target object in the road video data, and then determine whether the target object has a chapter violation behavior according to the moving track, so as to obtain a behavior analysis result. Specifically, as shown in fig. 4, the method includes:

step 402, obtaining a frame image of the road video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image. As an optional embodiment, the road video data may be acquired by a road video capture component, and specifically, the method further includes: and acquiring road video data according to the road video acquisition assembly. The road video acquisition assembly may be a camera disposed on the road. As an alternative embodiment, determining a feature vector of a target object in a detection frame of a frame image includes: determining a detection frame in the frame image, and extracting a detection image in the detection frame; and extracting the characteristics of the target object in the detection image to obtain a characteristic vector. The detection frame is used for positioning a target object, and the target object in the road video data can be a vehicle, a pedestrian, an obstacle, an animal and the like. As an alternative embodiment, the step of determining a combination of frame images includes: and acquiring continuous frame images of a target number as frame image combinations, and updating the frame images in the frame image combinations after determining the feature images corresponding to the frame image combinations. As an optional example, the acquiring of the target number of consecutive frame images, as a combination of frame images, includes: acquiring continuous frame images of a target number; and determining a graph group queue, and adding a target number of continuous frame images to the graph group queue to form a frame image combination. The updating of the frame images in the frame image combination comprises: at least one frame image is deleted from the image group queue and a corresponding number of frame images are added to the image group queue to form an updated frame image combination.

Step 404, determining a feature map according to similarities between target objects in the frame image combination, where the similarities include feature similarities and/or spatial similarities, and the feature map is used to associate the same target object in different frame images, where as an optional embodiment, step 404 specifically includes: screening out related objects corresponding to the target object in the target frame image according to the feature vectors of the target object in the frame image combination, and extracting corresponding feature vectors to form nodes; and adding connecting edges between the nodes according to the similarity between the nodes to form a characteristic diagram corresponding to the target object in the target frame image. As an optional embodiment, the screening out, according to a feature vector of a target object in a frame image combination, a relevant object corresponding to the target object in a target frame image, and extracting a corresponding feature vector to form a node includes: screening out a first object related to a target object of the target frame image according to the characteristic vector of the target object in the frame image combination, and screening out a second object related to the first object; determining the feature vectors of the first object and the second object to form a node. As an optional embodiment, the adding a connection edge between nodes according to the similarity between nodes includes: determining feature similarity among the nodes according to the feature vectors of the nodes, and determining adjacent nodes of the feature vectors according to the feature similarity so as to add a first connecting edge; and according to the coincidence degree between the detection frames of the nodes, determining the spatial similarity between the nodes, and determining the nodes meeting the coincidence condition so as to add a second connecting edge.

And step 406, determining a classification result of each target object according to the feature map of each target object, so as to determine a movement track of the target object in the road video data. As an optional embodiment, the determining the classification result of each target object according to the feature map of each target object includes: and inputting the feature graph into a feature processing model, and determining a classification result of the target object, wherein the feature processing model is used for determining node features of the nodes and edge features of the connecting edges, and determining the classification result of the target object. As an optional embodiment, the determining a moving track of the target object in the road video data includes: and connecting the target objects in the frame images according to the classification result of the target objects in the frame images to obtain the moving track of the target objects in the road video data.

And step 408, determining the behavior type of the target object according to the movement track so as to determine a corresponding behavior analysis result. Specifically, in one example, the moving track may be a moving track of a vehicle, and the embodiment of the application may collect the moving track of the vehicle within a period of time (e.g., three seconds), so as to determine the running speed of the vehicle, and further determine whether the vehicle violates (e.g., exceeds the speed) according to the running speed of the vehicle, so as to output a behavior analysis result. In another example, the moving track may be a moving track of a pedestrian, and the embodiment of the application may determine the walking direction of the pedestrian according to a moving machine of the pedestrian, so as to determine whether the pedestrian moves in a wrong direction, and output a corresponding behavior analysis result.

The implementation of this embodiment is similar to the implementation of the foregoing embodiment, and for the specific implementation, reference may be made to the specific implementation of the foregoing embodiment, which is not described herein again.

In the embodiment of the application, road video data can be acquired through a road camera, then a frame image of the road video data is acquired, and a feature vector and a frame image combination of a target object in a detection frame of the frame image are determined; then, screening out related objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination, and extracting corresponding characteristic vectors to form nodes; then, according to the feature similarity and the space similarity between the nodes, connecting edges are added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data. And then, according to the moving track, determining whether the target object has a violation behavior (such as driving in the wrong direction, driving without pressing a signal lamp and the like) to obtain a behavior analysis result.

On the basis of the foregoing embodiments, the present application provides a data processing method, which can be applied to a processing end, where the processing end can be understood as a device for acquiring video data and analyzing the video data, the method can analyze target objects in different frame images in live video data, determine a detection frame and a feature vector of the target object, perform data association on the target objects in the different frame images, and further determine a classification result of the target object, so as to determine a movement trajectory of the target object in the video data, and then add virtual information to the target object in the video data according to the movement trajectory, thereby achieving an effect of enhancing reality. Specifically, as shown in fig. 5, the method includes:

step 502, acquiring a frame image of live video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image.

Step 504, determining a feature map according to the similarity between the target objects in the frame image combination, wherein the similarity includes feature similarity and/or spatial similarity, and the feature map is used for associating the same target object in different frame images. Specifically, as an optional embodiment, the step 504 specifically includes: screening out related objects corresponding to the target object in the target frame image according to the feature vectors of the target object in the frame image combination, and extracting corresponding feature vectors to form nodes; and adding connecting edges between the nodes according to the similarity between the nodes to form a characteristic diagram corresponding to the target object in the target frame image.

Step 506, determining a classification result of each target object according to the feature map of each target object to determine a moving track of the target object in the live video data.

And step 508, adding virtual information to the target object in the live video data according to the moving track.

The implementation manner of this embodiment is similar to that of the above embodiment, and the detailed implementation manner of the above embodiment may be referred to, and is not described herein again.

In this embodiment, virtual information may be added to a target object in a live video according to an Augmented Reality (AR) technology, which is a technology that skillfully fuses virtual information and a real world. In the embodiment of the application, the frame image of the live video data can be obtained, and the feature vector and the frame image combination of the target object in the detection frame of the frame image are determined; then, screening out related objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination, and extracting corresponding characteristic vectors to form nodes; then, according to the feature similarity and the space similarity between the nodes, connecting edges are added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the feature map, and obtaining a moving track of the target object in the video data. And then, virtual information can be added to the target object according to the moving track, so that the live broadcast effect is improved. For example, virtual information of face decoration (such as sunglasses) can be added to the face of the anchor according to the moving track of the face of the anchor in the live video, so that the effect of combining virtual and reality is achieved.

On the basis of the above embodiments, the present application provides a data processing method, which can be applied to a processing end, where the processing end can be understood as a device for acquiring video data and analyzing the video data, and the method can be applied to an automatic driving scene, and can analyze target objects (such as vehicles, pedestrians, and obstacles) in different frame images in driving video data during driving of a vehicle, determine a detection frame and a feature vector of the target objects, then perform data association on the target objects in the different frame images, and further determine a classification result of the target objects to determine a moving track of the target objects in the video data, and then determine a driving instruction according to the moving track to control the driving of the vehicle. Specifically, as shown in fig. 6A, the method includes:

step 602, obtaining a frame image of the driving video data, and determining a feature vector of a target object in a detection frame of the frame image and a frame image combination.

Step 604, determining a feature map according to the similarity between the target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity, and the feature map is used for associating the same target object in different frame images. Specifically, as an optional embodiment, the step 604 specifically includes: screening out related objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination, and extracting corresponding characteristic vectors to form nodes; and adding connecting edges between the nodes according to the similarity between the nodes to form a characteristic diagram corresponding to the target object in the target frame image.

And step 606, determining the classification result of each target object according to the feature map of each target object so as to determine the movement track of each target object in the driving video data.

And step 608, determining a driving instruction according to the moving track so as to control the vehicle to run.

According to the method and the device, the spatial movement track of the target object in the three-dimensional space can be determined according to the movement track of the target object in the driving video data and by combining the depth information of the target object in the driving video data, and then the corresponding driving instruction is determined. Specifically, as an optional embodiment, the determining a driving instruction according to the movement trajectory includes: acquiring depth information of a target object in each frame of image; determining a spatial movement track of the target object according to the depth information and the movement track of the target object; and determining a driving instruction according to the space movement track so as to control the vehicle to run. According to the embodiment of the application, a depth recognition model can be preset, the depth information of the target object in each frame of image is recognized by using the depth recognition model, and then the space moving track corresponding to the three-dimensional space is determined according to the depth information and the moving track of the target object in the video data. Then, a driving instruction may be determined according to a spatial movement trajectory of the target object in the three-dimensional space and a spatial movement trajectory of the vehicle in the three-dimensional space, and specifically, as an optional embodiment, the determining the driving instruction according to the spatial movement trajectory includes: determining a moving path of the vehicle according to the space moving track of the target object; and determining a corresponding driving instruction according to the moving path so as to control the vehicle to run. According to the embodiment, the behaviors of overtaking, following and the like of the vehicle can be planned according to the space moving track of each target object, so that a corresponding moving path is determined, and a corresponding driving instruction is determined, wherein the driving instruction can comprise instructions of left turning, right turning, acceleration, deceleration, braking and the like, and the vehicle is controlled.

In the embodiment of the application, the frame image of the driving video data can be obtained, and the feature vector and the frame image combination of the target object in the detection frame of the frame image are determined; then, according to the characteristic vector of the target object in the frame image combination, screening out a related object corresponding to the target object in the target frame image, and extracting the corresponding characteristic vector to form a node; then, according to the feature similarity and the space similarity between the nodes, connecting edges are added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data. Then, according to the moving track, corresponding driving instructions can be determined, such as driving instructions for deceleration, acceleration, turning, braking and the like can be determined, so as to control the running of the vehicle.

Fig. 6B provides a comparative schematic diagram of determining a target movement trajectory by using a characteristic association method and a method according to an embodiment of the present application, as shown in fig. 6B, in the process of determining the movement trajectory in the feature correlation manner (e.g., the process on the left side of fig. 6B), in the frame images of the first frame and the third frame on the left side, the vehicle 2 is not occluded, and in the frame image of the second frame on the left, the vehicle 2 is partially occluded, and therefore, in the data association by the feature association method, the similarity between the features in the frame image of the second frame on the left side of the vehicle 2 and the features in the frame images of the other two frames is low, and the association may not be performed, and therefore, in the data association process by means of feature association, the vehicle 2 in the frame images of the first frame and the third frame may be subjected to data association, so as to determine a movement track (such as track map 1).

In the process of determining the movement trajectory by using the method of the present embodiment (feature similarity and spatial similarity) (as in the process on the right side of fig. 6B), the vehicle 2 is not occluded in the frame images of the first and third frames on the right side, and therefore, the vehicle 2 may be associated by using the feature correlation, and the vehicle 2 is partially occluded in the frame image of the second frame on the right side, and therefore, the overlap ratio of the detection frames of the target object in the frame images of the first and second frames on the right side is determined by using the spatial correlation, so that the vehicles in the frame images of the first and second frames on the right side are associated, and further, the movement trajectory of the vehicle is determined (as in the trajectory diagram 2). Therefore, in the scene of automatic driving of the vehicle, the multi-target track identification can be carried out according to the characteristic and the space association mode, the probability that the position of the target jumps can be reduced, the relative position between the obstacle and the vehicle can be more accurately determined, and the safety of automatic driving is improved.

On the basis of the above embodiments, the present application provides a data processing method, which can be applied to a processing end, where the processing end can be understood as a device that interacts with a terminal to receive video data and analyze the video data, and the method can provide an interaction page for the terminal to facilitate a user to upload video data to be analyzed; after receiving the uploaded video data, the moving trajectory of each target object may be analyzed to form an analysis result, and the analysis result is issued through an interactive page, specifically, as shown in fig. 7A, the method includes:

step 702, providing an interactive page to obtain video data to be processed based on the interactive page.

Step 704, obtaining a frame image of the video data, and determining a feature vector of a target object in a detection frame of the frame image and a frame image combination.

Step 706, determining a feature map according to the similarity between the target objects in the frame image combination, wherein the similarity includes feature similarity and/or spatial similarity, and the feature map is used for associating the same target object in different frame images.

Step 708, determining a classification result of each target object according to the feature map of each target object to determine a movement track of the target object in the video data as an analysis result, and issuing the analysis result.

As shown in fig. 7B, in the embodiment of the present application, the processing end may provide an interactive page to the terminal, where the interactive page includes a data upload control, and a user of the terminal may upload video data to be processed to the processing end by triggering the data upload control. The processing end can acquire a frame image of the video data and determine a feature vector and a frame image combination of a target object in a detection frame of the frame image; then, according to the feature vectors of the target objects in the frame image combination, determining feature similarity between the target objects in the frame image combination; determining the spatial similarity between the target objects according to the detection frame; then, according to the feature similarity and the space similarity between the target objects, association can be established for the same target object in different frame images to form corresponding feature maps. And then according to the feature map, determining a classification result corresponding to the target object, obtaining a moving track of the target object in the video data as an analysis result, and issuing the analysis result to display the analysis result in the interactive page. According to the embodiment of the application, the processing terminal can be used for providing the service for identifying the track of the target for the terminal, and the user can conveniently identify the track of the target in the video data.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 8, the data processing apparatus may specifically include the following modules:

the video data obtaining module 802 is configured to obtain a frame image of the video data, and determine a feature vector of a target object in a detection frame of the frame image and a combination of the frame image.

The feature map obtaining module 804 is configured to determine a feature map according to similarities between target objects in a frame image combination, where the similarities include feature similarities and/or spatial similarities, and the feature map is used to associate the same target object in different frame images. As an alternative embodiment, the feature map obtaining module 804 includes: the relevant node acquisition module is used for screening out relevant objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination and extracting the corresponding characteristic vectors to form nodes; and the incidence relation acquisition module is used for adding connecting edges among the nodes according to the similarity among the nodes to form a characteristic diagram corresponding to the target object in the target frame image.

A moving track obtaining module 806, configured to determine a classification result of each target object according to the feature map of each target object, so as to determine a moving track of the target object in the video data.

In summary, in the embodiment of the present application, a frame image of video data may be obtained, and a feature vector and a frame image combination of a target object in a detection frame of the frame image are determined; then, according to the feature vectors of the target objects in the frame image combination, determining feature similarity between the target objects in the frame image combination; determining the spatial similarity between the target objects according to the detection frame; then, according to the feature similarity and the space similarity between the target objects, association can be established for the same target object in different frame images to form corresponding feature maps. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data. In the data association process, the data association can be performed according to the feature similarity and the space similarity between the target objects, so that the association of the features and the association of the space positions between the target objects can be considered, more accurate classification can be obtained, and the accuracy of the analysis of the moving track of the target objects in the video data is improved.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, which may specifically include the following modules:

and the frame image acquisition processing module is used for acquiring a frame image of the video data.

And the frame image combination acquisition processing module is used for determining the image group queue and adding a target number of continuous frame images into the image group queue to form a frame image combination. As an optional embodiment, this embodiment may adopt a dynamic evolution manner to continuously update the frame image combination, and specifically, the frame image combination obtaining processing module is further configured to delete at least one frame image from the image group queue and add a corresponding number of frame images to the image group queue to form the frame image combination.

And the detection image acquisition processing module is used for determining a detection frame in the frame image and extracting a detection image in the detection frame.

And the characteristic vector acquisition processing module is used for extracting the characteristics of the target object in the detected image to obtain the characteristic vector.

And the related object screening processing module is used for screening out a first object related to the target object of the target frame image and screening out a second object related to the first object according to the feature vector of the target object in the frame image combination.

And the related node acquisition processing module is used for determining the feature vectors of the first object and the second object to form a node.

And the first connection processing module is used for determining the feature similarity between the nodes according to the feature vector of each node and determining the adjacent nodes of the feature vector according to the feature similarity so as to add a first connection edge.

And the second connection processing module is used for determining the spatial similarity between the nodes according to the coincidence degree between the detection frames of the nodes, determining the nodes meeting the coincidence condition, and adding a second connection edge to form a characteristic diagram corresponding to the target object in the target frame image.

And the feature extraction processing module is used for inputting the feature map into the feature processing model and determining the classification result of the target object.

And the moving track acquisition processing module is used for connecting the target objects in the frame images according to the classification result of the target objects in the frame images to obtain the moving track of the target objects in the video data.

In the embodiment of the application, frame images of video data can be acquired, then, on one hand, an image group queue can be determined, and a target number of continuous frame images are added to the image group queue to form a frame image combination; on the other hand, a detection frame used for positioning the target object in the frame image can be determined, and feature extraction is performed on the target object in the detection frame to obtain a feature vector of the target object. Then, screening out a first object and a second object corresponding to the target object in the target frame image according to the characteristic vector of the target object in the frame image combination, and further extracting the corresponding characteristic vector to form a node; then, according to the feature similarity between the nodes and the coincidence degree of the detection frame, a connection edge is added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 9, the data processing apparatus may specifically include the following modules:

the video data determining module 902 is configured to obtain a frame image of the road video data, and determine a feature vector and a frame image combination of a target object in a detection frame of the frame image. As an optional embodiment, the road video data may be acquired by a road video capture component, and specifically, the apparatus further includes: and the video data acquisition processing module is used for acquiring road video data according to the road video acquisition assembly. The road video acquisition assembly may be a camera disposed on the road. As an optional embodiment, the video data determining module 902 specifically includes: determining a detection frame in the frame image, and extracting a detection image in the detection frame; and extracting the characteristics of the target object in the detected image to obtain a characteristic vector. The detection frame is used for positioning a target object, and the target object in the road video data can be a vehicle, a pedestrian, an obstacle, an animal and the like. As an optional embodiment, the video data determining module 902 specifically includes: and acquiring continuous frame images of a target number as frame image combinations, and updating the frame images in the frame image combinations after determining the feature images corresponding to the frame image combinations. As an optional example, the video data determining module 902 specifically includes: acquiring continuous frame images of a target number; and determining a graph group queue, and adding a target number of continuous frame images into the graph group queue to form a frame image combination. The updating of the frame images in the frame image combination comprises: at least one frame image is deleted from the image group queue and a corresponding number of frame images are added to the image group queue to form an updated frame image combination.

The feature map determining module 904 determines a feature map according to similarity between target objects in a frame image combination, where the similarity includes feature similarity and/or spatial similarity, and the feature map is used to associate a same target object in different frame images, and as an optional embodiment, the feature map determining module 904 specifically includes: the relevant node determining module is used for screening out relevant objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination and extracting the corresponding characteristic vectors to form nodes; and the incidence relation determining module is used for adding connecting edges among the nodes according to the similarity among the nodes to form a characteristic diagram corresponding to the target object in the target frame image. As an optional embodiment, the relevant node determining module specifically includes: screening out a first object related to a target object of the target frame image according to the characteristic vector of the target object in the frame image combination, and screening out a second object related to the first object; determining the feature vectors of the first object and the second object to form a node. As an optional embodiment, the association relation determining module specifically includes: determining feature similarity among the nodes according to the feature vectors of the nodes, and determining adjacent nodes of the feature vectors according to the feature similarity so as to add a first connecting edge; and determining the spatial similarity among the nodes according to the coincidence degree among the detection frames of the nodes, and determining the nodes meeting the coincidence condition so as to add a second connecting edge.

A moving track determining module 906, configured to determine a classification result of each target object according to the feature map of each target object, so as to determine a moving track of the target object in the road video data. As an optional embodiment, the moving track determining module 906 specifically includes: and inputting the feature graph into a feature processing model, and determining a classification result of the target object, wherein the feature processing model is used for determining node features of the nodes and edge features of the connecting edges, and determining the classification result of the target object. As an optional embodiment, the moving track determining module 906 specifically includes: and connecting the target objects in the frame images according to the classification result of the target objects in the frame images to obtain the moving track of the target objects in the road video data.

A behavior result determining module 908, configured to determine a behavior type of the target object according to the movement trajectory to determine a corresponding behavior analysis result.

In the embodiment of the application, a frame image of road video data can be obtained, and a feature vector and a frame image combination of a target object in a detection frame of the frame image are determined; then, according to the characteristic vector of the target object in the frame image combination, screening out a related object corresponding to the target object in the target frame image, and extracting the corresponding characteristic vector to form a node; then, according to the feature similarity and the space similarity between the nodes, connecting edges are added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data. And then, according to the moving track, determining whether the target object has a violation behavior (such as driving in the wrong direction, driving without pressing a signal lamp and the like) to obtain a behavior analysis result.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 10, the data processing apparatus may specifically include the following modules:

the video data obtaining module 1002 is configured to obtain a frame image of live video data, and determine a feature vector and a frame image combination of a target object in a detection frame of the frame image.

The feature map obtaining module 1004 is configured to determine a feature map according to similarities between target objects in the frame image combination, where the similarities include feature similarities and/or spatial similarities, and the feature map is used to associate the same target object in different frame images. As an optional embodiment, the feature map obtaining module 1004 specifically includes: the relevant node acquisition module is used for screening out relevant objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination and extracting the corresponding characteristic vectors to form nodes; and the incidence relation obtaining module is used for adding connecting edges among the nodes according to the similarity among the nodes to form a characteristic diagram corresponding to the target object in the target frame image.

A moving track obtaining module 1006, configured to determine a classification result of each target object according to the feature map of each target object, so as to determine a moving track of the target object in the live video data.

And a virtual information adding module 1008, configured to add virtual information to the target object in the live video data according to the moving track.

The embodiment can add virtual information to the target object in the live video. In the embodiment of the application, the frame image of the live video data can be obtained, and the feature vector and the frame image combination of the target object in the detection frame of the frame image are determined; then, according to the characteristic vector of the target object in the frame image combination, screening out a related object corresponding to the target object in the target frame image, and extracting the corresponding characteristic vector to form a node; then, according to the feature similarity and the space similarity between the nodes, connecting edges are added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data. And then, virtual information can be added to the target object according to the moving track, so that the live broadcasting effect is improved. For example, virtual information of face decoration (such as sunglasses) can be added to the face of the anchor according to the moving track of the face of the anchor in the live video, so that the effect of combining virtual and reality is achieved.

On the basis of the foregoing embodiment, this embodiment further provides a data processing apparatus, and with reference to fig. 11, the data processing apparatus may specifically include the following modules:

the video data acquisition module 1102 is configured to acquire a frame image of the driving video data, and determine a feature vector and a frame image combination of a target object in a detection frame of the frame image.

The feature map acquisition module 1104 is configured to determine a feature map according to similarities between target objects in the frame image combination, where the similarities include feature similarities and/or spatial similarities, and the feature map is used to associate the same target object in different frame images. Specifically, as an optional embodiment, the characteristic diagram acquiring module 1104 specifically includes: the relevant node acquisition module is used for screening out relevant objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination and extracting the corresponding characteristic vectors to form nodes; and the associated information acquisition module is used for adding connecting edges among the nodes according to the similarity among the nodes to form a characteristic diagram of the target object in the corresponding target frame image.

A moving track acquiring module 1106, configured to determine a classification result of each target object according to the feature map of each target object, so as to determine a moving track of the target object in the driving video data.

And the driving control module 1108 is configured to determine a driving instruction according to the moving track, so as to control the vehicle to run.

According to the method and the device, the spatial movement track of the target object in the three-dimensional space can be determined according to the movement track of the target object in the driving video data and by combining the depth information of the target object in the driving video data, and then the corresponding driving instruction is determined. Specifically, as an optional embodiment, the driving control module 1108 specifically includes: the depth information acquisition processing module is used for acquiring the depth information of the target object in each frame of image; the spatial movement track acquisition processing module is used for determining the spatial movement track of the target object according to the depth information and the movement track of the target object; and the driving instruction acquisition processing module is used for determining a driving instruction according to the space movement track so as to control the vehicle to run. According to the method and the device, a depth recognition model can be preset, the depth information of the target object in each frame of image is recognized by the depth recognition model, and then the space movement track corresponding to the three-dimensional space is determined according to the depth information and the movement track of the target object in the video data. Then, a driving instruction may be determined according to a spatial movement trajectory of the target object in the three-dimensional space and a spatial movement trajectory of the vehicle in the three-dimensional space, specifically, as an optional embodiment, the driving instruction obtaining and processing module specifically includes: determining a moving path of the vehicle according to the space moving track of the target object; and determining a corresponding driving instruction according to the moving path so as to control the vehicle to run. According to the embodiment, the behaviors of overtaking, following and the like of the vehicle can be planned according to the space moving track of each target object, so that a corresponding moving path is determined, and a corresponding driving instruction is determined, wherein the driving instruction can comprise instructions of left turning, right turning, acceleration, deceleration, braking and the like, and the vehicle is controlled.

In the embodiment of the application, a frame image of the driving video data can be obtained, and a feature vector and a frame image combination of a target object in a detection frame of the frame image are determined; then, screening out related objects corresponding to the target objects in the target frame images according to the characteristic vectors of the target objects in the frame image combination, and extracting corresponding characteristic vectors to form nodes; then, according to the feature similarity and the space similarity between the nodes, connecting edges are added between the nodes to establish data association between the nodes to form a corresponding feature graph. And determining a classification result corresponding to the target object according to the characteristic diagram, and obtaining a moving track of the target object in the video data. Then, according to the moving track, corresponding driving instructions can be determined, such as driving instructions of deceleration, acceleration, turning, braking and the like can be determined, so as to control the running of the vehicle.

On the basis of the foregoing embodiment, the present embodiment further provides a data processing apparatus, and with reference to fig. 12, the data processing apparatus may specifically include the following modules:

an interactive page providing module 1202, configured to provide an interactive page, so as to obtain video data to be processed based on the interactive page.

The video data processing module 1204 is configured to obtain a frame image of the video data, and determine a feature vector of a target object in a detection frame of the frame image and a combination of the frame image.

The feature vector processing module 1206 is configured to determine a feature map according to similarities between target objects in the frame image combination, where the similarities include feature similarities and/or spatial similarities, and the feature map is used to associate the same target object in different frame images.

And the analysis result issuing module 1208 is configured to determine a classification result of each target object according to the feature map of each target object, determine a moving track of the target object in the video data as an analysis result, and issue the analysis result.

In the embodiment of the application, the processing terminal can provide an interactive page for the terminal, the interactive page comprises a data uploading control, and a user of the terminal can upload video data to be processed to the processing terminal by triggering the data uploading control. The processing end can acquire a frame image of the video data and determine a feature vector of a target object in a detection frame of the frame image and a frame image combination; then, according to the feature vectors of the target objects in the frame image combination, determining feature similarity between the target objects in the frame image combination; determining the spatial similarity between the target objects according to the detection frame; then, according to the feature similarity and the space similarity between the target objects, association can be established for the same target object in different frame images to form corresponding feature maps. And then according to the characteristic diagram, determining a classification result corresponding to the target object, obtaining a moving track of the target object in the video data as an analysis result, and issuing the analysis result to display the analysis result in the interactive page. According to the embodiment of the application, the processing terminal can be used for providing the service for identifying the track of the target for the terminal, and the user can conveniently identify the track of the target in the video data.

The present application further provides a non-transitory, readable storage medium, where one or more modules (programs) are stored, and when the one or more modules are applied to a device, the device may execute instructions (instructions) of method steps in this application.

Embodiments of the present application provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an electronic device to perform the methods as described in one or more of the above embodiments. In the embodiment of the application, the electronic device includes a server, a terminal device and other devices.

Embodiments of the present disclosure may be implemented as an apparatus, which may comprise a server (cluster), a terminal, etc., electronic device, using any suitable hardware, firmware, software, or any combination thereof, in a desired configuration. Fig. 13 schematically illustrates an example apparatus 1300 that can be used to implement various embodiments described herein.

For one embodiment, fig. 13 illustrates an example apparatus 1300 having one or more processors 1302, a control module (chipset) 1304 coupled to at least one of the processor(s) 1302, memory 1306 coupled to the control module 1304, non-volatile memory (NVM)/storage 1308 coupled to the control module 1304, one or more input/output devices 1310 coupled to the control module 1304, and a network interface 1312 coupled to the control module 1304.

Processor 1302 may include one or more single-core or multi-core processors, and processor 1302 may include any combination of general-purpose or special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In some embodiments, the apparatus 1300 can serve as a server, a terminal, or the like in this embodiment.

In some embodiments, apparatus 1300 may include one or more computer-readable media (e.g., memory 1306 or NVM/storage 1308) having instructions 1314 and one or more processors 1302, which in combination with the one or more computer-readable media, are configured to execute instructions 1314 to implement modules to perform actions described in this disclosure.

For one embodiment, control module 1304 may include any suitable interface controllers to provide any suitable interface to at least one of the processor(s) 1302 and/or any suitable device or component in communication with control module 1304.

The control module 1304 may include a memory controller module to provide an interface to the memory 1306. The memory controller module may be a hardware module, a software module, and/or a firmware module.

Memory 1306 may be used, for example, to load and store data and/or instructions 1314 for device 1300. For one embodiment, memory 1306 may comprise any suitable volatile memory, such as suitable DRAM. In some embodiments, the memory 1306 may comprise a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, control module 1304 may include one or more input/output controllers to provide an interface to NVM/storage 1308 and input/output device(s) 1310.

For example, NVM/storage 1308 may be used to store data and/or instructions 1314. NVM/storage 1308 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

The NVM/storage 1308 may include storage resources that are part of a device on which the apparatus 1300 is installed or it may be accessible by the device and may not be necessary as part of the device. For example, the NVM/storage 1308 may be accessible over a network via input/output device(s) 1310.

Input/output device(s) 1310 may provide an interface for apparatus 1300 to communicate with any other suitable device, input/output device(s) 1310 may include communication components, audio components, sensor components, and so forth. The network interface 1312 may provide an interface for the device 1300 to communicate over one or more networks, and the device 1300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols, such as access to a communication standard-based wireless network, e.g., WiFi, 2G, 3G, 4G, 5G, etc., or a combination thereof.

For one embodiment, at least one of the processor(s) 1302 may be packaged together with logic for one or more controllers (e.g., memory controller modules) of the control module 1304. For one embodiment, at least one of the processor(s) 1302 may be packaged together with logic for one or more controllers of the control module 1304 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 1302 may be integrated on the same die with logic for one or more controller(s) of the control module 1304. For one embodiment, at least one of the processor(s) 1302 may be integrated on the same die with logic of one or more controllers of the control module 1304 to form a system on chip (SoC).

In various embodiments, apparatus 1300 may be, but is not limited to being: a server, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a handheld computing device, a tablet, a netbook, etc.), among other terminal devices. In various embodiments, apparatus 1300 may have more or fewer components and/or different architectures. For example, in some embodiments, device 1300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

The detection device can adopt a main control chip as a processor or a control module, sensor data, position information and the like are stored in a memory or an NVM/storage device, a sensor group can be used as an input/output device, and a communication interface can comprise a network interface.

An embodiment of the present application further provides an electronic device, including: a processor; and a memory having executable code stored thereon that, when executed, causes the processor to perform a method as described in one or more of the embodiments of the application.

Embodiments of the present application also provide one or more machine-readable media having executable code stored thereon that, when executed, causes a processor to perform a method as described in one or more of the embodiments of the present application.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or terminal equipment comprising the element.

The foregoing detailed description has been made of a data processing method, a data processing apparatus, an electronic device, and a storage medium, and specific examples are applied herein to explain the principles and embodiments of the present application, where the descriptions of the foregoing examples are only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of data processing, the method comprising:

acquiring a frame image of video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image;

determining a feature map according to the similarity between target objects in the frame image combination, wherein the similarity comprises feature similarity and/or spatial similarity;

and determining the classification result of each target object according to the characteristic diagram of each target object so as to determine the movement track of the target object in the video data.

2. The method of claim 1, wherein determining the feature map according to the similarity between the target objects in the frame image combination comprises:

screening out related objects corresponding to the target object in the target frame image according to the feature vectors of the target object in the frame image combination, and extracting corresponding feature vectors to form nodes;

and adding connecting edges between the nodes according to the similarity between the nodes to form a characteristic diagram corresponding to the target object in the target frame image.

3. The method of claim 1, wherein the step of determining a combination of frame images comprises:

and acquiring continuous frame images of a target number as frame image combinations, and updating the frame images in the frame image combinations after determining the feature images corresponding to the frame image combinations.

4. The method of claim 3, wherein the obtaining a target number of consecutive frame images as a combination of frame images comprises:

acquiring continuous frame images of a target number;

and determining a graph group queue, and adding a target number of continuous frame images to the graph group queue to form a frame image combination.

5. The method of claim 4, wherein updating the frame images in the frame image combination comprises:

at least one frame image is deleted from the image group queue and a corresponding number of frame images are added to the image group queue to form an updated frame image combination.

6. The method of claim 1, wherein determining a feature vector of a target object within a detection frame of a frame image comprises:

determining a detection frame in the frame image, and extracting a detection image in the detection frame;

and extracting the characteristics of the target object in the detected image to obtain a characteristic vector.

7. The method according to claim 2, wherein the screening out relevant objects corresponding to the target object in the target frame image according to the feature vector of the target object in the frame image combination, and extracting corresponding feature vector to form a node comprises:

screening out a first object related to a target object of the target frame image and screening out a second object related to the first object according to the feature vector of the target object in the frame image combination;

determining the feature vectors of the first object and the second object to form a node.

8. The method according to claim 2, wherein the adding a connecting edge between nodes according to the similarity between nodes comprises:

determining feature similarity among the nodes according to the feature vectors of the nodes, and determining adjacent nodes of the feature vectors according to the feature similarity so as to add a first connecting edge;

and determining the spatial similarity among the nodes according to the coincidence degree among the detection frames of the nodes, and determining the nodes meeting the coincidence condition so as to add a second connecting edge.

9. The method of claim 1, wherein determining the classification result of each target object according to the feature map of each target object comprises:

and inputting the feature graph into a feature processing model, and determining a classification result of the target object, wherein the feature processing model is used for determining node features of the nodes and edge features of the connecting edges, and determining the classification result of the target object.

10. The method of claim 1, wherein determining the movement trajectory of the target object in the video data comprises:

and connecting the target objects in the frame images according to the classification result of the target objects in the frame images to obtain the moving track of the target objects in the video data.

11. A method of data processing, the method comprising:

acquiring a frame image of road video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image;

determining a classification result of each target object according to the feature map of each target object so as to determine a moving track of the target object in the road video data;

and determining the behavior type of the target object according to the movement track so as to determine a corresponding behavior analysis result.

12. A method of data processing, the method comprising:

acquiring a frame image of live video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image;

determining a classification result of each target object according to the feature map of each target object so as to determine a moving track of the target object in the live video data;

and adding virtual information to the target object in the live video data according to the moving track.

13. A method of data processing, the method comprising:

acquiring a frame image of driving video data, and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image;

determining the classification result of each target object according to the feature map of each target object so as to determine the moving track of the target object in the driving video data;

and determining a driving instruction according to the movement track so as to control the vehicle to run.

14. The method of claim 13, wherein determining driving instructions from the movement trajectory comprises:

acquiring depth information of a target object in each frame of image;

determining a spatial movement track of the target object according to the depth information and the movement track of the target object;

and determining a driving instruction according to the space movement track so as to control the vehicle to run.

15. The method of claim 14, wherein determining driving instructions from the spatial movement trajectory comprises:

determining a moving path of the vehicle according to the space moving track of the target object;

and determining a corresponding driving instruction according to the moving path so as to control the vehicle to run.

16. A method of data processing, the method comprising:

providing an interactive page to acquire video data to be processed based on the interactive page;

and determining the classification result of each target object according to the characteristic diagram of each target object to determine the moving track of the target object in the video data as an analysis result, and issuing the analysis result.

17. A data processing apparatus, characterized in that said apparatus comprises:

the video data acquisition module is used for acquiring a frame image of the video data and determining a feature vector and a frame image combination of a target object in a detection frame of the frame image;

the characteristic image acquisition module is used for determining a characteristic image according to the similarity between target objects in the frame image combination, wherein the similarity comprises characteristic similarity and/or space similarity;

and the moving track acquisition module is used for determining the classification result of each target object according to the characteristic diagram of each target object so as to determine the moving track of the target object in the video data.

18. An electronic device, comprising: a processor; and

memory having stored thereon executable code which, when executed, causes the processor to perform the method of one or more of claims 1-16.

19. One or more machine-readable media having executable code stored thereon that, when executed, causes a processor to perform the method of one or more of claims 1-16.