CN113887284A

CN113887284A - Target object speed detection method, device, equipment and readable storage medium

Info

Publication number: CN113887284A
Application number: CN202111012847.8A
Authority: CN
Inventors: 安建平; 王向韬; 牟晓凡; 郝雨萌; 程新景
Original assignee: International Network Technology Shanghai Co Ltd
Current assignee: International Network Technology Shanghai Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2022-01-04

Abstract

The invention provides a method, a device, equipment and a readable storage medium for detecting the speed of a target object, which relate to the technical field of vision, and the method comprises the following steps: collecting video data to be detected, and extracting a frame image sequence containing the target object to be detected from the video data to be identified; the method comprises the steps of inputting a feature map corresponding to a first frame image sequence after alignment, the first frame image sequence and a second frame image sequence into a speed identification detection model, and obtaining the identification speed of the target object to be detected output by the speed identification detection model, wherein the identification speed corresponds to the second frame image sequence.

Description

Target object speed detection method, device, equipment and readable storage medium

Technical Field

The invention relates to the technical field of vision, in particular to a method, a device and equipment for detecting the speed of a target object and a readable storage medium.

Background

Along with the rapid increase of the number of automobiles, traffic accidents are frequently occurred in the driving process of vehicles, the speed problem becomes one of the main inducement factors of the traffic accidents in many traffic accidents, and along with the development of intelligent traffic and automatic driving technologies, the speed of vehicles close to the vehicles except the vehicles driven by drivers needs to be detected by more accurate means, the analysis of the speed and the driving conditions of the self-driven vehicles is realized, and the reasonable and accurate planning of the running paths of the vehicles is carried out in the subsequent stage. Therefore, it is very necessary to detect the operating speed of another vehicle accurately in time for the safety of the vehicle during operation.

With the development of computer vision technology, the computer vision technology has received great attention and is widely applied to the fields of automatic driving, area monitoring, image navigation guidance, city safety, terrain matching and the like, and the stereoscopic vision technology is an important component of computer vision and has the advantages of high measurement precision, low use cost, non-contact measurement and the like. The detection and tracking of the target object and the positioning and speed estimation of the target object can be realized by utilizing a stereoscopic vision technology, for example, in the application based on video structuring, the target object can obtain a unique identifier and a motion track corresponding to the unique identifier after passing through a tracking algorithm, and a plurality of subsequent works can be performed by utilizing the two data: speed detection (traffic application scene), counting (traffic application scene, security application scene), and behavior detection (traffic application scene, security application scene). At present, the detection of an object such as a vehicle by a video monitoring system can only obtain two-dimensional information of the object on a camera projection plane and the size of the object on the projection plane, and the detected information is actually the two-dimensional information of the object on the camera projection plane, but not the object itself in a three-dimensional space. Because the actual size of the vehicle cannot be determined by only depending on the two-dimensional information in the camera picture, the real-time speed detection cannot be realized. The existing common technical means is to acquire information of a target object in a two-dimensional space, convert the acquired two-dimensional information into information of the target object in a three-dimensional space, and combine a filtering algorithm to improve the problems of inaccurate tracking and easy loss in the original tracking algorithm.

In order to detect the operating speed of the target object in time, a new target object speed detection method needs to be designed.

Disclosure of Invention

The invention provides a method, a device and equipment for detecting the speed of a target object and a readable storage medium, which are used for solving the defects of long pipeline and low precision caused by the fact that the information of the target object in a three-dimensional space cannot be directly obtained in the prior art, optimizing the operation flow of the whole pipeline, improving the speed detection precision and continuity and simplifying the post-processing flow of the speed detection.

The invention provides a target object speed detection method, which comprises the following steps:

collecting video data to be detected, and extracting a frame image sequence containing the target object to be detected from the video data to be identified;

inputting the feature map corresponding to the aligned first frame image sequence, the first frame image sequence and the second frame image sequence into a speed identification detection model to obtain the identification speed of the target object to be detected output by the speed identification detection model, which corresponds to the second frame image sequence;

the speed recognition detection model is obtained by training based on a sample frame image sequence extracted from sample video data, the first frame image sequence and the second frame image sequence both belong to the frame image sequence, the first frame image sequence is adjacent to the second frame image sequence and is positioned in front of the second frame image sequence, the feature map is obtained by inputting the first frame image sequence into the feature extraction model and outputting, the feature map is aligned based on a first offset, and the first offset is obtained by inputting the feature map into the behavior recognition model and outputting.

According to the speed detection method of the target object provided by the invention, the feature map corresponding to the aligned first frame image sequence, the first frame image sequence and the second frame image sequence are input into a speed identification detection model, so as to obtain the identification speed of the target object to be detected output by the speed identification detection model and corresponding to the second frame image sequence, and the method specifically comprises the following steps:

extracting the first frame image sequence and the second frame image sequence from the frame image sequence;

inputting the first frame image sequence into the feature extraction model to obtain the feature maps corresponding to each image in the first frame image sequence output by the feature extraction model, and taking one of the feature maps as a reference feature map; the feature extraction model is obtained by training based on a sample frame image sequence extracted from sample video data;

inputting the feature map corresponding to the first frame image sequence into the behavior recognition model based on the reference feature map to obtain the first offset output by the behavior recognition model; the behavior recognition model is obtained by training based on a sample feature map and a reference sample feature map;

on the basis of the first offset, the feature maps corresponding to the first frame image sequence are aligned to the reference feature map;

and fusing the aligned feature map, the first frame image sequence and the second frame image sequence, and inputting the fused feature map, the first frame image sequence and the second frame image sequence into the speed recognition detection model to obtain the recognition speed output by the speed recognition detection model.

According to the target object speed detection method provided by the invention, the feature extraction model is obtained by training through the following steps:

acquiring the sample video data, and extracting a third frame image sequence from the sample video data;

and taking the third frame image sequence as input data used for training, and training in a deep learning mode to obtain the feature extraction model of the feature map for generating the frame image sequence to be predicted.

According to the speed detection method of the target object, the speed identification detection model is obtained by training through the following steps:

extracting the sample frame image sequence from the sample video data, and extracting the third frame image sequence and the fourth frame image sequence from the sample frame image sequence; wherein the third frame image sequence and the fourth frame image sequence both belong to the sample frame image sequence, and the third frame image sequence is adjacent to and before the fourth frame image sequence;

inputting the third frame image sequence into the feature extraction model to obtain the sample feature maps corresponding to each image in the third frame image sequence output by the feature extraction model, and taking one of the sample feature maps as the reference sample feature map;

inputting the sample feature map corresponding to the third frame image sequence into the behavior recognition model based on the reference sample feature map to obtain a second offset output by the behavior recognition model;

on the basis of the second offset, the sample feature maps corresponding to the third frame image sequence are aligned to the reference sample feature map;

and fusing the aligned sample characteristic diagram, the third frame image sequence and the fourth frame image sequence to obtain input data used for training, and training in a deep learning mode to obtain the speed recognition detection model for generating the recognition speed of the frame image sequence to be predicted.

According to the target object speed detection method provided by the invention, the feature extraction model and the speed identification detection model are the same physical model.

According to the target object speed detection method provided by the invention, the behavior recognition model is obtained by training through the following steps:

and taking the reference sample feature map as a reference for alignment, taking the sample feature map corresponding to the third frame image sequence as input data used for training, and training in a deep learning manner to obtain the behavior recognition model for generating the offset of the feature map to be aligned.

The invention also provides a target object speed detection method, which comprises the following steps:

the first acquisition module is used for acquiring video data to be detected and extracting a frame image sequence containing the target object to be detected from the video data to be identified;

the speed detection module is used for inputting the feature map corresponding to the aligned first frame image sequence, the first frame image sequence and the second frame image sequence into a speed identification detection model to obtain the identification speed of the target object to be detected output by the speed identification detection model, wherein the identification speed corresponds to the second frame image sequence;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the target object speed detection method.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for object velocity detection as described in any one of the above.

The present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method for detecting a velocity of an object as described in any one of the above.

The invention provides a target object speed detection method, a device and equipment and a readable storage medium.A continuous multi-frame image data contained in the collected original video data is used as a second frame image sequence to be predicted, and a continuous multi-frame image data which is adjacent and positioned before the second frame image sequence and is also contained in the original video data is used as a first frame image sequence; the method comprises the steps of inputting a first frame image sequence into a feature extraction model obtained by training in the method, further extracting a corresponding feature map, aligning the feature maps, fusing the feature maps with the first frame image sequence and a second frame image sequence to be predicted, inputting the fused image sequences into a speed recognition detection model, directly returning to the recognition speed of a target object to be detected, optimizing the operation flow of the whole pipeline based on an end-to-end deep learning mode, improving the speed detection precision and continuity, simplifying the post-processing flow of the speed detection, and enabling the recognition speed detection of the target object to achieve better performance.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for detecting a velocity of a target object according to the present invention;

fig. 2 is a schematic flow chart of the step S200 in the method for detecting a speed of a target object according to the present invention;

FIG. 3 is a schematic structural diagram of a target speed detection device provided by the present invention;

FIG. 4 is a schematic structural diagram of a speed detection module in the speed detection apparatus for an object according to the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The object speed detection method of the present invention will be described below with reference to fig. 1, for example, as applied to speed detection of a vehicle, and the method includes the steps of:

s100, collecting video data to be detected, and extracting a frame image sequence containing a target object to be detected from the video data to be identified, wherein in the embodiment, the target object is a vehicle, and more specifically, other vehicles except an automatic driving vehicle using the method are used. It can be understood that the video data to be detected collected by the vehicle-mounted camera or the like includes multiple frames of image data, each frame of image data may include a vehicle or no vehicle, and when the image data includes a vehicle, all the vehicles included therein are used as the target object to be detected, and the speed of the vehicle as the target object is detected respectively.

S200, inputting the feature map corresponding to the aligned first frame image sequence, the first frame image sequence and the second frame image sequence into a speed recognition detection model, obtaining a recognition speed of the target object to be detected output by the speed recognition detection model, which is the recognition speed of the vehicle to be detected in the time frame corresponding to the second frame image sequence, so that the recognition speed is the real-time speed of other vehicles, specifically the relative moving speed of other vehicles in a three-dimensional space relative to the automatically driven vehicle using the method, the recognition speed can be used as a parameter for adjusting the automatic driving strategy of the automatically driven vehicle using the method later, and serves as control and Prediction (Prediction) of other aspects in the automatic driving, specifically, through some post-processing, determining which vehicles are important targets by combining the position, distance, speed and other information of the vehicles, and finally outputting parameters of the key targets, and controlling and predicting the automatic driving process according to the parameters of the key targets.

The speed identification detection model is obtained by training based on a sample frame image sequence extracted from sample video data; the first frame image sequence and the second frame image sequence both belong to frame image sequences, and the first frame image sequence is adjacent to the second frame image sequence and is positioned before the second frame image sequence, namely, the first frame image sequence is image data of continuous frames (at least two frames) and is a part of the frame image sequence, and the second frame image sequence is also image data of continuous frames (at least two frames) and is a part of the frame image sequence; the feature map is obtained by inputting the first frame image sequence into the feature extraction model and outputting the first frame image sequence, the feature map is aligned based on a first offset, and the first offset is obtained by inputting the feature map into the behavior recognition model and outputting the feature map.

In the method, a speed identification detection model and a feature extraction model are both Convolutional Neural Networks (CNN) models, the CNN is essentially a Multilayer Perceptron (MLP), and the CNN adopts a way of local connection and weight sharing, so that the network is easy to optimize due to the reduced number of weights on one hand, and the risk of overfitting is reduced on the other hand. The CNN is one of the neural networks, and the weight sharing network structure of the CNN is more similar to a biological neural network, so that the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input data of a CNN model, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. At present, the CNN model has a plurality of advantages in two-dimensional image processing, for example, a network can automatically extract image characteristics including color, texture, shape and a topological structure of an image; the method has good robustness, operation efficiency and the like in the aspect of processing two-dimensional image problems, particularly in the aspect of identifying displacement, scaling and other forms of distortion invariance.

CNNs themselves may take the form of a combination of different neurons and learning rules, with some advantages not found in conventional techniques: the method has good fault-tolerant capability, parallel processing capability and self-learning capability, can solve the problems of complex environmental information, unclear background knowledge and uncertain inference rule, allows the sample to have larger defect and distortion, and has high operation speed, good self-adaption performance and higher resolution. The CNN model fuses the feature extraction function into the MLP through structural recombination and weight reduction, and omits a complicated image feature extraction process before identification. The CNN model is composed of an input layer, an output layer, and a plurality of hidden layers, which can be classified into a convolutional layer (convolutional layer), a Pooling layer (Pooling layer), a RELU layer, and a Fully connected layer (Fully connected layer), wherein the convolutional layer is the core of the CNN model, and the parameters of the convolutional layer are composed of a set of learnable filters (filters) or kernels (kernels), which have a small field of view and extend to the entire depth of the input volume. During feed forward, each filter convolves the input data, typically a two-dimensional vector, but possibly height, i.e., the convolution layer is used to convolve the input layer, extracting higher-level features, calculating the dot product between the filter and the input data, and generating a two-dimensional activation map of the filter.

In this method, a feature map (feature map) is a feature that is output (extracted) after a certain layer of the convolutional layer of the extracted feature extraction model is filtered, and in this embodiment, specifically, is a shared feature map output by the last layer of the convolutional layer of the extracted feature extraction model.

In this method, the speed recognition detection model and the feature extraction model may be the same physical model.

The CNN model adopts a deep learning mode, in the training process of the deep learning mode, a prediction result can be obtained from an input end to an output end, only one neural network is arranged in the middle, an error can be obtained by comparing with a real result, the error can be transmitted in each layer in the CNN model, each layer can be adjusted according to the error until the CNN model converges or reaches an expected effect, and the end-to-end deep learning mode is adopted, so that data labeling before each independent learning task is executed can be omitted, the precision of an output result is improved, and the learning cost is saved.

The target object speed detection method comprises the steps of taking continuous multi-frame image data contained in collected original video data as a second frame image sequence to be predicted, and taking continuous multi-frame image data which is adjacent to and positioned before the second frame image sequence and is also contained in the original video data as a first frame image sequence; the method comprises the steps of inputting a first frame image sequence into a feature extraction model obtained by training in the method, further extracting a corresponding feature map, aligning the feature maps, fusing the feature maps with the first frame image sequence and a second frame image sequence to be predicted, inputting the fused image sequences into a speed recognition detection model, directly returning to the recognition speed of a target object to be detected, optimizing the operation flow of the whole pipeline based on an end-to-end deep learning mode, improving the speed detection precision and continuity, simplifying the post-processing flow of the speed detection, and enabling the recognition speed detection of the target object to achieve better performance.

The method comprises the steps of estimating the motion speed of a target object in a mode of combining target detection and target tracking in a traditional speed detection method, wherein the obtained identification speed is the relative motion speed of the target object in a three-dimensional space relative to an automatic driving vehicle using the method, and the motion speed of the automatic driving vehicle using the method is combined with the motion speed of the automatic driving vehicle, such as the vehicle speed (obtained through a vehicle-mounted sensor), so that the real speed of the target object in the three-dimensional space can be obtained, and the motion speed of the target object is obviously different from the motion speed of the target object obtained in the traditional method, namely the pixel speed of a two-dimensional central point of the target object in a two-dimensional image.

The target speed detection method of the present invention is described below with reference to fig. 2, and step S200 specifically includes the following steps:

s210, extracting a first frame image sequence and a second frame image sequence from the frame image sequence.

S220, inputting the first frame image sequence into the feature extraction model to obtain feature maps corresponding to each image in the first frame image sequence output by the feature extraction model, and taking one of the feature maps as a reference feature map.

In the method, a feature extraction model is obtained by training based on a sample frame image sequence extracted from sample video data. In this embodiment, the feature extraction model adopts a supervised learning mode in the deep learning process, that is, the sample feature map is used as a label for supervised learning.

And S230, inputting the feature map corresponding to the first frame image sequence into the behavior recognition model based on the reference feature map to obtain a first offset output by the behavior recognition model.

And S240, based on the first offset, aligning all the feature maps corresponding to the first frame image sequence to the reference feature map.

In the method, a behavior recognition model is obtained based on a sample feature map and a reference sample feature map, the behavior recognition model is used for learning an offset map (offset map) to align feature maps of a plurality of continuous frames, specifically, input data of the behavior recognition model is feature maps of the plurality of continuous frames, namely feature maps corresponding to each image in a first frame image sequence, output data is an offset required by other feature maps to move to the reference feature map, namely a first offset, and the feature maps corresponding to the first frame image sequence can be aligned to the reference feature map through the first offset.

And S250, fusing the aligned feature map, the first frame image sequence and the second frame image sequence, and inputting the fused feature map, the first frame image sequence and the second frame image sequence into the speed recognition detection model to obtain the recognition speed of direct regression of the speed recognition detection model. In this embodiment, the speed identification detection model adopts a supervised learning mode in the deep learning process, that is, the identification speed corresponding to the fused data is used as a tag for supervised learning.

The feature extraction model in the method is obtained by training the following steps:

a100, collecting sample video data, and extracting a third frame image sequence from the sample video data;

and A200, taking the third frame image sequence as input data used for training, and training in a deep learning mode to obtain a feature extraction model for generating a feature map of the frame image sequence to be predicted.

In the method, a speed identification detection model is obtained by training the following steps:

and A300, extracting a sample frame image sequence from the sample video data, and extracting a third frame image sequence and a fourth frame image sequence from the sample frame image sequence. The third frame image sequence and the fourth frame image sequence both belong to the sample frame image sequence, and the third frame image sequence is adjacent to and before the fourth frame image sequence.

And A400, inputting the third frame image sequence into the feature extraction model to obtain a sample feature map corresponding to each image in the third frame image sequence output by the feature extraction model, and taking one sample feature map as a reference sample feature map.

A500, inputting a sample feature map corresponding to the third frame image sequence into the behavior recognition model based on the reference sample feature map to obtain a second offset output by the behavior recognition model;

a600, based on a second offset, aligning the sample feature maps corresponding to the third frame image sequence to a reference sample feature map;

and A700, fusing the aligned sample feature map, the third frame image sequence and the fourth frame image sequence to obtain input data used for training, and training in a deep learning mode to obtain a speed recognition detection model for generating the recognition speed of the frame image sequence to be predicted. In step a700, in the deep learning process, the recognition speed corresponding to the fused data may be used as a tag for supervised learning.

The behavior recognition model in the method is obtained by training through the following steps:

and A800, taking the reference sample feature map as a reference for alignment, taking the sample feature map corresponding to the third frame image sequence as input data used for training, and training in a deep learning mode to obtain a behavior recognition model for generating the offset of the feature map to be aligned.

The following describes the object speed detection device provided by the present invention, and the object speed detection device described below and the object speed detection method described above may be referred to in correspondence with each other.

The object speed detection apparatus of the present invention will be described below with reference to fig. 3, for example, as applied to speed detection of a vehicle, the apparatus including:

the first acquisition module 100 is configured to acquire video data to be detected, and extract a frame image sequence including a target object to be detected from the video data to be identified, where in this embodiment, the target object is a vehicle, and more specifically, another vehicle except an autonomous driving vehicle using the method. It can be understood that the video data to be detected collected by the vehicle-mounted camera or the like includes multiple frames of image data, each frame of image data may include a vehicle or no vehicle, and when the image data includes a vehicle, all the vehicles included therein are used as the target object to be detected, and the speed of the vehicle as the target object is detected respectively.

The speed detection module 200 is configured to input the feature map corresponding to the aligned first frame image sequence, the first frame image sequence, and the second frame image sequence into the speed recognition detection model, to obtain a recognition speed of the target object to be detected output by the speed recognition detection model, which corresponds to the second frame image sequence, that is, a recognition speed of the vehicle to be detected in a time frame corresponding to the second frame image sequence, where the recognition speed is a real-time speed of the vehicle, and may be used as a parameter for adjusting an automatic driving policy of the vehicle later, and serve for control and Prediction (Prediction) in other aspects of automatic driving. Specifically, after some post-processing is performed, information such as the position, distance and speed of the vehicle is combined to determine which vehicles are key targets and which are non-key targets, and finally parameters of the key targets are output, and the automatic driving process is controlled and predicted according to the parameters of the key targets.

In the device, a speed identification detection model and a feature extraction model are both CNN models, CNN is essentially an MLP, and CNN adopts a way of local connection and weight sharing, so that on one hand, the number of the reduced weights enables the network to be easily optimized, and on the other hand, the risk of overfitting is reduced. The CNN is one of the neural networks, and the weight sharing network structure of the CNN is more similar to a biological neural network, so that the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input data of a CNN model, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. At present, the CNN model has a plurality of advantages in two-dimensional image processing, for example, a network can automatically extract image characteristics including color, texture, shape and a topological structure of an image; the method has good robustness, operation efficiency and the like in the aspect of processing two-dimensional image problems, particularly in the aspect of identifying displacement, scaling and other forms of distortion invariance.

CNNs themselves may take the form of a combination of different neurons and learning rules, with some advantages not found in conventional techniques: the method has good fault-tolerant capability, parallel processing capability and self-learning capability, can solve the problems of complex environmental information, unclear background knowledge and uncertain inference rule, allows the sample to have larger defect and distortion, and has high operation speed, good self-adaption performance and higher resolution. The CNN model fuses the feature extraction function into the MLP through structural recombination and weight reduction, and omits a complicated image feature extraction process before identification. The CNN model consists of an input layer, an output layer, and a plurality of hidden layers, which can be divided into convolutional layers, pooling layers, RELU layers, and fully-connected layers, where the convolutional layers are the core of the CNN model, and the parameters of the convolutional layers consist of a set of learnable filters or kernels, which have a small field of view, extending to the entire depth of the input volume. During feed forward, each filter convolves the input data, typically a two-dimensional vector, but possibly height, i.e., the convolution layer is used to convolve the input layer, extracting higher-level features, calculating the dot product between the filter and the input data, and generating a two-dimensional activation map of the filter.

In this apparatus, the feature map is a feature extracted from one of the convolution layers of the feature extraction model and output (extracted), and in this embodiment, is specifically a shared feature map output from the last convolution layer of the feature extraction model.

In the apparatus, the speed recognition detection model and the feature extraction model may be the same physical model.

The CNN model adopts a deep learning mode, in the training process of the deep learning mode, a prediction result can be obtained from an input end to an output end, only one neural network is arranged in the middle, an error can be obtained by comparing the prediction result with a real result, the error can be transmitted in each layer in the CNN model, each layer can be adjusted according to the error until the CNN model converges or reaches an expected effect, the end-to-end learning mode is adopted, data labeling before each independent learning task is executed can be omitted, the precision of an output result is improved, and the learning cost is saved.

The target object speed detection device takes continuous multi-frame image data contained in the acquired original video data as a second frame image sequence to be predicted, and takes continuous multi-frame image data which is adjacent to and positioned before the second frame image sequence and is also contained in the original video data as a first frame image sequence; the method comprises the steps of inputting a first frame image sequence into a feature extraction model obtained by training of the device, further extracting a corresponding feature map, aligning the feature maps, fusing the feature maps with the first frame image sequence and a second frame image sequence to be predicted, inputting the fused image sequences into a speed recognition detection model, directly returning to the recognition speed of a target object to be detected, optimizing the operation flow of the whole pipeline based on an end-to-end deep learning mode, improving the speed detection precision and continuity, simplifying the post-processing flow of the speed detection, and enabling the recognition speed detection of the target object to achieve better performance.

In the following, the target speed detection apparatus of the present invention is described with reference to fig. 4, and the speed detection module 200 specifically includes:

an extracting unit 210 is configured to extract a first frame image sequence and a second frame image sequence from the frame image sequence.

The first input unit 220 is configured to input the first frame image sequence into the feature extraction model, obtain feature maps corresponding to each image in the first frame image sequence output by the feature extraction model, and use one of the feature maps as a reference feature map.

In the device, a feature extraction model is obtained by training based on a sample frame image sequence extracted from sample video data. In this embodiment, the feature extraction model adopts a supervised learning mode in the deep learning process, that is, the sample feature map is used as a label for supervised learning.

The second input unit 230 is configured to input the feature map corresponding to the first frame image sequence into the behavior recognition model based on the reference feature map, so as to obtain a first offset output by the behavior recognition model.

And an aligning unit 240, configured to align all feature maps corresponding to the first frame image sequence to the reference feature map based on the first offset.

In the device, a behavior recognition model is obtained based on a sample feature map and a reference sample feature map, the behavior recognition model is used for learning an offset map (offset map) to align feature maps of a plurality of continuous frames, specifically, input data of the behavior recognition model is feature maps of the plurality of continuous frames, namely feature maps corresponding to each image in a first frame image sequence, output data is an offset required by other feature maps to move to the reference feature map, namely a first offset, and the feature maps corresponding to the first frame image sequence can be aligned to the reference feature map through the first offset.

And a third input unit 250, configured to fuse the aligned feature map, the first frame image sequence, and the second frame image sequence, and input the fused feature map, the first frame image sequence, and the second frame image sequence into the speed recognition detection model, so as to obtain a recognition speed of direct regression of the speed recognition detection model. In this embodiment, the speed identification detection model adopts a supervised learning mode in the deep learning process, that is, the identification speed corresponding to the fused data is used as a tag for supervised learning.

The feature extraction model in the device is obtained by training the following modules:

the second acquisition module a100 is configured to acquire sample video data and extract a third frame image sequence from the sample video data;

and the first input module 200 is configured to train the third frame image sequence as input data after training in a deep learning manner to obtain a feature extraction model for generating a feature map of the frame image sequence to be predicted.

The speed identification detection model in the device is obtained by training the following modules:

an extracting module a300, configured to extract a sample frame image sequence from the sample video data, and extract a third frame image sequence and a fourth frame image sequence from the sample frame image sequence. The third frame image sequence and the fourth frame image sequence both belong to the sample frame image sequence, and the third frame image sequence is adjacent to and before the fourth frame image sequence.

And a second input module a400, configured to input the third frame of image sequence into the feature extraction model, to obtain sample feature maps corresponding to each image in the third frame of image sequence output by the feature extraction model, and to use one of the sample feature maps as a reference sample feature map.

A third input module a500, configured to input, based on the reference sample feature map, the sample feature map corresponding to the third frame image sequence into the behavior recognition model, so as to obtain a second offset output by the behavior recognition model;

an alignment module a600, configured to align all the sample feature maps corresponding to the third frame image sequence to the reference sample feature map based on the second offset;

and a fourth input module a700, configured to fuse the aligned sample feature map, the third frame image sequence, and the fourth frame image sequence to serve as input data for training, and train in a deep learning manner to obtain a speed recognition detection model for generating a recognition speed of the frame image sequence to be predicted. In the fourth input module a700, in the deep learning process, the recognition speed corresponding to the fused data may be used as a tag for supervised learning.

The behavior recognition model in the device is obtained by training through the following steps:

and a fifth input module a800, configured to take the reference sample feature map as a reference for alignment, take the sample feature map corresponding to the third frame image sequence as input data used for training, and train in a deep learning manner to obtain a behavior recognition model for generating an offset of the feature map to be aligned.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform a target object velocity detection method comprising the steps of:

s100, collecting video data to be detected, and extracting a frame image sequence containing the target object to be detected from the video data to be identified;

s200, inputting a feature map corresponding to the aligned first frame image sequence, the first frame image sequence and the second frame image sequence into a speed recognition detection model to obtain a recognition speed of the target object to be detected output by the speed recognition detection model, wherein the recognition speed corresponds to the second frame image sequence;

In addition, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the object velocity detection method provided by the above methods, the method comprising the steps of:

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for detecting a velocity of an object provided by the above methods, the method comprising the steps of:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A target object speed detection method is characterized by comprising the following steps:

collecting video data to be detected, and extracting a frame image sequence containing a target object to be detected from the video data to be identified;

2. The method according to claim 1, wherein the step of inputting the feature map corresponding to the aligned first frame image sequence, the first frame image sequence, and the second frame image sequence into a speed recognition detection model to obtain the recognition speed of the target object to be detected output by the speed recognition detection model, which corresponds to the second frame image sequence, specifically comprises the following steps:

3. The method for detecting the speed of the target object according to claim 2, wherein the feature extraction model is obtained by training through the following steps:

4. The method for detecting the speed of the target object according to claim 3, wherein the speed recognition detection model is obtained by training through the following steps:

inputting the third frame image sequence into the feature extraction model to obtain sample feature maps corresponding to each image in the third frame image sequence output by the feature extraction model, and taking one of the sample feature maps as the reference sample feature map;

5. The object velocity detection method according to claim 4, wherein the feature extraction model and the velocity recognition detection model are the same physical model.

6. The method for detecting the speed of the target object according to claim 4, wherein the behavior recognition model is obtained by training through the following steps:

7. A method for detecting a velocity of a target, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the object velocity detection method according to any of claims 1 to 6.

9. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the object velocity detection method according to any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the object velocity detection method according to any one of claims 1 to 6.