CN113205546A

CN113205546A - Method, system, medium, and apparatus for obtaining a motion trajectory of a target vehicle

Info

Publication number: CN113205546A
Application number: CN202110485411.4A
Authority: CN
Inventors: 李亚东; 匡卫军; 赵科
Original assignee: Sichuan Yuncong Tianfu Artificial Intelligence Technology Co Ltd
Current assignee: Sichuan Yuncong Tianfu Artificial Intelligence Technology Co Ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-08-03

Abstract

The invention belongs to the technical field of identification, and particularly relates to a method, a system, a medium and equipment for obtaining a motion trail of a target vehicle. The invention aims to solve the problems that the existing identification method has large storage and calculation amount and the identification result is not accurate enough. For this purpose, the method for obtaining the motion trail of the target vehicle comprises the steps of obtaining a picture of the target vehicle; adding local feature output branches; inputting the target vehicle picture into the first picture recognition model to obtain a feature vector of the target vehicle picture; comparing the characteristic vectors in the query library with the characteristic vectors of the target vehicle to obtain all pictures meeting the requirements; and combining all the pictures which meet the requirements, the position information and the time information thereof to obtain the motion trail of the target vehicle. According to the invention, by adding the local feature output branch on the basis of the overall feature, the interested region can be identified, the identification effect is better, and the operation speed is faster.

Description

Method, system, medium, and apparatus for obtaining a motion trajectory of a target vehicle

Technical Field

The invention belongs to the technical field of identification, and particularly relates to a method, a system, a medium and equipment for obtaining a motion trail of a target vehicle.

Background

Cross-mirror tracking (ReID): the method comprises the steps of vectorizing a queried picture, comparing the vectorized picture with pictures in a database, finding out pictures belonging to the same ID, and obtaining a track of the same ID. The technology can be applied to the identification and tracking of people, and can also be applied to the tracking of other moving objects, such as vehicles and the like.

In application scenarios such as security and road inspection, it is necessary to find and associate images of vehicles shot by different cameras at different times and different places from a large amount of data, or to obtain a running track of the same vehicle.

However, in the practical application process, the data volume is often in a huge scale, and the data volume can reach millions or tens of millions or even more. At different times and locations, the appearance of the same vehicle may change due to environmental factors, etc. Meanwhile, due to the influence of factors such as a photographing apparatus, a photographing environment, a photographing parameter, a photographing technique, and the like, a vehicle may be visually changed in local characteristics. In addition, in an actual application scenario, a large amount of feature data storage also occupies a large storage space.

Further, in different scenes, the characteristics of different vehicles are different, and if the same characteristics are used for vehicle identification, the calculation amount is large, and the identification result is not accurate enough.

Accordingly, a new method for obtaining a motion trajectory of a target vehicle is needed in the art to solve the problems of large storage and calculation amount and inaccurate recognition result of the existing recognition method.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problems of large storage and calculation amount and inaccurate identification result of the existing identification method, the present invention provides a method for obtaining a motion trajectory of a target vehicle, comprising:

acquiring at least one target vehicle picture and/or a vehicle head picture;

adding a local feature output branch on an overall feature output branch in the first picture recognition model;

inputting at least one target vehicle picture and/or vehicle head picture into the first picture recognition model to obtain a feature vector of the target vehicle picture and/or vehicle head picture;

comparing the characteristic vectors in the query library with the characteristic vectors of the target vehicle to obtain all pictures meeting the requirements;

and combining all the pictures meeting the requirements, the position information and the time information thereof to obtain the motion trail of the target vehicle.

In a preferred embodiment of the foregoing method, the step of "adding a local feature output branch to an overall feature output branch in the first picture recognition model" further includes:

and adding a local feature output branch to the last convolution layer or the previous layer of the linear classification layer on the overall feature output branch of the first picture identification model.

In a preferred technical solution of the above method, the step of inputting at least one target vehicle picture and/or vehicle head picture into the first picture recognition model to obtain a feature vector of the target vehicle picture and/or vehicle head picture further includes:

inputting at least one target vehicle picture and/or a head picture into the first picture recognition model;

performing pooling and triple loss operations on the first picture identification model, after the target vehicle picture and/or the vehicle head picture enter the first picture identification model, performing processing of the overall characteristic output branch to obtain data of overall characteristics of the target vehicle, and then performing processing of the local characteristic output branch to obtain data of local characteristics further refined on the overall characteristics;

processing the obtained data of the overall characteristic and the obtained data of the local characteristic by applying a concat function to obtain a characteristic vector of the data simultaneously having the overall characteristic and the data of the local characteristic;

performing dimensionality reduction operation on the feature vector to obtain a dimensionality-reduced feature vector;

and quantizing the feature vector after dimension reduction to generate an int8 feature vector.

In a preferred technical solution of the above method, the step of performing a dimension reduction operation on the feature vector to obtain a dimension-reduced feature vector includes: adding a one-dimensional additional convolution layer in the last convolution layer of the first picture identification model, and reducing the channel number of the original feature map in a convolution mode to obtain a feature vector after dimension reduction; and/or the like and/or,

adding a full-connection layer into the last convolution layer of the first picture identification model, tiling the feature vectors into one-dimensional vectors, and performing linear weighting to obtain the feature vectors after dimension reduction; and/or the like and/or,

and reducing the dimension of the feature vector by using a PCA dimension reduction mode, thereby obtaining the feature vector after dimension reduction.

In a preferred technical scheme of the method, the step of acquiring at least one target vehicle picture and/or vehicle head picture specifically comprises the following steps:

inputting at least one picture containing a target vehicle into a second picture recognition model to obtain the coordinate and confidence of the target vehicle;

and obtaining the target vehicle picture and/or the vehicle head picture according to the coordinates and the confidence degree of the target vehicle.

In a preferred technical solution of the above method, the step of comparing the feature vector in the query library with the feature vector of the target vehicle to obtain all the pictures meeting the requirement specifically includes:

using FAISS to retrieve, and calculating the distances between the characteristic vectors of all vehicles in the query library and the characteristic vector of the target vehicle;

sequencing the distances between the characteristic vectors of all vehicles and the characteristic vector of the target vehicle;

acquiring k pictures with the minimum distance in the query library as a retrieval result, wherein k is an integer larger than or equal to 2; alternatively, the first and second electrodes may be,

setting a distance threshold value, and obtaining all pictures with the characteristic vector distance smaller than the threshold value in the query library as retrieval results.

In a preferred technical solution of the above method, the distance of the feature vector includes a cosine distance or a euclidean distance; and/or the like and/or,

the first picture identification model or the second picture identification model is a RetinaNet model, a YOLO model or a fast-RCNN model; and/or the like and/or,

the model used in the step of comparing the feature vectors in the query library with the feature vectors of the target vehicle to obtain all pictures meeting the requirements is an IVF model.

The invention also provides a system for obtaining the motion trail of the target vehicle, which comprises the following steps:

the vehicle detection module is used for acquiring at least one target vehicle picture and/or a vehicle head picture;

a branch adding module for adding a local feature output branch on the global feature output branch in the first picture recognition model;

the feature extraction module is used for inputting at least one target vehicle picture and/or head picture into the first picture recognition model and obtaining a feature vector of the target vehicle picture and/or head picture;

the retrieval module is used for comparing the characteristic vectors in the query library with the characteristic vectors of the target vehicle to obtain all pictures meeting the requirements;

and the track output module is used for combining all the pictures which meet the requirements, the position information and the time information thereof to obtain the motion track of the target vehicle.

In a preferred embodiment of the above system, the branch adding module further includes: the local feature output branch is added to the last convolution layer or the previous layer of the linear classification layer on the overall feature output branch of the first picture recognition model.

In a preferred technical solution of the above system, the feature extraction module specifically includes:

a picture input module for inputting at least one picture of the target vehicle and/or a picture of the head of the vehicle into the first picture recognition model;

the characteristic obtaining module is used for carrying out pooling and triple loss operations on the first picture identification model, and after the target vehicle picture and/or the vehicle head picture enter the first picture identification model, the data of the overall characteristic of the target vehicle is obtained through the processing of the overall characteristic output branch, and then the data of the local characteristic which is further refined on the overall characteristic is obtained through the processing of the local characteristic output branch;

the vector acquisition module is used for processing the obtained data of the overall characteristic and the obtained data of the local characteristic by applying a concat function to obtain a characteristic vector which simultaneously has the data of the overall characteristic and the data of the local characteristic;

the vector dimension reduction module is used for carrying out dimension reduction operation on the feature vector to obtain the feature vector after dimension reduction;

and the characteristic quantization module is used for quantizing the feature vector after the dimension reduction to generate an int8 feature vector.

In a preferred technical solution of the above system, the vehicle detection module specifically includes:

the coordinate and confidence coefficient acquisition module is used for inputting at least one picture containing a target vehicle into a second picture recognition model to obtain the coordinate and confidence coefficient of the target vehicle;

and the picture acquisition module is used for acquiring the target vehicle picture and/or the head picture in the picture containing the target vehicle according to the coordinate and the confidence coefficient of the target vehicle.

The invention also provides a computer readable storage medium, wherein a plurality of program codes are stored in the storage medium, and the program codes are suitable for being loaded and run by a processor to execute the method for obtaining the motion trail of the target vehicle in any one of the technical schemes.

The invention also provides a computer device comprising a processor and a memory, said memory being adapted to store a plurality of program codes, characterized in that said program codes are adapted to be loaded and run by said processor to perform the method of obtaining a target vehicle motion trajectory according to any of the above-mentioned claims.

As can be understood by those skilled in the art, in the technical solution of the present invention, the method for obtaining the motion trajectory of the target vehicle includes: acquiring at least one target vehicle picture and/or a vehicle head picture; continuously adding a local feature output branch to the overall feature output branch in the first picture recognition model; inputting at least one target vehicle picture and/or vehicle head picture into the first picture recognition model to obtain a feature vector of the target vehicle picture and/or vehicle head picture; comparing the characteristic vectors in the query library with the characteristic vectors of the target vehicle to obtain all pictures meeting the requirements; and combining all the pictures which meet the requirements, the position information and the time information thereof to obtain the motion trail of the target vehicle.

Through the arrangement mode, the method for obtaining the motion trail of the target vehicle can obtain the overall characteristics and the local characteristics of the vehicle, and can add interested characteristic layers according to different scenes to realize diversified identification processes. In the identification process, the overall characteristics and the local characteristics of the vehicle are converted into the characteristic vectors by combining a multi-scale detection network, dimension reduction and quantification processing are carried out on the characteristic vectors, then the retrieval process is carried out, the storage space can be further saved, the characteristic retrieval speed is accelerated, the robustness is improved, and the accuracy is higher.

Drawings

Embodiments of the present invention are described below with reference to the accompanying drawings. In the drawings:

FIG. 1 is a main flow chart of one embodiment of the method of obtaining a target vehicle motion profile of the present invention;

FIG. 2 is a detailed flowchart of one embodiment of step S100;

FIG. 3 is a detailed flowchart of one embodiment of step S300;

FIG. 4 is a detailed flowchart of one embodiment of step S400;

FIG. 5 is a detailed flowchart of another embodiment of step S400;

fig. 6 is a schematic diagram of main modules of the system for obtaining the motion trail of the target vehicle.

Detailed Description

For the purpose of facilitating understanding of the present invention, the present invention will be described more fully and in detail below with reference to the accompanying drawings and examples, but it will be understood by those skilled in the art that these embodiments are merely illustrative of the technical principles of the present invention and are not intended to limit the scope of the present invention.

In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.

The method of the present invention is described below with reference to fig. 1 to 5, and the present invention provides a method of obtaining a movement trajectory of a target vehicle, including:

step S100, at least one picture containing a target vehicle and/or a picture containing a head of the vehicle are obtained.

As shown in fig. 2, step S100 further includes:

step S110, at least one picture containing the target vehicle is input into the second picture recognition model, and the coordinates and the confidence degree of the target vehicle are obtained.

Specifically, at least one picture of a vehicle needing to obtain a motion track is input into a second picture recognition model, and the second picture recognition model can recognize coordinates and confidence degree information of a target vehicle in the vehicle picture. Wherein the second picture recognition model is a RetinaNet model, a YOLO model or a fast-RCNN model.

The existing target detection algorithm based on deep learning mainly comprises Two types of structures, namely a Two Stage structure and a One Stage structure. The Two Stage structure comprises a fast-RCNN model and the like, wherein the first Stage is focused on the extraction of the target, and the second Stage is used for classifying the extracted target and performing accurate coordinate regression; the two-stage structure is more accurate, but there is a loss in speed because the second stage requires classification/regression of each target individually. The One Stage structure comprises a YOLO model and the like, the algorithm abandons the process of extracting the target, only uses a primary structure to complete recognition/regression, has higher speed, and has certain loss of accuracy. In addition, the One Stage structure also comprises a RetinaNet model, and the model can solve the problem that the accuracy rate in the One Stage structure is lost. The RetinaNet model applies Focal local to solve the problem of accuracy Loss, and the detection precision of the RetinaNet model exceeds the fast-RCNN model of Two Stage. Focal local is a Loss function, which is modified based on standard cross-entropy Loss. This function may make the model more focused on difficult samples during training by reducing the weights of the easy samples. The person skilled in the art can select the second picture recognition model according to actual needs.

And step S120, obtaining a target vehicle picture and/or a vehicle head picture according to the coordinates and the confidence of the target vehicle.

Specifically, according to the coordinates and the confidence of the target vehicle, the target vehicle picture and/or the picture of the head of the target vehicle are/is extracted from the picture containing the target vehicle. The overall picture of the vehicle can be cut out by a common small vehicle, and the overall picture of the vehicle can be cut out by a medium-large vehicle such as a truck, a passenger car and the like, and the picture of the vehicle head can also be cut out.

In a possible embodiment, an RPN (Region generation network) is applied, and an image of any scale is input into the RPN, so that a series of rectangular candidate regions can be output, where the rectangular candidate regions are frames screened out to possibly have targets, that is, a scratched target vehicle picture and/or a scratched head picture.

Step S200, adding a local feature output branch to the overall feature output branch in the first picture recognition model.

Specifically, the local feature output branch may be added on the basis of the global feature output branch according to the specific needs in the recognition process. That is, local feature output branches are added to the classification branches and regression branches of different scales of the detection network, and the output branch of the local feature is added to the last convolution layer or the previous layer of the linear classification layer of each detection network output branch. The classification branch is from a classification network, namely a branch in the classification network; the regression branch is from the regression network, i.e. is a branch in the regression network. The detection network and the regression network are obtained from the RPN. Local features include, but are not limited to, annual logos, ornaments, hanging ornaments, tissue boxes, sun visors, windshields, vehicle lights, vehicle logos, text and patterns, and the like. Wherein the first picture recognition model is a RetinaNet model, a YOLO model or a fast-RCNN model. In the subsequent processing process, after a target picture is input into a first picture recognition model, vectors of an overall structure (overall vehicle type, color and the like) are extracted through an overall characteristic output branch, then due to the fact that a second layer of local characteristic output branch is added, after the vectors of the overall output characteristic are extracted, vectors of interested local characteristics (a tissue box and the like) can be continuously extracted through the local characteristic output branch, finally the output characteristic vectors contain the overall characteristic vectors and the local characteristic vectors, and because the types and styles of the local characteristics (whether the tissue box has the color, what the color and the like or whether the tissue box has the pendant at the same time) are much more changed than the overall characteristics (the color, the vehicle type and the like), a vehicle is drawn after the overall characteristics and the local characteristics are combined, and the limited individual is unique and specific, therefore, the accuracy is improved, and meanwhile, the whole calculation amount is greatly reduced.

In a possible embodiment, after obtaining the target Region in the RPN network, a binary threshold is determined, and when the threshold of the target Region is greater than the threshold, the target Region is determined to be an ROI (Region of interest), and these different ROIs are the detection networks. At the same time, the RPN also frames the approximate locations of these ROIs on the target picture, and these framed approximate locations constitute the regression network.

Step S300, at least one target vehicle picture and/or vehicle head picture is input into the first picture recognition model, and the feature vector of the target vehicle picture and/or vehicle head picture is obtained.

As shown in fig. 3, step S300 further includes:

step S310, at least one picture containing the target vehicle and/or the picture containing the head of the vehicle are input into the first picture recognition model.

Specifically, the picture of the target vehicle and/or the picture of the head of the vehicle in the pictures containing the target vehicle acquired in step S120 are input into the first picture recognition model.

Step S320, performing a pooling and triple loss operation on the first image recognition model, and after the target vehicle image and/or the vehicle head image enter the first image recognition model, obtaining data of the overall characteristic of the target vehicle through processing of the overall characteristic output branch, and then obtaining data of the local characteristic further refined on the overall characteristic through processing of the local characteristic output branch.

Specifically, after a target vehicle picture and/or a vehicle head picture in a picture containing a target vehicle are/is acquired, a first picture recognition model is put into a convolutional network to acquire a Feature Map (Feature Map), and Feature data acquired by a convolutional layer is further integrated and refined by using posing and triple Loss technologies to acquire more accurate overall Feature data and local Feature data of the vehicle. The overall characteristic output branch detects the overall characteristics of the vehicle picture to obtain overall characteristic data of the target vehicle; and the local characteristic output branch detects the local characteristic of the vehicle picture to obtain the local characteristic data of the target vehicle. posing is a technique for reducing dimensions in an artificial network that follows the human visual system and identifies image features using higher-level abstractions, i.e., a true neural network. The TripletLoss is a loss function applied to deep learning, and is used for training samples with small differences, the data input of the TripletLoss comprises an anchor example, a positive example and a negative example, and similarity calculation of the samples is realized by optimizing the distance between the anchor example and the positive example to be smaller than the distance between the anchor example and the negative example.

In one possible embodiment, the obtained global characteristics and local characteristics of the target vehicle are processed by Batch Normalization to generate characteristic data with regularity. The Batch Normalization is a method for unifying dispersed data and a method for optimizing a neural network, and the data processed by the Batch Normalization has a unified specification, so that a machine can learn rules in the data more easily.

Step S330, the concat function is applied to process the obtained data of the overall characteristic and the data of the local characteristic, and the data with the overall characteristic and the feature vector of the data with the local characteristic are obtained.

Specifically, the data of the global feature and the data of the local feature obtained in the previous step are input into a concat function, and the data of the global feature and the data of the local feature are obtained through calculation. The concat function is a function for connecting two or more arrays, and returns a new array after the calculation is completed.

In another possible implementation, the first picture recognition mode is a RetinaNet model, a backbone network of the RetinaNet model is composed of Resnet + FPN, and feature extraction is performed on a target vehicle picture and/or a vehicle head picture in an input picture containing a target vehicle through the backbone network, so that an FPN network can be obtained; after the FPN network is obtained, performing convolution on each layer of the FPN network containing the overall characteristics and the local characteristics by using a classification sub-network and a regression sub-network to obtain classification sub-network characteristics and regression sub-network characteristics, performing posing and Batchnormalization operations, processing by using a concat function, and outputting as a characteristic vector. The Resnet is a residual error network, the residual error network belongs to a network type commonly used by technicians in the field, and detailed expansion is not performed in the application; the Backbone network (Backbone) is mainly used for Feature extraction, extracting information on pictures and generating a Feature Map (Feature Map) for subsequent network use; FPN (feature Pyramid networks) is a method for efficiently extracting each dimension feature of a picture by using a conventional convolution network, and can better deal with the problem of multi-scale change in the object detection process.

And step S340, performing dimension reduction operation on the feature vector to obtain the feature vector after dimension reduction.

Specifically, the feature vector obtained in step S330 has a high dimension, which may result in an excessively complex feature matching process, consume system resources, and thus require a feature dimension reduction operation. By feature dimensionality reduction, a feature with a high dimensionality is represented by a feature with a low dimensionality. The characteristic dimension reduction of the invention mainly adopts two types of methods: feature selection and feature extraction. Selecting features, namely selecting one subset from the features with high dimension as the features after dimension reduction; and the feature extraction is to map the high-dimensional features to the low-dimensional features through a certain function as the reduced-dimensional features.

In a possible implementation manner, a one-dimensional additional convolutional layer is added to the last convolutional layer of the first picture identification model, and the number of channels of the original feature map is reduced in a convolution mode, so that a feature vector after dimension reduction is obtained.

In another possible implementation, a full-link layer is added after the last convolutional layer of the first picture recognition model, the feature vectors are tiled into one-dimensional vectors, and linear weighting is performed, so as to obtain the feature vectors after dimension reduction.

In another possible embodiment, the feature vectors are reduced in dimension using PCA (principal component Analysis), i.e., the original features are linearly transformed and mapped into a low-dimensional space while representing the original features as much as possible.

And step S350, quantizing the feature vectors subjected to the dimension reduction, and generating int8 feature vectors.

Specifically, the storage capacity of the feature vector after the dimension reduction is still large, and the fluctuation of the feature vector after the dimension reduction is small, so that the feature vector after the dimension reduction is quantized, the access storage capacity can be further reduced, the calculation amount can be reduced, and the influence on the convolution network is small. The actual quantization process is to convert the convolution operation of float32 bit to the convolution operation of int 8.

In one possible embodiment, the quantization is performed directly using an unsaturated manner, i.e., scale |/127, where scale is the range of the feature vector after quantization and max is the range of the feature vector before quantization, and the quantization is implemented directly in a mapping manner during quantization.

In another possible implementation, a saturation quantization mode is used, a threshold parameter is set to intercept the value range of float32, and then mapping is performed to obtain an int8 vector for output.

And step S400, comparing the characteristic vectors in the query library with the characteristic vector of the target vehicle to obtain all pictures meeting the requirements.

As shown in fig. 4, in a possible implementation, step S400 further includes:

and step S410, applying FAISS retrieval to calculate the distances between the feature vectors of all vehicles in the query library and the feature vector of the target vehicle.

Specifically, the FAISS refers to a Facebook AI Similarity Search, which is an open source library and provides an efficient and reliable retrieval method mainly for mass data in a high-dimensional space. And (4) putting the int8 feature vector obtained in the step (S350) into an FAISS model, and calculating the distance between the feature vector in the query library and the target vehicle feature vector. The distance of the feature vector includes, but is not limited to, cosine distance, Euclidean distance, etc. and their variants.

In one possible embodiment, the model used to calculate the distance between the feature vectors of all vehicles in the query library and the feature vector of the target vehicle is the IVF model. The IVF (inverted File system) is an inverted index, namely, K-means clustering is directly carried out on all vectors in the library, the distance between a query vector and K clustering centers is calculated firstly during each calculation, then Top N clusters with the closest distance are selected, and only the distance between the query vector and the vectors under the clusters is calculated.

Step S420, ranks the distances between the feature vectors of all vehicles and the feature vector of the target vehicle.

And step S430, acquiring k pictures with the minimum distance in the query library as a retrieval result.

Specifically, the distances obtained in the above steps are ranked, and K pictures with the minimum distance to the target vehicle feature vector in the query library are obtained as the retrieval result. Here, K is an integer of 2 or more. In a preferred embodiment, K is 100. The person skilled in the art can adjust the value of K accordingly according to actual needs.

In another possible implementation, step S400 further includes:

Step S430, a distance threshold is set, and all the pictures with the distance less than the distance threshold in the query library are obtained as a retrieval result.

Specifically, a distance threshold is set, and after the distances obtained in step S410 are sorted, the pictures with the distances smaller than the set threshold are used as the search result.

And step S500, combining all the pictures meeting the requirements and the position information and the time information thereof to obtain the motion trail of the target vehicle.

Specifically, the picture of the search result obtained in step S430 includes the position information and the time information of the vehicle, and the motion trajectory of the target vehicle is obtained by combining the obtained picture meeting the requirement, the position information, and the time information thereof.

The setting mode has the advantages that: in the process of obtaining the motion trail of the target vehicle, the vehicle is likely to have certain change due to the influence of factors such as environment and the like, and the method can effectively avoid the influence caused by the change.

According to the method for obtaining the target vehicle running track, the local feature output branches are added on the overall feature output branches on the basis of the original convolutional network, meanwhile, interested output branches can be added according to the change of a use scene and the environment, the overall feature and the local feature are extracted, and the robustness of the feature extraction mode is higher. And the extracted overall features and local features of the target vehicle are further subjected to dimension reduction and quantification, so that the storage space can be effectively saved, and the matching speed is increased. And finally, the running track of the vehicle is output, the accuracy is higher, and the feature extraction method used in the invention belongs to a strong supervision deep learning method, so that the cost of collecting and annotating large-scale data is avoided, the model training time can be integrally saved, and the interpretability is stronger.

As shown in fig. 6, the present invention also provides a system for obtaining a motion trajectory of a target vehicle, including:

the characteristic extraction module is used for inputting at least one target vehicle picture and/or vehicle head picture into the first picture recognition model to obtain a characteristic vector of the target vehicle picture and/or vehicle head picture;

The branch adding module is further characterized by: the method is used for adding a local feature output branch to the last convolution layer or the previous layer of the linear classification layer on the overall feature output branch of the first picture recognition model.

Wherein the feature extraction module specifically comprises:

the picture input module is used for inputting at least one target vehicle picture and/or a vehicle head picture into the first picture recognition model;

the characteristic obtaining module is used for carrying out pooling and triple loss operations on the first picture recognition model, obtaining data of the overall characteristics of the target vehicle through processing of an overall characteristic output branch after the target vehicle picture and/or the vehicle head picture enter the first picture recognition model, and then obtaining data of the local characteristics further refined on the overall characteristics through processing of a local characteristic output branch;

the vector acquisition module is used for processing the acquired data of the overall characteristic and the acquired data of the local characteristic by applying a concat function to acquire a characteristic vector which simultaneously has the data of the overall characteristic and the data of the local characteristic;

and the characteristic quantization module is used for quantizing the feature vectors subjected to the dimension reduction and generating int8 feature vectors.

Wherein vehicle detection module specifically includes:

the coordinate and confidence coefficient acquisition module is used for inputting at least one picture containing the target vehicle into the second picture recognition model to obtain the coordinate and confidence coefficient of the target vehicle;

and the picture acquisition module is used for acquiring a picture of the target vehicle and/or a picture of the head of the vehicle according to the coordinates and the confidence of the target vehicle.

Particularly, when extracting the features of the target vehicle, a multi-Granularity algorithm, such as MGN (multi-Granularity Network), may be used to obtain the global features and the features of different fine granularities of the target vehicle. Feature extraction can also be achieved using a separate local feature extraction method, such as PCB.

Further, in an embodiment of a computer device of the present invention, the computer device comprises a processor and a memory, the memory is adapted to store a plurality of program codes, the program codes are adapted to be loaded and run by the processor to execute the aforementioned method of obtaining the movement trace of the target vehicle.

Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.

Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A method of obtaining a target vehicle motion profile, comprising:

acquiring at least one target vehicle picture and/or a vehicle head picture;

2. The method according to claim 1, wherein the step of adding a local feature output branch to the global feature output branch in the first picture recognition model further comprises:

3. The method of claim 1, wherein the step of inputting at least one of the target vehicle picture and/or the vehicle head picture into the first picture recognition model to obtain the feature vector of the target vehicle picture and/or the vehicle head picture further comprises:

performing pooling and Tripletloss operations on the first picture identification model, after the target vehicle picture and/or the vehicle head picture enter the first picture identification model, performing processing of the overall characteristic output branch to obtain data of overall characteristics of the target vehicle, and then performing processing of the local characteristic output branch to obtain data of local characteristics further refined on the overall characteristics;

4. The method according to claim 3, wherein the step of performing a dimensionality reduction operation on the feature vector to obtain a dimensionality-reduced feature vector comprises:

adding a one-dimensional additional convolution layer in the last convolution layer of the first picture identification model, and reducing the channel number of the original feature map in a convolution mode to obtain a feature vector after dimension reduction; and/or the like and/or,

5. The method according to claim 1, wherein the step of obtaining at least one target vehicle picture and/or a vehicle head picture specifically comprises:

6. The method according to claim 5, wherein the step of comparing the feature vectors in the query library with the feature vectors of the target vehicle to obtain all the pictures meeting the requirement specifically comprises:

7. The method of claim 6, wherein the distance of the feature vector comprises a cosine distance or a Euclidean distance; and/or the like and/or,

8. A system for obtaining a target vehicle motion profile, comprising:

the feature extraction module is used for inputting at least one target vehicle picture and/or head picture into the first picture recognition model to obtain a feature vector of the target vehicle picture and/or head picture;

9. The system of claim 8, wherein the branch addition module is further characterized by: the local feature output branch is added to the last convolution layer or the previous layer of the linear classification layer on the overall feature output branch of the first picture recognition model.

10. The system of claim 8, wherein the feature extraction module specifically comprises:

the characteristic obtaining module is used for carrying out pooling and Tripletloss operations on the first picture identification model, and after the target vehicle picture and/or the vehicle head picture enter the first picture identification model, the target vehicle picture and/or the vehicle head picture are processed through the overall characteristic output branch to obtain data of overall characteristics of the target vehicle, and then the data of the local characteristics which are further refined on the overall characteristics are obtained through the processing of the local characteristic output branch;

11. The system according to claim 8, characterized in that the vehicle detection module comprises in particular:

and the picture acquisition module is used for acquiring the target vehicle picture and/or the vehicle head picture according to the coordinates and the confidence coefficient of the target vehicle.

12. A computer readable storage medium having stored thereon a plurality of program codes, wherein the program codes are adapted to be loaded and executed by a processor to perform the method of obtaining a target vehicle motion profile according to any one of claims 1-7.

13. A computer device comprising a processor and a memory adapted to store a plurality of program codes, characterized in that said program codes are adapted to be loaded and run by said processor to perform the method of obtaining a target vehicle motion profile according to any one of claims 1-7.