CN110348463B

CN110348463B - Method and device for identifying vehicle

Info

Publication number: CN110348463B
Application number: CN201910640084.8A
Authority: CN
Inventors: 蒋旻悦; 谭啸; 孙昊; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2021-08-24
Anticipated expiration: 2039-07-16
Also published as: CN110348463A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for identifying a vehicle. One embodiment of the method comprises: acquiring an image of a vehicle to be identified, and identifying at least one key point of the image; determining at least one coplanar region of the vehicle based on the at least one keypoint; extracting local features of all coplanar regions based on a pre-trained neural network; learning weights of different features of the image based on the local features of the coplanar regions; calculating similarity of the image and each vehicle image in the vehicle image library based on the learned weight of each feature; and determining the vehicle image with the similarity higher than a preset threshold and the maximum similarity in the vehicle image library as a target image and outputting relevant information of the target image. The implementation method can effectively improve the feature effectiveness of the vehicle in the sheltering and cross-border scenes, and can fuse the detailed features and the overall features of the vehicle and improve the vehicle re-identification result.

Description

Method and device for identifying vehicle

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for identifying a vehicle.

Background

With the development of modern industrialization process, the quantity of motor vehicles kept by residents is continuously improved, great convenience is brought to the life of people, and a serious challenge is brought to the aspect of traffic management. Therefore, the improvement of the intelligent degree of the Intelligent Transportation System (ITS) becomes a research focus. In ITS, research of a re-recognition method of a designated vehicle is one of the keys.

The existing vehicle re-identification system extracts features from the overall picture of the vehicle, and the robustness is poor under the shielding condition. Under the condition of crossing cameras, the visible parts of the same vehicle are different, and the direct use of full-image matching can cause large feature difference of the same vehicle under different lenses. The variability of different vehicles is generally focused on details, but existing vehicle re-identification methods lack a local contrast for the vehicle.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatus for identifying a vehicle.

In a first aspect, embodiments of the present disclosure provide a method for identifying a vehicle, comprising: acquiring an image of a vehicle to be identified, and identifying at least one key point of the image; determining at least one coplanar region of the vehicle based on the at least one keypoint; extracting local features of all coplanar regions based on a pre-trained neural network; learning weights of different features of the image based on the local features of the coplanar regions; calculating similarity of the image and each vehicle image in the vehicle image library based on the learned weight of each feature; and determining the vehicle image with the similarity higher than a preset threshold and the maximum similarity in the vehicle image library as a target image and outputting relevant information of the target image.

In some embodiments, learning the weights of the different features of the image based on the local features of the coplanar regions comprises: extracting global features of the image based on a neural network; weights of different features of the image are learned based on local features and global features of the coplanar regions.

In some embodiments, learning the weights of the different features of the image based on the local features of the coplanar regions comprises: acquiring a thermodynamic diagram of at least one key point; weights of different features of the image are learned based on local features of the coplanar regions, thermodynamic diagrams of the at least one keypoint, and global features.

In some embodiments, identifying at least one keypoint of the image comprises: inputting the image into a deep neural network to obtain at least one original image feature map, wherein each original image feature map corresponds to a key point; inputting a mirror image of an image into a deep neural network to obtain at least one characteristic graph, and carrying out mirror image processing on the at least one characteristic graph to obtain at least one mirror image characteristic graph, wherein each mirror image characteristic graph corresponds to a key point; and for each key point, respectively finding two positions of the maximum value of the current feature from the original image feature map and the mirror image feature map corresponding to the key point, and averaging the two positions to obtain the position of the key point.

In some embodiments, extracting local features of each coplanar region based on a pre-trained neural network comprises: and converting the coplanar regions into rectangles through affine transformation, and inputting the rectangles into a pre-trained classifier to obtain local features of the coplanar regions.

In some embodiments, calculating a similarity of the image to each vehicle image in the vehicle image library based on the learned weight of each feature comprises: for each feature, calculating a distance between the feature of the image and a feature of each vehicle image in the vehicle image library; the distances between the features are weighted and summed based on the learned weights of the features, and the weighted sum of the distances is converted into a similarity.

In a second aspect, an embodiment of the present disclosure provides an apparatus for identifying a vehicle, including: a key point identification unit configured to acquire an image of a vehicle to be identified and identify at least one key point of the image; a coplanarity determination unit configured to determine at least one coplanarity area of the vehicle based on the at least one keypoint; a local feature extraction unit configured to extract local features of the coplanar regions based on a pre-trained neural network; a weight determination unit configured to learn weights of different features of the image based on the local features of the respective coplanar regions; a calculation unit configured to calculate a similarity of the image with each vehicle image in the vehicle image library based on the learned weight of each feature; and the output unit is configured to determine the vehicle image with the similarity higher than a preset threshold value and the maximum similarity in the vehicle image library as the target image and output the related information of the target image.

In some embodiments, the weight determination unit is further configured to: extracting global features of the image based on a neural network; weights of different features of the image are learned based on local features and global features of the coplanar regions.

In some embodiments, the weight determination unit is further configured to: acquiring a thermodynamic diagram of at least one key point; weights of different features of the image are learned based on local features of the coplanar regions, thermodynamic diagrams of the at least one keypoint, and global features.

In some embodiments, the keypoint identification unit is further configured to: inputting the image into a deep neural network to obtain at least one original image feature map, wherein each original image feature map corresponds to a key point; inputting a mirror image of an image into a deep neural network to obtain at least one characteristic graph, and carrying out mirror image processing on the at least one characteristic graph to obtain at least one mirror image characteristic graph, wherein each mirror image characteristic graph corresponds to a key point; and for each key point, respectively finding two positions of the maximum value of the current feature from the original image feature map and the mirror image feature map corresponding to the key point, and averaging the two positions to obtain the position of the key point.

In some embodiments, the local feature extraction unit is further configured to: and converting the coplanar regions into rectangles through affine transformation, and inputting the rectangles into a pre-trained classifier to obtain local features of the coplanar regions.

In some embodiments, the computing unit is further configured to: for each feature, calculating a distance between the feature of the image and a feature of each vehicle image in the vehicle image library; the distances between the features are weighted and summed based on the learned weights of the features, and the weighted sum of the distances is converted into a similarity.

In a third aspect, embodiments of the present disclosure provide an electronic device for identifying a vehicle, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.

In a fourth aspect, embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any one of the first aspect.

According to the method and the device for identifying the vehicle, the two-dimensional coplanar regions of the vehicle are obtained by detecting the key points of the vehicle and according to the coplanar relation of the key points, and the features are extracted aiming at different coplanar regions of the vehicle for feature fusion, so that the robustness of the vehicle features in a cross-lens scene can be effectively enhanced. The fused features can be used as input of a vehicle re-identification system, and the re-identification system outputs an identification result.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method for identifying a vehicle according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for identifying a vehicle according to the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a method for identifying a vehicle according to the present disclosure;

FIG. 5 is a schematic structural diagram of one embodiment of an apparatus for identifying a vehicle according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for identifying a vehicle or the apparatus for identifying a vehicle of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

cameras

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium to provide communication links between the

cameras

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

cameras

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages or the like.

The

cameras

101, 102, 103 generally refer to cameras for vehicle monitoring. The electronic police can be used for capturing illegal vehicles (such as crossing lanes to press solid lines, reversely driving, occupying non-motor lanes, driving without a guide mark, running red lights and the like) at the crossroad. The camera can also be a bayonet camera which is arranged on some key road sections of expressways, provincial roads and national roads and is used for capturing the illegal behaviors of driving at an overspeed. The

cameras

101, 102, 103 may also be a break-stop snapshot camera, a traffic monitoring camera, a skynet monitoring camera, a mobile snapshot camera, and the like.

The server 105 may be a server that provides various services, such as a background analysis server that provides analysis of vehicle data collected on the

cameras

101, 102, 103. The background analysis server may perform processing such as analysis on the received vehicle image, and output a processing result (e.g., vehicle information).

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for identifying a vehicle provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the device for identifying a vehicle is generally disposed in the server 105.

It should be understood that the number of cameras, networks, and servers in fig. 1 is merely illustrative. There may be any number of cameras, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for identifying a vehicle according to the present disclosure is shown. The method for identifying a vehicle comprises the following steps:

step 201, acquiring an image of a vehicle to be identified, and identifying at least one key point of the image.

In the present embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for recognizing a vehicle may acquire an image of the vehicle to be recognized from a camera for taking a picture of the vehicle through a wired connection manner or a wireless connection manner. The key points of the vehicle can be wheel shafts, license plates, door rotating shafts, door handles and the like as shown in fig. 3. The key points of the vehicle can be detected through the trained key point detection model. The method of training the keypoint detection model may comprise the steps of:

in step 2011, a sample set is obtained.

In this embodiment, the performing agent (e.g., server 105 shown in fig. 1) of the method of training the keypoint detection model may obtain the sample set in a variety of ways. For example, the executing entity may obtain the existing sample set stored therein from the database server through a wired connection or a wireless connection. As another example, the user may collect the sample through a camera. In this way, the executing subject may receive the vehicle images taken by the camera as samples and store the samples locally, thereby generating a sample set.

Here, the sample set may include at least one sample. The sample can include the sample vehicle image and the annotation information associated with the key points in the sample vehicle image.

Optionally, data enhancement of the training samples may be performed, including rotation, size change, cropping, flipping, changing light intensity, etc., to obtain augmented training data and to make the model more generalized. When data enhancement is carried out on the picture, corresponding operations such as rotation, scale change and overturning are carried out on the marked key point coordinates.

In the present embodiment, the sample vehicle image generally refers to an image containing a vehicle. It may be a planar vehicle image or a stereoscopic vehicle image (i.e., a vehicle image containing depth information). And the sample vehicle image may be a color image (e.g., an RGB (Red, Green, Blue, Red-Green-Blue) photograph) and/or a grayscale image, etc. The Format of the Image is not limited in the present application, and may be a Format such as jpg (Joint Photo graphics Experts Group, a picture Format), BMP (Bitmap, Image file Format), or RAW (RAW Image Format), as long as the subject reading and recognition can be performed.

Step 2012, a sample is selected from the sample set.

In this embodiment, the executing subject may select a sample from the sample set obtained in step 201, and perform the training step. The selection manner and the number of samples are not limited in the present application. For example, at least one sample may be selected randomly, or a sample with better sharpness (i.e., higher pixels) of the vehicle image may be selected.

And 2013, inputting the sample vehicle image of the selected sample into the initial first model to obtain a characteristic diagram.

In this embodiment, the executive principal may input the sample vehicle image of the sample selected in step 2012 into the initial first model. By detecting and analyzing the key point areas in the sample vehicle image, a feature map containing key points can be obtained.

In this embodiment, the initial first model may be an existing variety of neural network models created based on machine learning techniques. The neural network model may have various existing neural network structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). The storage location of the initial model is likewise not limited in this application.

Step 2014, determining a first-layer loss value based on the feature map and the labeling information of the key points in the sample vehicle image.

In this embodiment, the executive subject may analyze the annotation information of the key points of the sample vehicle image and the feature map obtained in step 2013, so as to determine the first-layer loss value. For example, the feature map and the label information of the key point may be used as parameters and input to a specified loss function (loss function), so that a loss value between the feature map and the key point can be calculated.

In this embodiment, the loss function is usually used to measure the degree of inconsistency between the predicted value (e.g. feature map) and the actual value (e.g. annotation information) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function may be set according to actual requirements.

Step 2015, inputting the feature map into the initial second model to obtain the position coordinates of the detected key points.

In this embodiment, the executive agent may input the feature map generated in step 2013 into the initial second model to obtain the position coordinates of the detected key points. The initial second model may be an attention model based neural network. The primary purpose of the initial second model is to extract the concerned features from feature maps of different scales, and the detailed features and the semantic features can be retained. Thereby concentrating the important features and weakening the unimportant features.

Step 2016, determining a second layer loss value based on the detected position coordinates of the keypoints and the labeling information of the keypoints in the sample vehicle image.

In this embodiment, the executing entity may analyze the labeling information of the key points of the sample vehicle image and the position coordinates of the key points obtained in step 2015, so that the second-layer loss value may be determined. For example, the position coordinates of the detected key points and the label information of the key points may be input to a predetermined loss function (loss function) as parameters, and a loss value between the two values may be calculated.

In this embodiment, the loss function is generally used to measure the degree of inconsistency between the predicted value (e.g. the position coordinates of the detected key points) and the actual value (e.g. the annotation information of the key points) of the model. It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function may be set according to actual requirements.

Step 2017, determining whether the training of the initial first model and the initial second model is finished based on the first layer loss value and the second layer loss value.

In this embodiment, the first layer loss value and the second layer loss value are added to obtain the total loss value of the network. In each iterative training process, inputting pictures and corresponding key point marking data, calculating a first layer loss value and a second layer loss value by forward propagation, then calculating the gradient of the first layer loss value and the second layer loss value, and completing the backward propagation of the network and updating parameters. Experiments show that after a certain number of iterations, the first layer loss value and the second layer loss value are changed, only key points which are difficult to detect are concerned, namely, only a plurality of key point channels with larger second loss values are calculated and returned, and therefore a better detection effect on the key points which are difficult to detect is achieved.

From the change in the loss value, the execution subject may determine whether the initial model is trained. As an example, if multiple samples are selected in step 2012, the executing agent may determine that the training of the initial first model and the initial second model is complete if the total loss value of each sample reaches the target value. As another example, the performing agent may count the proportion of samples with total loss values reaching the target value to the selected samples. And when the ratio reaches a preset sample ratio (e.g., 95%), it can be determined that the initial model training is complete.

In this embodiment, if the executive agent determines that the training of the initial first model and the initial second model is completed, the executive agent may continue to execute step 2018. If the executing agent determines that the initial first model and the initial second model are not trained, the relevant parameters in the initial first model and the initial second model may be adjusted. The weights in each convolutional layer in the initial first model and the weights in each attention model in the initial second model are modified, for example, using a back propagation technique. And may return to step 2012 to re-select samples from the sample set. So that the training steps described above can be continued.

It should be noted that the selection manner is not limited in the present application. For example, in the case where there are a large number of samples in the sample set, the execution subject may select a non-selected sample from the sample set.

Step 2018, in response to determining that the training of the initial first model and the initial second model is completed, determining the initial first model and the initial second model as the vehicle key point detection models.

In this embodiment, if the execution subject determines that the training of the initial first model and the initial second model is completed, the initial first model and the initial second model may be determined as the vehicle key point detection model.

Optionally, the executing entity may store the generated key point detection model locally, or may send it to a terminal or a database server.

The execution subject may input the acquired vehicle image into the key point detection model, thereby generating a vehicle key point detection result of the detection object. The vehicle keypoint detection result may be position information describing keypoints of the vehicle in the image.

At least one coplanar region of the vehicle is determined based on the at least one keypoint, step 202.

In the embodiment, according to the detection result of the key points and the prior coplanar area of the key points, the coplanar vehicle area r is positioned in the coplanar vehicle area₁，r₂，...，r_nAnd n is the number of coplanar regions. Keypoint a priori coplanar regions refer to regions of known coplanar keypoints in advance. For example, key points on the left side door of the vehicle constitute a coplanar region.

And step 203, extracting local features of all coplanar regions based on a pre-trained neural network.

In the present embodiment, since each coplanar region is not necessarily rectangular, local features need to be extracted from the input neural network after converting the coplanar region into a rectangle by affine transformation. The neural network may be various classifiers for classifying the vehicle image, and the type of the vehicle is determined by extracting features of the vehicle. Such as bme 1, gallop 2, etc. The training sample of the neural network is an image of the coplanar area of the vehicle labeled with the vehicle type. And inputting the coplanar area to be identified into the trained neural network to obtain the characteristics of the vehicle and the type of the vehicle.

At step 204, weights of different features of the image are learned based on the local features of the coplanar regions.

In this embodiment, the local features of the coplanar regions may be connected in series and input into the neural network, so as to autonomously learn the weights between different features.

In some optional implementations of the present embodiment, the global features of the image are extracted based on a neural network. Weights of different features of the image are learned based on local features and global features of the coplanar regions. The entire image may be input into the neural network described above to extract global features of the image. And then, connecting the local features and the global features of all coplanar regions in series, inputting the two types of features into the neural network, and autonomously learning the weight among different features.

In some optional implementations of this embodiment, a thermodynamic diagram of at least one keypoint is obtained. Weights of different features of the image are learned based on local features of the coplanar regions, thermodynamic diagrams of the at least one keypoint, and global features. Thermodynamic diagrams kp of the features, key points, of the coplanar regions₁，kp₂，...，kp_mAnd the features G of the whole graph area are connected in series and input into the neural network, and the weights among different features are automatically learned:

W＝F[CNN(r₁，r₂，...，r_n，kp₁，kp₂，...，kp_m，G)]

wherein, CNN (r)₁，r₂，...，r_n，kp₁，kp₂，...，kp_mAnd G) represents features extracted from the neural network. The thermodynamic diagram of at least one key point is multiplied by the global characteristics to obtain at least one key point cycleLocal features of the enclosure. As can be seen, the implementation fuses coplanar features, local features around keypoints, and global features.

In step 205, the similarity between the image and each vehicle image in the vehicle image library is calculated based on the learned weight of each feature.

In this embodiment, after obtaining the above features, the corresponding positions all represent specific features, and the distances between different feature portions on the corresponding pictures are compared with the vehicle images in the vehicle image library, and the weights between different pictures are compared, so as to weight different features, thereby obtaining the similarity between final pictures. Similarity calculation methods such as Euclidean distance and Hamming distance can be adopted to calculate the similarity between the image and each vehicle image in the vehicle image library.

And step 206, determining the vehicle image with the similarity higher than a preset threshold and the maximum similarity in the vehicle image library as a target image and outputting the relevant information of the target image.

In the embodiment, the similarity calculation is performed by traversing each vehicle image in the vehicle image library to find the most similar image, and if the similarity is higher than a predetermined threshold, the vehicle image with the maximum similarity is determined as the target image. In the process of establishing the vehicle image library, relevant information of the vehicle images, such as information of license plates, names of vehicle owners, contact ways of the vehicle owners and the like, is collected.

With continued reference to fig. 3, fig. 3 is a schematic view of an application scenario of the method for identifying a vehicle according to the present embodiment. In the application scenario of fig. 3, an image of a vehicle to be recognized is captured by a camera, and then key points in the image, such as a left front door hinge, a left front door handle, a front license plate, an air inlet, etc., are recognized. The coplanar area is determined by the known key point coplanar characteristics, for example, the front license plate, the air inlet and the logo form the coplanar area. And extracting the characteristics of all coplanar areas through a neural network. Inputting the full map into the neural network can obtain the global features of the full map area. And multiplying the thermodynamic diagram of the key point by the global feature to obtain the feature of the local area of the key point. And finally, connecting the features of the coplanar regions, the global features of the whole image region and the thermodynamic diagrams of the key points in series, inputting the three features into a neural network, and autonomously learning the weight among different features. And then calculating the similarity between the image of the vehicle to be identified and each vehicle image in the vehicle image library according to the weight, so as to judge the vehicle image matched with the image of the vehicle to be identified in the vehicle image library according to the similarity, wherein the registered information is the information of the vehicle to be identified when the vehicle image is put in the library.

The method provided by the embodiment of the disclosure detects the vehicle key points by using the key point method, extracts the vehicle local information based on the positions of the key points, and performs the vehicle re-identification task by combining the vehicle overall information.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for identifying a vehicle is shown. The flow 400 of the method for identifying a vehicle includes the steps of:

step 401, obtaining an image of a vehicle to be identified, and inputting the image into a deep neural network to obtain at least one original image feature map.

In the present embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for recognizing a vehicle may acquire an image of the vehicle to be recognized through a camera for taking a picture of the vehicle in a wired connection manner or a wireless connection manner. The original image I_oriInputting the vehicle key point model KP (a) (deep neural network) trained in step 201, and obtaining the key point detection result KP_oriI.e. at least one original feature map, wherein each key point corresponds to one original feature map.

KP_ori＝KP(I_ori)

Step 402, inputting the mirror image of the image into a deep neural network to obtain at least one feature map, and performing mirror image processing on the at least one feature map to obtain at least one mirror image feature map.

In the present embodiment, a mirror image I of an image is set_mirrorInputting the vehicle key point model KP (a) (deep neural network) trained in step 201, and obtaining the key point detection result KP_mirrorI.e. at least one feature map, wherein each keypoint corresponds to a feature map.

KP_ori＝KP(I_ori)

Wherein KP_oriAnd KP_mirrorDimension of (1) is H W C_p. H, W are the height and width of the feature tensor, C, respectively_pThe number of key points of the vehicle. Then, the feature map needs to be subjected to mirror image processing to return to the original image coordinate system, so as to obtain at least one mirror image feature map.

And step 403, for each key point, finding two positions of the current maximum feature value from the original image feature map and the mirror image feature map corresponding to the key point, and averaging the two positions to obtain the position of the key point.

In the present embodiment, KP pairs_oriAnd KP_mirrorAnd fusing to obtain the key points of detection. For each key point, its coordinates in the original image are obtained by the following operations:

1) output feature pattern KP at H W_oriAnd KP_mirrorAnd restoring the image to a mirror image of an original image coordinate system to find the position of the maximum value of the current feature.

2) And averaging the two positions to obtain the position of the current key point.

At least one coplanar region of the vehicle is determined based on the at least one keypoint, step 404.

In step 405, local features of the coplanar regions are extracted based on a pre-trained neural network.

At step 406, weights of different features of the image are learned based on local features of the coplanar regions.

Step 407, calculating the similarity of the image and each vehicle image in the vehicle image library based on the learned weight of each feature.

And step 408, determining the vehicle image with the similarity higher than a preset threshold and the maximum similarity in the vehicle image library as a target image and outputting the relevant information of the target image.

The steps 404-408 are substantially the same as the steps 202-206, and thus are not described in detail.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for identifying a vehicle in the present embodiment represents the step of detecting the key points. Therefore, the scheme described in the embodiment can further improve the accuracy of vehicle identification.

With further reference to fig. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides one embodiment of an apparatus for identifying a vehicle, which corresponds to the method embodiment illustrated in fig. 2, and which may be particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for identifying a vehicle of the present embodiment includes: a key point identification unit 501, a coplanarity determination unit 502, a local feature extraction unit 503, a weight determination unit 504, a calculation unit 505, and an output unit 506. The key point identification unit 501 is configured to acquire an image of a vehicle to be identified and identify at least one key point of the image; a coplanarity determination unit 502 configured to determine at least one coplanarity area of the vehicle based on the at least one keypoint; a local feature extraction unit 503 configured to extract local features of the coplanar regions based on a pre-trained neural network; a weight determination unit 504 configured to learn weights of different features of the image based on the local features of the respective coplanar regions; a calculation unit 505 configured to calculate a similarity of the image with each vehicle image in the vehicle image library based on the learned weight of each feature; and an output unit 506 configured to determine a vehicle image in the vehicle image library, which has a similarity higher than a predetermined threshold and is the largest, as a target image and output information related to the target image.

In the present embodiment, the specific processes of the key point identifying unit 501, the coplanarity determining unit 502, the local feature extracting unit 503, the weight determining unit 504, the calculating unit 505 and the outputting unit 506 of the apparatus 500 for identifying a vehicle may refer to step 201 and step 206 in the corresponding embodiment of fig. 2.

In some optional implementations of this embodiment, the weight determining unit 504 is further configured to: extracting global features of the image based on a neural network; weights of different features of the image are learned based on local features and global features of the coplanar regions.

In some optional implementations of this embodiment, the weight determining unit 504 is further configured to: acquiring a thermodynamic diagram of at least one key point; weights of different features of the image are learned based on local features of the coplanar regions, thermodynamic diagrams of the at least one keypoint, and global features.

In some optional implementations of this embodiment, the keypoint identification unit 501 is further configured to: inputting the image into a deep neural network to obtain at least one original image feature map, wherein each original image feature map corresponds to a key point; inputting a mirror image of an image into a deep neural network to obtain at least one characteristic graph, and carrying out mirror image processing on the at least one characteristic graph to obtain at least one mirror image characteristic graph, wherein each mirror image characteristic graph corresponds to a key point; and for each key point, respectively finding two positions of the maximum value of the current feature from the original image feature map and the mirror image feature map corresponding to the key point, and averaging the two positions to obtain the position of the key point.

In some optional implementations of this embodiment, the local feature extraction unit 503 is further configured to: and converting the coplanar regions into rectangles through affine transformation, and inputting the rectangles into a pre-trained classifier to obtain local features of the coplanar regions.

In some optional implementations of this embodiment, the computing unit 505 is further configured to: for each feature, calculating a distance between the feature of the image and a feature of each vehicle image in the vehicle image library; the distances between the features are weighted and summed based on the learned weights of the features, and the weighted sum of the distances is converted into a similarity.

Referring now to FIG. 6, a schematic diagram of an electronic device (e.g., the server of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image of a vehicle to be identified, and identifying at least one key point of the image; determining at least one coplanar region of the vehicle based on the at least one keypoint; extracting local features of all coplanar regions based on a pre-trained neural network; learning weights of different features of the image based on the local features of the coplanar regions; calculating similarity of the image and each vehicle image in the vehicle image library based on the learned weight of each feature; and determining the vehicle image with the similarity higher than a preset threshold and the maximum similarity in the vehicle image library as a target image and outputting relevant information of the target image.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a keypoint identifying unit, a coplanarity determining unit, a local feature extracting unit, a weight determining unit, a calculating unit, and an output unit. Where the names of the units do not in some cases constitute a limitation of the units themselves, for example, the keypoint identification unit may also be described as a "unit that takes an image of the vehicle to be identified and identifies at least one keypoint of said image".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for identifying a vehicle, comprising:

acquiring an image of a vehicle to be identified, and identifying at least one key point of the image;

determining at least one coplanar region of the vehicle according to the at least one key point and a key point prior coplanar region, wherein the key point prior coplanar region refers to a region composed of previously known coplanar key points;

extracting local features of all coplanar regions based on a pre-trained neural network;

learning weights of different features of the image based on local features of the coplanar regions;

calculating a similarity of the image to each vehicle image in a vehicle image library based on the learned weight of each feature;

and determining the vehicle image with the similarity higher than a preset threshold and the maximum similarity in the vehicle image library as a target image and outputting related information of the target image.

2. The method of claim 1, wherein the learning weights for different features of the image based on local features of coplanar regions comprises:

extracting global features of the image based on the neural network;

and learning the weights of different features of the image based on the local features and the global features of all coplanar regions.

3. The method of claim 2, wherein the learning weights for different features of the image based on local features of coplanar regions comprises:

acquiring a thermodynamic diagram of the at least one key point;

learning weights for different features of the image based on local features of the coplanar regions, thermodynamic diagrams of the at least one keypoint, the global features.

4. The method of claim 1, wherein the identifying at least one keypoint of the image comprises:

inputting the image into a deep neural network to obtain at least one original image feature map, wherein each original image feature map corresponds to a key point;

inputting a mirror image of the image into the deep neural network to obtain at least one feature map, and performing mirror image processing on the at least one feature map to obtain at least one mirror image feature map, wherein each mirror image feature map corresponds to a key point;

and for each key point, respectively finding two positions of the maximum value of the current feature from the original image feature map and the mirror image feature map corresponding to the key point, and averaging the two positions to obtain the position of the key point.

5. The method of claim 1, wherein the extracting local features for coplanar regions based on a pre-trained neural network comprises:

and converting the coplanar regions into rectangles through affine transformation, and inputting the rectangles into a pre-trained classifier to obtain local features of the coplanar regions.

6. The method of claim 1, wherein said calculating a similarity of the image to each vehicle image in a library of vehicle images based on the learned weight of each feature comprises:

for each feature, calculating a distance between the feature of the image and a feature of each vehicle image in a vehicle image library;

the distances between the features are weighted and summed based on the learned weights of the features, and the weighted sum of the distances is converted into a similarity.

7. An apparatus for identifying a vehicle, comprising:

a key point identification unit configured to acquire an image of a vehicle to be identified and identify at least one key point of the image;

a coplanarity determination unit configured to determine at least one coplanarity region of the vehicle from the at least one keypoint and a keypoint a priori coplanarity region, wherein keypoint a priori coplanarity region refers to a region made up of previously known coplanar keypoints;

a local feature extraction unit configured to extract local features of the coplanar regions based on a pre-trained neural network;

a weight determination unit configured to learn weights of different features of the image based on local features of the respective coplanar regions;

a calculation unit configured to calculate a similarity of the image with each vehicle image in a vehicle image library based on the learned weight of each feature;

and the output unit is configured to determine the vehicle image with the similarity higher than a preset threshold value and the maximum similarity in the vehicle image library as a target image and output the related information of the target image.

8. The apparatus of claim 7, wherein the weight determination unit is further configured to:

extracting global features of the image based on the neural network;

9. The apparatus of claim 8, wherein the weight determination unit is further configured to:

acquiring a thermodynamic diagram of the at least one key point;

10. The apparatus of claim 7, wherein the keypoint identification unit is further configured to:

11. The apparatus of claim 7, wherein the local feature extraction unit is further configured to:

12. The apparatus of claim 7, wherein the computing unit is further configured to:

13. An electronic device for identifying a vehicle, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.