WO2016145547A1 - Appareil et système de classification et de vérification de véhicules - Google Patents

Appareil et système de classification et de vérification de véhicules Download PDF

Info

Publication number
WO2016145547A1
WO2016145547A1 PCT/CN2015/000172 CN2015000172W WO2016145547A1 WO 2016145547 A1 WO2016145547 A1 WO 2016145547A1 CN 2015000172 W CN2015000172 W CN 2015000172W WO 2016145547 A1 WO2016145547 A1 WO 2016145547A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
vehicle
convolutional neural
output
parameters
Prior art date
Application number
PCT/CN2015/000172
Other languages
English (en)
Inventor
Xiaoou Tang
Linjie YANG
Ping Luo
Chen Change Loy
Original Assignee
Xiaoou Tang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaoou Tang filed Critical Xiaoou Tang
Priority to PCT/CN2015/000172 priority Critical patent/WO2016145547A1/fr
Priority to CN201580077195.2A priority patent/CN107430693A/zh
Publication of WO2016145547A1 publication Critical patent/WO2016145547A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Definitions

  • the present application generally relates to an apparatus and method for vehicle classification.
  • the present application generally also relates to an apparatus and method for vehicle verification.
  • Previous approaches are often restricted to a smaller number of vehicle models. In particular, they classify less than 30 models and utilize hand crafted features.
  • Some of the recent works propose to use 3D representation to classify 196 vehicle models, which is the largest scale experiment so far. It first obtains an estimate of the 3D geometry of an object, then extracts SIFT (Scale Invariant Feature Transform) feature from rectified patches relative to this geometry. Hand crafted features such as SIFT are not discriminative enough for vehicle model recognition.
  • SIFT Scale Invariant Feature Transform
  • the latest object detection algorithms include a DPM (Deformable Part Model) and a RCNN (Region Based Convolutional Neural Network) .
  • the DPM learns object parts in a data-driven way.
  • the object parts can be deformed with respect to an elastic cost.
  • the RCNN first proposes object proposals with selective search, and then learns a classification model with features from convolutional networks.
  • the disclosures address the problem of vehicle classification and verification.
  • the claimed solutions achieve at least the following technical effects:
  • the claimed solutions are useful in many applications, e.g. in video surveillance and image search engine.
  • the claimed solutions can be used for retrieving images/video clips of designated vehicle models in surveillance videos to locate suspicious vehicles, and automatically recognizing the model of a vehicle with mobile devices when a person wants to identify it on the street.
  • the claimed solutions can predict the attributes (maximum speed, seat number, etc. ) of a vehicle in an image when it cannot identify the model.
  • the claimed solutions can retrieve similar vehicles (potentially the same model, same released year, etc. ) from surveillance videos to locate/track the target vehicle. This function is critical in challenging multi-camera surveillance environment, where car plate recognition could fail, and car tracking can only be done from visual appearance.
  • the claimed solutions can be applied not only to images but more generally to video.
  • the technology is not limited to RGB image, but can be easily extended for a depth image or images from multiple sensory devices.
  • the present application discloses an apparatus for vehicle classification.
  • the disclosed apparatus may comprise a vehicle detector and a predictor.
  • the vehicle detector is used to detect a location of a vehicle in a received image
  • the predictor is electrically communicated with the detector and is used for predicting one or more attributes for the vehicle from image patches in the detected location.
  • the predictor has a convolutional neural network configured with one or more output layers, each of the layers being differently trained such that each of the output layers is used to predict one attribute of the vehicle, and each output layer has a same size as the number of output categories for the vehicle.
  • an apparatus for vehicle verification which may comprise a vehicle detector, a feature extractor and a verification unit.
  • the vehicle detector is configured to receive two images containing vehicles and detect two vehicles from the received images.
  • the feature extractor is in electronic communication with the detector and configured to extract verification features from the detected vehicles, respectively.
  • the verification unit is coupled to the extractor and configured to judge whether vehicles in the two images are from a same kind vehicle.
  • a method for vehicle classification which may comprise:
  • the convolutional neural network comprises at least one fully connected layer configured to extract classification features from entire region of input features for the received image.
  • the output nodes are coupled to the fully connected layer and predict one or more attributes for the vehicle from the classification features received from the fully connected layer.
  • the attributes comprise at least one selected from a group consisting of a brand, a model, a year, a continuous value of the maximum speed, a displacement, a seat number and a door number or the like.
  • a training device for training the convolutional neural network based on:
  • the labels for the pre-training set are the object categories for the corresponding images.
  • the labels for the fine-tuning set depend on the attributes to be predicted.
  • the training device pre-trains a first convolutional neural network using the images in the pre-training set by: initializing randomly parameters for the first convolutional neural network; calculating a loss of the parameters in the first convolutional neural network; calculating a gradient with respect to all said parameters based on the calculated loss; updating the parameters by using a production of one prefixed learning rate and the corresponding gradients; determining if a stopping criterion is satisfied; if not, returning to the step of calculating.
  • the training device is further configured to create a second convolutional neural network with the same structure as the first neural network; initialize the second neural network using the pre-trained parameters of the first neural network; replace the output nodes of the second convolutional neural network with a new output layer with n node, where n is a size of a designated output; and fine-tune the second convolutional neural network using the images in the fine-tuning set.
  • Fig. 1 is a schematic diagram illustrating an apparatus for vehicle classification according to one embodiment of the present application.
  • Fig. 2a is a schematic diagram illustrating a typical structure of a convolutional neural network.
  • Fig. 2b is a schematic diagram illustrating an example of a network with multiple output layers/notes according to one embodiment of the present application.
  • Fig. 3 is a schematic diagram illustrating a flow chart for the training according to one embodiment of the present application.
  • Fig. 4 is a schematic diagram illustrating a flow chart of back-propagation algorithm according to one embodiment of the present application.
  • Fig. 5 is a schematic diagram illustrating an apparatus for vehicle classification and verification according to another embodiment of the present application.
  • Fig. 6 is a schematic diagram illustrating an apparatus for vehicle verification according to one embodiment of the present application.
  • Fig. 7 is a schematic diagram illustrating an apparatus for vehicle verification according to another embodiment of the present application.
  • the apparatus 1000 for vehicle classification may be configured to, based on received image, produce final output of the classification information, such as at least one of brand, model, and released year of the vehicle.
  • the apparatus 1000 may be also configured to, based on received image, produce final output of an estimation of the designated attribute of the vehicle, such as the brand/model/released year, or maximum speed, displacement, seat number, etc.
  • the apparatus 1000 comprises a vehicle detector 10, a predictor 20 and a training device30.
  • the vehicle detector 10 is used to detect a location of the vehicle in the received image.
  • the vehicle detector 10 only targets at the vehicle category but not at other categories.
  • the detector 10 can detect vehicles under various conditions such as foggy, rainy, and dark light. If there is no vehicle in the received image, it will produce a message indicating no vehicle was found.
  • the received image contains a vehicle and is fed into the vehicle detector 10.
  • the vehicle includes car, truck, van, bus, motorcycle, etc.
  • the vehicle can be in arbitrary viewpoints, such as front view, rear view, side view, etc.
  • the detector 10 detects the vehicle in the image and produces a bounding box for the detected vehicle in the image.
  • the produced bounding boxes may not be very accurate with the conventional technical means. Compared with the conventional technical means, the predictor 20 can produce accurate predictions in this situation, as will be discussed later.
  • the image patch within the bounding box is cropped and fed into the predictor 20.
  • the predictor 20 is configured for predicting various attributes for the vehicle, such as the brand/model/released year, or maximum speed, displacement, seat number, etc.
  • the predictor 20 when specified as a vehicle model classifier, it can produce predictions of a plurality of most probable vehicle models with their corresponding probabilities.
  • the predictor 20 is used to classify which vehicle brand/model/released year the detected vehicle belongs to.
  • the vehicle models can be naturally organized with a three-level hierarchy. According to the three levels of architecture in the conventional technical means, the three levels comprise a brand level, a model level, and a released year level. While, in the present application, the predictor 20 can produce prediction in any of the three levels.
  • the predictor 20 If the predictor 20 is set to predict in the brand level, it will produce the prediction of which brand the vehicle in the input image belongs to; if the predictor 20 is set to predict in the model level, it will produce the prediction of which brand and which model the vehicle belongs to; if the predictor 20 is set to predict in the year level, it will produce the prediction of brand, model, and released year (Objective I) .
  • the predictor 20 may be used to predict a specified attribute of the detected vehicle, which may be maximum speed, displacement, seat number, door number, vehicle type, etc.
  • the attributes can be naturally divided into two classes: continuous and discrete.
  • the maximum speed of a vehicle is continuous, it can be any real positive number; while the door /seat number of a vehicle is discrete, which can only be chosen from a discrete list ⁇ 1, 2, 3, 4, 5 ⁇ .
  • the attributes and their examples are shown in Table 1 but are not restricted by the list in Table 1. (Objective II)
  • Table 1 Some attributes and their examples.
  • the predictor 20 can predict an arbitrary combination of the above classes and attributes.
  • one predictor can be used to predict brand, model, year, maximum speed, seat number, and other possible descriptions about a vehicle simultaneously. (Objective III)
  • the predicator 20 utilizes a convolutional neural network as the predication model, which is the major advantage of the proposed system since the convolutional neural network can greatly increase the accuracy of tasks such as vehicle model classification and attribute prediction.
  • the convolutional neural network for different outputs of the predictor 20 only differs in the size and the type of the output layers in the network, which is the last layer of the convolutional neural network. That is, each of the output layers may be used to predict one attribute for the vehicle, and has a same size as a number of output categories for the vehicle. To be specific, the size and the type of the output layers are differently predetermined so as to achieve different Objective I, II, III.
  • Objectives I and II have one output layer while Objective III has multiple output layers with each layer predicting one attribute.
  • the size of the output layer is equal to the number of categories (brand /model /released year) for the output. For example, if it needs to predict the brand and there are 100 different brands for vehicles, then the size of the output layer is 100. Each output node corresponds to a specific brand.
  • the size of the output layer i.e., the number of the output nodes in the output layer
  • the size of each output layer is allocated independently according to Objective I and II. For example, if a network is to predict the brand and the maximum speed jointly, it will have two output layers, one to predict the brand with size 100 and the other to predict the maximum speed with size 1.
  • the convolutional neural network may comprises a data layer, one or more convolution layers, one or more max pooling layers, a fully connected layer and one (Fig. 2a) or more (Fig. 2b) output layers.
  • This layer 101 receives images and its labels where X ij is the j-th bit value of the d-dimension feature vector of the i-th input image region, y ij is thej-th bit value of the n dimension label vector of the i-th input image region.
  • the convolution layer receives the output from the data layer 101 and performs convolution, padding, sampling, and non-linear transformation operations.
  • the convolution operation in each convolutional layer may be expressed as
  • x i and y j are the i-th input feature map and the j-th output feature map, respectively;
  • k ij is the convolution kernel between the i-th input feature map and the j-th output feature map
  • b j is the bias of the j-th output feature map
  • r indicates a local region where weights are shared. In one extreme of the local region r corresponding to entire input feature maps, convolution becomes global convolution. In another extreme of the local region r corresponding to a single pixel in input feature maps, a convolutional layer degrades to a local-connection layer.
  • the convolution operation can extract typical features from the input image, such as edge, curve, dot, etc. These features are not predefined manually but are learned through the training data.
  • the convolution kernel k ij When the convolution kernel k ij operates on the marginal pixel of x i , it will exceeds the border of x i . In this case, it sets the values that exceed the border of x i to be 0 so as to make the operation valid. This operation is also called “padding” in the art.
  • the max pooling layer keeps the maximum value in a local window and discard the other values, the output is thus smaller than the input, which may be formulated as
  • each neuron in the i-th output feature map y i pools over an M ⁇ N local region in the i-th input feature map x i , with s as the step size.
  • the spatial invariance means that if the input drifts by several pixels, the output of the layer won’ t change much.
  • the fully connected layer takes the feature vector from the previous layer as input and operates the inner-production between the feature x and weights w, and then one non-linear transformation will be operated on the production, which may be formulated as
  • x denotes neural outputs (features) from the cascaded pooling module
  • y denotes neural outputs (features) in the current fully-connection
  • w denotes neural weights in current feature extraction module (current fully-connection) . Neurons in fully-connection modules linearly combine features in previous feature extraction module, followed by ReLU non-linearity.
  • the fully connected layer is configured to extract global features (features extracted from the entire region of input feature maps) from previous layer.
  • the full-connection layer also has the function of feature dimension reduction as pooling layer by restricting the number of neurons in them.
  • there are provided with at least two fully-connection layers so as to increase the nonlinearity of the neural network, which in turns makes the operation of fitting data easier.
  • the convolutional layer and the max pooling layer only provide local transformations, which means that they only operate on a local window of the input (local region of the inputted image) .
  • the fully-connected layer provides global transformation, which takes features from the whole space of the inputted image and conduct a transformation as discussed in the above formulation (3)
  • Output layers/nodes 105 (105-1, 105-2, 105-3)
  • the convolutional neural networks for different outputs of the predictor 20 only differ in the size and the type of the output node, which is the last layer of the network.
  • the different output layers for different output are discussed as below. For purpose of the description, there are illustrated 3 output layers 105-1, 105-2, 105-3, but the invention is not limited thereto and any number of output layers is applicable if required.
  • each output node denotes the probability of being one certain brand.
  • the number of the output nodes is the number of brands. It is required to make the different training to the output nodes of the neural network so as to output different results.
  • the output for the output node may be one code, for example, 1-of-k code. In other words, among the output, only one bit is 1 and the others area all 0. Each bit in the output code is predetermined to represent different brand or model.
  • the ground truth label will be set to train the 1-of-k code.
  • the output code is interpreted as the corresponding attributes of the vehicle, such as brand and model. Therefore, for each task, one kind of coding shall be defined. While, for the continuous attributes, such as maximum speed, 0-100 speeding-up time and displacement etc., the ground truth label may be simply set as a value of a real number.
  • each output node denotes a certain model under a certain brand.
  • the number of the output nodes is the number of different models.
  • each output node denotes a certain model in a released year under a certain brand.
  • the number of the output nodes is the number of unique brand /model /released year combinations.
  • each output node denotes an element in the discrete list .
  • the number of the output nodes is the size of the list. For example, to predict the door number of a vehicle, 4 output nodes denote 2, 3, 4, 5 doors respectively.
  • the output node For the network for predicting a continuous attribute, there is only one output node and it produces a continuous value of the attribute. For example, to predict the maximum speed of a vehicle, the output node produces a continuous value such as 200 km/h.
  • the targets are brand /model /year and maximum speed
  • the multiple output layers all connect with the last fully connected layer of the convolutional network, as shown in Fig. 2b.
  • the corresponding 1-of-k code shall be set for the respective output nodes during the training.
  • the different output layers in the same convolutional neural network will be trained such that the convolutional neural network may use the output nodes to output different prediction in response to the different input images.
  • the training device 30 is used to train the predictor 20.
  • the predictor 20 is specified for different outputs, the only difference in the convolutional neural network is the output nodes (layers) .
  • the training device 30 is used to train the convolutional neural network which takes an image cropped by a vehicle’s bounding box as input and produces a general prediction.
  • the training device 30 takes the followings as its input to train the neural network:
  • ⁇ A pre-training set that consists of images containing different objects and the corresponding ground truth object labels.
  • the set encompasses m object classes.
  • the ground truth labels are the brand/model/year of the input images; if the network is used to predict an attribute, the ground truth labels are the ground truth values of the attribute; if the network is used to predict multiple classes and attributes, the ground truth labels are the collection of the designated classes and attributes.
  • a fine-tuned convolutional neural network which takes an image cropped by a vehicle’s bounding box as input and produces predictions of designated output, will be available.
  • the training process by the training device 30 according to one embodiment of the present application is illustrated in Fig. 3.
  • step s301 the training device 30 pre-trains the first convolutional neural network using the images in the pre-training set. Learning is performed using the back-propagation algorithm, and the output is a pre-trained convolutional neural network.
  • Fig. 4 illustrates the specific steps for the back-propagation algorithm.
  • step s3011 parameters, including the convolution filters, deformational layer weights, fully connected weights, and bias are initialized randomly.
  • the training tries to minimize the loss function and can be divided into many updating steps. Therefore, at step s3012, the loss is calculated, and then at step s3013, the algorithm calculates the gradient with respect to all the neural network parameters based on the calculated loss, including the convolution filters, deformational layer weights, fully connected weights, and bias.
  • the gradient of any network parameters can be calculated with the chain rule.
  • the output of a layer L k in the network can be expressed by a general function
  • y k is the output of the layer L k
  • y k-1 is the output of the previous layer L k-1
  • w k is the weights of L k
  • f k is the function for L k .
  • the derivative of y k with respect to y k-1 and w k is all known.
  • the loss function C of the network is define on the output of the last layer L n and the ground truth label t,
  • the derivative of c with respect to y n is also known.
  • the chain rule can be applied
  • the gradient of the cost c with respect to any weights in the network can be calculated.
  • the algorithm updates the convolution filters, deformational layer weights, fully connected weights, and bias by rule of
  • is the learning rate
  • is a predefined value
  • Updates of the parameters are performed using the production of one prefixed learning rate and the corresponding gradients.
  • step s3015 it determines if the stopping criterion is satisfied. For example, if the variation of the loss is less than a predetermined value, the process terminates, otherwise, the process return back to step s3012.
  • step s301 After the first convolutional neural network is trained in step s301, the process moves to step s302 to create the second convolutional neural network with the same structure as the pre-trained neural network.
  • step s303 it initializes the second convolutional neural network using the parameters of the pre-trained convolutional neural network.
  • step s304 it replaces the output layer of the second convolutional neural network of m node with a new output layer with n node, where n is the size of the designated output. For example, the different trainings are required for different outputs.
  • the output for the output node may be one code, for example, 1-of-k cod, as discussed in the above.
  • step 305 it fine-tunes the second convolutional neural network using the images in the fine-tuning set. Learning is performed using the back-propagation algorithm.
  • the output is a fine-tuned convolutional neural network.
  • the fine-tuning set is consisted of vehicle images with the ground truth labels.
  • the system 4000 comprises a memory 401 that stores executable components and a processor 402 coupled to the memory 402 and configured to execute the executable components to perform operations of the system 4000.
  • the executable components may comprise: a vehicle detection component 403 used to detect a location of a vehicle in a received image, and a prediction component 404 for predicting one or more attributes for the vehicle from image patches in the detected location.
  • the prediction component 404 has a convolutional neural network configured with one or more output layers, each of the layers being differently trained to determine a size and a type thereof such that different attributes for the vehicle are output.
  • the same discussions of the predictor 20 are also applicable to the prediction component 404 and thus the detailed discussion thereof is omitted herein.
  • the convolutional neural network comprises at least one fully connected layer configured to extract classification features from entire region of input features for the received image.
  • the output nodes are coupled to the fully connected layer and predict one or more attributes for the vehicle from the classification features received from the fully connected layer.
  • the attributes comprise at least one selected from a group consisting of a brand, a model, a year, a continuous value of the maximum speed, a displacement, a seat number and a door number or the like.
  • the system 4000 may further comprise a training component 405 for training the convolutional neural network based on: 1) a pre-training set including images containing different objects and corresponding ground truth object labels; and 2) a fine-tuning set including images containing only vehicles and corresponding ground truth labels.
  • the ground truth labels for the pre-training and fine-tuning sets vary depending on the attributes to be predicted.
  • the training component 405 pre-trains a first convolutional neural network using the images in the pre-training set by: initializing randomly parameters for the first convolutional neural network; calculating a loss of the parameters in the first convolutional neural network; calculating a gradient with respect to all said parameters based on the calculated loss; updating the parameters by using a production of one prefixed learning rate and the corresponding gradients; determining if a stopping criterion is satisfied; if not, returning to the step of calculating.
  • the training component 405 is further configured to create a second convolutional neural network with the same structure as the first neural network; initialize the second neural network using the pre-trained parameters of the first neural network; replace the output nodes of the second convolutional neural network with a new output layer with n node, where n is a size of a designated output; and fine-tune the second convolutional neural network using the images in the fine-tuning set. Since the discussions for the training device 30 are also applicable to the training component 405, the detailed algorithm is omitted herein.
  • the present application also provides a system for verifying whether two vehicles from two images have the same attribute (s) , for example, to belong to the same brand /model /released year.
  • Fig. 6 illustrates schematic diagram illustrating such a system 6000 according to one embodiment of the present application.
  • a vehicle detector 60 First, two images containing vehicles are fed into a vehicle detector 60, respectively. Then, two vehicles are detected and images are cropped by their detected bounding boxes, respectively. Then, each cropped vehicle image is fed into a feature extractor 62.
  • the feature extractor 62 is configured with a convolutional neural network which is trained with the protocol of the predictor 10. Finally, the features from feature extractor 62 are combined and fed into the verification unit 64.
  • the verification unit 64 judges whether two inputs are from the same kind (potentially brand /model /year) and produces a binary output (yes or no) .
  • the vehicle detector 60 is the same with the vehicle detector 10 as discussed in above, and thus the detailed description thereof is omitted herein.
  • the feature extractor 62 is to receive an image that is occupied mostly by a vehicle, and extract features from the input vehicle image.
  • the present application adopts features based on convolutional neural networks which are highly semantic and expressive.
  • the features are used as the input for the verification unit 64 which judges whether two inputs belong to the same class.
  • the convolutional neural network for the feature extractor 62 is the same as the convolutional neural network as shown in Fig. 2a.
  • the training procedure of the model is also the same as the predictor 20. Since the verification can be done in three levels which are brand, model and year, the feature extractor is designed to be trained with the same level of the target of the verification task. If the target is to verify whether two inputs are from the same brand, the convolutional neural network is trained with the brand as the target; if the target is to verify whether two inputs are from the same model, the convolutional neural network is trained with the model as the target. And it is similar for year and other potential targets. When an input is feed into the network, the values of the last fully connected layer, i.e. values of the last layer except the output layer, is used as the extracted feature.
  • the verification unit 64 takes features of two images as input and outputs a prediction of whether the two inputs belong to the same class.
  • the class is predefined. It can be brand, model, released year or other possible categories for vehicles.
  • the model structure for the verification unit is not restricted. Any model which can achieve the objective can be used.
  • a typical model is Joint Bayesian which will be described in detail.
  • Other popular models include Support Vector Machine, Siamese Neural Network, etc.
  • Joint Bayesian formulates the feature x as the sum of two independent Gaussian variables
  • S ⁇ and S ⁇ can be learned from training data with EM algorithm. In test, it calculates the likelihood ratio with
  • the likelihood ratio r can be transformed into a binary label
  • the system 8000 comprises a memory 401 that stores executable components and a processor 402 coupled to the memory 402 and configured to execute the executable components to perform operations of the system 4000.
  • the executable components may comprise: a vehicle detecting component 403 configured to receive two images containing vehicles and detect two vehicles from the received images; a feature extracting component 404 configured to extract verification features from the detected vehicles, respectively; and a verification component 405 configured to judge whether vehicles in the two images are from a same kind vehicle.
  • the feature extracting component 404 may be created based on convolutional neural networks which are highly semantic and expressive.
  • the features are used as the input for the verification unit 64 which judges whether two inputs belong to the same class.
  • the convolutional neural network in the feature extracting component 404 according to one embodiment of the present application is the same as the convolutional neural network as shown in Fig. 2a.
  • the training procedure of the model is also the same as the predictor 20. Since the verification can be done in three levels (brand level, model level and year level), the convolutional neural network is designed to be trained with the same level of the target of the verification task.
  • the convolutional neural network is trained with the brand as the target; if the target is to verify whether two inputs are from the same model, the convolutional neural network is trained with the model as the target. And it is similar for year and other potential targets.
  • the values of the last fully connected layer i.e. values of the last layer except the output layer, is used as the extracted feature.
  • the verification component 405 takes features of two images as input and outputs a prediction of whether the two inputs belong to the same class.
  • the class is predefined. It can be brand, model, released year or other possible categories for vehicles.
  • the model structure for the verification component 405 is not restricted. Any model which can achieve the objective can be used.
  • a typical model is Joint Bayesian which will be described in detail.
  • Other popular models include Support Vector Machine, Siamese Neural Network, etc.
  • the systems 6000 and 8000 can retrieve similar vehicles (potentially the same model, same released year, etc. ) from surveillance videos to locate/track the target vehicle. This function is critical in challenging multi-camera surveillance environment, where car plate recognition could fail, and car tracking can only be done from visual appearance.
  • embodiments within the scope of the present invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof.
  • Apparatus within the scope of the present invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method actions within the scope of the present invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output.
  • Embodiments within the scope of the present invention be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.
  • Each computer program can be implemented in a high-level procedural or object oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language.
  • Suitable processors include, by way of example, both general and special purpose microprocessors.
  • a processor will receive instructions and data from a read-only memory and/or a random access memory.
  • a computer will include one or more mass storage devices for storing data files.
  • Embodiments within the scope of the present invention include computer-readable media for carrying or having computer-executable instructions, computer-readable instructions, or data structures stored thereon.
  • Such computer-readable media may be any available media, which is accessible by a general-purpose or special-purpose computer system.
  • Examples of computer-readable media may include physical storage media such as RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other media which can be used to carry or store desired program code means in the form of computer-executable instructions, computer-readable instructions, or data structures and which may be accessed by a general-purpose or special-purpose computer system. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits) . While particular embodiments of the present invention have been shown and described, changes and modifications may be made to such embodiments without departing from the true scope of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un appareil (1000) de classification de véhicules et un appareil pour système (6000) de vérification de véhicules. L'appareil (1000) de classification de véhicules peut comporter un détecteur (10) de véhicules qui est utilisé pour détecter la position d'un véhicule dans une image reçue; et un prédicteur (20) qui est en communication électrique avec le détecteur (10) et qui prédit un ou plusieurs attributs du véhicule à partir de plages d'image dans la position détectée. Le prédicteur (20) est doté d'un réseau neuronal à convolution configuré avec une ou plusieurs couches de sortie, chacune des couches de sortie ayant fait l'objet d'un apprentissage différent de telle façon que chacune des couches de sortie soit utilisée pour prédire un attribut du véhicule, et présente une taille identique à un nombre de catégories de sortie pour le véhicule. L'appareil pour système (6000) de vérification de véhicules peut comporter un détecteur (60) de véhicules, un extracteur (62) de traits distinctifs et une unité (64) de vérification, et est utilisé pour vérifier si deux véhicules issus de deux images présentent le même attribut.
PCT/CN2015/000172 2015-03-13 2015-03-13 Appareil et système de classification et de vérification de véhicules WO2016145547A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2015/000172 WO2016145547A1 (fr) 2015-03-13 2015-03-13 Appareil et système de classification et de vérification de véhicules
CN201580077195.2A CN107430693A (zh) 2015-03-13 2015-03-13 用于车辆分类和验证的设备和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/000172 WO2016145547A1 (fr) 2015-03-13 2015-03-13 Appareil et système de classification et de vérification de véhicules

Publications (1)

Publication Number Publication Date
WO2016145547A1 true WO2016145547A1 (fr) 2016-09-22

Family

ID=56918192

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/000172 WO2016145547A1 (fr) 2015-03-13 2015-03-13 Appareil et système de classification et de vérification de véhicules

Country Status (2)

Country Link
CN (1) CN107430693A (fr)
WO (1) WO2016145547A1 (fr)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182413A (zh) * 2017-12-29 2018-06-19 中国矿业大学(北京) 一种矿井移动目标检测与跟踪识别方法
US10296828B2 (en) 2017-04-05 2019-05-21 Here Global B.V. Learning a similarity measure for vision-based localization on a high definition (HD) map
CN109885718A (zh) * 2019-02-28 2019-06-14 江南大学 一种基于深度车贴检测的嫌疑车辆检索方法
EP3502977A1 (fr) * 2017-12-19 2019-06-26 Veoneer Sweden AB Estimateur d'état
EP3502976A1 (fr) * 2017-12-19 2019-06-26 Veoneer Sweden AB Estimateur d'état
US10345449B2 (en) * 2016-12-02 2019-07-09 Verizon Connect Ireland Limited Vehicle classification using a recurrent neural network (RNN)
CN110059748A (zh) * 2019-04-18 2019-07-26 北京字节跳动网络技术有限公司 用于输出信息的方法和装置
CN111144476A (zh) * 2019-12-22 2020-05-12 上海眼控科技股份有限公司 车厢座位的检测方法、装置、电子设备及可读存储介质
CN111966897A (zh) * 2020-08-07 2020-11-20 上海新共赢信息科技有限公司 一种出行意愿的感知方法、装置、终端及存储介质
US11210939B2 (en) 2016-12-02 2021-12-28 Verizon Connect Development Limited System and method for determining a vehicle classification from GPS tracks
CN115546472A (zh) * 2022-11-29 2022-12-30 城云科技(中国)有限公司 一种路面车辆重识别方法、装置及应用

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10068140B2 (en) * 2016-12-02 2018-09-04 Bayerische Motoren Werke Aktiengesellschaft System and method for estimating vehicular motion based on monocular video data
EP3493116B1 (fr) * 2017-12-04 2023-05-10 Aptiv Technologies Limited Système et procédé pour générer une valeur de confiance pour au moins un état à l'intérieur d'un véhicule
CN108229413A (zh) * 2018-01-16 2018-06-29 宁夏智启连山科技有限公司 病虫害类型识别方法及装置
WO2019141559A1 (fr) * 2018-01-17 2019-07-25 Signify Holding B.V. Système et procédé de reconnaissance d'objet grâce à des réseaux neuronaux
CN110119750A (zh) * 2018-02-05 2019-08-13 浙江宇视科技有限公司 数据处理方法、装置及电子设备
IL259285B2 (en) * 2018-05-10 2023-07-01 Inspekto A M V Ltd A system and method for detecting defects on objects in an image
EP3853616A4 (fr) * 2018-09-20 2021-11-17 Siemens Healthcare Diagnostics, Inc. Réseaux d'émission d'hypothèse et de vérification et procédés de classification d'échantillons
CN111060507B (zh) * 2019-12-24 2021-05-04 北京嘀嘀无限科技发展有限公司 一种车辆验证方法及装置
CN111784031A (zh) * 2020-06-15 2020-10-16 上海东普信息科技有限公司 物流车辆分类预测方法、装置、设备及存储介质
CN113269150A (zh) * 2021-06-24 2021-08-17 浪潮云信息技术股份公司 基于深度学习的车辆多属性识别的系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408933A (zh) * 2008-05-21 2009-04-15 浙江师范大学 基于粗网格特征提取和bp神经网络的车牌字符识别方法
CN102194130A (zh) * 2011-05-19 2011-09-21 苏州两江科技有限公司 一种基于图像识别的车辆分类方法
WO2013021823A1 (fr) * 2011-08-05 2013-02-14 株式会社メガチップス Appareil de reconnaissance d'image
CN103324920A (zh) * 2013-06-27 2013-09-25 华南理工大学 基于车辆正面图像与模板匹配的车型自动识别方法
CN104036323A (zh) * 2014-06-26 2014-09-10 叶茂 一种基于卷积神经网络的车辆检测方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2344205B (en) * 1998-11-26 2003-04-30 Roke Manor Research Method of and apparatus for vehicle indentification
JP2002008186A (ja) * 2000-06-23 2002-01-11 Mitsubishi Heavy Ind Ltd 車種識別装置
CN104021375B (zh) * 2014-05-29 2017-11-07 银江股份有限公司 一种基于机器学习的车型识别方法
CN104299008B (zh) * 2014-09-23 2017-10-31 同济大学 基于多特征融合的车型分类方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408933A (zh) * 2008-05-21 2009-04-15 浙江师范大学 基于粗网格特征提取和bp神经网络的车牌字符识别方法
CN102194130A (zh) * 2011-05-19 2011-09-21 苏州两江科技有限公司 一种基于图像识别的车辆分类方法
WO2013021823A1 (fr) * 2011-08-05 2013-02-14 株式会社メガチップス Appareil de reconnaissance d'image
CN103324920A (zh) * 2013-06-27 2013-09-25 华南理工大学 基于车辆正面图像与模板匹配的车型自动识别方法
CN104036323A (zh) * 2014-06-26 2014-09-10 叶茂 一种基于卷积神经网络的车辆检测方法

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10345449B2 (en) * 2016-12-02 2019-07-09 Verizon Connect Ireland Limited Vehicle classification using a recurrent neural network (RNN)
US11210939B2 (en) 2016-12-02 2021-12-28 Verizon Connect Development Limited System and method for determining a vehicle classification from GPS tracks
US10296828B2 (en) 2017-04-05 2019-05-21 Here Global B.V. Learning a similarity measure for vision-based localization on a high definition (HD) map
EP3502976A1 (fr) * 2017-12-19 2019-06-26 Veoneer Sweden AB Estimateur d'état
US11807270B2 (en) 2017-12-19 2023-11-07 Arriver Software Ab State estimator
WO2019120865A1 (fr) * 2017-12-19 2019-06-27 Veoneer Sweden Ab Estimateur d'états
WO2019120861A1 (fr) * 2017-12-19 2019-06-27 Veoneer Sweden Ab Estimateur d'état
EP3502977A1 (fr) * 2017-12-19 2019-06-26 Veoneer Sweden AB Estimateur d'état
CN108182413B (zh) * 2017-12-29 2022-01-25 中国矿业大学(北京) 一种矿井移动目标检测与跟踪识别方法
CN108182413A (zh) * 2017-12-29 2018-06-19 中国矿业大学(北京) 一种矿井移动目标检测与跟踪识别方法
CN109885718B (zh) * 2019-02-28 2021-05-28 江南大学 一种基于深度车贴检测的嫌疑车辆检索方法
CN109885718A (zh) * 2019-02-28 2019-06-14 江南大学 一种基于深度车贴检测的嫌疑车辆检索方法
CN110059748A (zh) * 2019-04-18 2019-07-26 北京字节跳动网络技术有限公司 用于输出信息的方法和装置
CN111144476A (zh) * 2019-12-22 2020-05-12 上海眼控科技股份有限公司 车厢座位的检测方法、装置、电子设备及可读存储介质
CN111966897A (zh) * 2020-08-07 2020-11-20 上海新共赢信息科技有限公司 一种出行意愿的感知方法、装置、终端及存储介质
CN111966897B (zh) * 2020-08-07 2023-07-21 凹凸乐享(苏州)信息科技有限公司 一种出行意愿的感知方法、装置、终端及存储介质
CN115546472A (zh) * 2022-11-29 2022-12-30 城云科技(中国)有限公司 一种路面车辆重识别方法、装置及应用
CN115546472B (zh) * 2022-11-29 2023-02-17 城云科技(中国)有限公司 一种路面车辆重识别方法、装置及应用

Also Published As

Publication number Publication date
CN107430693A (zh) 2017-12-01

Similar Documents

Publication Publication Date Title
WO2016145547A1 (fr) Appareil et système de classification et de vérification de véhicules
US9443320B1 (en) Multi-object tracking with generic object proposals
Wang et al. A self-training approach for point-supervised object detection and counting in crowds
US10902243B2 (en) Vision based target tracking that distinguishes facial feature targets
US9158971B2 (en) Self-learning object detectors for unlabeled videos using multi-task learning
Tian et al. Robust detection of abandoned and removed objects in complex surveillance videos
US9767570B2 (en) Systems and methods for computer vision background estimation using foreground-aware statistical models
El Baf et al. Type-2 fuzzy mixture of Gaussians model: Application to background modeling
Chan et al. Vehicle detection and tracking under various lighting conditions using a particle filter
EP3229206A1 (fr) Association de données profondes pour le suivi de multiples objets multi-catégories en ligne
US9400936B2 (en) Methods and systems for vehicle tag number recognition
Andrews Sobral et al. Highway traffic congestion classification using holistic properties
Zhuang et al. Real‐time vehicle detection with foreground‐based cascade classifier
US20180286081A1 (en) Object re-identification with temporal context
Zaidenberg et al. A generic framework for video understanding applied to group behavior recognition
Mahapatra et al. Human recognition system for outdoor videos using Hidden Markov model
Guindel et al. Joint object detection and viewpoint estimation using CNN features
JP2005311691A (ja) 物体検出装置及び方法
Yabo et al. Vehicle classification and speed estimation using computer vision techniques
Santos et al. Dyfusion: dynamic IR/RGB fusion for maritime vessel recognition
Dorudian et al. Moving object detection using adaptive blind update and RGB-D camera
Bewley et al. Background Appearance Modeling with Applications to Visual Object Detection in an Open‐Pit Mine
CN113095199A (zh) 一种高速行人识别方法及装置
CN115482513A (zh) 使预训练的机器学习系统适配于目标数据的设备和方法
CN112733578B (zh) 车辆重识别方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15884932

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15884932

Country of ref document: EP

Kind code of ref document: A1