CN114519843A

CN114519843A - Vehicle prediction method and device

Info

Publication number: CN114519843A
Application number: CN202210127296.8A
Authority: CN
Inventors: 闫军; 阳平; 王艳清
Original assignee: Super Vision Technology Co Ltd
Current assignee: Super Vision Technology Co Ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-20

Abstract

The application discloses a vehicle prediction method and device. The vehicle prediction method includes: constructing a camera point bitmap according to the relation between the plurality of camera point positions and the nodes; acquiring a vehicle snapshot history in a monitoring system based on a camera point bitmap to form a plurality of camera point position sequences; constructing a vectorization model training set according to the multiple camera point location sequences, and training to form a vectorization model according to the vectorization model training set; according to the vectorization model, performing feature vectorization on each camera point in the camera point bitmap to obtain a plurality of point location feature vectors; constructing a point location prediction model training set according to the plurality of point location feature vectors, and training to form a point location prediction model according to the point location prediction model training set; and predicting the camera point position of the vehicle at the next moment according to the point position prediction model. By the vehicle prediction method, the adoption of a manual analysis mode is avoided, camera point positions possibly occurring in the vehicle cannot be missed, and the method can be used in practical application environments such as vehicle query and abnormal parking detection.

Description

Vehicle prediction method and device

Technical Field

The present application relates to the field of vehicle monitoring technologies, and in particular, to a vehicle prediction method and apparatus.

Background

With the increasing number of urban vehicles, traffic monitoring systems have assumed a major role in vehicle management. Among them, the vehicle search technology plays an important role in management. With the increase of the number of managed camera sites and the expansion of the management range, the cost of vehicle retrieval in a wide range is increasing. In order to reduce the cost of vehicle retrieval, the camera positions where the vehicle may appear are firstly locked, and then detailed detection is carried out in a small range after locking.

According to the traditional retrieval method, a small number of camera point positions are selected as a search range in a local area through manual analysis according to the historical behaviors of the vehicle, so that the vehicle retrieval cost is reduced. However, in the conventional retrieval method, due to the adoption of a manual analysis mode, possible point positions of the vehicle are missed, so that the detection efficiency is low, the accuracy is low, and the target vehicle cannot be accurately searched.

Content of application

The method aims to solve the technical problems that the traditional retrieval method is low in detection efficiency and accuracy and cannot realize accurate searching of the target vehicle due to the fact that manual analysis is adopted. In order to achieve the purpose, the application provides a vehicle prediction method and a vehicle prediction device.

The application provides a vehicle prediction method, comprising:

acquiring node relations between a plurality of camera point positions of a monitoring system and the plurality of camera point positions, and constructing a camera point bitmap according to the node relations between the plurality of camera point positions;

acquiring a vehicle snapshot history in the monitoring system based on the camera point bitmap to form a plurality of camera point position sequences;

constructing a vectorization model training set according to the camera point location sequences, training to form a vectorization model according to the vectorization model training set, and performing feature vectorization on each camera point location in the camera point bitmap according to the vectorization model to obtain a plurality of point location feature vectors;

constructing a point location prediction model training set according to the point location feature vectors, and training to form a point location prediction model according to the point location prediction model training set;

and predicting the camera point position of the vehicle at the next moment according to the point position prediction model.

The application provides a vehicle prediction device, including:

the camera point bitmap generation module is used for acquiring node relations between a plurality of camera point positions of the monitoring system and the plurality of camera point positions and constructing a camera point bitmap according to the node relations between the plurality of camera point positions and the plurality of camera point positions;

the camera point location sequence acquisition module is used for acquiring a vehicle snapshot history record in the monitoring system based on the camera point bitmap to form a plurality of camera point location sequences;

the vectorization model generation module is used for constructing a vectorization model training set according to the camera point location sequences, training to form a vectorization model according to the vectorization model training set, and performing feature vectorization on each camera point location in the camera point bitmap according to the vectorization model to obtain a plurality of point location feature vectors;

the point location prediction model generation module is used for constructing a point location prediction model training set according to the plurality of point location feature vectors and training to form a point location prediction model according to the point location prediction model training set;

and the prediction module is used for predicting the camera point position of the vehicle at the next moment according to the point position prediction model.

In the vehicle prediction method and the vehicle prediction device, a training set is formed by forming a plurality of camera point location sequences, and a vectorization model is formed by training. And according to the vectorization model, performing feature representation on each camera point location to form a plurality of point location feature vectors. And (3) taking the historical t point location sequence feature vectors of a plurality of vehicles as the input of the point location prediction model, and outputting the t +1 th camera point location as the prediction result to form a training set of point location prediction so as to train and form the point location prediction model. On the basis of a trained point location prediction model, historical point location snapshot sequences of a vehicle to be detected appearing in a monitoring network are obtained to form camera point location sequences, after vectorization representation is carried out, point location feature vectors corresponding to all historical camera point locations are output and obtained, the formed point location feature vectors are input into the point location prediction model, and camera point locations of the vehicle appearing at the next moment are output and obtained.

Furthermore, the vehicle prediction method provided by the application establishes the relation between all camera point locations from the whole monitoring system, and covers the camera point locations of the whole monitoring network. And the information of the camera point positions of the monitoring network and the relation among all the camera point positions are represented through specific data, and the relation among all the camera point positions is comprehensively considered. According to the vehicle prediction method, the camera point positions which are possible to appear at the next moment of the vehicle are predicted by adopting deep learning large-scale data training and analysis based on analysis of historical records of the vehicle appearing at each camera point position. Therefore, the vehicle prediction method provided by the application can improve the accuracy of vehicle prediction, can be applied to a large-scale vehicle monitoring system, and can realize the accurate search of the target vehicle. Therefore, the vehicle prediction method avoids adopting a manual analysis mode, avoids missing camera point positions possibly appearing on the vehicle, and can be used in practical application environments such as vehicle query and parking abnormity detection.

Drawings

FIG. 1 is a flow chart illustrating steps of a vehicle prediction method provided by the present application.

Fig. 2 is a schematic structural diagram of a camera point bitmap provided in the present application.

FIG. 3 is a table diagram of a training set of a word skipping model according to an embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a structure of a word skipping model in an embodiment provided in the present application.

FIG. 5 is a schematic illustration of a vehicle prediction process according to an embodiment provided herein.

FIG. 6 is a table diagram of data of a point prediction model training set according to an embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of a vehicle prediction device provided in the present application.

Detailed Description

The technical solution of the present application is further described in detail by the accompanying drawings and examples.

Referring to fig. 1, the present application provides a vehicle prediction method, including:

s10, acquiring node relations between a plurality of camera point locations of the monitoring system and the plurality of camera point locations, and constructing a camera point bitmap according to the node relations between the plurality of camera point locations;

s20, acquiring a vehicle snapshot history record in the monitoring system based on the camera point bitmap to form a plurality of camera point sequences;

s30, constructing a vectorization model training set according to the camera point location sequences, training to form a vectorization model according to the vectorization model training set, and performing feature vectorization on each camera point location in the camera point bitmap according to the vectorization model to obtain a plurality of corresponding point location feature vectors;

s40, constructing a point location prediction model training set according to the point location feature vectors, and training to form a point location prediction model according to the point location prediction model training set;

and S50, predicting the camera point position of the vehicle at the next moment according to the point position prediction model.

In S10, v is included in the monitoring system₀，v₁，...，v_n-1And n camera point positions are formed to form n nodes of the camera point bitmap. The order in which the vehicle is captured between two camera point locations as the edge relationship of two nodes of the camera point bitmap can also be understood as the node relationship. For example: the vehicle is firstly at v₀The camera point is snapped to, then at v₄The camera spot is snapped to. (v)₀,v₄) Form two nodes v₀And v₄The edge relationship of (2). By parity of reasoning, a plurality of nodes of the camera point bitmap are formed by a plurality of camera point positions, and the nodes of the camera point bitmap are formed by the snap shot sequence among the camera point positionsThe relationship forms a camera point bitmap as shown in fig. 2.

In S20, in the urban dynamic and static traffic system, each vehicle is constantly running between each camera point, so that the edge relationships in the camera point bitmap, which can also be understood as node relationships, are dynamically changing. From the history of vehicle snapshots at various camera sites generated during system operation, a camera site bitmap may be formed. From the camera point map, a record of the camera point that the vehicle has passed over a certain period of time can be obtained. And forming a camera point position sequence corresponding to the vehicle according to the appearance sequence of the vehicle in the camera point positions. For example: the camera dot sequence corresponding to the vehicle X4 is { v }₂₁,v₄₅,v₁₁₁,v₇₃The camera point bit sequence corresponding to Beijing B T is { v }₄₁,v₇₅,v₄₃₁,v₇₃,v₅₆The camera point bit sequence corresponding to Jing C is { v }₂,v₇,v₅₃₄,v₃₃,v₉₆}。

In S30, a training set forming a vectorization model is constructed from the camera point location sequences of the plurality of vehicles. And training according to the training set to form a vectorization model, realizing vectorization of each camera point location, and further obtaining a feature vector corresponding to each point location in the graph.

And performing feature vectorization on each camera point in the camera point bitmap according to the trained vectorization model, so as to obtain a point feature vector corresponding to each camera point in the camera point bitmap. One camera point location corresponds to one point location feature vector.

At S40, after passing through the point location feature vectors, a training set forming a point location prediction model is constructed from the point location sequences of the vehicles. And training according to the training set to form a point location prediction model, and predicting the camera point location of the vehicle at the next moment.

In S50, through the steps of S10 to S50, a vectorization model and a point prediction model are obtained. And performing feature vectorization on each camera point in the camera point bitmap through a vectorization model to obtain feature vectorization corresponding to each camera point. The camera point location of the vehicle is predicted at the next moment by obtaining a camera point location sequence of the vehicle, inputting a plurality of feature vectors corresponding to the camera point location sequence into a point location prediction model, and outputting a prediction result.

A plurality of camera point location sequences are formed through S10-S40 to form a training set, and a vectorization model is formed through training. After point location feature vectors are obtained, feature representation is carried out on each camera point location in the previous t-1 time period, and a plurality of point location feature vector representation sequences are formed. It can also be understood that one camera point location sequence is obtained in the first t-1 time period, one camera point location sequence corresponds to a plurality of camera point locations, and one camera point location corresponds to one point location feature vector. One camera point location sequence corresponds to a plurality of point location feature vectors. The camera point sequence represents the sequence of the camera point positions of the vehicle in a certain time period, so that the camera point positions in the camera point position sequence are connected with one another. Through S40, a sequence formed by a plurality of point location feature vectors is used as the input of a point location prediction model, the camera point location at the time t is used as the prediction result to be output, and a supervised training method is used for training and forming the point location prediction model. Through S50, on the basis of the trained point location prediction model, obtaining a historical point location snapshot sequence of a vehicle appearing in the monitoring network to form a camera point location sequence, vectorizing the point location sequence, inputting the vectorized point location sequence into the point location prediction model, and outputting to obtain a camera point location of the vehicle appearing at the next moment.

Furthermore, the vehicle prediction method provided by the application starts from the whole monitoring system, establishes the relation between all camera point locations, and covers the camera point locations of the whole monitoring network. And the information of the camera point positions of the monitoring network and the relation among all the camera point positions are represented through specific data, and the relation among all the camera point positions is comprehensively considered. According to the vehicle prediction method, the camera point positions of the vehicle which possibly appear at the next moment are predicted by adopting deep learning large-scale data training and analysis based on analysis of historical records of the vehicle appearing at all the camera point positions. Therefore, the vehicle prediction method provided by the application can improve the accuracy of vehicle prediction, can be applied to a large-scale vehicle monitoring system, and can realize the accurate search of the target vehicle. Therefore, the vehicle prediction method avoids adopting a manual analysis mode, avoids missing camera point positions possibly appearing on the vehicle, and can be used in practical application environments such as vehicle query and parking abnormity detection.

In one embodiment, S10, obtaining node relationships between multiple camera sites of the monitoring system and multiple camera sites, and constructing a camera site bitmap according to the multiple camera sites and the node relationships, includes:

s110, acquiring a plurality of camera point positions to form a camera point position set V, and taking each camera point position as a node;

s120, acquiring the sequence of a plurality of vehicles among a plurality of nodes, forming a node relation among all adjacent nodes, and forming a node edge set E according to the node relation;

s130, constructing a camera point bitmap G (V, E) according to the camera point position set V and the node edge set E.

In S110, one camera point location serves as one node of the camera point bitmap. The plurality of camera point locations form a plurality of nodes of a camera point bitmap. In S120, the sequence of the vehicle captured between the two camera points is used as an edge relationship between two adjacent nodes, which may also be referred to as a node relationship, to form a node edge set, which may also be referred to as an edge set. The node edge set E represents the set of edges of all neighboring nodes.

In one embodiment, a camera point bitmap G (V, E) is formed as shown in fig. 2. Taking this as an example, the set of camera point locations V ═ V₀,v₁,...,v_n-1]The node edge set is E ═ v₀,v₁),(v₀,v₄),(v₁,v₂),…,(v_n-2,v_n-1)]. In a dynamic or static urban traffic system, as each vehicle continuously runs between each camera point location, the connection of edges in the camera point bitmap G (V, E) formed thereby is dynamically changed, and it can also be understood that the node relationship between each node in the camera point bitmap G (V, E) is dynamically changed.

Through S110 and S130, and based on the history data of the respective vehicles running between the respective camera sites, a camera site map G (V, E) is formed. In a network with monitoring cameras as nodes, camera points possibly occurring at the next moment of a vehicle can be predicted through history records captured before each camera point of the vehicle, and the method can be used in practical applications such as large-rule vehicle retrieval, vehicle behavior analysis and abnormal parking detection.

In one embodiment, S20, acquiring a plurality of camera point location sequences according to the camera point bitmap, including:

s210, acquiring a camera point position sequence corresponding to the vehicle according to the camera point bitmap G (V, E);

wherein the sequence of camera locations comprises a recording of the same vehicle being snapped between a plurality of camera locations over a period of time.

In S210, the camera spot sequence may include a history of the same vehicle being captured between multiple camera spots over a period of 3 to 5 days. For example: the camera dot sequence corresponding to Jing X4 is { v }₂₁,v₄₅,v₁₁₁,v₇₃The camera point bit sequence corresponding to Beijing B T is { v }₄₁,v₇₅,v₄₃₁,v₇₃,v₅₆The camera point bit sequence corresponding to Jing C is { v }₂,v₇,v₅₃₄,v₃₃,v₉₆}. The sequence of camera point locations for a plurality of vehicles forms a camera point bitmap G (V, E). It is understood that a certain sequence of vehicle camera point locations belongs to a subset of the camera point bitmap G (V, E). From the camera point bitmap G (V, E), a plurality of camera point location sequences can be obtained for application in subsequent model training.

The multiple camera point location sequences form a vectorized model training set. The vectorization model training set comprises historical records captured by the vehicle at each camera point position in the operation process of the monitoring system. It can be understood that camera point location records passed by a vehicle in a certain time period form a camera point location sequence corresponding to the vehicle according to the appearance sequence of the vehicle at each camera point location.

In one embodiment, capturing the captured history of camera locations for the same vehicle over a 3 day period forms a sequence of camera locations. By selecting the time periods, the records of vehicle snapshots separated by a long time period can be prevented from being listed in the same sequence. Through the selection of the time period, the complexity of vehicle prediction can be reduced. The selection of the specific time period can be limited according to the actual situation.

In one embodiment, in a monitoring system, a license plate is used as a single identifier of a vehicle, and a sequence formed by camera point position marks passing through in sequence corresponding to the license plate and the license plate is obtained. The camera point location sequences of the plurality of vehicles form a training data set of the vectorization model and the point location prediction model. For example: the camera spot bit sequence corresponding to Jing X4 is { v }₂₁,v₄₅,v₁₁₁,v₇₃The camera point bit sequence corresponding to Beijing B T is { v }₄₁,v₇₅,v₄₃₁,v₇₃,v₅₆The camera point bit sequence corresponding to Jing C is { v }₂,v₇,v₅₃₄,v₃₃,v₉₆And so on.

In one embodiment, S30, constructing a vectorization model training set according to the multiple camera point location sequences, and training to form a vectorization model according to the vectorization model training set, includes:

s310, setting the window size for training the word skipping model;

s320, extracting samples of each camera point location sequence according to the window size to form a plurality of groups of model training samples; each group of model training samples comprises a single camera point location and a corresponding prediction context point location sequence;

and S330, training to form a word skipping model according to the multiple groups of model training samples.

In this embodiment, the vectorization model is a skip-gram. In this embodiment, the sequence of camera points corresponding to the vehicle is analogized to a sentence, and each camera point is analogized to a word. And modeling and training through a word skipping model to obtain vector expression of each camera point location, namely obtaining point location feature vectors.

In S310, the window size of the word skipping model may be 4, 5, 6, etc., which may be defined according to actual situations. In this embodiment, the window size of the word skipping model is selected to be 4.

In S320, one camera point location corresponds to one point location feature vector. The skip-word model skip-gram predicts the context of the text through the intermediate words, and the vector expression of each word is obtained through training. In this embodiment, the sequence of camera points of the vehicle is analogized to a sentence, and each camera point is analogized to a word. And extracting point location feature vectors corresponding to each camera point location sequence by setting the window size. In S330, a training set is formed by the formed pairs of the plurality of input samples and the plurality of output samples, and a word skipping model is formed by training for implementing vectorization of the camera point location.

In one embodiment, taking the window size equal to 4 as an example, a sample extraction is performed on each camera point location sequence to form an input sample and an output sample. Each set of model training samples includes input samples and output samples, and the format is shown in the table shown in fig. 3.

In one embodiment, S330, training to form a word skipping model according to the multiple sets of model training samples includes:

s331, inputting a single camera point location into a word skipping model, and outputting a prediction context point location sequence; wherein, the word skipping model is a three-layer neural network;

s332, training the skipping model according to the single camera point location and the prediction context point location sequence, constructing a target loss function of the skipping model, and iteratively adjusting parameters of each layer of the model through a gradient descent algorithm during training to obtain the trained skipping model;

wherein, the target loss function of the word skipping model is as follows:

v_in(vi) a single camera point location representing an input skip model, (v)_out,1,v_out,2,...,v_out,t) V representing the output of a word skipping model_inA sequence of prediction context point locations corresponding to a single camera point location, t represents a window size, h represents a vector with a dimension d of the hidden layer, W_outThe d × n dimension representing the hidden layer connects the weights. The prediction context point location sequence can also be understood as the front and back camera point locations through which the single camera point location passes through the word skipping model and the output vehicle passes.

In S331, a word skipping model is trained according to the plurality of sets of model training samples. The word skipping model is a three-layer neural network. The input layer receives a single camera point location and the output layer is a context point location of the prediction sequence. In S332, an objective loss function of the word skip model is constructed on the output layer.

Referring to FIG. 4, in S331 and S332, the input layer of the word skipping model is a certain camera point v_inThe output layer of the skip word model is v in the point location sequence of the prediction camera_inFront and rear camera point locations of, i.e. { v_out,1,v_out,2,v_out,3,v_out,4}。{v_out,1,v_out,2,v_out,3,v_out,4And is the point location feature vector. For example: when the camera has a dot sequence of { v }₄₁,v₇₅,v₄₃₁,v₇₃,v₅₆At time, the corresponding input layer is { v }₄₃₁→ output layer is { v }₄₁,v₇₅,v₇₃,v₅₆}. Can also be understood as v_inIs { v }₄₃₁}，{v_out,1,v_out,2,v_out,3,v_out,4Is { v }₄₁,v₇₅,v₇₃,v₅₆}. By analogy, each camera point location corresponds to a point location feature vector.

In this embodiment, a certain camera point location v of the input is characterized by a unique hot code_inThe corresponding input vector is of length n. Correspondingly, the output vector of the one-hot coding with the length of n respectively represents the v of the front and back camera point positions of the output_out,1、v_out,2、v_out,3And v_out,4. As shown in FIG. 4, in one embodiment, for a training sample, input layer v_inCorresponding one-hot coded input vector is 01000 … 0, output layer v_out,1Corresponding one-hot code of 1000 … 0, output layer v_out,2Corresponding one-hot code of 0001 … 0, output layer v_out,3Corresponding one-hot code is 0010 … 0, output layer v_out,4The corresponding one-hot code is 0000 … 1.

Weight matrix W between input layer and hidden layer_inRepresents the weight of the kth camera point. The hidden layer is a vector h of dimension d. The vector h is obtained by weighted summation of the input layer inputs, and no activation function is required. The vector corresponding to each output camera point location is shared with the d multiplied by n dimensionality connection weight of the hidden layer, namely W_out. The inputs to the output nodes of the skip-word model are calculated by weighted sums of the corresponding input nodes, i.e.

Wherein h ═ W_in*v_cj。u_cjRepresenting the c-th output camera point location of the t output camera point locations. The likelihood of being the jth of all n camera point locations is identified. Finally, the probability that the c-th output camera point location is the j-th camera point location in all the n camera point locations is generated through the softmax function:

and adjusting a target loss function by using a plurality of groups of model training samples as a training set of the word skipping model. The target loss function of the jump character model is a logarithmic form of conditional probability of point positions before and after output, and the formula is as follows

As shown.

In S332, in the training process of the point prediction model, the target loss function of the word skipping model will continuously decrease, and finally approach to the equilibrium state. And when the target loss function reaches the balance state, the word skipping model reaches the optimum, and the trained word skipping model is obtained.

In one embodiment, during the training of the word skipping model, the word is adoptedAnd adjusting the target loss function by using a random gradient descent algorithm, and reducing the calculation complexity by adopting a negative sampling mode. Finally, a weight matrix W is obtained_inEach row of (a) is taken as a point location feature vector for the corresponding camera point location.

In one embodiment, the vectorization model may also be a CBOW model or a Glove model, and the like, and it is sufficient to implement vectorization of the camera point locations.

In one embodiment, according to the vectorization model, feature vectorization is performed on each camera point in the camera point bitmap to obtain a plurality of corresponding point location feature vectors. It can also be understood that a single camera point location outputs the front and rear camera point locations through which the vehicle passes after passing through the vectorization model, and corresponding point location feature vectors are formed.

Referring to fig. 5, in an embodiment, the step S40 of constructing a point location prediction model training set according to a plurality of point location feature vectors, and training to form a point location prediction model according to the point location prediction model training set includes:

s410, inputting a plurality of point location feature vectors into an attention machine mapping convolution network, outputting a plurality of node hidden layer features, inputting the node hidden layer features into a classification layer, and outputting a camera point location prediction result;

and S420, training the attention machine graph convolution network according to the point location feature vectors and the camera point location prediction result, obtaining parameters of the attention machine graph convolution network, and obtaining a point location prediction model after training.

In one embodiment, S410, inputting a plurality of point location feature vectors into an attention machine graph convolution network, outputting a plurality of node hidden layer features, inputting the plurality of node hidden layer features into a classification layer, and outputting a camera point location prediction result, includes:

s411, constructing an attention coefficient according to a graph convolution network, a feedforward neural network and a linear unit function with leakage correction; the attention factor is:

wherein the content of the first and second substances,

representing a feed-forward neural network, W representing a shared weight matrix,

node hidden layer characteristics corresponding to the node i representing the ith layer graph convolution output,

representing the node hidden layer characteristics corresponding to the node j of the ith layer graph convolution output, representing vector splicing by an | | symbol, wherein the node i and the node j are adjacent nodes;

s412, obtaining node hidden layer characteristics corresponding to a node i output by the (l + 1) th layer graph convolutional layer of the multi-head attention mechanism according to the attention coefficient; the node hidden layer characteristics corresponding to the node i output by the (l + 1) th layer graph convolutional layer of the multi-head attention mechanism are as follows:

where σ (-) denotes the activation function, N_iRepresents a set of nodes in the neighborhood of node i, K represents the amount of attention of the multi-head,

and (3) representing node hidden layer characteristics corresponding to a node i output by the (l + 1) th layer graph convolutional layer of the single attention mechanism.

S413, constructing a multilayer graph convolutional layer according to node hidden layer characteristics corresponding to a node i output by the (l + 1) th graph convolutional layer of the multi-head attention mechanism;

s414, respectively carrying out maximum pooling and average pooling on the node hidden layer characteristics output by the multilayer graph convolutional layer to obtain two pooled output characteristics;

s415, inputting the two-pooling output characteristics into an identification function, and outputting the probability of the vehicle appearing at each camera point position at the next moment;

and S416, acquiring the camera point position of the vehicle at the next moment according to the probability of the vehicle appearing at each camera point position at the next moment.

In S411 to S416, a point location prediction model training set is constructed according to the plurality of point location feature vectors. The training set of point location prediction models is shown in the table of fig. 6. And (3) taking the feature representation of the point location sequence of the first t-1 time period of each camera point location sequence as the input of a point location prediction model, and outputting the camera point location at the time t as a prediction result. The locus prediction model in the embodiment is formed by a convolutional network of an attention machine graph, a feedforward neural network, a linear unit function with leakage correction, a two-pooling processing process of maximum pooling and average pooling and a softmax function.

And the appearance sequence of the vehicle at each camera point position is used as the relation of the edges in the camera point bitmap. The relationship of edges (which may also be understood as a node relationship) changes dynamically as each vehicle travels. The method can be suitable for the dynamic change of the vehicle by adopting an attention machine to map the convolutional network. And (3) representing the features of the point location sequence of the first t-1 time period of the point location sequences of the cameras to form a plurality of point location feature vectors as the input of a point location prediction model, outputting the camera point location corresponding to the time t as the prediction result of the point location prediction model, and training by using a supervised training method to form the point location prediction model.

The attention machine graph convolution network GAT is an improvement based on a graph convolution network GCN added attention machine, and the association relation between each node on a graph structure is carved by a learnable attention method. And establishing a dynamic connection relation according to the characteristics of the current node and the nodes (also understood as adjacent nodes) in the neighborhood. In the present embodiment, the appearance order of the vehicles in the camera point map is used as the relationship of the edges in the map. The relationship of the sides changes dynamically as each vehicle travels. In the vehicle prediction method provided by the application, the attention mechanism is adopted to dynamically calculate the relation of adjacent points in the camera point bitmap.

The camera point bitmap formed by the historical camera point position sequence of the vehicle is converted into x ═ h after vectorization₀,h₁,...,h_i,...,h_z]Form (a). Wherein h is_iA feature vector representing the ith camera point location in a sequence of camera point locations of length z of the vehicle. The l-th layer graph convolution calculation output is the hidden characteristics H of each updated camera point location node^(l)＝[h₀ ^(l),h₁ ^(l),...,h_i ^(l),...,h_t ^(l)]. All nodes train a shared weight matrix to obtain the weight of each neighbor node. The shared weight matrix is a mapping relationship between the input F feature dimension and the output F' dimension.

In S411, when calculating the attention coefficient, the feature vectors of the node i and the node j are mapped by using W, respectively, and the vectors are spliced together. By feedforward neural networks

The stitching vector is mapped onto a real number. And activating by a leakage correction linear unit function LeakyReLU, and obtaining a final attention coefficient after normalization.

In S412, N_iThe set of nodes in the neighborhood of the node i may be understood as a set of adjacent camera points of a certain camera point i, or may be understood as a set of adjacent nodes of the node i. According to the attention coefficient, the nodes in the neighborhood of the node i can be weighted and summed to obtain the output characteristic of the node i, namely: node hidden layer characteristics corresponding to node i output by first +1 layer graph convolution layer in single-head attention mechanism

Wherein alpha is_ijAnd obtaining by using a softmax function.

In the embodiment, the multi-head attention mechanism is introduced to improve the characterization capability of the model, so that the node can be stably represented by self-attention.

For middle layer output features, per layer graph convolutionAnd calculating K self-attentiveness by the layer, and splicing results obtained by the attention heads to obtain an output vector. The output result of each layer of chart convolution layer is obtained by averaging the output vectors of each attention head

In one embodiment, K represents the number of attentions in each map convolutional layer, which can also be understood as K-head attentions, and can be defined according to specific situations.

In S413, the plurality of attention map convolutional layers may be 3 map convolutional layers. The number of map convolution layers is not limited, and may be limited in accordance with actual circumstances. And outputting the node hidden layer characteristics by the input point location characteristic vector through 3 graph convolution layers so as to obtain the node hidden layer characteristics output by the multilayer graph convolution layers, and respectively performing maximum value pooling and average value pooling to obtain two pooled output characteristics. Both the maximum pooling and the average pooling are performed in parallel on a node hiding layer.

The plurality of attention map convolutional layers comprises 3 map convolutional layers. And a multi-head attention mechanism is adopted for each graph convolution layer. Input H of layer 1 graph convolution layer⁽⁰⁾＝[h₀ ⁽⁰⁾,h₁ ⁽⁰⁾,...,h_i ⁽⁰⁾,...,h_t ⁽⁰⁾]Output H⁽¹⁾＝[h₀ ⁽¹⁾,h₁ ⁽¹⁾,...,h_i ⁽¹⁾,...,h_t ⁽¹⁾]，

Input H of layer 2 graph convolution layer⁽¹⁾＝[h₀ ⁽¹⁾,h₁ ⁽¹⁾,...,h_i ⁽¹⁾,...,h_t ⁽¹⁾]Output H⁽²⁾＝[h₀ ⁽²⁾,h₁ ⁽²⁾,...,h_i ⁽²⁾,...,h_t ⁽²⁾]，

Input H of layer 3 graph convolution layer⁽²⁾＝[h₀ ⁽²⁾,h₁ ⁽²⁾,...,h_i ⁽²⁾,...,h_t ⁽²⁾]Output H⁽³⁾＝[h₀ ⁽³⁾,h₁ ⁽³⁾,...,h_i ⁽³⁾,...,h_t ⁽³⁾]，

H⁽³⁾＝[h₀ ⁽³⁾,h₁ ⁽³⁾,...,h_i ⁽³⁾,...,h_t ⁽³⁾]Node hidden layer characteristics output as a 3-layer graph convolutional layer. Further, for H⁽³⁾＝[h₀ ⁽³⁾,h₁ ⁽³⁾,...,h_i ⁽³⁾,...,h_t ⁽³⁾]And respectively carrying out maximum pooling and average pooling to obtain two-pooling output characteristics.

And by splicing the two pooled output characteristics after the two pooling processes, inputting the two pooled output characteristics into a softmax recognition function, outputting the probability that the vehicle appears at each camera point position at the next moment, and further outputting a final vehicle prediction result. For example: the probability of the output vehicle appearing at each camera point at the next moment is respectively as follows: camera point location v₅₆The probability is 96%; camera point location v₇₃The probability is 3%; camera point location v₄₃₁The probability is 1%. And finally outputting a vehicle prediction result as a camera point position v₅₆。

Starting from the whole monitoring system, the camera point locations of the whole monitoring network are covered based on the relation among all the camera point locations. And the information of the camera point positions of the monitoring network and the relation among all the camera point positions are represented through specific data, and the relation among all the camera point positions is comprehensively considered for model prediction. Based on analysis of historical records of the vehicles appearing at each camera point, training and analysis of deep learning large-scale data are adopted, so that the point prediction model has strong learning capacity, wide coverage range and good adaptability, and camera points of the vehicles which may appear at the next moment can be predicted more accurately. Therefore, the vehicle prediction method provided by the application can improve the accuracy of vehicle prediction, can be applied to a large-scale vehicle monitoring system, and can realize the accurate search of the target vehicle. Therefore, the vehicle prediction method avoids the adoption of a manual analysis mode, avoids missing camera point positions possibly appearing on the vehicle, and can be used in practical application environments such as vehicle inquiry, abnormal parking detection and the like.

In one embodiment, the respective point location feature vectors corresponding to the n camera point locations in the camera point bitmap obtained according to the word skipping model may be stored as a lexicographic structure, such as { v }_j：x_j}，v_jRepresenting camera point location, x_jDenotes v_jAnd (4) corresponding point location feature vectors. And obtaining a point location feature vector corresponding to each camera point location in the camera point bitmap according to the word skipping model. When point location prediction is performed according to the point location prediction model, the point location prediction is performed according to v_jFind corresponding x_jAnd performing subsequent point location prediction processing.

In one embodiment, in step S420, training the attention machine graph convolution network according to the sequence of the multiple point location feature vectors and the camera point location prediction result, obtaining parameters of the attention machine graph convolution network, and obtaining a trained point location prediction model, the method includes:

s421, according to the probability that the vehicle appears at each camera point location at the next moment, constructing a target loss function of the point location prediction model, and adjusting each layer parameter of the multilayer graph convolution layer according to the value of the target loss function of the point location prediction model to obtain a trained point location prediction model; the target loss function of the point location prediction model is:

wherein M represents the number of point location feature vectors in a training set formed by a plurality of point location feature vectors, N represents the number of camera point locations in a camera point bitmap, y_ijRepresenting point location feature vectors in a training seti, if the real prediction result of the point location feature vector i is the camera point location j, the value is 1, otherwise, the value is 0, p_ijAnd the probability that the point location feature vector i corresponding to the vehicle appears at the camera point location j at the next moment is represented.

The target loss function of the point location prediction model can adopt a cross entropy loss function. In the training process of the point location prediction model, the target loss function can continuously decrease and finally tends to be in an equilibrium state. When the target loss function reaches the equilibrium state, the point location prediction model reaches the optimum, and the trained point location prediction model is obtained, so that the prediction model is more stable and reliable.

In one embodiment, S50, a camera location where the vehicle appears at the next time is predicted according to the location prediction model.

After the training of the vectorization model and the point location prediction model is completed, a historical point location sequence (which can also be understood as an original sequence or a camera point location sequence) of a certain vehicle appearing in the monitoring network is obtained. According to the word skipping model, the point location feature vector corresponding to each camera point location in the camera point location sequence can be obtained. And forming a camera point location sequence according to the historical camera point location sequence, and calling a point location feature vector corresponding to each camera point location. The point location feature vectors form an input sequence and are input to the point location prediction model. And according to the point location prediction model, obtaining the probability of the vehicle appearing at each point location at the next moment, and further obtaining the camera point location of the vehicle appearing at the next moment.

Referring to fig. 7, in one embodiment, the present application provides a vehicle predictive device 100. The vehicle prediction device 100 includes a camera point bitmap generation module 10, a camera point location sequence acquisition module 20, a vectorization model generation module 30, a point location prediction model generation module 40, and a prediction module 50.

The camera point bitmap generation module 10 is configured to obtain node relationships between a plurality of camera point locations of the monitoring system and the plurality of camera point locations, and construct a camera point bitmap according to the node relationships between the plurality of camera point locations. The camera point location sequence obtaining module 20 is configured to obtain a history of vehicle capturing in the monitoring system based on the camera point location map, and form a plurality of camera point location sequences. The vectorization model generation module 30 is configured to construct a vectorization model training set according to the multiple camera point location sequences, train to form a vectorization model according to the vectorization model training set, and perform feature vectorization on each camera point location in the camera point bitmap according to the vectorization model to obtain multiple point location feature vectors. The point location prediction model generation module 40 is configured to construct a point location prediction model training set according to the plurality of point location feature vectors, and train to form a point location prediction model according to the point location prediction model training set. The prediction module 50 is configured to predict a camera point location of the vehicle at the next time according to the vectorization model and the point location prediction model.

In this embodiment, the relevant description of the camera point bitmap generation module 10 may refer to the relevant description of S10 in the above embodiment. The related description of the camera location point sequence acquiring module 20 may refer to the related description of S20 in the above embodiment. The relevant description of the vectorization model generation module 30 may refer to the relevant description of S30 in the above embodiment. The relevant description of the point location prediction model generation module 40 may refer to the relevant description of S50 in the above embodiment. The relevant description of the prediction module 50 may refer to the relevant description of S60 in the above embodiment.

In one embodiment, the vectorization model generation module 30 includes a parameter setting module (not shown), a sample generation module (not shown), and a model training module (not shown). And the parameter setting module is used for setting the window size for training the word skipping model. And the sample generation module is used for extracting samples of each camera point location sequence according to the window size to form a plurality of groups of model training samples. And the model training module is used for training and forming the word skipping model according to the plurality of groups of model training samples.

In this embodiment, the relevant description of the parameter setting module may refer to the relevant description of S310 in the above embodiment. The relevant description of the sample generation module may refer to the relevant description of S320 in the above embodiment. The relevant description of the model training module may refer to the relevant description of S330 in the above embodiment.

In one embodiment, the model training module (not labeled) includes a model building module (not labeled) and a first loss function adjusting module (not labeled). The model construction module is used for inputting the single camera point location into the word skipping model and outputting the prediction context point location sequence; the word skipping model is a three-layer neural network. And the first loss function module is used for training the word skipping model according to the single camera point location and the prediction context point location sequence, constructing a target loss function of the word skipping model, and iteratively adjusting each layer of neural network parameters of the word skipping model through a gradient descent algorithm during training. Wherein, the target loss function of the word skipping model is as follows:

v_in(vi) a single camera point location representing the input of the skip model, (v)_out,1,v_out,2,...,v_out,t) V representing the output of the word skipping model_inA prediction context point location sequence corresponding to a single camera point location, t represents the window size, h represents a vector with a dimension d of the hidden layer, and W_outA d n dimensional connection weight representing the hidden layer.

In this embodiment, the relevant description of the hidden layer parameter obtaining module may refer to the relevant description of S331 in the above embodiment. The description of the first loss function adjusting module may refer to the description of S332 in the above embodiment.

In one embodiment, the point location prediction model generation module 40 includes a point location prediction model construction module and a point location prediction model training module. And the point location prediction model building module (not marked in the figure) is used for inputting the plurality of point location feature vectors into an attention machine graph convolution network, outputting a plurality of node hidden layer features, inputting the plurality of node hidden layer features into the classification layer and outputting a camera point location prediction result. And the point location prediction model training module (not marked in the figure) is used for training the attention machine graph convolution network according to the plurality of point location feature vectors and the camera point location prediction result, acquiring parameters of the attention machine graph convolution network, and obtaining a trained point location prediction model.

In one embodiment, the point location prediction model construction module comprises an attention coefficient construction module, a multi-head attention mechanism map convolutional layer feature acquisition module, a multi-layer map convolutional layer construction module, a two-pooling output feature acquisition module, a probability acquisition module and a camera point location output module. The attention coefficient building module is used for building an attention coefficient according to the graph convolution network, the feedforward neural network and the linear unit function with leakage correction; the attention coefficient is:

wherein the content of the first and second substances,

representing node hidden layer characteristics corresponding to a node j of the ith layer graph convolution output, representing vector splicing by an | | symbol, wherein the node i and the node j are adjacent nodes;

the multi-head attention machine drawing convolutional layer characteristic acquisition module is used for acquiring node hidden layer characteristics corresponding to a node i output by the (l + 1) th layer drawing convolutional layer of the multi-head attention machine according to the attention coefficient; the node hidden layer characteristics corresponding to the node i output by the l +1 th layer graph convolution layer of the multi-head attention mechanism are as follows:

where σ (-) denotes the activation function, N_iAnd K represents the number of the multiple points of attention.

The multilayer graph convolutional layer building module is used for building a multilayer graph convolutional layer according to the node hidden layer characteristics corresponding to the node i output by the l +1 th layer graph convolutional layer of the multi-head attention mechanism;

the two-pooling output characteristic acquisition module is used for performing maximum pooling and average pooling on the node hidden layer characteristics output by the multilayer graph convolutional layer respectively to obtain two-pooling output characteristics. The probability obtaining module is used for inputting the two pooling output characteristics into the recognition function and outputting the probability that the vehicle appears at each camera point position at the next moment. The camera point location output module is used for obtaining the camera point location of the vehicle at the next moment according to the probability of the vehicle appearing at each camera point location at the next moment.

In this embodiment, the description of the attention coefficient building block may refer to the description of S411 in the above embodiment. The relevant description of the multi-head attention map convolutional layer feature acquisition module can refer to the relevant description of S412 in the above embodiment. The description of the multilayer map convolutional layer building block may refer to the description of S413 in the above embodiment. The relevant description of the two-pooled output feature acquisition module may refer to the relevant description of S414 in the above embodiment. The relevant description of the probability obtaining module may refer to the relevant description of S415 in the above embodiment. The relevant description of the camera position output module may refer to the relevant description of S416 in the above embodiment.

In one embodiment, the point location prediction model training module includes a second loss function module (not labeled). The second loss function module is used for constructing a target loss function of the point location prediction model according to the probability of the vehicle appearing at each camera point location at the next moment, and adjusting each layer parameter of the multilayer graph convolution layer according to the value of the target loss function of the point location prediction model to obtain a trained point location prediction model; the target loss function of the point location prediction model is as follows:

wherein M represents the number of point location feature vectors in a training set formed by the plurality of point location feature vectors, N represents the number of camera point locations in the camera point bitmap, y_ijAnd a label representing the point location feature vector i in the training set, wherein if the true prediction result of the point location feature vector i is the camera point j, the value is 1, otherwise, the value is 0, and p is obtained_ijAnd the probability that the point location feature vector i corresponding to the vehicle appears at the camera point location j at the next moment is represented.

In this embodiment, the related description of the second loss function adjusting module may refer to the related description of S421 in the above embodiment.

In one embodiment, the camera point bitmap generation module 10 includes a camera point location set generation module (not labeled), a node edge set generation module (not labeled), and a construction module (not labeled).

The camera point location set generating module is used for acquiring the plurality of camera point locations, forming a camera point location set V, and taking each camera point location as a node. The node edge set generation module is used for acquiring the sequence of a plurality of vehicles among a plurality of nodes, forming the node relation among all adjacent nodes, and forming a node edge set E according to the node relation. The construction module is used for constructing the camera point bitmap G (V, E) according to the camera point position set V and the node edge set E.

In this embodiment, the relevant description of the camera point location set generating module may refer to the relevant description of S110 in the above embodiment. The relevant description of the node edge set generation module may refer to the relevant description of S120 in the above embodiment. The relevant description of the building block can refer to the relevant description of S130 in the above embodiment.

In one embodiment, the camera point location sequence acquiring module 20 includes a record acquiring module (not labeled). And the record acquisition module is used for acquiring a camera point position sequence corresponding to the vehicle according to the camera point bitmap G (V, E). The camera spot sequence includes a recording of the same vehicle being snapped between multiple camera spots over a period of time.

In this embodiment, reference may be made to the description of S210 in the foregoing embodiment for the description of the record obtaining module.

In the various embodiments described above, the particular order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.

Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The various illustrative logical blocks, or elements described in this application may be implemented or operated by a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.

The steps of a method or algorithm described in the embodiments herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.

The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present application.

Claims

1. A vehicle prediction method, characterized by comprising:

2. The vehicle prediction method of claim 1, wherein the vectorization model is a skip word model, the constructing a vectorization model training set according to the plurality of camera point location sequences, and training the vectorization model according to the vectorization model training set to form the vectorization model comprises:

setting a window size for training the word skipping model;

according to the window size, performing sample extraction on each camera point location sequence to form a plurality of groups of model training samples; each group of model training samples comprises a single camera point location and a corresponding prediction context point location sequence;

and training to form the word skipping model according to the multiple groups of model training samples.

3. The vehicle prediction method of claim 2, wherein the training to form the skip word model based on the plurality of sets of model training samples comprises:

inputting the single camera point location into the skip word model, and outputting the prediction context point location sequence; wherein, the word skipping model is a three-layer neural network;

training the word skipping model according to the single camera point location and the prediction context point location sequence, constructing a target loss function of the word skipping model, and iteratively adjusting each layer of neural network parameters of the word skipping model through a gradient descent algorithm during training; wherein, the target loss function of the word skipping model is as follows:

4. The vehicle prediction method of claim 1, wherein the constructing a point location prediction model training set according to the point location feature vectors and training a point location prediction model according to the point location prediction model training set to form a point location prediction model comprises:

inputting the point location feature vectors into an attention machine graph convolution network, outputting a plurality of node hidden layer features, inputting the node hidden layer features into a classification layer, and outputting a camera point location prediction result;

and training the attention machine graph convolution network according to the point location feature vectors and the camera point location prediction result to obtain parameters of the attention machine graph convolution network, so as to obtain a trained point location prediction model.

5. The vehicle prediction method according to claim 4, wherein the inputting the point location feature vectors into an attention machine graph convolution network, outputting a plurality of node hidden layer features, and inputting the node hidden layer features into a classification layer, and outputting a camera point location prediction result, comprises:

constructing an attention coefficient according to a graph convolution network, a feedforward neural network and a linear unit function with leakage correction; the attention coefficient is:

wherein, the first and the second end of the pipe are connected with each other,

representing the node hidden layer characteristics corresponding to the node j of the ith layer graph convolution output, representing vector splicing by the symbol of | l, wherein the node i and the node j are adjacent nodes;

obtaining node hidden layer characteristics corresponding to a node i output by the (l + 1) th layer graph convolutional layer of the multi-head attention mechanism according to the attention coefficient; the node hidden layer characteristics corresponding to the node i output by the l +1 th layer graph convolution layer of the multi-head attention mechanism are as follows:

where σ (-) denotes the activation function, N_iRepresenting a neighborhood inner node set of the node i, and K represents the number of the multi-head attention;

constructing a multilayer graph convolution layer according to node hidden layer characteristics corresponding to a node i output by the l +1 th graph convolution layer of the multi-head attention mechanism;

performing maximum pooling and average pooling on the node hidden layer characteristics output by the multilayer graph convolutional layer respectively to obtain two pooled output characteristics;

inputting the two-pooling output characteristics into a recognition function, and outputting the probability of the vehicle appearing at each camera point location at the next moment;

and obtaining the camera point position of the vehicle at the next moment according to the probability of the vehicle appearing at each camera point position at the next moment.

6. The vehicle prediction method according to claim 5, wherein the training of the attention machine graph convolution network according to the point location eigenvectors and the camera point location prediction result to obtain parameters of the attention machine graph convolution network to obtain a trained point location prediction model further comprises:

according to the probability that the vehicle appears at each camera point location at the next moment, constructing a target loss function of the point location prediction model, and adjusting each layer parameter of the multilayer graph convolution layer according to the value of the target loss function of the point location prediction model to obtain a trained point location prediction model; the target loss function of the point location prediction model is as follows:

wherein M represents the number of point location feature vectors in a training set formed by the plurality of point location feature vectors, N represents the number of camera point locations in the camera point bitmap, y_ijA label representing the point location feature vector i in the training set takes a value of 1 if the true prediction result of the point location feature vector i is the camera point location j,otherwise, the value is 0, p_ijAnd the probability that the point location feature vector i corresponding to the vehicle appears at the camera point location j at the next moment is represented.

7. The vehicle prediction method according to claim 1, wherein the obtaining a node relationship between a plurality of camera point locations of a monitoring system and the plurality of camera point locations, and constructing a camera point bitmap according to the node relationship between the plurality of camera point locations comprises:

acquiring the plurality of camera point positions to form a camera point position set V, and taking each camera point position as a node;

acquiring the sequence of a plurality of vehicles captured among a plurality of nodes, forming the node relation among all adjacent nodes, and forming a node edge set E according to the node relation;

and constructing the camera point bitmap G (V, E) according to the camera point position set V and the node edge set E.

8. The vehicle prediction method of claim 7, wherein the obtaining a plurality of camera point location sequences from the camera point bitmap comprises:

acquiring a camera point location sequence corresponding to the vehicle according to the camera point location graph G (V, E);

wherein the sequence of camera locations comprises a recording of a same vehicle being snapped between a plurality of camera locations over a period of time.

9. A vehicle prediction apparatus characterized by comprising:

10. The vehicle prediction device of claim 9, wherein the vectorization model generation module comprises:

the parameter setting module is used for setting the window size for training the word skipping model;

the sample generation module is used for extracting samples from each camera point location sequence according to the window size to form a plurality of groups of model training samples; each group of model training samples comprises a single camera point location and a corresponding prediction context point location sequence;

and the model training module is used for training to form the word skipping model according to the plurality of groups of model training samples.

11. The vehicle prediction apparatus of claim 10, wherein the model training module comprises:

the model construction module is used for inputting the single camera point location into the word skipping model and outputting the prediction context point location sequence; wherein, the word skipping model is a three-layer neural network;

the first loss function module is used for training the skip word model according to the single camera point location and the prediction context point location sequence, constructing a target loss function of the skip word model, and iteratively adjusting each layer of neural network parameters of the skip word model through a gradient descent algorithm during training;

wherein, the target loss function of the word skipping model is as follows:

12. The vehicle prediction apparatus according to claim 9, wherein the point location prediction model generation module includes:

the point location prediction model construction module is used for inputting the point location feature vectors into an attention machine graph convolution network, outputting a plurality of node hidden layer features, inputting the node hidden layer features into a classification layer and outputting a camera point location prediction result;

and the point location prediction model training module is used for training the attention machine graph convolution network according to the point location feature vectors and the camera point location prediction result, acquiring parameters of the attention machine graph convolution network, and obtaining a trained point location prediction model.

13. The vehicle prediction apparatus according to claim 12, wherein the point location prediction model construction module includes:

the attention coefficient building module is used for building an attention coefficient according to the graph convolution network, the feedforward neural network and the linear unit function with leakage correction; the attention coefficient is:

wherein the content of the first and second substances,

the multi-head attention machine graph convolutional layer characteristic acquisition module is used for acquiring node hidden layer characteristics corresponding to a node i output by the (l + 1) th layer graph convolutional layer of the multi-head attention machine according to the attention coefficient; the node hidden layer characteristics corresponding to the node i output by the l +1 th layer graph convolution layer of the multi-head attention mechanism are as follows:

the two-pooling output characteristic acquisition module is used for respectively performing maximum pooling and average pooling on the node hidden layer characteristics output by the multilayer graph convolutional layer to obtain two-pooling output characteristics;

the probability obtaining module is used for inputting the two-pooling output characteristics into a recognition function and outputting the probability of the vehicle appearing at each camera point at the next moment;

and the camera point location output module is used for obtaining the camera point location of the vehicle at the next moment according to the probability of the vehicle appearing at each camera point location at the next moment.

14. The vehicle prediction device of claim 13, wherein the point location prediction model training module comprises:

the second loss function module is used for constructing a target loss function of the point location prediction model according to the probability of the vehicle appearing at each camera point location at the next moment, and adjusting each layer parameter of the multilayer map convolutional layer according to the value of the target loss function of the point location prediction model to obtain a trained point location prediction model; the target loss function of the point location prediction model is as follows:

15. The vehicle prediction device of claim 9, wherein the camera point bitmap generation module comprises:

the camera point location set generation module is used for acquiring the plurality of camera point locations, forming a camera point location set V and taking each camera point location as a node;

the node edge set generation module is used for acquiring the sequence of a plurality of vehicles among a plurality of nodes, forming the node relation among all adjacent nodes and forming a node edge set E according to the node relation;

and the construction module is used for constructing the camera point bitmap G (V, E) according to the camera point position set V and the node edge set E.

16. The vehicle prediction apparatus according to claim 15, wherein the camera spot location sequence acquisition module includes:

and the record acquisition module is used for acquiring a camera point location sequence corresponding to the vehicle according to the camera point bitmap G (V, E), wherein the camera point location sequence comprises the record of the same vehicle which is captured among a plurality of camera point locations in a time period.