CN114519842A - Vehicle matching relation judgment method and device based on high-order video monitoring - Google Patents

Vehicle matching relation judgment method and device based on high-order video monitoring Download PDF

Info

Publication number
CN114519842A
CN114519842A CN202210127148.6A CN202210127148A CN114519842A CN 114519842 A CN114519842 A CN 114519842A CN 202210127148 A CN202210127148 A CN 202210127148A CN 114519842 A CN114519842 A CN 114519842A
Authority
CN
China
Prior art keywords
vehicle
detection frame
matrix
matching
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210127148.6A
Other languages
Chinese (zh)
Inventor
闫军
丁丽珠
王艳清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Super Vision Technology Co Ltd
Original Assignee
Super Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Super Vision Technology Co Ltd filed Critical Super Vision Technology Co Ltd
Priority to CN202210127148.6A priority Critical patent/CN114519842A/en
Publication of CN114519842A publication Critical patent/CN114519842A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application discloses a vehicle matching relation judging method and device based on high-order video monitoring. The method comprises the following steps: constructing a real relation matrix of the vehicle and the matched object and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the detection frame, the identification number of the detection frame, the type of the matched object, the coordinate position of the detection frame and the identification number of the detection frame; transforming the image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame to obtain a detection frame image; splicing the characteristics of each detection frame image with the distance characteristics to obtain splicing characteristics, and performing characteristic transformation on the splicing characteristics to obtain transformation characteristics; inputting the adjacent matrix of each transformation characteristic and each detection frame into a graph convolution neural network, and outputting a prediction relation matrix of the vehicle and the matching object; and constructing a loss function, and adjusting parameters of the graph convolution neural network according to the loss function to obtain the trained graph convolution neural network.

Description

Vehicle matching relation judgment method and device based on high-order video monitoring
Technical Field
The application relates to the technical field of target detection, in particular to a vehicle matching relationship judgment method and device based on high-order video monitoring.
Background
With the development of high-level video technology, the detection and snapshot of actions such as roadside parking, vehicle violation and the like can be realized, and the complete process is recorded in the form of pictures and videos. The parking or violation evidence obtaining data chain formed by the high-level video technology greatly relieves the phenomena of the road side parking such as the indiscriminate charging and the like. In the management process, the target detection of the vehicle and the corresponding matching object identification are carried out on the collected video frame image, the matching object is matched with the corresponding vehicle, the matching relation between the matching object and the vehicle is judged, a correct matching relation is formed, and a parking or violation order is sent to a user. The matching object of the vehicle can be a license plate or a parking space and the like.
In the conventional vehicle matching relationship judgment method, a logic judgment condition is set according to the position relationship and the intersection comparison relationship between the vehicle and the matching object, and the relationship between the vehicle and the matching object is judged. However, when the traditional method is used for judging, when the coordinate position has an error, the calculation of the intersection-comparison relationship has an error, and further the relationship between the vehicle and the matching object is judged incorrectly, so that the matching judgment accuracy of the traditional method is low.
Content of application
The method aims to solve the technical problem that the matching accuracy of the relation judgment of the vehicle and the matching object in the traditional vehicle matching relation judgment method is low. In order to achieve the purpose, the application provides a vehicle matching relationship judging method and device based on high-order video monitoring.
The application provides a vehicle matching relation judgment method based on high-order video monitoring, which comprises the following steps:
acquiring a plurality of video frame images, and acquiring a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a matching object type, a matching object detection frame coordinate position and a matching object detection frame identification number corresponding to each video frame image according to the plurality of video frame images;
constructing a real relation matrix of the vehicle and the matched object and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matched object, the coordinate position of the matched object detection frame and the identification number of the matched object detection frame; each detection frame is a vehicle detection frame or a matching object detection frame;
dividing and size-transforming each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame to obtain a plurality of detection frame images;
splicing the features of each detection frame image with the distance features to obtain a plurality of spliced features, and performing feature transformation on the spliced features to obtain a plurality of transformation features;
inputting the adjacent matrix of each transformation characteristic and each detection frame into a graph convolution neural network, and outputting a prediction relation matrix of the vehicle and a matching object;
constructing a loss function according to the real relation matrix of the vehicle and the matching object and the prediction relation matrix of the vehicle and the matching object, and adjusting the parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network;
and predicting the video frame image to be tested according to the trained image convolution neural network to obtain the matching relation between the vehicle and the matching object.
In one embodiment, the stitching the feature of each of the detection frame images with the distance feature to obtain a plurality of stitched features, and performing feature transformation on the plurality of stitched features to obtain a plurality of transformed features includes:
acquiring 3 × H × W dimensional features of each detection frame image;
setting 4H W dimensional distance features, and performing feature splicing on the 3H W dimensional features and the 4H W dimensional distance features of each detection frame image on channel dimensions to obtain a plurality of 7H W dimensional splicing features;
h × W represents the width and height of the detection frame, and the number of channels 7 includes R channels, G channels, B channels, X coordinate channels, Y coordinate channels, W width channels, and H height channels.
In one embodiment, the stitching the feature of each of the detection frame images and the distance feature to obtain a plurality of stitched features, and performing feature transformation on the plurality of stitched features to obtain a plurality of transformed features further includes:
inputting the plurality of stitching features into a plurality of convolutional layers, outputting a plurality of conversion features;
and inputting the plurality of conversion characteristics into a full connection layer network, and outputting the plurality of conversion characteristics.
In one embodiment, the inputting the adjacency matrix of each transformation feature and each detection box into a graph convolution neural network and outputting a prediction relation matrix of the vehicle and the matching object comprises:
constructing the graph convolution neural network, wherein the graph convolution neural network is as follows:
Figure BDA0003500910100000031
Figure BDA0003500910100000032
wherein X represents each of the transformation features, A represents a relationship matrix for each of the detection boxes,
Figure BDA0003500910100000033
an adjacency matrix representing each of the detection boxes, I represents an identity matrix of each of the detection boxes,
Figure BDA0003500910100000034
a degree matrix, W, representing each of said detection boxes(0)And W(1)Parameters representing the convolutional neural network of the graph, F1Representing a non-linear activation function, F2Representing a normalization function;
and outputting a prediction relation matrix of the vehicle and the matching object according to the graph convolution neural network.
In one embodiment, in the building of the convolutional neural network, a diagonal eigenvalue of the degree matrix of each detection frame is added by one to obtain a new degree matrix of each detection frame, where the convolutional neural network is:
Figure BDA0003500910100000035
Figure BDA0003500910100000036
wherein the content of the first and second substances,
Figure BDA0003500910100000037
a new adjacency matrix representing each of the detection boxes,
Figure BDA0003500910100000038
a freshness matrix representing each of the detection boxes.
In one embodiment, the constructing a loss function according to the real relationship matrix of the vehicle and the matching object and the predicted relationship matrix of the vehicle and the matching object, and adjusting the parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network includes:
constructing a loss function, wherein the loss function is as follows:
Figure BDA0003500910100000041
wherein the content of the first and second substances,
Figure BDA0003500910100000042
a prediction relation matrix, y, representing the vehicle and the matching object corresponding to the ith training sample(i)Representing a real relation matrix of the vehicle and a matching object corresponding to the ith training sample;
and adjusting parameters of the graph convolution neural network to minimize the loss function, so as to obtain the trained graph convolution neural network.
In one embodiment, the matching object includes a license plate or a parking space, the real relation matrix of the vehicle and the matching object includes a real dependency relation matrix of the vehicle and the license plate or a real occupancy relation matrix of the vehicle and the parking space, the predicted relation matrix of the vehicle and the matching object includes a predicted dependency relation matrix of the vehicle and the license plate or a predicted occupancy relation matrix of the vehicle and the parking space, and the matching relation of the vehicle and the matching object includes a dependency relation of the vehicle and the license plate or an occupancy relation of the vehicle and the parking space.
In one embodiment, the present application provides a vehicle matching relationship determination apparatus based on high-level video surveillance, including:
the data acquisition module is used for acquiring a plurality of video frame images and acquiring a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a matching object type, a matching object detection frame coordinate position and a matching object detection frame identification number corresponding to each video frame image according to the plurality of video frame images;
the relation matrix acquisition module is used for constructing a real relation matrix of the vehicle and the matched object and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matched object, the coordinate position of the matched object detection frame and the identification number of the matched object detection frame; each detection frame is a vehicle detection frame or a matching object detection frame;
the detection frame image acquisition module is used for dividing and carrying out size conversion on each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame to obtain a plurality of detection frame images;
the transformation characteristic acquisition module is used for splicing the characteristics and the distance characteristics of each detection frame image to obtain a plurality of splicing characteristics, and performing characteristic transformation on the plurality of splicing characteristics to obtain a plurality of transformation characteristics;
the image convolution neural network module is used for inputting the adjacent matrix of each transformation characteristic and each detection frame into the image convolution neural network and outputting a prediction relation matrix of the vehicle and the matching object;
the training module is used for constructing a loss function according to the real relation matrix of the vehicle and the matching object and the prediction relation matrix of the vehicle and the matching object, and adjusting the parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network;
and the vehicle and matching object relation determining module is used for predicting the video frame image to be tested according to the trained image convolution neural network to obtain the matching relation between the vehicle and the matching object.
In one embodiment, the transformation feature obtaining module comprises:
a first dimension feature obtaining module, configured to obtain 3 × H × W dimension features of each of the detection frame images;
the splicing feature acquisition module is used for setting 4H W-dimensional distance features and performing feature splicing on the 3H W-dimensional features of each detection frame image and the 4H W-dimensional distance features on channel dimensions to obtain a plurality of 7H W-dimensional splicing features;
h × W represents the width and height of the detection frame, and the number of channels 7 includes R channels, G channels, B channels, X coordinate channels, Y coordinate channels, W width channels, and H height channels.
In one embodiment, the transformation feature obtaining module further comprises:
the convolutional layer module is used for inputting the splicing characteristics into a plurality of convolutional layers and outputting a plurality of conversion characteristics;
and the full-connection layer network module is used for inputting the conversion characteristics into a full-connection layer network and outputting the conversion characteristics.
In one embodiment, the atlas neural network module includes:
a building module, configured to build the graph convolution neural network, where the graph convolution neural network is:
Figure BDA0003500910100000061
Figure BDA0003500910100000062
wherein X represents each of the transformation features, A represents a relationship matrix for each of the detection boxes,
Figure BDA0003500910100000063
an adjacency matrix representing each of the detection boxes, I represents an identity matrix of each of the detection boxes,
Figure BDA0003500910100000064
a degree matrix, W, representing each of said detection boxes(0)And W(1)Parameters representing the convolutional neural network of the graph, F1Representing a non-linear activation function, F2Representing a normalization function;
and the prediction module is used for outputting a prediction relation matrix of the vehicle and the matching object according to the graph convolution neural network.
In one embodiment, in the building module, a diagonal feature value of the degree matrix of each detection box is subjected to addition processing to obtain a new degree matrix of each detection box, and the graph convolution neural network is:
Figure BDA0003500910100000065
Figure BDA0003500910100000066
wherein the content of the first and second substances,
Figure BDA0003500910100000067
a new adjacency matrix representing each of the detection boxes,
Figure BDA0003500910100000068
a freshness matrix representing each of the detection boxes.
In one embodiment, the training module comprises:
a loss function constructing module, configured to construct a loss function, where the loss function is:
Figure BDA0003500910100000069
wherein the content of the first and second substances,
Figure BDA00035009101000000610
a prediction relation matrix, y, representing the vehicle and the matching object corresponding to the ith training sample(i)Representing a real relation matrix of the vehicle and a matching object corresponding to the ith training sample;
and the parameter optimization module is used for adjusting the parameters of the graph convolution neural network so as to minimize the loss function and obtain the trained graph convolution neural network.
In the vehicle matching relationship judgment method based on high-order video monitoring, a relationship matrix of the vehicle and the matching object and an adjacent matrix of each detection frame are formed by obtaining the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matching object, the coordinate position of the matching object detection frame and the identification number of the matching object detection frame, and the relationship between the vehicle and the matching object is constructed into graph structure data. The graph structure data characterizes the relationship between the individual detection boxes. The relationship between the vehicle and the matching object is learned through the graph structure data formed by the vehicle and the matching object and by utilizing the learning capability of the graph structure data with strong graph convolution neural network, and the relationship judgment of the vehicle and the matching object from end to end is realized. And the distance features are added into the features of each detection frame image to form splicing features, and the distance features are further added to train the graph convolution neural network, so that the learning of the matching relation between the vehicle and a matching object is further enhanced, and a more stable training model is favorably obtained. Therefore, the vehicle matching relationship judging method based on high-order video monitoring can judge the matching relationship between the vehicle and the matching object more accurately, solves the problem that the matching accuracy of the traditional method is low, and improves the judging accuracy of the matching relationship between the vehicle and the matching object.
Drawings
Fig. 1 is a schematic flow chart illustrating steps of a vehicle matching relationship determination method based on high-level video monitoring provided by the present application.
Fig. 2 is a schematic overall structure diagram of the vehicle matching relationship determination device based on high-level video monitoring provided by the present application.
Detailed Description
The technical solution of the present application is further described in detail by the accompanying drawings and examples.
Referring to fig. 1, the present application provides a vehicle matching relationship determination method based on high-level video monitoring, including:
s10, acquiring a plurality of video frame images, and acquiring a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a matching object type, a matching object detection frame coordinate position and a matching object detection frame identification number corresponding to each video frame image according to the plurality of video frame images;
s20, constructing a real relation matrix of the vehicle and the matching object and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matching object, the coordinate position of the matching object detection frame and the identification number of the matching object detection frame; each detection frame is a vehicle detection frame or a matching object detection frame;
s30, dividing and size-transforming each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame to obtain a plurality of detection frame images;
s40, splicing the features of each detection frame image and the distance features to obtain a plurality of splicing features, and performing feature transformation on the plurality of splicing features to obtain a plurality of transformation features;
s50, inputting the adjacent matrix of each transformation characteristic and each detection frame into a graph convolution neural network, and outputting a prediction relation matrix of the vehicle and the matching object;
s60, constructing a loss function according to the real relation matrix of the vehicle and the matching object and the prediction relation matrix of the vehicle and the matching object, and adjusting parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network;
and S70, predicting the video frame image to be tested according to the trained image convolution neural network to obtain the matching relation between the vehicle and the matching. The matching object comprises a license plate or a parking space, the real relation matrix of the vehicle and the matching object comprises a real subordinate relation matrix of the vehicle and the license plate or a real occupation relation matrix of the vehicle and the parking space, the prediction relation matrix of the vehicle and the matching object comprises a prediction subordinate relation matrix of the vehicle and the license plate or a prediction occupation relation matrix of the vehicle and the parking space, and the matching relation of the vehicle and the matching object comprises a subordinate relation of the vehicle and the license plate or an occupation relation of the vehicle and the parking space.
In S10, the plurality of video frame images are video frame images at different positions, different angles, and different time periods. The plurality of video frame images can be obtained by extracting frames from video data shot by the high-order video camera and stored as video frame images. Each video frame image includes target information such as a vehicle and a matching object. Further, a plurality of vehicle categories, a plurality of vehicle detection frame coordinate positions, a plurality of vehicle detection frame identification numbers, a plurality of matching object categories, a plurality of matching object detection frame coordinate positions, and a plurality of matching object detection frame identification numbers can be obtained from the plurality of video frame images. The categories in each video frame image can be divided into three categories of vehicles, license plates and berths. The matching object type comprises a license plate type or a parking position type, and whether the matching object type is the parking position type or the license plate type can be detected during detection. The coordinate position of the vehicle detection frame comprises the coordinate information of the center point of the vehicle detection frame and the width and height information of the vehicle detection frame. The coordinate position of the matching object detection frame comprises the coordinate information of the center point of the matching object detection frame and the width and height information of the matching object detection frame. The coordinate information of the center point of the detection frame and the width and height information of the detection frame may be represented as X, Y, W, H, respectively. The vehicle detection frame identification number can be an ID number of the vehicle detection frame, and can be understood as an identification serial number of the vehicle detection frame. The identification number of the matching object detection box may be an ID number of the matching object detection box, and may be understood as an identification serial number of the matching object detection box. The identification number of each vehicle detection frame and the identification number of the matching object detection frame are unique and different from each other. The vehicle type, the vehicle detection frame coordinate position, the vehicle detection frame identification number, the matching object type, the matching object detection frame coordinate position and the matching object detection frame identification number are stored in files in formats of txt, json, xml and the like.
In S20, the vehicle detection frame identification number and the matching object detection frame identification number are obtained according to the step of S10. And each vehicle detection frame identification number and each matching object detection frame identification number are uniquely represented, so that each vehicle and each matching object have a unique detection frame ID number for representation. It can be understood that one vehicle detection box identification number corresponds to one vehicle detection box, one matching object detection box identification number corresponds to one matching object detection box, and each detection box has a unique identification number. And taking the identification numbers of the vehicles and the matching objects which accord with the matching relationship as a matching pair according to the identification numbers of the vehicle detection frames and the identification numbers of the matching object detection frames, and indicating that the vehicles and the matching objects accord with the matching relationship. And constructing label data of the matching relation according to the matching relation formed by the vehicle and the matching object to form an N-by-N dimensional matrix. And the matrix of the dimension N is a real relation matrix of the vehicle and the matched object. Where N represents the number of detection boxes. Both the vehicle detection frame and the matching object detection frame may be collectively referred to as a detection frame. One detection box corresponds to one detection box identification number. And the characteristics of the dimension N x N represent the matching relation of each detection frame and the other N-1 detection frames. When one detection frame has a matching relationship with some other detection frame, the coordinate value representing the matching relationship of the two detection frames in the real relationship matrix is 1, otherwise, the coordinate value is 0. The relationship between each detection box and itself in the real relationship matrix is uniformly represented as 0. Therefore, the real relation matrix of the N-by-N dimensional vehicle and the matching object can be constructed according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matching object, the coordinate position of the matching object detection frame and the identification number of the matching object detection frame. And obtaining a relation matrix of each detection frame according to the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the coordinate position of the matching object detection frame and the identification number of the matching object detection frame. The relation matrix of each detection frame represents the relation matrix of each detection frame and other detection frames. Each detection frame has a matching relationship with other detection frames, the value in the matrix is 1, and otherwise, the value is 0. Further, an adjacency matrix for each detection frame can be obtained from the relationship matrix for each detection frame. The adjacency matrix represents a matrix of adjacent relations among all detection frames and is an n-order square matrix. Each detection frame may be a vehicle detection frame or a matching object detection frame.
In S30, the vehicle detection frame coordinate position represents the specific position of the vehicle detection frame, and the video frame image can be cut according to the vehicle detection frame coordinate position, and the image of the corresponding vehicle detection frame can be cut out from the original image. And carrying out size transformation on the image of the corresponding vehicle detection frame obtained by matting to obtain a detection frame image with the same size. The coordinate position of the matching object detection frame represents the specific position of the matching object detection frame, the video frame image can be cut according to the coordinate position of the matching object detection frame, and the image of the corresponding matching object detection frame is cut in the original image. And carrying out size transformation on the image of the corresponding matched object detection frame obtained by matting to obtain the detection frame image with the same size. The plurality of detection frame images include a vehicle detection frame image and a matching object detection frame image. Through division and size conversion, a plurality of images are converted into detection frame images with the same size, and subsequent image processing is facilitated.
In S40, the distance features characterize the distance features of adjacent detection boxes. For the video frame image, the probability that the vehicle detection frame and the matching object detection frame which are close to each other belong to the matching relationship is high. And then, adding distance features on the basis of the features of each detection frame image for splicing to obtain spliced features. By adding the distance characteristic, the distance of the detection frame is added into the splicing characteristic, so that the learning of the matching relation between the vehicle and the matched object can be enhanced, model training is facilitated, and a more stable training model is obtained. The transformation characteristics are obtained by performing characteristic transformation on the splicing characteristics, and the method is convenient to be suitable for calculation of the graph convolution neural network.
In S50, the graph convolution neural network includes a plurality of convolution layers, nonlinear activation layers, and the like. Graph convolutional neural networks can be used to process unstructured data. Each transformed feature is used as an adjacency matrix of an input feature vector and each detection frame, and is used as the input of a graph convolution neural network to carry out feature learning. And the prediction relation matrix of the vehicle and the matching object is graph structure data and serves as the output of the graph convolution neural network.
In S60, a loss function is constructed through the real relation between the vehicle and the matching object and the prediction relation between the vehicle and the matching object, parameters of the graph convolution neural network are optimized, and finally the trained graph convolution neural network is obtained and used for prediction of S70.
In S70, the trained convolutional neural network is used to determine the matching relationship between the vehicle in the video frame image to be tested and the matching object, so that the matching relationship between the vehicle and the matching object can be accurately obtained.
The vehicle category, the vehicle detection frame coordinate position, the vehicle detection frame identification number, the matching object category, the matching object detection frame coordinate position, and the matching object detection frame identification number are obtained through S10 to S70 to form a true relationship matrix of the vehicle and the matching object and an adjacency matrix of each detection frame, and the matching relationship between the vehicle and the matching object is constructed as graph structure data. The graph structure data characterizes the relationship between the individual detection boxes. The relationship between the vehicle and the matching object is learned through the graph structure data formed by the vehicle and the matching object and by utilizing the learning capability of the graph structure data with strong graph convolution neural network, and the relationship judgment of the vehicle and the matching object from end to end is realized. And the distance features are added into the features of each detection frame image to form the splicing features, and the distance features are further added to train the graph convolution neural network, so that the learning of the relation between the vehicle and the matched object is further enhanced, and a more stable training model is favorably obtained. Therefore, the vehicle matching relationship judging method based on high-order video monitoring can judge the matching relationship between the vehicle and the matching object more accurately, solves the problem that the matching accuracy of the traditional method is low, and improves the judging accuracy of the matching relationship between the vehicle and the matching object.
In one embodiment, the matching object is a license plate. And the real relation matrix of the vehicle and the matching object is a real dependency relation matrix of the vehicle and the license plate. And the prediction relation matrix of the vehicle and the matching object is a prediction dependency relation matrix of the vehicle and the license plate. The matching relation between the vehicle and the matching object is the subordinate relation between the vehicle and the license plate. The vehicle matching relation judging method based on high-order video monitoring comprises the following steps:
acquiring a plurality of video frame images, and acquiring a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a license plate type, a license plate detection frame coordinate position and a license plate detection frame identification number corresponding to each video frame image according to the plurality of video frame images;
constructing a real membership matrix of the vehicle and the license plate and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the license plate type, the coordinate position of the license plate detection frame and the identification number of the license plate detection frame; each detection frame is a vehicle detection frame or a license plate detection frame;
dividing and size-transforming each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the license plate detection frame to obtain a plurality of detection frame images;
splicing the features and the distance features of each detection frame image to obtain a plurality of splicing features, and performing feature transformation on the plurality of splicing features to obtain a plurality of transformation features;
inputting the adjacent matrix of each transformation characteristic and each detection frame into a graph convolution neural network, and outputting a prediction dependency relationship matrix of the vehicle and the license plate;
constructing a loss function according to the real dependency matrix of the vehicle and the license plate and the prediction dependency matrix of the vehicle and the license plate, and adjusting parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network;
and predicting the video frame image to be tested according to the trained image convolution neural network to obtain the dependency relationship between the vehicle and the license plate.
In the embodiment, the vehicle matching relationship judgment method based on high-level video monitoring can judge the subordinate relationship between the vehicle and the license plate, form a correct matching relationship, and provide a basis for a user to send a parking or violation order, so that the problems of vehicle management such as roadside parking, vehicle violation and the like are solved.
In one embodiment, the matching object is a parking space, the real relation matrix of the vehicle and the matching object is a real occupancy relation matrix of the vehicle and the parking space, the prediction relation matrix of the vehicle and the matching object is a prediction occupancy relation matrix of the vehicle and the parking space, and the matching relation of the vehicle and the matching object is an occupancy relation of the vehicle and the parking space. The vehicle matching relation judging method based on high-order video monitoring comprises the following steps:
acquiring a plurality of video frame images, and acquiring a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a parking type, a parking detection frame coordinate position and a parking detection frame identification number corresponding to each video frame image according to the plurality of video frame images;
constructing a real occupation relation matrix of the vehicle and the berth and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the berth type, the coordinate position of the berth detection frame and the identification number of the berth detection frame; each detection frame is a vehicle detection frame or a parking position detection frame;
dividing and size-transforming each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the berth detection frame to obtain a plurality of detection frame images;
splicing the characteristics and the distance characteristics of each detection frame image to obtain a plurality of splicing characteristics, and performing characteristic transformation on the plurality of splicing characteristics to obtain a plurality of transformation characteristics;
inputting the adjacent matrix of each transformation characteristic and each detection frame into a graph convolution neural network, and outputting a predicted occupation relation matrix of the vehicle and the berth;
constructing a loss function according to the real occupation relation matrix of the vehicle and the parking space and the predicted occupation relation matrix of the vehicle and the parking space, and adjusting parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network;
and predicting the video frame image to be tested according to the trained image convolution neural network to obtain the occupation relation between the vehicle and the berth.
In the embodiment, the vehicle matching relationship judgment method based on high-level video monitoring can judge the occupation relationship between the vehicle and the parking space, form a correct matching relationship, and further accurately judge whether the parking space is occupied or the parking space is released, so that accurate parking duration is sent to a user, parking charge management is performed, and the vehicle management problems of roadside parking, vehicle violation and the like are solved.
In one embodiment, in S10, from the plurality of video frame images, the vehicle category, the vehicle detection frame coordinate position, the vehicle detection frame identification number, the matching object category, the matching object detection frame coordinate position, and the matching object detection frame identification number corresponding to each video frame image are obtained, and from the target detection model, the vehicle category, the vehicle detection frame coordinate position, the vehicle detection frame identification number, the matching object category, the matching object detection frame coordinate position, and the matching object detection frame identification number corresponding to each video frame image are obtained.
In this embodiment, the target detection model includes, but is not limited to, a target detection model obtained by using an algorithm such as a deep learning algorithm and a machine learning algorithm, and performs target detection on each video frame image.
In one embodiment, when the matching object is a berth, and the berth is marked, the coordinate position of a berth corner point in a video frame image is calibrated, and the coordinate of the minimum circumscribed rectangular frame of the berth is obtained through calculation, so that the coordinate position information of the berth detection frame is obtained.
In this embodiment, coordinates of four corner points of each berth in the video frame image can be obtained by calibrating the coordinate positions of the four corner points of the berth in the video frame image. When the high-order video camera is not interfered by the outside, the position of each berth is unchanged in the video frame image picture shot by the camera. And obtaining the coordinates of the minimum circumscribed rectangle of the quadrangle according to the obtained coordinates of the four corner points and the minimum circumscribed rectangle algorithm. The minimum circumscribed rectangle algorithm is to calculate a circumscribed polygon and a rectangle with the minimum area.
In one embodiment, when the minimum circumscribed rectangular frame of the parking space is obtained through calculation, the background feature of the parking space can be increased, and further assistance in judging the occupation relationship between the vehicle and the parking space is facilitated.
In one embodiment, the matching object class may be set to 1, the vehicle class may be set to 2, and the specific setting may be set according to the actual application.
In one embodiment, S40, the stitching the feature of each detection frame image with the distance feature to obtain a plurality of stitching features, and performing feature transformation on the plurality of stitching features to obtain a plurality of transformation features, includes:
s410, acquiring 3H W dimensional features of each detection frame image;
s420, setting 4H W dimensional distance features, and performing feature splicing on the 3H W dimensional features and the 4H W dimensional distance features of each detection frame image on channel dimensions to obtain a plurality of 7H W dimensional splicing features;
h × W represents the width and height of the detection frame, and the number of channels 7 includes R channels, G channels, B channels, X coordinate channels, Y coordinate channels, W width channels, and H height channels.
In this embodiment, 3 of the 3 × H × W dimensional features of each detection frame image represents the number of channels, and H × W represents width and height information. In the 4 × H × W dimensional distance feature, 4 indicates the number of channels, and indicates X, Y, W, H information of the detection frame coordinate position, and H × W indicates width and height information. The width and height information in the feature of each detection frame image in the present embodiment is the same as the width and height information in the distance feature. In the 7 × H × W dimension splicing features, the number of channels represents RGB channels, central point coordinate information X, Y of the detection frames, and width and height information W, H of the detection frames, respectively, and features representing distances between the detection frames are added to features of each detection frame image, so that distance features are fully integrated. By setting the distance feature, the distance between the detection frames is taken into account in the feature of each detection frame image. By adding the distance features, the matching relationship between the vehicle and the matching object with the close distance is easy to detect, the learning of the matching relationship between the vehicle and the matching object can be further enhanced, and the accuracy of judging the matching relationship between the vehicle and the matching object is improved.
In one embodiment, S40, stitching the feature of each detection frame image with the distance feature to obtain a plurality of stitched features, and performing feature transformation on the plurality of stitched features to obtain a plurality of transformed features, further includes:
s440, inputting the splicing characteristics into the convolution layers and outputting a plurality of conversion characteristics;
and S450, inputting the plurality of conversion characteristics into a full connection layer network, and outputting the plurality of conversion characteristics.
In this embodiment, in step S440, the 7 × H × W-dimensional stitching features are input to the plurality of convolution layers, and the image features after feature transformation are obtained. The dimensions of the transition features are denoted 128 × H × W. And S450, the converted feature data is sent to a full-connection layer network for calculation, and the conversion feature with the feature dimension of 128 x 1 is obtained and used as an input feature vector of the graph convolution neural network. And mapping the learned splicing characteristics to a sample mark space through distributed characteristic representation by a full-connection layer network so as to further calculate the graph convolution neural network.
In one embodiment, S50, inputting the adjacency matrix of each transformed feature and each detection box into the graph convolution neural network, and outputting the prediction relationship matrix of the vehicle and the matching object, includes:
s510, constructing a graph convolution neural network, wherein the graph convolution neural network is as follows:
Figure BDA0003500910100000151
Figure BDA0003500910100000152
wherein X represents each transformation feature, A represents a relationship matrix for each detection box,
Figure BDA0003500910100000153
an adjacency matrix representing each detection box, I an identity matrix representing each detection box,
Figure BDA0003500910100000154
degree matrix, W, representing each detection box(0)And W(1)Parameters representing a graph convolution neural network, F1Representing a non-linear activation function, F2Representing a normalization function;
and S520, outputting a prediction relation matrix of the vehicle and the matching object according to the graph convolution neural network.
In S510, the convolutional neural network Z includes two network layers, and the output feature dimension is N × N. X denotes each transformation feature. Each transformation feature is used as an input feature vector of the first layer image convolution neural network, and the image feature and the distance feature of the detection frame are fused. The adjacency matrix of each transformation feature and each detection frame is used asAnd inputting graph structure data of the first layer graph convolution neural network. A represents a relationship matrix of each detection frame, which can be understood as a matrix formed by the relationship between each detection frame and other detection frames. If each detection frame has a relationship with a certain detection frame, the value is displayed as 1 in the matrix, otherwise, the value is 0. I denotes an identity matrix of each detection box. By adding the identity matrix, the feature representation of each detection box and itself can be increased. Adjacency matrix of each detection frame
Figure BDA0003500910100000161
And obtaining the relation matrix of each detection frame, the unit matrix of each detection frame and the degree matrix. The adjacency matrix represents a matrix of adjacent relations among all detection frames and is an n-order square matrix.
Figure BDA0003500910100000162
A degree matrix representing each detection box. The degree matrix represents the degree of each detection frame as a node, and may also be understood as the number of detection frames in connection with each other.
Figure BDA0003500910100000163
And representing the first layer of graph convolutional neural network, and outputting a characteristic dimension of N x N. Adjacency matrix of each detection frame
Figure BDA0003500910100000164
Is N x N. W(0)The learnable parameters representing the first hierarchical convolutional neural network have a feature dimension of 128 × N. The output result of the first layer of graph convolution neural network is used as the input of the second layer of graph convolution neural network. W(1)The learnable parameters representing the second hierarchal convolutional neural network have a feature dimension of 128N.
In S520, based on the convolutional neural network, the adjacency matrix of each transformation feature and each detection box is input as the map structure data of the convolutional neural network, and the prediction relationship matrix of the vehicle and the matching object is output correspondingly. And constructing the relation between the vehicle and the matching object into graph structure data, and inputting the graph structure data into a graph convolution neural network for learning. By learning the relationship between the vehicle and the matching object, the matching relationship between the vehicle and the matching object from end to end can be judged. Moreover, each transformation feature fuses the original image feature and the distance feature of the video frame, training is further performed based on the features formed by splicing, learning of the matching relation between the vehicle and the matching object is further enhanced, and a more stable training model is obtained.
Through the graph convolution neural network constructed in S510 and S520, the relation between the vehicle and the matching object can be judged more accurately, the problem that the matching accuracy is low in the traditional method is solved, and the judgment accuracy of the matching relation between the vehicle and the matching object is improved.
In one embodiment, F1To modify a nonlinear activation function of a linear unit (ReLU). F2The function is normalized for softmax. The graph convolution neural network is represented as
Figure BDA0003500910100000171
In this embodiment, a ReLU nonlinear activation function softmax normalization function is adopted in the graph convolution neural network. The expression capability of the ReLU function is stronger, and the gradient disappearance problem does not exist, so that the convergence rate of the model is maintained in a stable state. The sparse model realized through the ReLU function can better mine the relation characteristics of the vehicle and the matched object, and the training data is fitted. The model training difficulty can be reduced through the softmax normalization function, so that the multi-classification problem is easier to converge, and the method is better suitable for judging the matching relationship between the vehicle and the matched object.
In one embodiment, in step S510, in constructing a graph convolution neural network, a diagonal eigenvalue of the degree matrix of each detection frame is added by one to obtain a new degree matrix of each detection frame, where the graph convolution neural network is:
Figure BDA0003500910100000172
Figure BDA0003500910100000173
wherein the content of the first and second substances,
Figure BDA0003500910100000174
a new adjacency matrix representing each detection box,
Figure BDA0003500910100000175
a freshness matrix representing each detection box.
In this embodiment, the contrast matrix
Figure BDA0003500910100000176
The diagonal characteristic value is added with one, so that the problem that the degree matrix cannot be inverted due to the fact that a matching object of a certain vehicle is shielded and cannot be matched with any matching object can be solved, a new degree matrix is obtained, and the method can be further suitable for matching relation judgment and prediction in a graph convolution neural network. By adding one to the diagonal characteristic value of the degree matrix in the embodiment, the problem of misjudgment and matching caused by shielding of a matching object in the traditional method can be solved.
In one embodiment, S60, constructing a loss function according to the real relationship matrix of the vehicle and the matching object and the predicted relationship matrix of the vehicle and the matching object, and adjusting parameters of the convolutional neural network according to the loss function to obtain a trained convolutional neural network, includes:
s610, constructing a loss function, wherein the loss function is as follows:
Figure BDA0003500910100000181
wherein the content of the first and second substances,
Figure BDA0003500910100000182
a prediction relation matrix, y, representing the vehicle corresponding to the ith training sample and the matching object(i)Is shown asA real relation matrix of the vehicle corresponding to the i training samples and the matched object;
s620, adjusting parameters of the graph convolution neural network to minimize the loss function, and obtaining the trained graph convolution neural network.
In S610, the ith training sample may also be understood as the training sample data corresponding to the ith detection box. Model training was performed by the L2 norm loss function. The L2 norm loss function is also referred to as the least squares error loss function. In the formula, m represents the number of samples in the training sample set.
In S620, the objective of the loss function is to minimize the sum of squares sum. And (3) minimizing the loss function by adjusting the parameters of the graph convolution neural network, and finally obtaining the optimal network parameters. And obtaining the graph convolution neural network after training according to the optimal network parameters. According to the graph convolution neural network, the matching relation between the vehicle and the matching object in the video frame image to be detected can be judged, and the matching relation between the vehicle and the matching object can be accurately obtained.
In one embodiment, S70, predicting the video frame image to be tested according to the trained convolutional neural network, and obtaining a matching relationship between the vehicle and the matching object includes:
acquiring a video frame image to be detected, and acquiring a corresponding vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a matching object type, a matching object detection frame coordinate position and a matching object detection frame identification number according to the video frame image to be detected;
according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matched object, the coordinate position of the matched object detection frame and the identification number of the matched object detection frame, an adjacency matrix of each detection frame is extracted from a real relation matrix of the vehicle and the matched object;
dividing and size-transforming each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame to obtain a plurality of detection frame images;
splicing the characteristics and the distance characteristics of each detection frame image to obtain a plurality of splicing characteristics, and performing characteristic transformation on the plurality of splicing characteristics to obtain a plurality of transformation characteristics;
and inputting the adjacent matrix of each transformation characteristic and each detection frame into a trained graph convolution neural network, and outputting the matching relation between the vehicle and the matching object.
Referring to fig. 2, in an embodiment, the present application provides a vehicle matching relationship determining apparatus 100 based on high-level video surveillance, which includes a data obtaining module 10, a relationship matrix obtaining module 20, a detection block image obtaining module 30, a transformation feature obtaining module 40, a graph convolution neural network module 50, a training module 60, and a vehicle and matching object relationship determining module 70.
The data obtaining module 10 is configured to obtain a plurality of video frame images, and obtain a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a matching object type, a matching object detection frame coordinate position, and a matching object detection frame identification number corresponding to each video frame image according to the plurality of video frame images. The relation matrix obtaining module 20 is configured to construct a real relation matrix of the vehicle and the matching object and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the vehicle detection frame identification number, the matching object type, the coordinate position of the matching object detection frame, and the matching object detection frame identification number; wherein each detection frame is a vehicle detection frame or a matching object detection frame.
The detection frame image obtaining module 30 is configured to divide and size-convert each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame, so as to obtain a plurality of detection frame images. The transformation feature obtaining module 40 is configured to splice the features of each detection frame image with the distance features to obtain a plurality of splicing features, and perform feature transformation on the plurality of splicing features to obtain a plurality of transformation features. The graph convolution neural network module 50 is used for inputting the adjacency matrix of each transformation characteristic and each detection frame into the graph convolution neural network and outputting a prediction relation matrix of the vehicle and the matching object.
The training module 60 is configured to construct a loss function according to the real relationship matrix of the vehicle and the matching object and the predicted relationship matrix of the vehicle and the matching object, and adjust parameters of the convolutional neural network according to the loss function to obtain a trained convolutional neural network. The vehicle-to-matching object relationship determining module 70 is configured to predict the video frame image to be detected according to the trained atlas neural network, so as to obtain a relationship between the vehicle and the matching object.
In this embodiment, the relevant description of the data obtaining module 10 may refer to the relevant description of S10 in the above embodiment. The relevant description of the relation matrix obtaining module 20 may refer to the relevant description of S20 in the above embodiment. The related description of the detection frame image obtaining module 30 may refer to the related description of S30 in the above embodiment. The related description of the transformation feature obtaining module 40 may refer to the related description of S40 in the above embodiment. The related description of the graph convolutional neural network module 50 can refer to the related description of S50 in the above embodiment. The related description of the training module 60 can refer to the related description of S60 in the above embodiment. The description about the vehicle and matching object relationship determination module 70 may refer to the description about S70 in the above-described embodiment.
In one embodiment, the transform feature acquisition module 40 includes a first dimension feature acquisition module (not shown) and a stitching feature acquisition module (not shown). The first dimension characteristic acquisition module is used for acquiring 3H W dimension characteristics of each detection frame image. And the splicing feature acquisition module is used for setting 4H W-dimensional distance features, and performing feature splicing on the 3H W-dimensional features and the 4H W-dimensional distance features of each detection frame image on channel dimensions to obtain a plurality of 7H W-dimensional splicing features.
H × W represents the width and height of the detection frame, and the number of channels 7 includes R channels, G channels, B channels, X coordinate channels, Y coordinate channels, W width channels, and H height channels.
In this embodiment, reference may be made to the description of S410 in the above embodiment for the description of the first-dimension feature obtaining module. The relevant description of the splicing feature obtaining module can refer to the relevant description of S420 in the above embodiment.
In one embodiment, the transformed feature acquisition module 40 further comprises a convolutional layer module (not shown) and a fully-connected layer network module (not shown). The convolutional layer module is used for inputting a plurality of splicing characteristics into a plurality of convolutional layers and outputting a plurality of conversion characteristics. And the full-connection layer network module is used for inputting the plurality of conversion characteristics into a full-connection layer network and outputting the plurality of conversion characteristics.
In this embodiment, reference may be made to the description of S430 in the above embodiment for the related description of the convolutional layer module. The relevant description of the splicing feature obtaining module may refer to the relevant description of S440 in the above embodiment.
In one embodiment, the convolutional neural network module 50 includes a building module (not shown) and a prediction module (not shown). The construction module is used for constructing a graph convolution neural network, and the graph convolution neural network is as follows:
Figure BDA0003500910100000211
Figure BDA0003500910100000212
wherein X represents each transformation feature, A represents a relationship matrix for each detection box,
Figure BDA0003500910100000213
an adjacency matrix representing each detection box, I an identity matrix representing each detection box,
Figure BDA0003500910100000214
degree matrix, W, representing each detection box(0)And W(1)Parameters representing the convolutional neural network of the graph, F1Representing a non-linear activation function, F2Representing a normalization function;
the prediction module is used for outputting a prediction relation matrix of the vehicle and the matching object according to the graph convolution neural network.
In this embodiment, reference may be made to the description of S510 in the above embodiment for the related description of the building block. The relevant description of the prediction module may refer to the relevant description of S520 in the above embodiment.
In one embodiment, the graph convolution neural network in the construction module is:
Figure BDA0003500910100000215
in this embodiment, the related description may refer to the related description in the above embodiments.
In one embodiment, in the building module, a diagonal eigenvalue of the degree matrix of each detection frame is subjected to addition processing to obtain a new degree matrix of each detection frame, and the graph convolution neural network is:
Figure BDA0003500910100000216
Figure BDA0003500910100000217
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003500910100000218
a new adjacency matrix representing each detection box,
Figure BDA0003500910100000219
a freshness matrix representing each detection box.
In this embodiment, reference may be made to the description of the freshness matrix in the above embodiments for related description.
In one embodiment, the training module 60 includes a loss function construction module (not shown) and a parameter optimization module (not shown). The loss function constructing module is used for constructing a loss function, and the loss function is as follows:
Figure BDA0003500910100000221
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003500910100000222
a prediction relation matrix, y, representing the vehicle corresponding to the ith training sample and the matching object(i)Representing a real relation matrix of the vehicle corresponding to the ith training sample and the matching object;
and the parameter optimization module is used for adjusting parameters of the graph convolution neural network so as to minimize the loss function and obtain the trained graph convolution neural network.
In this embodiment, the related description of the loss function building module may refer to the related description of S610 in the above embodiment. The relevant description of the parameter optimization module can refer to the relevant description of S620 in the above embodiment.
In one embodiment, the matching object in the data obtaining module 10 includes a license plate or a parking space, the real relationship matrix of the vehicle and the matching object in the relationship matrix obtaining module 20 includes a real dependency relationship matrix of the vehicle and the license plate or a real occupancy relationship matrix of the vehicle and the parking space, the predicted relationship matrix of the vehicle and the matching object in the graph convolution neural network module 50 includes a predicted dependency relationship matrix of the vehicle and the license plate or a predicted occupancy relationship matrix of the vehicle and the parking space, and the matching relationship of the vehicle and the matching object in the vehicle and matching object relationship determining module 70 includes a dependency relationship of the vehicle and the license plate or an occupancy relationship of the vehicle and the parking space.
The relevant description in the present embodiment may refer to the relevant description in the above embodiments.
In the various embodiments described above, the particular order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.
Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The various illustrative logical blocks or units described in this application may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in the embodiments herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.
The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (14)

1. A vehicle matching relation judgment method based on high-order video monitoring is characterized by comprising the following steps:
acquiring a plurality of video frame images, and acquiring a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a matching object type, a matching object detection frame coordinate position and a matching object detection frame identification number corresponding to each video frame image according to the plurality of video frame images;
constructing a real relation matrix of the vehicle and the matched object and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matched object, the coordinate position of the matched object detection frame and the identification number of the matched object detection frame; each detection frame is a vehicle detection frame or a matching object detection frame;
dividing and size-transforming each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame to obtain a plurality of detection frame images;
splicing the features of each detection frame image with the distance features to obtain a plurality of spliced features, and performing feature transformation on the spliced features to obtain a plurality of transformation features;
inputting the adjacent matrix of each transformation characteristic and each detection frame into a graph convolution neural network, and outputting a prediction relation matrix of the vehicle and a matching object;
constructing a loss function according to the real relation matrix of the vehicle and the matching object and the prediction relation matrix of the vehicle and the matching object, and adjusting the parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network;
and predicting the video frame image to be tested according to the trained image convolution neural network to obtain the matching relation between the vehicle and the matching object.
2. The method for determining vehicle matching relationship based on high-level video surveillance according to claim 1, wherein the step of splicing the feature of each detection frame image with the distance feature to obtain a plurality of spliced features, and performing feature transformation on the plurality of spliced features to obtain a plurality of transformed features comprises:
acquiring 3 x H x W dimensional features of each detection frame image;
setting 4H W dimensional distance features, and performing feature splicing on the 3H W dimensional features and the 4H W dimensional distance features of each detection frame image on channel dimensions to obtain a plurality of 7H W dimensional splicing features;
h × W represents the width and height of the detection frame, and the number of channels 7 includes R channels, G channels, B channels, X coordinate channels, Y coordinate channels, W width channels, and H height channels.
3. The method for determining vehicle matching relationship based on high-order video surveillance according to claim 2, wherein the stitching the feature of each detection frame image with the distance feature to obtain a plurality of stitching features, and performing feature transformation on the plurality of stitching features to obtain a plurality of transformation features, further comprises:
inputting the plurality of stitching features into a plurality of convolutional layers, outputting a plurality of conversion features;
and inputting the plurality of conversion characteristics into a full connection layer network, and outputting the plurality of conversion characteristics.
4. The vehicle matching relationship determination method based on high-level video surveillance as claimed in claim 3, wherein the step of inputting the adjacency matrix of each transformation feature and each detection frame into a convolutional neural network and outputting a prediction relationship matrix of a vehicle and a matching object comprises:
constructing the graph convolution neural network, wherein the graph convolution neural network is as follows:
Figure FDA0003500910090000021
Figure FDA0003500910090000022
wherein X represents each of the transformation features, A represents a relationship matrix for each of the detection boxes,
Figure FDA0003500910090000023
an adjacency matrix representing each of the detection boxes, I represents an identity matrix of each of the detection boxes,
Figure FDA0003500910090000024
a degree matrix, W, representing each of said detection boxes(0)And W(1)Parameters representing the convolutional neural network of the graph, F1Representing a non-linear activation function, F2Representing a normalization function;
and outputting a prediction relation matrix of the vehicle and the matching object according to the graph convolution neural network.
5. The method as claimed in claim 4, wherein in the step of constructing the convolutional neural network, the diagonal eigenvalue of the degree matrix of each detection frame is added with one to obtain a new degree matrix of each detection frame, and the convolutional neural network is:
Figure FDA0003500910090000031
Figure FDA0003500910090000032
wherein the content of the first and second substances,
Figure FDA0003500910090000033
a new adjacency matrix representing each of the detection boxes,
Figure FDA0003500910090000034
a freshness matrix representing each of the detection boxes.
6. The method according to claim 1, wherein the step of constructing a loss function according to the real relationship matrix of the vehicle and the matching object and the predicted relationship matrix of the vehicle and the matching object, and adjusting the parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network comprises:
constructing a loss function, wherein the loss function is as follows:
Figure FDA0003500910090000035
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003500910090000036
a prediction relation matrix, y, representing the vehicle and the matching object corresponding to the ith training sample(i)Representing a real relation matrix of the vehicle and a matching object corresponding to the ith training sample;
and adjusting parameters of the graph convolution neural network to minimize the loss function, so as to obtain the trained graph convolution neural network.
7. The method according to claim 1, wherein the matching object includes a license plate or a parking space, the true relationship matrix of the vehicle and the matching object includes a true dependency relationship matrix of the vehicle and the license plate or a true occupancy relationship matrix of the vehicle and the parking space, the predicted relationship matrix of the vehicle and the matching object includes a predicted dependency relationship matrix of the vehicle and the license plate or a predicted occupancy relationship matrix of the vehicle and the parking space, and the matching relationship of the vehicle and the matching object includes a dependency relationship of the vehicle and the license plate or an occupancy relationship of the vehicle and the parking space.
8. The utility model provides a vehicle matching relation judgement device based on high-order video monitoring which characterized in that includes:
the data acquisition module is used for acquiring a plurality of video frame images and acquiring a vehicle type, a vehicle detection frame coordinate position, a vehicle detection frame identification number, a matching object type, a matching object detection frame coordinate position and a matching object detection frame identification number corresponding to each video frame image according to the plurality of video frame images;
the relation matrix acquisition module is used for constructing a real relation matrix of the vehicle and the matched object and an adjacent matrix of each detection frame according to the vehicle type, the coordinate position of the vehicle detection frame, the identification number of the vehicle detection frame, the type of the matched object, the coordinate position of the matched object detection frame and the identification number of the matched object detection frame; each detection frame is a vehicle detection frame or a matching object detection frame;
the detection frame image acquisition module is used for dividing and size-transforming each video frame image according to the coordinate position of the vehicle detection frame and the coordinate position of the matching object detection frame to obtain a plurality of detection frame images;
the transformation characteristic acquisition module is used for splicing the characteristics and the distance characteristics of each detection frame image to obtain a plurality of splicing characteristics, and performing characteristic transformation on the plurality of splicing characteristics to obtain a plurality of transformation characteristics;
the image convolution neural network module is used for inputting the adjacent matrix of each transformation characteristic and each detection frame into the image convolution neural network and outputting a prediction relation matrix of the vehicle and the matching object;
the training module is used for constructing a loss function according to the real relation matrix of the vehicle and the matching object and the prediction relation matrix of the vehicle and the matching object, and adjusting the parameters of the graph convolution neural network according to the loss function to obtain a trained graph convolution neural network;
and the vehicle and matching object relation determining module is used for predicting the video frame image to be tested according to the trained image convolution neural network to obtain the matching relation between the vehicle and the matching object.
9. The vehicle matching relationship determination device based on high-level video surveillance as claimed in claim 8, wherein the transformation feature obtaining module comprises:
a first dimension feature obtaining module, configured to obtain 3 × H × W dimension features of each of the detection frame images;
the splicing feature acquisition module is used for setting 4H W-dimensional distance features and performing feature splicing on the 3H W-dimensional features of each detection frame image and the 4H W-dimensional distance features on channel dimensions to obtain a plurality of 7H W-dimensional splicing features;
h × W represents the width and height of the detection frame, and the number of channels 7 includes R channels, G channels, B channels, X coordinate channels, Y coordinate channels, W width channels, and H height channels.
10. The apparatus for determining vehicle matching relationship based on high-level video surveillance according to claim 9, wherein the transformed feature obtaining module further comprises:
the convolutional layer module is used for inputting the splicing characteristics into a plurality of convolutional layers and outputting a plurality of conversion characteristics;
and the full-connection layer network module is used for inputting the conversion characteristics into a full-connection layer network and outputting the conversion characteristics.
11. The apparatus for determining vehicle matching relationship based on high-level video surveillance according to claim 10, wherein the graph convolution neural network module comprises:
a construction module, configured to construct the graph convolution neural network, where the graph convolution neural network is:
Figure FDA0003500910090000051
Figure FDA0003500910090000052
wherein X represents each of the transformation features, A represents a relationship matrix for each of the detection boxes,
Figure FDA0003500910090000053
an adjacency matrix representing each of the detection boxes, I represents an identity matrix of each of the detection boxes,
Figure FDA0003500910090000054
a degree matrix, W, representing each of said detection boxes(0)And W(1)Parameters representing the convolutional neural network of the graph, F1Representing a non-linear activation function, F2Representing a normalization function;
and the prediction module is used for outputting a prediction relation matrix of the vehicle and the matching object according to the graph convolution neural network.
12. The vehicle matching relationship determination device based on high-level video monitoring of claim 11, wherein in the building module, a diagonal eigenvalue of the degree matrix of each detection frame is subjected to an addition process to obtain a new degree matrix of each detection frame, and the convolutional neural network is:
Figure FDA0003500910090000055
Figure FDA0003500910090000056
wherein the content of the first and second substances,
Figure FDA0003500910090000057
a new adjacency matrix representing each of the detection boxes,
Figure FDA0003500910090000058
a freshness matrix representing each of the detection boxes.
13. The vehicle matching relationship determination device based on high-level video surveillance as claimed in claim 8, wherein the training module comprises:
a loss function constructing module, configured to construct a loss function, where the loss function is:
Figure FDA0003500910090000061
wherein the content of the first and second substances,
Figure FDA0003500910090000062
a prediction relation matrix, y, representing the vehicle and the matching object corresponding to the ith training sample(i)Representing a real relation matrix of the vehicle and a matching object corresponding to the ith training sample;
and the parameter optimization module is used for adjusting the parameters of the graph convolution neural network so as to minimize the loss function and obtain the trained graph convolution neural network.
14. The apparatus according to claim 8, wherein the matching object in the data obtaining module includes a license plate or a parking space, the real relation matrix of the vehicle and the matching object in the relation matrix obtaining module includes a real dependency relation matrix of the vehicle and the license plate or a real occupancy relation matrix of the vehicle and the parking space, the predicted relation matrix of the vehicle and the matching object in the graph convolutional neural network module includes a predicted dependency relation matrix of the vehicle and the license plate or a predicted occupancy relation matrix of the vehicle and the parking space, and the matching relation of the vehicle and the matching object in the vehicle and matching object relation determining module includes a dependent relation of the vehicle and the license plate or an occupancy relation of the vehicle and the parking space.
CN202210127148.6A 2022-02-11 2022-02-11 Vehicle matching relation judgment method and device based on high-order video monitoring Pending CN114519842A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210127148.6A CN114519842A (en) 2022-02-11 2022-02-11 Vehicle matching relation judgment method and device based on high-order video monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210127148.6A CN114519842A (en) 2022-02-11 2022-02-11 Vehicle matching relation judgment method and device based on high-order video monitoring

Publications (1)

Publication Number Publication Date
CN114519842A true CN114519842A (en) 2022-05-20

Family

ID=81597096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210127148.6A Pending CN114519842A (en) 2022-02-11 2022-02-11 Vehicle matching relation judgment method and device based on high-order video monitoring

Country Status (1)

Country Link
CN (1) CN114519842A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372924A (en) * 2023-10-18 2024-01-09 中国铁塔股份有限公司 Video detection method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372924A (en) * 2023-10-18 2024-01-09 中国铁塔股份有限公司 Video detection method and device
CN117372924B (en) * 2023-10-18 2024-05-07 中国铁塔股份有限公司 Video detection method and device

Similar Documents

Publication Publication Date Title
CN111062413B (en) Road target detection method and device, electronic equipment and storage medium
CN109087510B (en) Traffic monitoring method and device
CN108133172A (en) Method, the analysis method of vehicle flowrate and the device that Moving Objects are classified in video
CN110689043A (en) Vehicle fine granularity identification method and device based on multiple attention mechanism
CN111428558A (en) Vehicle detection method based on improved YO L Ov3 method
CN111178235A (en) Target quantity determination method, device, equipment and storage medium
CN110704652A (en) Vehicle image fine-grained retrieval method and device based on multiple attention mechanism
CN114519842A (en) Vehicle matching relation judgment method and device based on high-order video monitoring
CN117372969B (en) Monitoring scene-oriented abnormal event detection method
CN114842285A (en) Roadside berth number identification method and device
CN117456482B (en) Abnormal event identification method and system for traffic monitoring scene
CN112164223B (en) Intelligent traffic information processing method and device based on cloud platform
CN112784494A (en) Training method of false positive recognition model, target recognition method and device
CN112329886A (en) Double-license plate recognition method, model training method, device, equipment and storage medium
CN116091964A (en) High-order video scene analysis method and system
CN116682101A (en) License plate number recognition method and system
CN114119953A (en) Method for quickly positioning and correcting license plate, storage medium and equipment
CN114782938A (en) License plate character recognition method and device
CN113888494A (en) Artificial intelligence interface pin quality detection method of automobile domain controller
CN114596337B (en) Self-recognition target tracking method and system based on linkage of multiple camera positions
CN117456738B (en) Expressway traffic volume prediction method based on ETC portal data
CN115690799A (en) Multi-node classification roadside berth character recognition method and system
CN116206235A (en) Shadow detection method and system based on high-order video monitoring
CN115909227A (en) Roadside parking management method and system based on human body posture recognition
CN116824520A (en) Vehicle track prediction method and system based on ReID and graph convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination