CN106934378B

CN106934378B - Automobile high beam identification system and method based on video deep learning

Info

Publication number: CN106934378B
Application number: CN201710156201.4A
Authority: CN
Inventors: 李成栋; 丁子祥; 许福运; 张桂青; 郝丽丽
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-03-16
Filing date: 2017-03-16
Publication date: 2020-04-24
Anticipated expiration: 2037-03-16
Also published as: CN106934378A

Abstract

The invention discloses an automobile high beam identification system and method based on video deep learning, wherein the system comprises the following two parts: the foreground part is used for realizing the identification and processing of the high beam violation behaviors and comprises a road monitoring equipment module, a video processing and identifying module, an identification result processing module and a database of the violation results to be detected, which are connected in sequence; the background part is used for processing the video and realizing the deep learning of the video and comprises a key frame extraction algorithm, a labeled database and a deep learning module, wherein the labeled database is constructed by calling the key frame extraction algorithm to extract the key frame from the original video data, the data in the labeled database is used for the training of the deep learning module, and the trained deep learning module and the key frame extraction algorithm are used for the calling of the video processing and recognition module. The invention automatically analyzes and identifies the monitoring video, ensures the completeness of law enforcement evidence, is similar to manual judgment and has intelligence.

Description

Automobile high beam identification system and method based on video deep learning

Technical Field

The invention relates to an automobile high beam identification system, in particular to an automobile high beam identification system and method based on video deep learning. Belongs to the technical field of intelligent traffic.

Background

Since the innovation is open, the economy of China is continuously, stably and rapidly developed, the living standard of people in China is improved unprecedentedly, and more people in China have private vehicles. The rapid increase of the number of private cars brings convenience to people going out, and meanwhile, the occurrence frequency of traffic accidents is higher and higher.

There are many reasons for traffic accidents, many of which are caused by improper use of high beam lights. At present, the violation of the high beam is mainly supervised by the traffic police, and due to the limitation of police force and time, the violation of all the high beams cannot be effectively supervised. In addition, some high beam snapshot systems developed in recent years all recognize snapshot pictures, but these methods have certain limitations, which are expressed in that: 1) the number of the captured high beam pictures is small and inconsistent, the high beam pictures are likely to be generated by a driver during normal use and are easily misjudged as disorder high beam, so that the pictures are not sufficient as law enforcement evidence; 2) in order to obtain the pictures, a plurality of capturing devices are often additionally erected at the same place, so that the manufacturing cost is high; 3) the originally laid video monitoring equipment cannot be completely utilized, and resource waste is caused.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an automobile high beam identification system based on video deep learning.

The invention also provides an automobile high beam identification method based on video deep learning corresponding to the system.

In order to achieve the purpose, the invention adopts the following technical scheme:

an automobile high beam identification system based on video deep learning comprises the following two parts:

the foreground part is used for realizing the identification and processing of the high beam violation behaviors and comprises a road monitoring equipment module, a video processing and identifying module, an identification result processing module and a database of the violation results to be detected, which are connected in sequence;

the background part is used for processing the video and realizing the deep learning of the video and comprises a key frame extraction algorithm, a labeled database and a deep learning module, wherein the labeled database is constructed by calling the key frame extraction algorithm to extract the key frame from the original video data, the data in the labeled database is used for the training of the deep learning module, and the trained deep learning module and the key frame extraction algorithm are used for the calling of the video processing and recognition module.

As one of the preferable technical solutions, the key frame extraction algorithm is a clustering-based key frame extraction algorithm.

As one of the preferable technical solutions, the deep learning module is a CNN + LSE (convolutional neural network + least square estimation) -based deep learning module.

The system corresponds to an automobile high beam identification method based on video deep learning, and the method specifically comprises the following steps:

(1) the road monitoring equipment module acquires driving video data of the automobile and transmits the driving video data to the video processing and identifying module;

(2) the video processing and identifying module calls a key frame extraction algorithm to extract key frames of video data, then graying operation is carried out, the grayed key frames are used as input, a deep learning module which is trained according to a database with labels and is based on CNN + LSE is called, output labels of all key frames are obtained, the output labels comprise dipped headlights, fog lamps or high beam lamps, and the labels are assigned to corresponding key frame images;

(3) and (3) taking the video data and the key frame with the label obtained in the step (2) as the input of a recognition result processing module for judging whether the vehicle violates the regulations, embedding a license plate recognition system in the recognition result processing module, extracting the license plate of the target vehicle when the target vehicle has the high beam violation behaviors, acquiring the vehicle information, and importing the suspected violation video data into a database of the violation results to be detected.

In the step (2), the key frame extraction algorithm is as follows:

(2-1) taking the ith segment V in the original video database_iExtracting n frames at equal time intervals and using F_i,jNaming the frame at the jth moment of the ith video data, and representing the key frame sequence of the corresponding video data as { F }_i,1,F_i,2,...,F_i,nIn which F_i,₁Is the first frame, F_i,nIs a tail frame; defining the similarity between two adjacent frames as the similarity of histograms of the two adjacent frames (namely histogram characteristic difference), and controlling the clustering density by a predefined threshold value delta; wherein i, j and n are integers;

(2-2) selecting the first frame F_i,1Is the initial cluster center and calculates frame F_i,jSimilarity with initial cluster center, if the value is less than delta, judging that the distance between the frame and the cluster center frame is too large, therefore, F_i,jCannot be added to the cluster; if F_i,jSimilarity with all clustering centers is less than delta, F_i,jForm aA new cluster, F_i,jIs a new cluster center; otherwise, adding the frame into the cluster with the maximum similarity to the frame, and enabling the distance between the frame and the center of the cluster to be minimum;

(2-3) repeating (2-2) to convert the original video data V_iAfter the n frames extracted in (1) are respectively classified into different clusters, the key frames can be selected: extracting the frame nearest to the cluster center from each cluster as the representative frame of the cluster, wherein the representative frames of all clusters form the original video data V_iThe key frame of (1).

In the step (2), the construction method of the database with the tags comprises the following steps:

the method comprises the steps of taking a large amount of vehicle running video data under a big data background as original video data, calling a key frame extraction algorithm based on clustering to the original video data to extract key frames, manually judging the light types of vehicles in the key frames, and adding labels to each key frame to enable the original key frames to become labeled data, wherein the label types comprise: three types of dipped headlight, fog light and high beam are respectively represented by-1, 0 and 1; storing the key frame data with the label into a labeled database, wherein the data in the labeled database are the original video data and the labeled key frame thereof, and the labeled key frame is represented as (F)_i,jK), where k takes the value-1, 0 or 1.

In the step (2), a construction method of the CNN + LSE-based deep learning module is that a LeNet5 convolutional neural network structure is adopted, the module is divided into eight layers, the first six layers are a feature extraction part, the second two layers are a classifier part, wherein the feature extraction layer adopts a classical convolutional neural network structure, and the classifier layer adopts a full-connection structure; the module takes data in a labeled database as training data, a CNN + LSE combined algorithm is adopted to train the deep learning module, a CNN method is adopted to train the feature extraction part, and an LSE method is adopted to train the classifier layer so as to realize the rapid learning of module parameters and enhance the generalization capability of the module.

The specific method comprises the following steps:

inputting a video key frame in a database with labels into a first layer of a CNN + LSE-based deep learning module; performing convolution operation on the upper layer output by adopting different convolution cores in the second layer; the third layer performs pooling (down-sampling) on the upper layer output; the fourth layer and the fifth layer repeat the operations of the second layer and the third layer; the sixth layer sequentially expands the output characteristics of the upper layer and arranges the output characteristics into a line; the seventh layer is fully interconnected with the upper layer output features; the last layer is also in a form of full interconnection with the upper layer. The output of the deep learning module based on CNN + LSE is in three cases: low beam, fog and high beam, denoted-1, 0 and 1, respectively.

The deep learning module based on CNN + LSE is trained as follows:

taking any sample from the tagged database (F)_i,jK) to F_i,jFirstly, graying operation is carried out to change the key frame into a grayscale image, and then the grayed key frame F is converted into a grayscale image_i,j' input into the module, i.e. input data as (F)_i,j', k); training the two parts of the deep learning module by adopting a CNN (common noise network) and LSE (least squares) method respectively; the parameter training method of the feature extraction part comprises the following steps:

(2-a1) initializing all connection weight parameters of the feature extraction part in the deep learning module;

(2-A2) calculating the actual output label O corresponding to the input key frame_k；

(2-A3) calculating actual output label O_kDifference from the corresponding ideal output label k;

(2-A4) weight learning: reversely transmitting and adjusting a connection weight parameter matrix of a feature extraction part in the deep learning module by a method of minimizing errors;

(2-A5) until all the key frames of the video data are traversed, and the parameter training is finished;

the parameter training method of the classifier part is as follows:

(2-B1) connection weights and biases between the rasterized layer and the fully-connected layer are randomly generated and the fully-connected layer output is written

Is a matrix

Wherein G (-) is an activation function, a_iTo connect weights, b_iFor bias, L is the number of nodes of the full link layer, N is the number of all key frames, x_jKey frame, i ═ 1,2, …, L, j ═ 1,2, …, N;

(2-B2) writing the net output result of the corresponding key frame as an output vector Y ═ Y₁y₂… y_n]^TWherein y is_jFor the jth key frame x_jA corresponding output tag;

(2-B3) calculating an output weight β ═ PHY between the fully-connected layer and the output layer, where P ═ H^TH)^-1。

In the step (3), the data in the database of the violation results to be detected is the video data judged to be violating the regulations by the identification result processing module, wherein the violation results to be detected should be manually checked, then the information which is confirmed to be correct is imported into the database of the violation regulations, and the information which is judged to be correct is deleted.

In the step (3), the method for judging whether the high beam violation exists is as follows: keyframe labeled as high beam

And its next key frame

Time interval Δ T between j₂-j₁If the delta T is larger than or equal to theta, the vehicle has the phenomenon that the high beam violates the regulations, wherein the theta is a violation time threshold value.

The invention has the beneficial effects that:

the invention automatically analyzes and identifies the monitoring video, ensures the completeness of law enforcement evidence, is similar to manual judgment, has intelligence, is simple in equipment arrangement, and can fully utilize the original monitoring equipment. The method comprises the following specific steps:

(1) by mining the video data, the sufficiency of law enforcement evidence is greatly improved on the basis of ensuring the accuracy, and the loss of an evidence chain is prevented when the high beam violates the law;

(2) the requirement of the same point position on the number of the devices is low, and a large amount of originally distributed monitoring devices can be directly reused, so that the cost is reduced, and the utilization rate of the devices is improved;

(3) the intelligent judgment of high beam violation is carried out by adopting a video deep learning-based mode, so that manual law enforcement is replaced, real automation is realized, and the efficiency is improved; meanwhile, after deep learning, the high beam violation identification effect is expected to reach or exceed the manual identification level, so that the real intellectualization of the identification system is realized;

(4) the deep learning module performs parameter learning on the system by adopting a CNN + LSE method, so that the parameter learning speed of the system is higher, the generalization capability of the module is stronger, and the robustness of the system is improved.

Drawings

FIG. 1 is a schematic diagram of the system architecture of the present invention;

fig. 2 is a diagram of a CNN + LSE-based deep learning module architecture.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, an automobile high beam identification system based on video deep learning includes the following two parts:

The key frame extraction algorithm is a key frame extraction algorithm based on clustering; the deep learning module is a CNN + LSE-based deep learning module.

(1) the road monitoring equipment module obtains the driving video data of the automobile and transmits the driving video data to the video processing and identifying module.

(2) The video processing and recognition module calls a key frame extraction algorithm to extract key frames of original video data, then graying operation is carried out, the grayed key frames are used as input, a trained deep learning module with a label database based on CNN + LSE is called, output labels of all key frames, including dipped headlights, fog lights or high beam lights, are obtained, and the labels are assigned to corresponding key frame images.

The key frame extraction algorithm is as follows:

(2-1) taking the ith segment V in the original video database_iExtracting n frames at equal time intervals and using F_i,jNaming the frame at the jth moment of the ith video data, and representing the key frame sequence of the corresponding video data as { F }_i,1,F_i,2,...,F_i,nIn which F_i,1Is the first frame, F_i,nIs a tail frame; defining the similarity between two adjacent frames as the similarity of histograms of the two adjacent frames (namely histogram characteristic difference), and controlling the clustering density by a predefined threshold value delta; wherein i, j and n are integers;

(2-2) selecting the first frame F_i,1Is the initial cluster center and calculates frame F_i,jSimilarity with initial cluster center, if the value is less than delta, judging that the distance between the frame and the cluster center frame is too large, therefore, F_i,jCannot be added to the cluster; if F_i,jSimilarity with all clustering centers is less than delta, F_i,jForm a new cluster, F_i,jIs a new cluster center; otherwise, adding the frame into the cluster with the maximum similarity to the frame, and enabling the distance between the frame and the center of the cluster to be minimum;

The construction method of the database with the labels comprises the following steps:

As shown in fig. 2, the deep learning module based on CNN + LSE is constructed by adopting a LeNet5 convolutional neural network structure, wherein the module is divided into eight layers, the first six layers are a feature extraction part, and the second two layers are a classifier part, wherein the feature extraction layer adopts a classical convolutional neural network structure, and the classifier layer adopts a full-connection structure; the module takes data in a labeled database as training data, a CNN + LSE combined algorithm is adopted to train the deep learning module, a CNN method is adopted to train the feature extraction part, and an LSE method is adopted to train the classifier layer so as to realize the rapid learning of module parameters and enhance the generalization capability of the module. The specific method comprises the following steps: inputting a video key frame in a database with labels into a first layer of a CNN + LSE-based deep learning module; performing convolution operation on the upper layer output by adopting different convolution cores in the second layer; the third layer performs pooling (down-sampling) on the upper layer output; the fourth layer and the fifth layer repeat the operations of the second layer and the third layer; the sixth layer sequentially expands the output characteristics of the upper layer and arranges the output characteristics into a line; the seventh layer is fully interconnected with the upper layer output features; the last layer is also in a form of full interconnection with the upper layer. The output of the deep learning module based on CNN + LSE is in three cases: low beam, fog and high beam, denoted-1, 0 and 1, respectively.

The deep learning module based on CNN + LSE is trained as follows:

the parameter training method of the classifier part is as follows:

Is a matrix

(2-B2) corresponding to the key frameThe network output result is written as an output vector Y ═ Y₁y₂… y_n]^TWherein y is_jFor the jth key frame x_jA corresponding output tag;

(3) And (3) taking the original video data and the key frame with the label obtained in the step (2) as the input of a recognition result processing module for judging whether the vehicle violates the regulations, embedding a license plate recognition system in the recognition result processing module, extracting the license plate of the target vehicle when the target vehicle has the high beam violation behaviors, acquiring the vehicle information, and importing the suspected violation video data into a database of the violation results to be detected.

The method for judging whether the high beam violation behaviors exist is as follows: keyframe labeled as high beam

And its next key frame

(4) The data in the database of the result of the violation to be detected is the video data judged to be violating the regulations by the identification result processing module, wherein the result of the violation to be detected should be manually checked, then the information which is confirmed to be correct is imported into the database of the violation to be detected, and the misjudged information is deleted.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto, and various modifications and variations which do not require inventive efforts and which are made by those skilled in the art are within the scope of the present invention.

Claims

1. A method for recognizing a high beam of an automobile based on video deep learning is characterized by comprising the following specific steps:

(2) the video processing and identifying module calls a key frame extraction algorithm to extract key frames of original video data, then graying operation is carried out, the grayed key frames are used as input, a trained deep learning module with a label database based on CNN + LSE is called, output labels of all key frames, including dipped headlights, fog lights or high beam lights, are obtained, and the labels are assigned to corresponding key frame images;

(3) the original video data and the key frame with the label obtained in the step (2) are used as the input of an identification result processing module for judging whether the vehicle violates the regulations, a license plate identification system is embedded in the identification result processing module, when the target vehicle has a high beam violation behavior, a license plate is extracted to obtain vehicle information, and the suspected violation video data is imported into a database of the violation result to be detected;

And its next key frame

Time interval △ T ═ j between₂-j₁If △ T is larger than or equal to theta, the vehicle has a high beam violation phenomenon, wherein theta is a violation time threshold;

in the step (2), the key frame extraction algorithm is as follows:

(2-1) taking the ith segment V in the original video database_iExtracting n frames at equal time intervals and using F_i,jNaming the frame at the jth moment of the ith video data, and representing the key frame sequence of the corresponding video data as { F }_i,1,F_i,2,...,F_i,nIn which F_i,1Is the first frame, F_i,nIs a tail frame; defining neighborsThe similarity between two frames is the similarity of histograms of two adjacent frames, namely the histogram feature difference, and a predefined threshold value delta controls the clustering density; wherein i, j and n are integers;

(2-2) selecting the first frame F_i,1Is the initial cluster center and calculates frame F_i,jSimilarity with the initial cluster center, if the similarity is less than delta, the frame F is judged_i,jToo large a distance from the cluster center frame, and therefore, F_i,jCannot be added to the cluster; if F_i,jSimilarity with all clustering centers is less than delta, F_i,jForm a new cluster, F_i,jIs a new cluster center; otherwise, the frame F is processed_i,jAdding the frame F into the cluster with the maximum similarity to the frame F_i,jThe distance from the center of this cluster is minimal;

(2-3) repeating (2-2) to convert the original video data V_iAfter the n frames extracted in (1) are respectively classified into different clusters, the key frames can be selected: extracting the frame nearest to the cluster center from each cluster as the representative frame of the cluster, wherein the representative frames of all clusters form the original video data V_iThe key frame of (1);

the method comprises the steps of taking a large amount of vehicle running video data under a big data background as original video data, calling a key frame extraction algorithm based on clustering to the original video data to extract key frames, manually judging the light types of vehicles in the key frames, and adding labels to each key frame to enable the original key frames to become labeled data, wherein the label types comprise: three types of dipped headlight, fog light and high beam are respectively represented by-1, 0 and 1; storing the key frame data with the label into a labeled database, wherein the data in the labeled database are the original video data and the labeled key frame thereof, and the labeled key frame is represented as (F)_i,jK), wherein k is-1, 0 or 1;

in the step (2), a construction method of the CNN + LSE-based deep learning module is that a LeNet5 convolutional neural network structure is adopted, the module is divided into eight layers, the first six layers are a feature extraction part, the second two layers are a classifier part, wherein the feature extraction layer adopts a classical convolutional neural network structure, and the classifier layer adopts a full-connection structure; taking data in a database with labels as training data, training a deep learning module by adopting a CNN + LSE combined algorithm, training a characteristic extraction part by adopting a CNN method, and training a classifier layer by adopting an LSE method;

the deep learning module based on CNN + LSE is trained as follows:

the parameter training method of the classifier part is as follows:

(2-B1) connection weights and biases between rasterized and fully-connected layers are randomly generated and the fully-connected layer output is written as a matrix

Wherein G (-) is an activation function, a_iTo connect weights, b_iFor bias, L is the number of nodes of the full link layer, N is the number of all key frames, x_jFor key frames, i ═1,2,…,L，j＝1,2,…,N；

(2-B2) writing the net output result of the corresponding key frame as an output vector Y ═ Y₁y₂…y_n]^TWherein y is_jFor the jth key frame x_jA corresponding output tag;

2. The method as claimed in claim 1, wherein in step (3), the data in the database of the violation results to be detected is the video data judged to be a violation by the identification result processing module, wherein the violation results to be detected should be manually checked, and then the information without error is imported into the database of the violation rules and rules, and the information with error judgment is deleted.