CN111695483A

CN111695483A - Vehicle violation detection method, device and equipment and computer storage medium

Info

Publication number: CN111695483A
Application number: CN202010504223.7A
Authority: CN
Inventors: 陈克凡
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-09-22
Anticipated expiration: 2040-06-05
Also published as: CN111695483B

Abstract

The application provides a vehicle violation detection method, a device, equipment and a computer storage medium, relates to the technical field of computers, and aims to optimize a process of detecting a vehicle violating the regulations based on a machine vision technology. The method comprises the following steps: acquiring a target area image acquired by vehicle-mounted image acquisition equipment; carrying out image recognition on the target area image by using the trained lightweight convolutional neural network, and determining road elements and vehicle information in the target area image; and when determining that the target area image has a vehicle meeting the violation conditions based on the road elements and the vehicle information in the target area image, sending violation report information at least comprising the target area image to an application server, wherein the lightweight convolutional neural network is obtained by training image sample data subjected to data annotation and data enhancement, and the data annotation comprises annotated road elements and vehicle information. The method directly detects the violation of the vehicle on the vehicle-mounted equipment, and saves the flow consumption in the violation of the vehicle detection process.

Description

Vehicle violation detection method, device and equipment and computer storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer storage medium for vehicle violation detection.

Background

The vehicle makes things convenient for people's life, but under the condition that the vehicle on the road is constantly increased, the traffic accident that comes with also constantly increases, has brought very big potential safety hazard, consequently becomes more important to the supervision of violating the regulations of vehicle.

At present, a camera is often installed at a fixed position of a road, a large number of vehicle running conditions are identified according to images shot by the camera, and then whether the vehicle violates the regulations is judged, but the implementation of the method needs to upload each frame of image collected by the camera to a server, the consumed flow is large, a large amount of computing power is consumed on the server to intensively identify all the vehicle running conditions shot by the cameras, the efficiency of vehicle violation detection can be reduced in the process, and therefore, the problem that how to reduce the flow consumption in the vehicle violation detection process and improve the detection efficiency needs to be considered is solved.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for detecting vehicle violation and a computer storage medium, which are used for improving the efficiency of detecting the vehicle violation.

In a first aspect of the present application, a vehicle violation detection method is provided, including:

acquiring a target area image of the periphery of the vehicle through a vehicle-mounted image acquisition device installed on the vehicle;

carrying out image recognition on the target area image by utilizing a lightweight convolutional neural network, and determining road elements and vehicle information in the target area image;

and based on the road elements and the vehicle information, when determining that the target area image has the vehicle meeting the violation conditions, sending violation report information to an application server through a mobile communication network, wherein the violation report information at least comprises the target area image.

In one possible implementation, the vehicle information includes a vehicle type, and the violation condition corresponds to the vehicle type.

In a second aspect of the present application, there is provided a vehicle violation detection device, comprising:

the image acquisition unit is used for acquiring a target area image around the vehicle, and the target area image is acquired through a vehicle-mounted image installed on the vehicle;

the image identification unit is used for carrying out image identification on the target area image by utilizing a trained lightweight convolutional neural network and determining road elements and vehicle information in the target area image, wherein the lightweight convolutional neural network is obtained by utilizing image sample data after data marking and data enhancement, and the data marking comprises marked road elements and vehicle information;

and the violation determining unit is used for sending violation report information to an application server through a mobile communication network when determining that the target area image has a vehicle meeting violation conditions based on the road elements and the vehicle information in the target area image, wherein the violation report information at least comprises the target area image.

In one possible implementation, the trained lightweight convolutional neural network includes a set number of deep separable convolutional layers; the depth separable convolution layers comprise depthwise convolution and pointwise convolution position convolution, and the set number of depth separable convolution layers are used for carrying out convolution operation on the target area image for multiple times to obtain road elements and vehicle information in the target area image.

In a possible implementation manner, the image recognition unit is specifically configured to obtain the trained lightweight convolutional neural network by:

training the convolutional neural network established based on deep learning by using the image sample data to obtain the trained convolutional neural network;

determining the loss influence degree of each convolution kernel parameter in the trained convolutional neural network, and cutting off the convolution kernel of which the loss influence degree is smaller than an influence degree threshold value in the trained convolutional neural network to obtain the trained lightweight convolutional neural network, wherein the loss influence degree represents the influence degree of the convolution kernel parameter on the loss function of the trained convolutional neural network.

and carrying out parameter value quantization processing on the floating point type parameter value of each convolution kernel parameter in the trained convolutional neural network to obtain the trained lightweight convolutional neural network.

In one possible implementation, the road element includes one or more of a lane line, a pedestrian, a traffic light, a traffic sign, the trained lightweight convolutional neural network includes a first lightweight convolutional neural network, or the trained lightweight convolutional neural network includes a first lightweight convolutional neural network and a second lightweight convolutional neural network, wherein:

the first lightweight convolutional neural network is used for carrying out target detection on the target area image and determining vehicle information and road elements except the lane line in the target area image;

and the second lightweight convolutional neural network is used for carrying out lane line detection on the target area image and determining a lane line in the target area image.

In one possible implementation, the vehicle violation detection device further includes:

and the image recognition accelerating unit is used for performing Neon assembly acceleration on the codes for operating the trained lightweight convolutional neural network so as to support parallel processing on the convolution operation of the same convolutional layer in the trained lightweight convolutional neural network through single instruction multiple data Stream (SIMD) when the trained lightweight convolutional neural network is used for performing image recognition on the target area image.

In one possible implementation, the violation condition includes one or more of crossing a solid line to change lanes, not avoiding a pedestrian, running a red light, speeding.

In a third aspect of the present application, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect and any one of the possible embodiments when executing the program.

In a fourth aspect of the present application, a computer-readable storage medium is provided, which stores computer instructions that, when executed on a computer, cause the computer to perform the method according to the first aspect and any one of the possible embodiments.

Due to the adoption of the technical scheme, the embodiment of the application has at least the following technical effects:

on one hand, on the one hand, the image recognition is carried out in the vehicle-mounted equipment of each vehicle through the trained lightweight convolutional neural network, so that the vehicles meeting the violation conditions can be directly recognized on the vehicle-mounted equipment of each vehicle, the target area image does not need to be concentrated to an application server for processing, and the vehicle violation detection efficiency is improved; on the other hand, image recognition is directly carried out in the vehicle-mounted equipment of each vehicle, and each frame of target area image acquired by the vehicle-mounted image acquisition equipment of each vehicle does not need to be sent to the application server, so that the flow consumption in the vehicle violation detection process is reduced.

Drawings

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a vehicle violation detection method according to an embodiment of the present application;

fig. 3 is a scene schematic diagram for acquiring an image of a target area according to an embodiment of the present disclosure;

FIG. 4 is an exemplary illustration of an image of a target area for a vehicle having a violation condition present as provided by an embodiment of the present application;

FIG. 5 is a graph illustrating a comparison of the computation of a standard convolutional layer and the computation of a depth separable convolutional layer, provided in an embodiment of the present application;

fig. 6 is a schematic diagram of a process for training a lightweight convolutional neural network according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram illustrating a principle that a lane line detection network identifies a lane line in a target area image according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating a principle of target detection performed by a target detection network according to an embodiment of the present application;

FIG. 9 is an exemplary illustration of an image of a target area for a vehicle having a violation condition present as provided by an embodiment of the present application;

FIG. 10 is an exemplary diagram of a target area image provided by an embodiment of the present application;

FIG. 11 is an exemplary illustration of an image of a target area for a vehicle having a violation condition present as provided by an embodiment of the present application;

FIG. 12 is a flow chart of one particular example of vehicle violation detection provided by an embodiment of the present application;

fig. 13 is a schematic structural diagram of a vehicle violation detection device according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a mobile terminal device according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the drawings and specific embodiments.

In order to facilitate those skilled in the art to better understand the technical solutions of the present application, the following description refers to the technical terms of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Convolutional Neural Networks (CNN): the method is a feedforward neural network which comprises convolution calculation and has a deep structure, and supervised learning can be performed through labeled training data, so that tasks such as visual image recognition, target detection and the like are completed.

Deep learning: deep learning is to learn the internal rules and the expression levels of sample data (such as images, voice and texts), so that a machine can have the ability of analyzing and learning like a human, can recognize data such as characters, images and sound, and is widely applied to the field of artificial intelligence, wherein a convolutional neural network is a common structure in deep learning.

The scheme provided by the embodiment of the application mainly relates to the computer vision technology of artificial intelligence and the deep learning technology in machine learning, and the like, and is specifically explained by the following embodiment.

The following explains the concept of the present application.

At present, when vehicles are subjected to violation detection, a violation detection mode is that a camera is arranged at a fixed position of a road to shoot images, the camera sends the shot images to a server frame by frame, the server identifies a large number of vehicle running conditions according to the images shot by the camera, and then judges whether the vehicles violate the regulations, but the implementation of the mode needs the server to intensively identify all the vehicle running conditions shot by the camera, the load of the server is large, and each frame of image collected by the camera needs to be uploaded to the server in the process, so that great flow is consumed; on the other hand, the camera installed at the fixed position can only cover a small number of violation scenes, and if more violation scenes are covered, a large number of cameras need to be added, so that the cost is high.

Another violation detection mode is that the conditions of surrounding roads are shot through a vehicle-mounted panoramic vehicle traveling recorder, whether the current scene is a scene in which the vehicle is prone to causing the vehicle congestion violation is judged by combining the vehicle speed and the position information of a map GPS, the possibility of the vehicle causing the vehicle congestion violation is judged according to the vehicle speed and the recognition condition of a license plate in a time sequence, if the possibility is larger than a certain threshold value, whether the vehicle violates the regulations really is judged by a vehicle driver and reported, but the vehicle congestion violation detection mode can only detect the vehicle congestion violation according to the vehicle speed and the GPS information, and the number of recognized violation behaviors is small.

In view of the above, the inventor designs a vehicle violation detection method, a device, equipment and a computer storage medium, in the method, it is considered that a large amount of traffic is consumed when an image is concentrated in a server for identification, and the load of the server is increased, so it is considered that environmental information around a current vehicle can be sensed by vehicle-mounted equipment mounted on the current vehicle by using an artificial intelligence technology, and whether a vehicle meeting a violation condition exists around the current vehicle is determined based on the sensed environmental information; specifically, in the embodiment of the application, the vehicle-mounted device performs image recognition on the target area graph around the current vehicle, which is acquired by the vehicle-mounted image acquisition device, by using a computer vision technology, determines the environmental information around the current vehicle based on the image recognition result, and then judges whether the vehicle meeting the violation condition exists based on the recognized environmental information around the current vehicle.

At present, a convolutional neural network for image recognition is usually created based on deep learning, the convolutional neural network can recognize the characteristics of an image in a target region by learning the characteristics of image sample data, but in order to ensure the accuracy of image recognition, the depth and the complexity of the convolutional neural network are increasingly large, and the calculation capacity required for operating the convolutional neural network is high; because the vehicle-mounted equipment has limited computing capacity and the convolutional neural network with large depth and complexity cannot run in the vehicle-mounted equipment, the embodiment of the application considers that the scale and the size of the convolutional neural network can be reduced through some processing modes to obtain the trained lightweight convolutional neural network, so that the trained lightweight convolutional neural network can run in the vehicle-mounted equipment, the image recognition can be carried out on the target area image on the vehicle-mounted equipment by using the trained lightweight convolutional neural network, and whether vehicles meeting violation conditions exist in the target area image is further judged.

Further, in order to reduce the traffic consumption of transmitting the target area image to the application server, the target area image related to the vehicle detected to have the violation condition may be transmitted to the server without transmitting the acquired target area image of each frame to the server.

In order to more clearly understand the design concept of the present application, an application scenario is given below.

Referring to fig. 1, the application scenario includes at least one vehicle 100 and an application server 200, each vehicle is installed with a vehicle-mounted device 110 and a vehicle-mounted image capturing device 120, and the vehicle-mounted device 110 and the vehicle-mounted image capturing device 120 are in communication with each other, where:

the vehicle-mounted image acquisition device 120 is used for acquiring a target area image of the periphery of the vehicle 100 and transmitting the acquired target area image to the vehicle-mounted device 110 for processing.

The vehicle-mounted device 110 is configured to perform image recognition on the target area image by using the trained lightweight convolutional neural network, and send violation report information to the application server 200 through the mobile communication network when it is determined that the vehicle meeting the violation condition exists in the target area image according to the image recognition result, where the violation report information at least includes the target area image.

The vehicle-mounted device 110 may be a vehicle-mounted processor installed on the vehicle 100, the vehicle-mounted image capturing device 120 may be a vehicle recorder or an intelligent vehicle-mounted rearview mirror installed on the vehicle 100, and the vehicle-mounted device 110 and the vehicle-mounted image capturing device 120 may be the same device installed on the vehicle 100, such as the intelligent vehicle-mounted rearview mirror, or may be different devices installed on the vehicle 100.

With continued reference to fig. 1, the vehicle-mounted image capturing device 120 may be installed at a middle position or a middle upper position of a windshield at the front end of the vehicle 100, or may be installed at other positions of the vehicle 100; the vehicle-mounted device 110 may be installed at a middle position of a windshield at the front end of the vehicle 100, or at a position to the left of the middle or to the right of the middle, or may be installed at other positions of the vehicle 100, and those skilled in the art may install the device according to actual settings.

The intelligent vehicle-mounted rearview mirror is an intelligent rearview mirror installed on a vehicle, and is provided with an independent operating system and an independent running space, programs provided by third-party service providers such as software, games and navigation can be installed by a user, wireless network access can be achieved through WIFI or a mobile communication network, and meanwhile, the intelligent rearview mirror with functions of driving recording, GPS positioning, electronic speed measurement reminding, backing visual, real-time online audio-video entertainment and the like can be provided.

Based on the application scenario of fig. 1, the following describes an example of a vehicle violation detection method in the embodiment of the present application, please refer to fig. 2, which is applied to a vehicle-mounted device 110 installed on a vehicle 100, and specifically includes the following steps:

step S201 is to acquire a target area image of the periphery of the vehicle, the target area image being captured by an onboard image mounted on the vehicle.

The target area image is obtained by shooting a target area around the vehicle by the vehicle-mounted image acquisition device 120, the target area may be an area formed by a vertical shooting range of a surrounding area which is 30 degrees upwards and 30 degrees downwards and a horizontal shooting range of a surrounding area which is 50 degrees rightwards and 50 degrees leftwards in front of the vehicle-mounted image acquisition device 120, or the target area is determined according to a shooting range which is preset by the vehicle-mounted image acquisition device 120, and the target area may be set by a person skilled in the art according to actual requirements.

Referring to fig. 3, an example of a scene for acquiring an image of a target area is given, where a road of the scene includes a motorway and a non-motorway, the road includes a vehicle 100, a traffic light 302, a zebra crossing 303, a bus 304, a sports car 305, a pedestrian 306, and the like, a horizontal shooting range of the on-board image capturing device 120 of the vehicle 100 in the scene is shown as a range 301 formed by two solid lines from the on-board image capturing device 120 in the figure, where a vertical shooting range of the on-board image capturing device 120 may be determined according to an actual configuration of the on-board image capturing device 120, and no example is given here.

Step S202, carrying out image recognition on the target area image by using a trained lightweight convolutional neural network, and determining road elements and vehicle information in the target area image, wherein the lightweight convolutional neural network is obtained by using image sample data after data annotation and data enhancement, and the data annotation comprises annotated road elements and vehicle information.

As an embodiment, the trained lightweight convolutional neural network may be obtained by performing at least one of a structural optimization process on convolutional layers of the convolutional neural network when the convolutional neural network is created, a pruning process on the trained convolutional neural network, or a parameter value quantization process on the trained convolutional neural network, so as to simplify the convolutional neural network, where the convolutional neural network is created based on deep learning and is used for performing image recognition on a target area image, and determining road elements and vehicle information in the target area image.

Vehicle information: information relating to the vehicle itself, such as one or more of license plate information, vehicle type, vehicle location, etc.

As an example, the road elements in the embodiments of the present application may include, but are not limited to, one or more of a lane line, a traffic sign or an object appearing on a road on which a vehicle may travel, the traffic sign may include a traffic sign (e.g., a sign for prohibiting a motor vehicle from entering, prohibiting a whistle, paying attention to a pedestrian, etc.) or a traffic symbol on the road (e.g., a straight arrow, a left-turn arrow, a right-turn arrow, a zebra crossing, etc.), etc., and the object may include, but is not limited to, a traffic signal, a pedestrian, a bicycle, an electric vehicle, a bus stop board, etc.; see, e.g., zebra crossing 303, traffic light 302, pedestrian 306, etc. illustrated in fig. 3.

In consideration of further improving the efficiency of vehicle violation detection, when deploying the code for running the trained lightweight convolutional neural network on the vehicle-mounted device 110, the processing efficiency of the convolutional operation of the trained lightweight convolutional neural network during image recognition can be improved by a certain deployment method, so that the processing message of the trained lightweight convolutional neural network is further improved.

Specifically, when deploying the code for running the trained lightweight convolutional neural network on the vehicle-mounted device 110, the Neon assembly acceleration may be performed on the code for running the trained lightweight convolutional neural network, so that when the trained lightweight convolutional neural network is used to perform image recognition on a target area image, parallel processing of convolution operations of the same convolutional layer in the trained lightweight convolutional neural network through single instruction multiple data stream SIMD is supported, that is, after performing the Neon assembly acceleration on the code, the dependency relationship between input data and output data of each convolutional layer in the trained lightweight convolutional neural network is not changed, and only a large number of convolution operations in the same convolutional layer in the trained lightweight convolutional neural network are performed in parallel, so as to reduce the time consumption of convolution operations of each convolutional layer during image recognition, and shorten the overall time for performing image recognition on the target area image by using the trained lightweight convolutional neural network And the efficiency of image identification is further improved, and the efficiency of vehicle violation detection is further improved.

The Neon is a Single Instruction Multiple Data (SIMD) technology of the ARM Instruction set, and the SIMD technology can greatly improve the parallelism of codes compared with a Single Instruction Single Data (SISD) technology, thereby improving the processing efficiency of convolution operation of the same convolution layer in the trained lightweight convolutional neural network.

Step S203, based on the road element and the vehicle information in the target area image, when the condition that the vehicle meets the violation condition exists in the target area image is determined, violation report information is sent to an application server through a mobile communication network, and the violation report information at least comprises the target area image.

As an embodiment, the violation condition in the embodiment of the present application includes one or more conditions of lane change across a solid line, pedestrian avoidance, red light running, and speeding, and the violation condition may also include other conditions, and those skilled in the art may set the condition according to actual needs.

As an embodiment, the vehicle information in the embodiment of the present application includes a vehicle type, and the violation condition corresponds to the vehicle type; if the vehicle type comprises a heavy goods vehicle, the corresponding violation condition can be that the heavy goods vehicle drives into city roads such as a main city area; when the type of the vehicle comprises an electric vehicle or a bicycle, the corresponding violation condition can be that the electric vehicle or the bicycle enters a motor vehicle lane; when the vehicle type comprises a car, the corresponding violation condition can be, but is not limited to, that the car drives into a road provided with a traffic sign board for prohibiting the motor vehicle from passing; referring to fig. 4, a specific example is given here, a traffic sign 402 for prohibiting the passage of motor vehicles is provided in the road between the lane line B and the lane line C, but when the car 401 enters the road between the lane line B and the lane line C, the car 401 satisfies the violation condition.

Further, after the vehicle meeting the violation conditions is determined to exist in the target area image, before the violation report information is sent to the application server through the mobile communication network, the violation report information can be displayed on the vehicle-mounted display device, the driver of the vehicle 100 verifies the violation report information, for example, the target area image of the vehicle meeting the violation conditions is displayed in the vehicle-mounted display device of the vehicle 100, the driver of the vehicle 100 further judges whether the vehicle meeting the violation conditions exists in the target area image displayed in the vehicle-mounted display device, and after the driver confirms that the vehicle meeting the violation conditions exists, the violation report information can be triggered to be sent to the application server through the vehicle-mounted display device, for example, the violation report information is sent to the application server of the traffic administration.

As an embodiment, the violation report information may further include one or more pieces of vehicle information such as license plate information, vehicle type, vehicle location, and vehicle color of the vehicle satisfying the violation condition, so that the traffic authority may track the vehicle satisfying the violation condition, and the like.

As an embodiment, when determining whether the target area image has a vehicle meeting the violation conditions based on the road element and the vehicle information in step S203, the distance between the vehicle to be detected in the pedestrian, the traffic signal lamp, or the vehicle information in the road element and the vehicle 100 may also be determined based on the point cloud data of the laser radar acquired from the server and the image captured by the camera installed on the road, and further, whether the vehicle meeting the violation conditions exists in the target area image is further determined based on the road element, the vehicle information, and the determined distance, where the vehicle to be detected is any vehicle in the target area image.

In the embodiment of the present application, the following description describes a training process of the trained lightweight convolutional neural network in step S202, and the trained lightweight convolutional neural network may include one or more networks of the first lightweight convolutional neural network and the second lightweight convolutional neural network.

The first training mode is as follows: and carrying out structure optimization on the convolution layer of the convolution neural network.

At present, the computation amount of each convolutional layer in the convolutional neural network is high, the structure of each convolutional layer of the convolutional neural network can be considered to be simplified, and the efficiency of the convolutional neural network is improved by reducing the computation amount of each convolutional layer, so that a depth Separable convolutional layer (Depthwise Separable convergence) is used in the embodiment of the present application to replace a standard convolutional layer in the convolutional neural network.

Specifically, a convolutional neural network is created based on deep learning, wherein the convolutional neural network is used for performing image recognition on an image and determining road elements and vehicle information in the image, the convolutional neural network comprises a first set number of depth-separable convolutional layers, and the depth-separable convolutional layers comprise a depth convolution constraint and a point-by-point convolution position constraint, wherein the first set number is not limited, and the convolutional neural network can be set by a person skilled in the art according to actual requirements.

And then training the convolutional neural network by using the image sample data subjected to data labeling and data enhancement to obtain the trained lightweight convolutional neural network.

The data labeling includes labeling road elements and vehicle information, and the specific data labeling mode is not limited, and can be set by a person skilled in the art according to actual requirements.

For a standard convolutional layer, different input channels (input channls) use the same convolution kernel; for depthwise contribution, one convolution kernel corresponds to one input channel, different input channels adopt different convolution kernels, and the depthwise contribution is operation of depth level; for the poitwise convolution, the same convolution kernel is used for different input channels as for the standard convolution layer, but the same convolution kernel used for the poitwise convolution is a convolution kernel of 1 × 1.

After the standard convolutional layer in the convolutional neural network is replaced by the depth separable convolutional layer, the calculated amount of the convolutional neural network can be obviously reduced, and the processing efficiency of the convolutional neural network is further improved; specifically, referring to fig. 5, the difference in the calculated amounts of the standard convolutional layer and the depth separable convolutional layer is as follows:

the input size of the image is DF multiplied by M, the size of the convolution kernel of the standard convolution layer is DK multiplied by M multiplied by N, wherein DK is the size of the convolution kernel, M is the number of characteristic image channels input by the convolution layer, and N is the number of the convolution kernels.

Convolving the image by adopting a standard convolutional layer, wherein the step length is 1 and padding is adopted, the output size is DF multiplied by N, and the calculated amount of the standard convolutional layer is DK multiplied by M multiplied by N multiplied by DF; when the convolution is carried out by adopting the depth separable convolution layer, firstly, M depthwise convolution is used for respectively convolving M input channels to obtain the dimension of DF multiplied by M, and the calculation amount of the step is DK multiplied by M multiplied by DF; then, the output size is DF × DF × N by performing point-by-point convolution by using N convolution kernels of 1 × 1 × M, and the calculation amount of this step is M × N × DF × DF, so that the total calculation amount of the depth separable convolution layer is DK × DK × M × DF × DF + M × N × DF × DF.

The ratio of the calculated amount of the depth separable convolutional layer to the calculated amount of the standard convolutional layer is as follows:

and in general, the value of N is large, so if a convolution kernel of 3 × 3 (that is, the value of DK is 3) is adopted, the amount of computation of the depth separable convolutional layer can be reduced by about 9 times compared with the amount of computation of the standard convolutional layer, and therefore, the processing efficiency of the convolutional neural network can be obviously improved by adopting the depth separable convolutional layer instead of the standard convolutional layer.

The second training mode is as follows: and pruning the convolutional neural network.

Specifically, a convolutional neural network established based on deep learning is trained by using image sample data subjected to data labeling and data enhancement to obtain the trained convolutional neural network, wherein the convolutional neural network is used for carrying out image recognition on an image and determining road elements and vehicle information in the image;

determining the loss influence degree of each convolution kernel parameter in the trained convolutional neural network, and cutting off the convolution kernel of which the loss influence degree is smaller than the influence degree threshold value in the trained convolutional neural network to obtain the trained lightweight convolutional neural network, wherein the loss influence degree represents the influence degree of the convolution kernel parameter on the loss function of the trained convolutional neural network.

Assuming that D is a set of image sample data used for training, W is a set of model parameters (W may be a set of convolution kernel parameters of a convolutional neural network), and C () is a loss function of the convolutional neural network, it is desirable in the embodiment of the present application that the value of C (D | W) is minimum (where "|" is a sign of a conditional probability), that is, the loss on the set of image sample data used for training is minimum in the process of performing pruning processing.

Specifically, in the embodiment of the present application, the training convolutional neural network may be pruned in the following manner:

for any convolution kernel parameter in the convolution neural network, removing the convolution kernel parameter from the convolution neural network to obtain C (D | W'), wherein W 'is the set of convolution kernel parameters of the convolution neural network after the convolution kernel parameters are removed, the difference value or the ratio of C (D | W) and C (D | W') is determined as the loss influence degree of the convolution kernel parameters, if the loss influence value of the convolution kernel parameters is smaller than the influence degree threshold value, the convolution kernel parameters exist or do not exist in the convolution neural network, the functional value of the loss function for the convolutional neural network has a low impact, so the convolution kernel parameters can be removed from the convolutional neural network, the scale of the convolutional neural network is reduced by reducing the convolutional kernel parameters, so that the calculation amount of the convolutional neural network is reduced, and the processing efficiency of the convolutional neural network is improved.

The third training mode is as follows: and carrying out quantization processing on the convolutional neural network.

Training a convolutional neural network established based on deep learning by using image sample data subjected to data labeling and data enhancement to obtain the trained convolutional neural network, wherein the convolutional neural network is used for carrying out image recognition on an image to determine road elements and vehicle information in the image;

and carrying out parameter value quantization processing on the floating point type parameter value of each convolution kernel parameter in the trained convolution neural network to obtain the trained lightweight convolution neural network.

The data labeling includes labeling road elements and vehicle information, and the specific data labeling mode is not limited, and can be set by a person skilled in the art according to actual needs.

In general, the parameter value of each convolution kernel parameter in the convolutional neural network is a floating point type parameter value of float32, and the value range of the floating point type parameter value is (-3.4 × 10)³⁸～+3.4×10³⁸) (ii) a Therefore, the convolutional neural network consumes computation power very much when processing the convolutional kernel parameters, so that parameter value quantization processing can be performed on the floating point type parameters, the floating point type parameter value of float32 is mapped to the integer type parameter value of int8, the value range of the integer type parameter value of int8 is (-128- +127), the size of the convolutional neural network can be further reduced, and the processing efficiency of the convolutional neural network is improved.

A process of training a lightweight convolutional neural network by combining the first training mode to the third training mode is given below, and as shown in fig. 6, the process specifically includes the following steps:

step S601, creating a convolutional neural network comprising a first set number of depth separable convolutional layers based on depth learning, wherein the depth separable convolutional layers comprise depth convolutional constants and point-by-point convolutional positions, and the convolutional neural network is used for carrying out image recognition on the image to determine road elements and vehicle information in the image.

Step S602, training the created convolutional neural network by using the image sample data subjected to data labeling and data enhancement to obtain the trained convolutional neural network.

Step S603, determining the loss influence degree of each convolution kernel parameter in the trained convolutional neural network, and cutting off the convolution kernel of which the loss influence degree is smaller than the influence degree threshold value in the trained convolutional neural network to obtain the convolutional neural network subjected to pruning processing.

And step S604, performing parameter value quantization processing on the floating point type parameter value of each convolution kernel parameter in the convolution neural network after pruning processing to obtain the trained lightweight convolution neural network.

As an embodiment, the trained lightweight convolutional neural network in step S202 includes a first lightweight convolutional neural network, or the trained lightweight convolutional neural network includes a first lightweight convolutional neural network and a second lightweight convolutional neural network; the first lightweight convolutional neural network is used for carrying out target detection on the target area image and determining vehicle information and road elements (such as pedestrians, traffic lights, bus stop boards and the like) except lane lines in the target area image; and the second lightweight convolutional neural network is used for detecting the lane lines of the target area image and determining the lane lines in the target area image.

Specifically, the first lightweight convolutional neural network may be a target detection network, and if the violation condition in step S203 does not relate to a lane line, the target area image may be image-recognized only by using the target detection network in the vehicle violation detection process.

The second lightweight convolutional neural network may be a lane line detection network, and if the violation condition in step S203 relates to a lane line, in the vehicle violation detection process, the lane line in the target area image needs to be identified by using the lane line detection network, and the vehicle information in the target area image needs to be identified by using the target detection network, or the vehicle information and road elements except the lane line need to be identified.

The violation condition is information used when judging whether a vehicle satisfying the violation condition exists in the target area image.

The following of the embodiments of the present application will explain the lane line detection network described above.

The lane line detection network in the embodiment of the present application may include two network branches, an embedded network branch and a Segmentation network branch, where the Segmentation network branch is used to determine whether each pixel in the target area image is a lane line pixel, and the embedded network branch performs feature mapping on pixels belonging to the lane line pixel, so as to distinguish each lane line pixel while detecting the position of the lane line.

Referring to fig. 7, specifically, after a target area image is input into a lane line detection network, the target area image first passes through a Shared encoding layer and is further processed by embedding a network branch and dividing the network branch; when judging whether a Pixel is a lane line Pixel, the embedded network branch outputs a picture (namely, Pixel embedding pictures in fig. 7) with the same resolution as the input target area image, and the segmentation network branch processes the input target area image and outputs a Binary lane line segmentation image, so that the lane line detection network processes the Pixel embedded image and the Binary lane line segmentation image into a Pixel clustering image and obtains a lane line segmentation result of the input target area image based on the Pixel clustering image.

In the above process, in the binary lane line segmentation map, the pixel value of the lane line pixel may be set to 1, and the pixel values of other pixels except the lane line pixel may be set to 0.

As an example, the lane lines include solid line lane lines and dotted line lane lines, i.e., the pixels include solid line lane line pixels and dotted line lane line pixels; further, in outputting the binary lane segmentation map, the pixel value of the solid line lane line pixel may be set to 1, the pixel value of the dotted line lane line pixel may be set to 2, and the pixel value of the non-lane pixel that does not belong to the solid line lane line pixel and does not belong to the dotted line lane line pixel may be set to 0.

After determining that each pixel is a non-lane pixel, a solid line lane line pixel and a dotted line lane line pixel, the segmentation network branch can be embedded into the network branch to perform feature mapping on pixels belonging to the lane line pixels (namely the solid line lane line pixel and the dotted line lane line pixel); when the pixels are subjected to feature mapping, the embedded network branches can be trained to extract features which are very similar to the pixels belonging to the same lane line, and the extracted features of the pixels of different lane lines have larger difference, so that the pixels belonging to the same lane line can be fitted through clustering by utilizing the difference of the extracted features of the pixels, and one lane line in lane line segmentation results is obtained.

The following of the embodiments of the present application will explain the above-described object detection network.

The object detection network may detect multiple objects in the image, classify the detected objects, and determine the locations of the detected objects in the image.

Referring to fig. 8, after the target area image is input into the target detection network, the target detection network outputs a three-dimensional feature of (M, N, C); wherein M is the high of the output signature; n is the width of the output characteristic diagram; c characterizes the feature dimension at each mesh, i.e. characterizes whether the object at the mesh belongs to the foreground or the background, if the object is foreground, the object class to which the object belongs, and the position of the object in the object region image.

With continued reference to fig. 8, the target detection network in the embodiment of the present application may include, but is not limited to, a feature extraction network backhaul and a detection header network; after the target area image is input into a target detection network, a boundary frame is predicted for each target in the target area image through a feature extraction network and a detection head network, the position of each target is determined by predicting the boundary frame of each target, and the confidence coefficient vehicle of a target class corresponding to the detected target is determined, if the confidence coefficient that the target 801 belongs to the target class of vehcile of a vehicle is 95.5%, the confidence coefficient that the target 802 belongs to the target class of vehcile is 70%, the confidence coefficient that the target 803 belongs to the target class of vehcile is 90.6%, and the confidence coefficient that the target 804 belongs to the target class of pedestrian is 85.8%; the confidence level represents the probability that the detected target belongs to the target class corresponding to the boundary box of the target.

In this embodiment, the object category may include, but is not limited to, one or more elements of a pedestrian, a traffic light, and a traffic sign in a vehicle and a road element, and when the object category includes a traffic light, the color of the traffic light (i.e., red, green, or yellow) may be detected as follows:

after the position of the traffic signal lamp is detected through the bounding box predicted by the target detection network, a region of interest (ROI) of the traffic signal lamp in the image is obtained, the number of pixels close to red, green and yellow in the ROI is counted respectively, the color with the largest number of pixels is determined as the color of the traffic signal lamp, and if the number of pixels close to red in the ROI is the largest, the color currently displayed by the traffic signal lamp is red (that is, the current traffic signal lamp is red).

The following description of the embodiment of the present application describes the processing procedure of the vehicle determined to satisfy the violation condition in step S203.

It should be noted that the following vehicle to be detected is any one or more vehicles in the acquired target area image.

The first violation decision: lane change across the solid line.

The lane change across the solid line means that the vehicle crosses the solid line lane line in the lane change process, and in this embodiment, whether the vehicle meets the violation vehicle or not can be determined according to the center point of the bottom edge of the boundary frame of the vehicle and the position of the detected solid line lane line.

In the embodiment of the application, whether a vehicle meeting lane change crossing with a solid line exists in the target area image can be determined based on only one frame of target area image; specifically, whether the vehicle to be detected meets the lane change crossing with the solid line or not can be determined according to the distance from the center point of the bottom edge of the border frame of the vehicle to be detected to the lane line of the solid line, and the vehicle to be detected, in which the shortest distance from the center point of the bottom edge of the border frame to the detected lane line of the solid line is smaller than the preset distance, is determined to be the vehicle meeting the lane change crossing with the solid line.

As shown in fig. 9, the distance from the center point O of the bottom edge of the boundary frame of the vehicle 901 to the solid lane line a is the shortest distance from the center point O of the bottom edge of the boundary frame of the vehicle 901 to the detected solid lane line; it can be seen from the figure that the distance from the center point O of the bottom edge of the boundary frame of the vehicle 901 to be detected to the nearest solid line lane line a is significantly short, and if the distance from the center point O of the bottom edge of the boundary frame to the nearest solid line lane line a is smaller than the preset threshold, the vehicle 901 to be detected meets the violation condition of lane change across the solid line.

In this embodiment of the application, it may also be determined whether a vehicle that satisfies lane change crossing with a solid line exists in at least two frames of target area images obtained continuously according to the at least two frames of target area images, and specifically, a vehicle to be detected whose center point of the bottom edge of the bounding box crosses from one side of the lane line of the solid line to the other side in the at least two frames of target area images may be determined as a vehicle that satisfies lane change crossing with the solid line.

Referring to fig. 9 and 10, it is assumed that fig. 10 is the target area image acquired before fig. 9, the center point O of the bottom edge of the boundary frame of the vehicle 901 to be detected in fig. 10 is on the right side of the solid line lane a, the center point O of the bottom edge of the boundary frame of the vehicle 901 to be detected in fig. 9 is on the left side of the solid line lane a, and fig. 10 is the target area image acquired before fig. 9, so that it can be known that the center point O of the bottom edge of the boundary frame of the vehicle 901 to be detected in fig. 10 and fig. crosses from one side of the solid line lane to the other side, and it is determined that the vehicle 901 to be detected meets the violation condition of lane change across.

The specific number of the target area images in the at least two frames of target area images is not limited, and those skilled in the art may set according to actual requirements, for example, the target area image acquired within a first set time period is determined as the at least two frames of target area images, or a second set number of target area images acquired before the currently processed target area image and the currently processed target area image are determined as the at least two frames of target area images.

And the second violation judgment: run red light

And if the color displayed by the traffic signal lamp in the at least two continuously acquired target area images is red, determining that the vehicle to be detected with the running state of the vehicle in the forward running state meets the red light running violation condition.

Referring to fig. 11, an upper target area image 1110 in the drawing and a lower target area image 1120 in the drawing are two frames of target area images collected successively, and it can be known that, in the process of collecting the target area image 1110 to collecting the target area image 1120, the driving state of the vehicle 1101 to be detected is static, and the driving states of the vehicle 1102 to be detected and the vehicle 1103 to be detected are forward states, from the vehicle positions corresponding to the boundary frames from the vehicle 1101 to the vehicle 1103 to be detected in the two frames of target area images; and the current displayed color of the traffic signal lamp is red, namely the vehicle communication is forbidden at present, so that the vehicle to be detected 1102 and the vehicle to be detected 1103 can be judged to meet the violation condition of running the red light.

A third violation condition: overspeed driving

In the embodiment of the application, the distance can be estimated through the monocular camera, the positions of the vehicles to be detected in the at least two frames of target area images acquired successively are determined through the target detection network, the speed of the vehicles to be detected in the at least two frames of target area images is calculated according to the speed of the vehicle 100, and the vehicles to be detected with the speed exceeding the limit threshold value of the current road are determined as the vehicles meeting the violation conditions of overspeed driving according to the calculation result.

A fourth violation condition: avoid the pedestrian

Based on the color currently displayed by the traffic signal lamp, when a pedestrian is detected within a set distance ahead and the vehicle to be detected is determined to still advance at a speed higher than a second speed threshold according to at least two continuously acquired target area images, determining that the vehicle to be detected meets the violation condition of not avoiding the pedestrian; or

If there is a pedestrian on the same zebra crossing in the at least two frames of target area images, determining a driving state of the vehicle to be detected based on the position of the vehicle to be detected in the at least two frames of target area images, where the driving state may include, but is not limited to, acceleration driving, uniform speed driving, deceleration driving, parking, and the like; and then the vehicle to be detected with the running state of constant-speed running or accelerated running or decelerated running can be determined as the vehicle meeting the violation condition of not avoiding pedestrians.

The following of the embodiments of the present application provides a specific example of vehicle violation detection.

In this example, the vehicle-mounted device 110 is a processor of the CPU MTK 86654 core A531.5GHZ, the vehicle-mounted image acquisition device 120 is a camera device with 100 ten thousand pixels or more, the trained lightweight convolutional neural network includes a target detection network and a lane line detection network, and the road elements include lane lines.

Referring to fig. 12, this example mainly includes a training process of a lightweight convolutional neural network, and two processes of vehicle violation detection using the trained lightweight convolutional neural network, where the training process of the lightweight convolutional neural network mainly includes the following steps:

step S1201, creating a convolutional neural network including a depth separable convolutional layer based on the depth learning.

Step S1202, perform data annotation and data enhancement processing on the image sample data.

Step S1203, training the created convolutional neural network by using the processed image sample data.

And step S1204, pruning the trained convolutional neural network.

And step S1205, carrying out parameter value quantization processing on the convolutional neural network subjected to pruning processing to obtain the trained lightweight convolutional neural network.

Step S1206, deploying the code running the trained lightweight convolutional neural network on the vehicle-mounted device 110, and performing Neon assembly acceleration on the code running the trained lightweight convolutional neural network.

The process for detecting the violation of the vehicle by utilizing the trained lightweight convolutional neural network mainly comprises the following steps:

step S1207, inputting one frame of target area image currently acquired by the vehicle-mounted image acquisition device 120, or at least two frames of target area images continuously acquired within a set duration, into the trained lightweight convolutional neural network, detecting vehicle information and road elements such as pedestrians and traffic lights in the target area image by using the target detection network, and detecting a lane line in the target area image by using the lane line detection network.

And step S1208, judging whether the input target area image has a vehicle meeting the violation conditions or not based on the detected vehicle information and road elements, if so, entering step S1209, otherwise, entering step S1207.

Step S1209, intercepting the input target area image, displaying the intercepted target area image in a vehicle-mounted display device, and determining, by a driver of the vehicle 100, whether a vehicle meeting a violation condition exists in the intercepted target area image, if so, the driver may trigger the vehicle-mounted device to send violation report information to an application server of a traffic administration, where the violation report information at least includes the intercepted target area image.

In the embodiment of the application, the trained lightweight convolutional neural network is obtained by processing the convolutional neural network, so that the vehicle-mounted equipment can operate the trained lightweight convolutional neural network, the vehicle-mounted equipment of each vehicle can directly perform image recognition through the trained lightweight convolutional neural network, the vehicle meeting the violation conditions can be directly recognized on the vehicle-mounted equipment of each vehicle, and the target area image does not need to be concentrated to an application server for processing, so that the vehicle violation detection efficiency is improved, and the flow consumed by sending the target area image is saved. And the trained lightweight convolutional neural network is adopted in the embodiment of the application, and the scale size of the trained lightweight convolutional neural network is smaller than that of a general convolutional neural network, so that the processing efficiency of the trained lightweight convolutional neural network is superior to that of the general convolutional neural network, and the efficiency of detecting the vehicle violation is further improved.

Referring to fig. 13, based on the same inventive concept, an embodiment of the present application provides a vehicle violation detection apparatus 1300, including:

an image capturing unit 1301 configured to acquire a target area image of the periphery of the vehicle, where the target area image is captured by a vehicle-mounted image mounted on the vehicle;

an image recognition unit 1302, configured to perform image recognition on the target area image by using a trained lightweight convolutional neural network, and determine road elements and vehicle information in the target area image, where the lightweight convolutional neural network is obtained by using image sample data after data tagging and data enhancement, where the data tagging includes tagged road elements and vehicle information;

and a violation determining unit 1303 configured to send violation report information to an application server through a mobile communication network when determining that the target area image has a vehicle satisfying a violation condition based on the road element and the vehicle information in the target area image, where the violation report information at least includes the target area image.

As an embodiment, the trained lightweight convolutional neural network includes a set number of deep separable convolutional layers; the depth separable convolution layers include depthwise convolution and pointwise convolution positions, and the set number of depth separable convolution layers are used for performing convolution operation on the target area image for multiple times to obtain road elements and vehicle information in the target area image.

As an embodiment, the image recognition unit is specifically configured to obtain the trained lightweight convolutional neural network by:

training a convolutional neural network established based on deep learning by using the image sample data to obtain a trained convolutional neural network;

As an embodiment, the road element includes one or more of a lane line, a pedestrian, a traffic light, and a traffic sign, the trained lightweight convolutional neural network includes a first lightweight convolutional neural network, or the trained lightweight convolutional neural network includes a first lightweight convolutional neural network and a second lightweight convolutional neural network, wherein:

the second lightweight convolutional neural network is used for detecting the lane lines of the target area image and determining the lane lines in the target area image.

As an embodiment, the vehicle violation detecting device further includes:

and the image recognition accelerating unit is used for performing Neon assembly acceleration on the codes running the trained lightweight convolutional neural network so as to support parallel processing on the convolution operation of the same convolutional layer in the trained lightweight convolutional neural network through single instruction multiple data stream SIMD when the trained lightweight convolutional neural network is used for performing image recognition on the target area image.

As an example, the violation condition may include one or more of crossing a solid line to change lanes, not avoiding a pedestrian, running a red light, and speeding.

As an embodiment, the vehicle information includes a vehicle type, and the violation condition corresponds to the vehicle type.

As an example, the apparatus of fig. 13 may be used to implement any of the vehicle violation detection methods discussed above.

Based on the same inventive concept, the embodiment of the present application provides a mobile terminal device 1400, which is described below.

Referring to fig. 14, the mobile terminal apparatus 1400 includes a display unit 1410, a processor 1420, an image capture device 1430, and a memory 1440, wherein the display unit 1440 includes a display panel 1441 for displaying information input by or provided to a user and a target area image, a navigation interface, and the like acquired by the in-vehicle image capture device 120.

The mobile terminal device 1400 may be the in-vehicle device 110 of the vehicle 100, and the image capturing device may be the in-vehicle image capturing device 120 of the vehicle 100.

Alternatively, the Display panel 1441 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The processor 1420 is configured to read the computer program and then execute a method defined by the computer program, for example, the processor 1420 reads the target area image and the like acquired by the image capturing device 120 or reads the navigation application, so that the navigation application is executed on the mobile terminal device, and an interface of the navigation application is displayed on the display unit 1440. The Processor 1420 may include one or more general purpose processors, and may further include one or more DSPs (Digital Signal processors) for performing relevant operations to implement the solutions provided in the embodiments of the present application.

Memory 1440 generally includes both internal and external memory, which may be Random Access Memory (RAM), Read Only Memory (ROM), and CACHE (CACHE). The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk or a tape drive. The memory 1440 is used for storing computer programs including applications for clients, and other data, which may include data generated by the operating system or the applications after being executed, including system data (e.g., configuration parameters of the operating system) and user data. Program instructions stored in the memory 1440 of an embodiment of the present application and executed by the processor 1420 in the embodiment of the present application in which the program instructions stored in the memory 1440 implement any of the vehicle violation detection methods discussed in the previous figures.

The display unit 1440 is used for receiving input digital information, character information, or touch operation/non-touch gesture, and generating signal input related to user setting and function control related to the mobile terminal device, etc. Specifically, in the embodiment of the present application, the display unit 1440 may include a display panel 1441. The display panel 1441, such as a touch screen, may collect touch operations of a user (for example, operations of the user on the display panel 1441 or on the display panel 1441 by using a finger, a stylus pen, or any other suitable object or accessory) on or near the display panel 1441, and drive the corresponding connection device according to a preset program. Alternatively, the display panel 1441 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a player, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, and sends the touch point coordinates to the processor 1420, and can receive and execute commands sent from the processor 1420.

The display panel 1441 may be implemented by various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the display unit 1440, the mobile terminal device may further include an input unit 1450, and the input unit 1450 may include, but is not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

In addition to the above, the mobile terminal apparatus 1400 may further include a power supply 1460 for supplying power to other modules, an audio circuit 1470, a near field communication module 14800, and an RF circuit 1490. The mobile terminal device may also comprise one or more sensors 1411, such as acceleration sensors, light sensors, pressure sensors, etc. The audio circuit 1470 specifically includes a speaker 1471, a microphone 1472, and the like, for example, the mobile terminal device may collect voice of a user through the microphone 1472, perform corresponding operations, and the like.

For one embodiment, the number of the processors 1420 may be one or more, and the processors 1420 and the memories 1440 may be coupled or relatively independent.

As an example, the processor 1420 in fig. 14 may be used to implement the functions of the image acquisition unit 1301, the image recognition unit 1302, and the violation determination unit 1303 in fig. 13.

Based on the same technical concept, the embodiment of the present application also provides a computer-readable storage medium, which stores computer instructions, and when the computer instructions are executed on a computer, the computer is caused to execute the vehicle violation detection method as discussed in the foregoing.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A vehicle violation detection method is applied to vehicle-mounted equipment installed on a vehicle, and comprises the following steps:

acquiring a target area image of the periphery of the vehicle, wherein the target area image is acquired by vehicle-mounted image acquisition mounted on the vehicle;

carrying out image recognition on the target area image by using a trained lightweight convolutional neural network, and determining road elements and vehicle information in the target area image, wherein the lightweight convolutional neural network is obtained by using image sample data after data labeling and data enhancement, and the data labeling comprises labeling road elements and vehicle information;

and based on the road elements and the vehicle information in the target area image, when determining that the vehicle meeting the violation conditions exists in the target area image, sending violation report information to an application server through a mobile communication network, wherein the violation report information at least comprises the target area image.

2. The method of claim 1, in which the trained lightweight convolutional neural network comprises a set number of deep separable convolutional layers; the depth separable convolution layers comprise depthwise convolution and pointwise convolution position convolution, and the set number of depth separable convolution layers are used for carrying out convolution operation on the target area image for multiple times to obtain road elements and vehicle information in the target area image.

3. The method of claim 1, wherein the trained lightweight convolutional neural network is obtained by:

4. The method of claim 1, wherein the trained lightweight convolutional neural network is obtained by:

5. The method of any one of claims 1-4, wherein the road element comprises one or more of a lane line, a pedestrian, a traffic light, a traffic sign, the trained lightweight convolutional neural network comprises a first lightweight convolutional neural network, or the trained lightweight convolutional neural network comprises a first lightweight convolutional neural network and a second lightweight convolutional neural network, wherein:

6. The method of any one of claims 1-4, further comprising:

neon assembly acceleration is carried out on the codes which run the trained lightweight convolutional neural network, so that when the trained lightweight convolutional neural network is used for carrying out image recognition on the target area image, parallel processing on convolution operation of the same convolutional layer in the trained lightweight convolutional neural network is supported through single instruction multiple data stream SIMD.

7. The method of any one of claims 1-4 wherein the violation conditions include one or more of crossing a solid line to change lanes, not avoiding a pedestrian, running a red light, speeding.

8. A vehicle violation detection device, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1-7 are implemented when the program is executed by the processor.

10. A computer-readable storage medium having stored thereon computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-7.