CN113469121A

CN113469121A - Vehicle state identification method and device

Info

Publication number: CN113469121A
Application number: CN202110825545.6A
Authority: CN
Inventors: 余言勋; 杜治江
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2021-10-01

Abstract

The application discloses a method and a device for identifying a vehicle state, which are used for acquiring an image to be identified; the image to be recognized is obtained by shooting a target vehicle; inputting an image to be recognized into a first network model, and determining each vehicle projection point of a target vehicle through the first network model; and determining the vehicle state of the target vehicle based on the obtained vehicle projection points. For the image to be recognized, by inputting the image into the first network model, each vehicle projection point of the target vehicle in the image can be quickly and accurately output. Since the vehicle projection point is a contact point between the tire of the vehicle and the ground, and is objectively present and does not change with changes of external factors, after the first network model accurately identifies each vehicle projection point of the target vehicle, the obtained judgment result has high accuracy when the driving state of the vehicle is judged by using each vehicle projection point subsequently, and the probability of misjudgment on the vehicle state is greatly reduced.

Description

Vehicle state identification method and device

Technical Field

The embodiment of the application relates to the field of intelligent traffic, in particular to a method and a device for recognizing a vehicle state.

Background

With the remarkable improvement of the living standard of people, the enhancement of purchasing power and the vigorous development of traffic industry, vehicles on the road surface are more and more. In order to guarantee good operation of traffic industry and improve safety of road traffic, accurate identification of vehicles violating regulations is obviously of great significance.

At present, a camera is arranged to capture a road running vehicle, and then the state of the vehicle is identified for a shot picture, so that whether the vehicle runs in a suspected violation or not is judged. However, the camera shooting process objectively has perspective factors of large and small distances, which causes the recognized state of the vehicle to have a large deviation from the actual situation, thereby causing the consequences of mistaken grabbing, mistaken grabbing and the like.

In summary, a method for accurately identifying a vehicle state of a vehicle is needed.

Disclosure of Invention

The application provides a vehicle state identification method and device, which are used for accurately identifying states expressed by a vehicle.

In a first aspect, an embodiment of the present application provides a method for identifying a vehicle state, where the method includes: acquiring an image to be identified; the image to be recognized is obtained by shooting a target vehicle; inputting the image to be recognized into a first network model, and determining each vehicle projection point of the target vehicle through the first network model; the vehicle projection point is a contact point between a tire of the vehicle and the ground; the first network model is used for identifying each vehicle projection point of the vehicle, the first network model is obtained by joint training with a second network model, and the second network model is obtained by taking the vehicle key point information of any sample image in a training set as model input and taking the vehicle projection point of the sample image as model output for training; and determining the vehicle state of the target vehicle based on the obtained vehicle projection points.

Based on the scheme, for the image to be recognized, each vehicle projection point of the target vehicle in the image can be rapidly and accurately output by inputting the image into the first network model. Since the vehicle projection point is a contact point between the tire of the vehicle and the ground, and is objectively present and does not change with changes of external factors, after the first network model accurately identifies each vehicle projection point of the target vehicle, the obtained judgment result has high accuracy when the driving state of the vehicle is judged by using each vehicle projection point subsequently, and the probability of misjudgment on the vehicle state is greatly reduced.

In one possible implementation, the first network model is obtained by joint training with a second network model, and the method includes: determining the characteristic information loss of any sample image in a training set through an initial first network model; inputting information of each real vehicle key point in the sample image into an initial second network model, and determining the structural information loss of the sample image through the initial second network model; and adjusting the initial first network model and the initial second network model according to the characteristic information loss and the structural information loss until training conditions are met to obtain the first network model and the second network model.

Based on the scheme, because the first network model for identifying each vehicle projection point of the vehicle in the image to be identified is obtained by joint training with the second network model, in the process of training the two network models, on one hand, after the initial first network model learns any sample image in the training set, according to the learning result, the characteristic information loss of the sample image under the initial first network model can be calculated, on the other hand, after the initial second network model learns the information of each real vehicle key point in the sample image, according to the learning result, the structural information loss of the sample image under the initial second network model can be calculated, so that after the two losses are obtained, the initial first network model and the initial second network model can be respectively adjusted by combining the two losses, and then learning another sample image in the training set based on the first network model and the second network model obtained by the current adjustment, and training the adjusted first network model and the adjusted second network model again until the training conditions are met, so that the first network model and the second network model which can be used for accurately identifying each vehicle projection point of the vehicle in the image to be identified can be obtained. In the method, the two network models are jointly trained to achieve the effect of predicting the vehicle projection points of the vehicle from different angles, and then the two network models are adjusted according to the loss between the predicted value and the true value, so that the finally obtained first network model has extremely high accuracy when recognizing each vehicle projection point of the vehicle in the image to be recognized, and the accurate recognition of the vehicle state is facilitated.

In one possible implementation, the determining, by an initial first network model, a loss of feature information of the sample image for any sample image in a training set includes: inputting the sample image into an initial first network model aiming at any sample image in a training set to obtain each predicted vehicle key point and each predicted first vehicle projection point of the vehicle; determining the characteristic information loss of the sample image according to the first loss of each real vehicle key point and each predicted vehicle key point in the sample image under the same vehicle key point and according to the second loss of each real vehicle projection point and each predicted first vehicle projection point in the sample image under the same vehicle projection point; inputting information of each real vehicle key point in the sample image into an initial second network model, and determining structural information loss of the sample image through the initial second network model, wherein the method comprises the following steps: inputting information of each real vehicle key point in the sample image into an initial second network model to obtain each predicted second vehicle projection point of the vehicle; determining the predicted structural information of the vehicle according to the predicted second vehicle projection points and the real vehicle key points, wherein the predicted structural information comprises the offset between the predicted second vehicle projection points and the real vehicle key points; determining the loss of the structured information of the sample image according to the third losses of the real structured information and the predicted structured information under the same offset; the real structured information comprises offsets between the projection points of each real vehicle and the key points of each real vehicle; the adjusting the initial first network model and the initial second network model according to the characteristic information loss and the structural information loss until a training condition is satisfied to obtain the first network model and the second network model includes: determining fusion loss according to the characteristic information loss and the structural information loss; and respectively adjusting the initial first network model and the initial second network model according to the fusion loss until a training condition is met to obtain the first network model and the second network model.

Based on the scheme, for the image to be recognized, accurate output of each vehicle key point of the vehicle in the image can be realized through the first network model, which is obtained through joint training with the second network model, and specifically, the first network model comprises the following steps: the input of the initial first network model is any sample image, the output of the initial first network model is each predicted vehicle key point and each predicted first vehicle projection point of the vehicle in the sample image, meanwhile, the input of the initial second network model is the information of each real vehicle key point in the sample image, and the output of the initial second network model is each predicted second vehicle projection point of the vehicle in the sample image; then, on one hand, based on the same vehicle key point, the loss between each predicted vehicle key point and each real vehicle key point of the vehicle in the sample image can be calculated to obtain a first loss, and based on the same vehicle projection point, the loss between each predicted first vehicle projection point and each real vehicle projection point of the vehicle in the sample image can be calculated to obtain a second loss, and the loss of the characteristic information of the sample image can be obtained by combining the first loss and the second loss, and on the other hand, based on the same offset, the loss between the real structured information and the predicted structured information can be calculated to obtain the structured information loss of the sample image; then, fusion loss can be generated by fusing the characteristic information loss and the structural information loss; and finally, respectively adjusting the initial first network model and the initial second network model based on the fusion loss, and repeatedly training until the training conditions are met, thereby obtaining the first network model and the second network model. In the method, the calculation of the offset between the vehicle projection point and the vehicle key point is introduced, and the model is adjusted based on the loss of the offset in the training process of the model, and the offset represents the relationship that the vehicle projection point and the vehicle key point are restricted with each other, so that the model is adjusted by adding the loss of the offset, and the first network model and the second network model obtained by training both have the advantage of accurately predicting each vehicle projection point of the target vehicle in the image to be recognized.

In one possible implementation, the characteristic information loss is determined by a smooth loss function, and the structured information loss is determined by a euclidean distance loss function.

In one possible implementation, the inputting the image to be recognized into a first network model, and determining vehicle projection points of the target vehicle through the first network model includes: inputting the image to be recognized into the first network model, and determining each first target vehicle projection point of the target vehicle through the first network model; inputting the information of each vehicle key point of the image to be recognized into the second network model, and determining each second target vehicle projection point of the target vehicle through the second network model; determining the similarity of each first target vehicle projection point and each second target vehicle projection point under the target vehicle projection points based on the same target vehicle projection point; and determining the vehicle projection point of the target vehicle according to the similarity.

Based on the scheme, in the model training process, the first network model can output the vehicle projection points, and the second network model can also output the vehicle projection points, so that in the use process of the model, for each first target vehicle projection point of the target vehicle output by the first network model and each second target vehicle projection point of the target vehicle output by the second network model, in order to improve the accuracy of vehicle projection point identification of the target vehicle, the similarity between the first target vehicle projection point and the second target vehicle projection point can be determined based on the same target vehicle projection point, and therefore the vehicle projection point of the target vehicle can be determined according to each similarity. In the scheme, the vehicle projection point output by the first network model and the vehicle projection point output by the second network model are comprehensively considered, so that the vehicle projection point of the target vehicle can be finally identified with high accuracy.

In a possible implementation method, the information of each vehicle key point of the image to be recognized is obtained by performing a vehicle key point detection algorithm on the image to be recognized or is obtained by processing the image to be recognized by the first network model.

Based on the scheme, for each vehicle key point of the image to be recognized which needs to be input into the second network model, each vehicle key point of the image to be recognized can be obtained by performing a vehicle key point detection algorithm on the image to be recognized, and each output vehicle key point of the image to be recognized after being processed by the first network model can be directly used as a reference, so that the effect of determining each vehicle key point of the image to be recognized from multiple angles and flexibly is achieved.

In one possible implementation, the target vehicle includes M-1 vehicle proxels; the M-1 vehicle projection points comprise coaxial vehicle projection points and first single projection points, the coaxial vehicle projection points are contact points of two rear tires or two front tires and the ground respectively, and the first single projection points are contact points of one front tire or one rear tire and the ground; the method further comprises the following steps: determining a first intersection point formed by the intersection of the first lane line and the first straight line and a second intersection point formed by the intersection of the first lane line and the second straight line; the first straight line is a straight line where a connecting line between the coaxial vehicle projection points is located, and the second straight line is a straight line which passes through the single projection point and is parallel to the first straight line; intercepting a first line segment from a third straight line which passes through the first intersection point and intersects a second lane line at a third intersection point, and intercepting a second line segment from a fourth straight line which passes through the second intersection point and intersects the second lane line at a fourth intersection point, wherein the first line segment takes the first intersection point and the third intersection point as end points respectively, the second line segment takes the second intersection point and the fourth intersection point as end points respectively, and the first line segment and the second line segment are parallel to each other; determining a rate of change between the first line segment and the second line segment; and determining a second single projection point of the target vehicle from the second straight line according to the distance between the coaxial vehicle projection points and the change rate.

Based on the scheme, the width change rate of two adjacent lane lines in the image to be recognized along the driving direction of the vehicle is consistent with the change rate between the width of two front tires and the width of two rear tires of the vehicle after being shot by the camera; therefore, by obtaining the width change rate of two adjacent lane lines in the vehicle traveling direction, and then using the known widths of the two front tires or the known widths of the two rear tires and the width change rate, the unknown widths of the two rear tires or the unknown widths of the two front tires can be determined, and further, since the position information of one rear tire or the position information of one front tire is also known, the position information of the rear tire of the other vehicle or the position information of the front tire of the other vehicle of the vehicle can be determined by the obtained widths of the two rear tires or the width of the two front tires. Based on the four effective vehicle projection points of the vehicle, whether the violation behaviors occur in the driving process of the vehicle can be accurately judged according to the four effective vehicle projection points.

In one possible implementation, the determining the vehicle state of the target vehicle based on the obtained vehicle projection points includes: if the coaxial projection points of the target vehicle are the contact points of the two rear tires and the ground respectively, determining that the target vehicle is driven away from a lens; and if the coaxial projection points of the target vehicle are the contact points of the two front tires and the ground respectively, determining that the target vehicle drives to a lens.

Based on the scheme, by accurately judging the driving direction of the vehicle, a valuable reference can be provided for a worker for evaluating whether the vehicle has the violation behaviors. For example, when it is determined that the vehicle is driving away from the lens, the worker does not need to determine whether the driver and the accompanying person fasten the safety belt at the time; on the contrary, when the vehicle is determined to be driven to the lens, the worker can use whether the driver and the accompanying person wear the safety belt or not as one of the items for evaluating whether the vehicle breaks rules and regulations in the driving process.

In one possible implementation, the determining the vehicle state of the target vehicle based on the obtained vehicle projection points includes: determining steering of the target vehicle according to the first vehicle projection point of the target vehicle; the first vehicle projection point includes a front left tire contact point of the target vehicle and a rear left tire contact point of the target vehicle, or includes a front right tire contact point of the target vehicle and a rear right tire contact point of the target vehicle.

Based on the scheme, by accurately judging the steering of the vehicle, a valuable reference can be provided for a worker for evaluating whether the vehicle has the violation behaviors. For example, when it is determined that the vehicle is turning left, the worker will need to determine whether the vehicle is correctly positioned on a left-turning lane and whether the vehicle is operating to turn left under the correct left-turning indicator, and when it is determined that the vehicle is turning right, the worker will need to determine whether the vehicle is correctly positioned on a right-turning lane and whether the vehicle is operating to turn right under the correct right-turning indicator.

In one possible implementation, the determining the vehicle state of the target vehicle based on the obtained vehicle projection points includes: and determining whether the target vehicle presses the line according to the condition that intersection points exist between the quadrilateral frame corresponding to each vehicle projection point of the target vehicle and a preset lane line or a preset area.

Based on the scheme, for the image to be recognized, a quadrilateral frame is obtained by sequentially connecting the four recognized effective vehicle projection points, and then whether the target vehicle in the image to be recognized is pressed is determined by evaluating whether the quadrilateral frame is intersected with the preset area.

In a second aspect, an embodiment of the present application provides an apparatus for identifying a vehicle state, including: the image acquisition unit is used for acquiring an image to be identified; the image to be recognized is obtained by shooting a target vehicle; the vehicle projection point determining unit is used for inputting the image to be recognized into a first network model and determining each vehicle projection point of the target vehicle through the first network model; the vehicle projection point is a contact point between a tire of the vehicle and the ground; the first network model is used for identifying each vehicle projection point of the vehicle, the first network model is obtained by joint training with a second network model, and the second network model is obtained by taking the vehicle key point information of any sample image in a training set as model input and taking the vehicle projection point of the sample image as model output for training; and the vehicle state identification unit is used for determining the vehicle state of the target vehicle based on the obtained vehicle projection points.

In one possible implementation, the apparatus further comprises a model training unit; the model training unit is used for: determining the characteristic information loss of any sample image in a training set through an initial first network model; inputting information of each real vehicle key point in the sample image into an initial second network model, and determining the structural information loss of the sample image through the initial second network model; and adjusting the initial first network model and the initial second network model according to the characteristic information loss and the structural information loss until training conditions are met to obtain the first network model and the second network model.

In a possible implementation method, the model training unit is specifically configured to: inputting the sample image into an initial first network model aiming at any sample image in a training set to obtain each predicted vehicle key point and each predicted first vehicle projection point of the vehicle; determining the characteristic information loss of the sample image according to the first loss of each real vehicle key point and each predicted vehicle key point in the sample image under the same vehicle key point and according to the second loss of each real vehicle projection point and each predicted first vehicle projection point in the sample image under the same vehicle projection point; inputting information of each real vehicle key point in the sample image into an initial second network model to obtain each predicted second vehicle projection point of the vehicle; determining the predicted structural information of the vehicle according to the predicted second vehicle projection points and the real vehicle key points, wherein the predicted structural information comprises the offset between the predicted second vehicle projection points and the real vehicle key points; determining the loss of the structured information of the sample image according to the third losses of the real structured information and the predicted structured information under the same offset; the real structured information comprises offsets between the projection points of each real vehicle and the key points of each real vehicle; determining fusion loss according to the characteristic information loss and the structural information loss; and respectively adjusting the initial first network model and the initial second network model according to the fusion loss until a training condition is met to obtain the first network model and the second network model.

In a possible implementation of the method, the vehicle proxel determining unit is specifically configured to: inputting the image to be recognized into the first network model, and determining each first target vehicle projection point of the target vehicle through the first network model; inputting the information of each vehicle key point of the image to be recognized into the second network model, and determining each second target vehicle projection point of the target vehicle through the second network model; determining the similarity of each first target vehicle projection point and each second target vehicle projection point under the target vehicle projection points based on the same target vehicle projection point; and determining the vehicle projection point of the target vehicle according to the similarity.

In one possible implementation, the target vehicle includes M-1 vehicle proxels; the M-1 vehicle projection points comprise coaxial vehicle projection points and first single projection points, the coaxial vehicle projection points are contact points of two rear tires or two front tires and the ground respectively, and the first single projection points are contact points of one front tire or one rear tire and the ground; the apparatus also includes a single proxel determining unit; the single proxel determining unit is configured to: determining a first intersection point formed by the intersection of the first lane line and the first straight line and a second intersection point formed by the intersection of the first lane line and the second straight line; the first straight line is a straight line where a connecting line between the coaxial vehicle projection points is located, and the second straight line is a straight line which passes through the single projection point and is parallel to the first straight line; intercepting a first line segment from a third straight line which passes through the first intersection point and intersects a second lane line at a third intersection point, and intercepting a second line segment from a fourth straight line which passes through the second intersection point and intersects the second lane line at a fourth intersection point, wherein the first line segment takes the first intersection point and the third intersection point as end points respectively, the second line segment takes the second intersection point and the fourth intersection point as end points respectively, and the first line segment and the second line segment are parallel to each other; determining a rate of change between the first line segment and the second line segment; and determining a second single projection point of the target vehicle from the second straight line according to the distance between the coaxial vehicle projection points and the change rate.

In one possible implementation, the vehicle state identification unit is specifically configured to: if the coaxial projection points of the target vehicle are the contact points of the two rear tires and the ground respectively, determining that the target vehicle is driven away from a lens; and if the coaxial projection points of the target vehicle are the contact points of the two front tires and the ground respectively, determining that the target vehicle drives to a lens.

In one possible implementation, the vehicle state identification unit is specifically configured to: determining steering of the target vehicle according to the first vehicle projection point of the target vehicle; the first vehicle projection point includes a front left tire contact point of the target vehicle and a rear left tire contact point of the target vehicle, or includes a front right tire contact point of the target vehicle and a rear right tire contact point of the target vehicle.

In one possible implementation, the vehicle state identification unit is specifically configured to: and determining whether the target vehicle presses the line according to the condition that intersection points exist between the quadrilateral frame corresponding to each vehicle projection point of the target vehicle and a preset lane line or a preset area.

In a third aspect, an embodiment of the present application provides a computing device, including:

a memory for storing a computer program;

a processor for calling the computer program stored in said memory and executing the method according to the first aspect in accordance with the obtained program.

In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program for causing a computer to execute the method according to the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic diagram of a prior art violation act of determining whether a vehicle has a line press;

fig. 2 is a schematic diagram of a system architecture according to an embodiment of the present application;

fig. 3 is a schematic diagram of a method for identifying a vehicle state according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a detection of a vehicle proxel according to an embodiment of the present disclosure;

fig. 5 is a schematic view of a vehicle state identification device according to an embodiment of the present application;

fig. 6 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, when detecting the vehicle state, the following method can be used:

photographing vehicles running on a road through a monocular camera, detecting and identifying the vehicles through the photographed pictures, selecting the identified vehicles through a vehicle inspection frame, and calculating the central point of the vehicle inspection frame; then, detecting and identifying the license plate of the identified vehicle, and selecting the identified license plate by using a license plate frame; then, with the central point of the vehicle inspection frame as the center, the length of the license plate frame with a certain proportion is simultaneously expanded in the left and right directions (generally, the selected proportion is 1.5), and a straight line is fitted; and finally, judging whether the identified vehicle has the violation behavior of pressing the line according to the intersection condition of the fitted straight line and the lane line. Fig. 1 is a schematic diagram of a violation act for determining whether a vehicle has a line press according to the prior art. Referring to fig. 1, since there is an intersection point between a straight line fitted through the center point of the vehicle inspection frame and the right lane line, the vehicle theoretically has a violation of pressing a line; however, in the actual detection process, the vehicle is not pressed. Such detection deviation occurs because: when the monocular camera is used for detecting the violation behaviors of the running vehicle, the computed relative relationship between the vehicle and the lane line (or lane area) always has larger deviation from the actual situation due to the objective existence of perspective factors with large and small distances.

Obviously, according to the technical scheme for judging the violation behavior of the vehicle with the line pressed by using the monocular camera to collect the vehicle image and fitting the straight line to the central point of the vehicle inspection frame by using the length information of the license plate frame, the judgment result given by the method for identifying the vehicle state is easy to deviate from the real driving state due to the unavoidable existence of perspective factors, namely misjudgment is easy to occur, and the method is obviously not beneficial to the accurate management of road traffic.

In order to solve the above technical problem, the embodiment of the present application provides a possible system architecture, as shown in fig. 2, which is a schematic diagram of the system architecture provided by the embodiment of the present application, and includes a vehicle 210 to be detected, an image capturing device 220, and a server 230.

The vehicle 210 to be detected may be a vehicle in driving. It is understood that the driving process includes both the process in which the vehicle drives on the road surface at a certain speed and the process in which the vehicle is temporarily stopped while passing through the intersection because of waiting for the traffic light.

The image capturing device 220 may be installed at various intersections or at a straight traffic intersection, and is configured to capture an image of a vehicle during driving and transmit the captured driving image of the vehicle to the server 230. The image capturing device 220 may be a monocular camera or other devices, and the image capturing device 220 and the server 230 may communicate with each other in a wired manner or in a wireless manner, which is not limited in the present application.

The server 230 may be configured to receive the vehicle driving image sent by the image capturing device 220, and process the vehicle driving image, so as to output each vehicle projection point of the vehicle; further, after obtaining the vehicle projected points of the vehicle, the vehicle state of the vehicle may be determined based on the vehicle projected points. The vehicle projection point refers to a contact point of a tire of the vehicle and the ground. The server 230 may be an independent server or a server cluster formed by a plurality of servers, which is not limited in this application.

Based on the system architecture shown in fig. 2, firstly, the image capturing device 220 may take a picture of the vehicle 210 to be detected; then, the image capturing device 220 may send the captured vehicle driving image of the vehicle 210 to be detected to the server 230, and correspondingly, the server 230 may receive the vehicle driving image; then, the server 230 processes the vehicle driving image to output information of each vehicle projection point displayed under the lens of the image acquisition device 220 by the vehicle 210 to be detected; finally, since the vehicle projected point is a contact point of a tire of the running vehicle with the ground, it is possible to determine whether or not the running vehicle has a problem of pressing a line (or entering an inappropriate area) based on the relationship between each vehicle projected point of the vehicle and a lane line (or a lane area). According to the method, the projection points of each vehicle of the running vehicle are accurately identified, and then the judgment efficiency of the driving state rationality of the vehicle in the running process can be improved based on the identified projection points of each vehicle.

In view of the above technical problems and based on the system architecture shown in fig. 2, an embodiment of the present application provides a method for identifying a vehicle state, so as to accurately identify a state exhibited by a vehicle during a driving process. As shown in fig. 3, a schematic diagram of a method for identifying a vehicle state according to an embodiment of the present application is provided, and the method may be executed by the server 230 shown in fig. 2 or by another data processing device. The method comprises the following steps:

step 301, acquiring an image to be identified; the image to be recognized is obtained by shooting aiming at the target vehicle.

In this step, the driving vehicle may be photographed by an image capturing device, such as a monocular camera. The shot vehicle in the driving state is the target vehicle. After the image acquisition equipment shoots the target vehicle, the shot image can be sent to a server; correspondingly, the server can receive the shot image, wherein the shot image is the image to be identified.

The target vehicle is a vehicle in driving, and includes a vehicle traveling on a road surface at a certain speed, and a vehicle traveling at an intersection and requiring a short stop for waiting for a traffic light.

Step 302, inputting the image to be recognized into a first network model, and determining each vehicle projection point of the target vehicle through the first network model.

In this step, after the server receives the image to be recognized, the server may process the image to be recognized, including: and inputting the image to be recognized into a first network model, and determining each vehicle projection point of the target vehicle in the image to be recognized through the first network model. The vehicle projection point is a contact point of a tire of the vehicle and the ground; the first network model is used for identifying each vehicle projection point of the vehicle, the first network model is obtained by joint training with a second network model, and the second network model is obtained by taking the vehicle key point information of any vehicle driving image in a training set as a model input and taking the vehicle projection point of the vehicle driving image as a model output for training.

And step 303, determining the vehicle state of the target vehicle based on the obtained vehicle projection points.

Some of the above steps will be described in detail with reference to examples.

In one implementation of step 302, the first network model is obtained by training in conjunction with a second network model, and includes: determining the characteristic information loss of any sample image in a training set through an initial first network model;

inputting information of each real vehicle key point in the sample image into an initial second network model, and determining the structural information loss of the sample image through the initial second network model;

and adjusting the initial first network model and the initial second network model according to the characteristic information loss and the structural information loss until training conditions are met to obtain the first network model and the second network model.

In some implementations of the present application, the determining, for any sample image in the training set, a loss of feature information of the sample image through an initial first network model includes: inputting the sample image into an initial first network model aiming at any sample image in a training set to obtain each predicted vehicle key point and each predicted first vehicle projection point of the vehicle; determining the characteristic information loss of the sample image according to the first loss of each real vehicle key point and each predicted vehicle key point in the sample image under the same vehicle key point and according to the second loss of each real vehicle projection point and each predicted first vehicle projection point in the sample image under the same vehicle projection point; inputting information of each real vehicle key point in the sample image into an initial second network model, and determining structural information loss of the sample image through the initial second network model, wherein the method comprises the following steps: inputting information of each real vehicle key point in the sample image into an initial second network model to obtain each predicted second vehicle projection point of the vehicle; determining the predicted structural information of the vehicle according to the predicted second vehicle projection points and the real vehicle key points, wherein the predicted structural information comprises the offset between the predicted second vehicle projection points and the real vehicle key points; determining the loss of the structured information of the sample image according to the third losses of the real structured information and the predicted structured information under the same offset; the real structured information comprises offsets between the projection points of each real vehicle and the key points of each real vehicle; the adjusting the initial first network model and the initial second network model according to the characteristic information loss and the structural information loss until a training condition is satisfied to obtain the first network model and the second network model includes: determining fusion loss according to the characteristic information loss and the structural information loss; and respectively adjusting the initial first network model and the initial second network model according to the fusion loss until a training condition is met to obtain the first network model and the second network model.

For example, in the initial stage of model training, the preparation work that needs to be done may include:

for a training set consisting of a plurality of vehicle driving images, any one image can be selected from the training set, and a marking person marks each vehicle key point and each vehicle projection point on the image. For example, the vehicle key points that may be labeled include 22, and specifically: 1. a left front wheel point; 2 right front wheel point; 3 left rear wheel point; 4 rear right vehicle points; 5, a left headlamp; 6 right front headlight; 7, the central point of the front license plate; 8 left front vehicle vertex; 9 right front vehicle vertex; 10 left rear roof points; 11 right rear vehicle vertex; 12 left rear headlight; 13 right rear headlight; 14 left rear fog light; 15 right rear fog light; 16 left back panel; 17 right back panel; 18 rear car logo; 19 rear license plate center point; 20 front car logo; 21 left rear-view mirror; 22 right rear-view mirror. For example, the vehicle projection points that can be labeled include 4, and specifically: 1. the contact point of the left front tire with the ground; 2. the contact point of the right front tire with the ground; 3. the contact point of the left rear tire and the ground; 4. contact point of the right rear tire with the ground.

The description is as follows: in the embodiment of the application, vehicles with even number of tires are considered, including four-wheel vehicles (such as cars), six-wheel vehicles (such as buses and trucks) and other vehicles; and, the vehicle projection point refers to a contact point of each of two tires at the foremost end and the rearmost end of the vehicle with the ground. For example, for a four-wheeled vehicle, its vehicle projection point includes contact points of 4 tires with the ground, respectively, whereas for a six-wheeled vehicle, its vehicle projection point includes contact points of two tires near the head of the vehicle with the ground, respectively, and contact points of two tires near the tail of the vehicle with the ground, respectively, and does not include contact points of two tires at the center of the vehicle with the ground, respectively.

When the annotating personnel annotate the key points of the vehicles on any image in the training set, only the visible key points of the vehicles need to be annotated, and the invisible key points of the vehicles do not need to be annotated; and when the marking personnel marks the vehicle projection points on any image in the training set, the marking personnel only need to directly mark the visible vehicle projection points, and the marking personnel is still required to roughly estimate the positions of the invisible vehicle projection points and mark the invisible vehicle projection points.

After the annotator finishes annotating each vehicle key point and each vehicle projection point of any vehicle driving image in the training set, the first round of model training can be started, and the method can include the following steps:

setting an image A in the marked training set, on one hand, inputting the image A into the neural network 1; through the processing of the neural network 1, each predicted vehicle key point and each predicted first vehicle projection point with respect to the image a can be output. Then, on one hand, based on each real vehicle key point marked by the marking personnel of the image A, matching one predicted vehicle key point corresponding to each real vehicle key point from each predicted vehicle key point one by one through a vehicle key point matching technology, then calculating the loss between the matched real vehicle key point and the predicted vehicle key point under the same vehicle key point, and taking the loss corresponding to all the real vehicle key points as a first loss; on the other hand, by the vehicle projection point pairing technology, based on each real vehicle projection point marked by the marking person of the image a, one predicted first vehicle projection point corresponding to each real vehicle projection point is matched one by one from each predicted first vehicle projection point, then, under the same vehicle projection point, the loss between the paired real vehicle projection point and the predicted first vehicle projection point is calculated, and the loss corresponding to all the real vehicle projection points is taken as the second loss. In the embodiment of the application, the vehicle key point and the vehicle projection point can be used as the characteristic information of the vehicle driving image, so that the sum value obtained by summing the first loss and the second loss is the characteristic information loss of the image A.

Still for the image A, on the other hand, the information of each real vehicle key point marked by the marking personnel of the image A can be input into the neural network 2; through the processing of the neural network 2, each predicted second vehicle projection point with respect to the image a can be output. Then, on one hand, a second vehicle projection point can be predicted based on each predicted second vehicle projection point, and an offset between the second vehicle projection point and each real vehicle key point is respectively constructed, so that predicted structural information of the vehicle is obtained, and on the other hand, an offset between the second vehicle projection point and each real vehicle projection point can be respectively constructed based on each real vehicle projection point in each real vehicle projection point marked by a marking person in the image A, so that real structural information of the vehicle is obtained; next, based on an offset between each real vehicle projection point in each real vehicle projection point marked by a marking person in the image a and each real vehicle key point in the image a, determining a paired offset from each offset formed by each predicted second vehicle projection point and each real vehicle key point in the image a one by one through an offset pairing technology, calculating loss between the paired two offsets, namely, third loss under the same offset, and summing third losses corresponding to all offsets, wherein the obtained sum value is the structural information loss of the image a.

In some implementations of the present application, the characteristic information loss is determined by a smooth loss function, and the structured information loss is determined by a euclidean distance loss function.

After the first round of training is finished, the characteristic information loss can be obtained based on the neural network 1, for example, a smooth loss function can be adopted for calculating the characteristic information loss, and for example, after the smooth loss function is used for calculating, the obtained loss can be expressed by Es; structured information loss can be obtained based on the neural network 2. For example, the calculation of the structured information loss can be calculated by using a euclidean distance loss function, for example, after the euclidean distance loss function is used, the obtained loss can be represented by Ek; these two losses can then be fused to obtain a fusion loss, for example, one possible fusion method is:

Ea＝(1-a)*Ek+a*Es (a∈[0,1))

where Ea represents fusion loss, a is a scale factor, and default is 0.5.

After the fusion loss is obtained, parameter adjustment can be respectively carried out on the neural network 1 and the neural network 2 according to the fusion loss; then, for the adjusted neural network 1 and the adjusted neural network 2, any one vehicle running image in the training set can be used for training the neural network 1 and the adjusted neural network 2, and the training process can refer to the process of training the neural network 1 and the neural network 2 by using the image A, which is not described again here; and training for multiple times until the training condition is met, thereby obtaining a first network model and a second network model.

According to the scheme, the model is adjusted by introducing calculation of the offset between the vehicle projection point and the vehicle key point and based on the loss of the offset in the training process of the model, and the offset represents the relationship that the vehicle projection point and the vehicle key point are restricted with each other, so that the model is adjusted by increasing the loss of the offset, and the first network model and the second network model obtained by training both have the advantage of accurately predicting each vehicle projection point of the target vehicle in the image to be recognized.

In one implementation of the foregoing step 302, the inputting the image to be recognized into a first network model, and determining vehicle projection points of the target vehicle through the first network model includes: inputting the image to be recognized into the first network model, and determining each first target vehicle projection point of the target vehicle through the first network model; inputting the information of each vehicle key point of the image to be recognized into the second network model, and determining each second target vehicle projection point of the target vehicle through the second network model; determining the similarity of each first target vehicle projection point and each second target vehicle projection point under the target vehicle projection points based on the same target vehicle projection point; and determining the vehicle projection point of the target vehicle according to the similarity.

In some implementations of the present application, the information of each vehicle key point of the image to be recognized is obtained by performing a vehicle key point detection algorithm on the image to be recognized or is obtained by processing the image to be recognized by the first network model.

Following the foregoing example, after the neural network 1 and the neural network 2 are trained by using the vehicle driving images in the training set, until the trained first network model and second network model satisfy the training condition, it indicates that the first network model and second network model respectively have been available for recognizing each vehicle projection point of the vehicle in the image to be recognized. The vehicle projection points of the vehicle in the image to be recognized can be recognized in the following three ways:

mode 1, an image to be recognized is directly input into a first network model; and then, each vehicle projection point of the vehicle in the image to be recognized can be output through the processing of the image to be recognized by the first network model.

Mode 2, inputting the information of each vehicle key point in the image to be identified into a second network model; and processing the image to be recognized through the second network model, and outputting each vehicle projection point of the vehicle in the image to be recognized.

It is noted that, depending on the determination manner of the information of each vehicle key point in the image to be recognized, the manner 2 may be implemented by the following two manners, including:

mode 21, inputting the image to be recognized into the first network model, and outputting each vehicle key point of the vehicle in the image to be recognized through the processing of the first network model; then, the information of each vehicle key point identified by the first network model can be input into the second network model, and each vehicle projection point of the vehicle in the image to be identified can be output after the processing of the second network model.

In the mode 22, a vehicle key point detection algorithm is performed on the image to be recognized, so that each vehicle key point of the vehicle in the image to be recognized can be obtained; then, the detected information of each vehicle key point can be input into the second network model; and then each vehicle projection point of the vehicle in the image to be recognized can be output through the processing of the second network model.

In the mode 3, on one hand, the image to be recognized can be directly input into the first network model, and each vehicle projection point of the vehicle in the image to be recognized can be obtained through the processing of the first network model and is recorded as each first target vehicle projection point; meanwhile, on the other hand, the information of each vehicle key point in the image to be recognized can be input into the second network model, and each vehicle projection point of the vehicle in the image to be recognized can be obtained through the processing of the second network model and recorded as each second target vehicle projection point. Then, through a vehicle projection point pairing technology, each first target vehicle projection point and each second target vehicle projection point are paired with respect to the same vehicle projection point, so that for a group of target vehicle projection points with the paired relationship established, the similarity between the first target vehicle projection point and the second target vehicle projection point in the group of target vehicle projection points can be calculated, and therefore the vehicle projection point of the target vehicle can be determined according to the similarity.

It is noted that the information of each vehicle key point input into the second network model in the method 3 can be obtained in any one of the following two methods:

mode 31, inputting the image to be recognized into the first network model, and outputting each vehicle key point of the vehicle in the image to be recognized through the processing of the first network model; at this time, information about each vehicle key point of the vehicle in the image to be recognized, which is output by the first network model, may be used as an input of the second network model;

in the mode 32, a vehicle key point detection algorithm is performed on the image to be recognized, and each vehicle key point of the vehicle in the image to be recognized is obtained; in this case, information of each vehicle key point of the vehicle obtained according to the vehicle key point detection algorithm may be used as an input of the second network model.

Continuing to explain, according to the similarity, determining the vehicle projection point of the target vehicle may include:

in case 1, regarding a group of target vehicle projection points, if the calculated similarity meets the condition of the credible vehicle projection points, calculating an average value of a first target vehicle projection point and a second target vehicle projection point in the group of target vehicle projection points, and taking the calculated average value as a vehicle projection point of a target vehicle;

in case 2, regarding a group of target vehicle projection points, if the calculated similarity satisfies the condition of the unreliable vehicle projection points, it is determined that the target vehicle does not have a vehicle projection point under the group of target vehicle projection points, and therefore a vehicle projection point corresponding to the group of target vehicle projection points is not externally output;

in case 3, regarding a group of target vehicle projection points, if the calculated similarity is between the condition of the credible vehicle projection points and the incredible vehicle projection points, the distances from the first target vehicle projection point and the second target vehicle projection point in the group of target vehicle projection points to the credible vehicle projection points are determined by taking the other credible vehicle projection points as the reference, the longer target vehicle projection point is discarded, and the closer target vehicle projection point is output as the lower vehicle projection point of the target vehicle under the group of target vehicle projection points.

In certain implementations of the present application, the target vehicle includes M-1 vehicle proxels; the M-1 vehicle projection points comprise coaxial vehicle projection points and first single projection points, the coaxial vehicle projection points are contact points of two rear tires or two front tires and the ground respectively, and the first single projection points are contact points of one front tire or one rear tire and the ground; the method further comprises the following steps: determining a first intersection point formed by the intersection of the first lane line and the first straight line and a second intersection point formed by the intersection of the first lane line and the second straight line; the first straight line is a straight line where a connecting line between the coaxial vehicle projection points is located, and the second straight line is a straight line which passes through the single projection point and is parallel to the first straight line; intercepting a first line segment from a third straight line which passes through the first intersection point and intersects a second lane line at a third intersection point, and intercepting a second line segment from a fourth straight line which passes through the second intersection point and intersects the second lane line at a fourth intersection point, wherein the first line segment takes the first intersection point and the third intersection point as end points respectively, the second line segment takes the second intersection point and the fourth intersection point as end points respectively, and the first line segment and the second line segment are parallel to each other; determining a rate of change between the first line segment and the second line segment; and determining a second single projection point of the target vehicle from the second straight line according to the distance between the coaxial vehicle projection points and the change rate.

For example, the vehicle is easily obstructed by itself during the driving process, so that the vehicle cannot completely display all vehicle projection points under the camera lens. Fig. 4 is a schematic diagram illustrating detection of a vehicle proxel according to an embodiment of the present application. Referring to fig. 4, after the vehicle projection points are identified from the vehicle driving image shown in fig. 4 through the first network model, the contact points between the right front tire and the ground and the contact points between the two rear tires and the ground, i.e., three vehicle projection points, can be identified, wherein the contact points between the two rear tires and the ground are a set of coaxial vehicle projection points, and the contact point between the right front tire and the ground is a single projection point, i.e., a first single projection point. In order to obtain the 4 th vehicle projection point of the vehicle shown in fig. 4, i.e. the contact point of the left front tire and the ground, or the second single projection point of the vehicle shown in fig. 4, the embodiment of the present application may provide the following two ways:

mode 1: for the contact points between the two rear tires and the ground respectively shown in fig. 4, the distance information between the two vehicle projection points is obtained, for example, the distance between the two vehicle projection points is set as the distance a, and the distance is taken as the distance between the contact points between the two front tires and the ground respectively. Therefore, under the condition that the contact point of the right front tire and the ground is known, a parallel line passing through the contact point of the right front tire and the ground and forming a line segment with the contact points of the two rear tires and the ground respectively is taken as a straight line A, the contact point of the right front tire and the ground is taken as a line segment end point, the distance A is cut from the straight line A, and the obtained other line segment end point is the contact point of the left front tire and the ground of the vehicle, namely a second single projection point.

Mode 2: in fig. 4, first, a straight line is made based on the contact points of the two rear tires of the vehicle and the ground, as a straight line B, and then, a straight line parallel to the straight line B is made based on the contact point of the right front tire of the vehicle and the ground, as a straight line C; wherein, the intersection line B and the intersection line C of the lane line 1 are respectively connected with a point D and a point E, wherein the coordinate of the point D can be expressed as (X)₀,Y₀) The E point coordinate can be expressed as (X)₁,Y₁) Then, the lane line 2 is handed over to the point F and the point G, respectively, by respectively crossing the point D and the point E as a set of parallel lines parallel to each other, wherein the lane line 1 and the lane line 2 are two lane lines adjacent to each other, whereby the rate of change Z between the line segment DF (the length of the line segment DF is represented by Z1) and the line segment EG (the length of the line segment EG is represented by Z2) can be calculated:

z＝(Z1-Z2)/Z1

the width change rate of two adjacent lane lines in the image to be recognized along the driving direction of the vehicle is consistent with the change rate between the widths of two front tires and the widths of two rear tires of the vehicle after being shot by the camera; therefore, after obtaining the change rate z, the distance between the contact points of the two front tires of the vehicle and the ground is obtained based on the information on the distance between the contact points of the two rear tires of the vehicle and the ground and the change rate z, the calculated distance between the contact points of the two front tires and the ground is set as the distance B, finally, the distance B is cut from the straight line C (cut in the direction toward the contact point of the left front tire and the ground) with the contact point of the right front tire and the ground as the line segment end point, and the obtained other line segment end point is the contact point of the left front tire and the ground of the vehicle.

Compared with the mode 1, when the mode 2 is adopted to calculate the mth vehicle projection point of the vehicle, the calculation result can be relatively more accurate.

In some implementations of the present application, the determining the vehicle state of the target vehicle based on the obtained vehicle projection points includes: if the coaxial projection points of the target vehicle are the contact points of the two rear tires and the ground respectively, determining that the target vehicle is driven away from a lens; and if the coaxial projection points of the target vehicle are the contact points of the two front tires and the ground respectively, determining that the target vehicle drives to a lens.

By the mode, the driving direction of the vehicle is accurately judged, and a valuable reference can be provided for a worker for evaluating whether the vehicle is subjected to violation behaviors. For example, when it is determined that the vehicle is driving away from the lens, the worker does not need to determine whether the driver and the accompanying person fasten the safety belt at the time; on the contrary, when the vehicle is determined to be driven to the lens, the worker can use whether the driver and the accompanying person wear the safety belt or not as one of the items for evaluating whether the vehicle breaks rules and regulations in the driving process.

In some implementations of the present application, the determining the vehicle state of the target vehicle based on the obtained vehicle projection points includes: determining steering of the target vehicle according to the first vehicle projection point of the target vehicle; the first vehicle projection point includes a front left tire contact point of the target vehicle and a rear left tire contact point of the target vehicle, or includes a front right tire contact point of the target vehicle and a rear right tire contact point of the target vehicle.

In this way, by accurately determining the steering of the vehicle, a valuable reference can be provided to a worker evaluating whether the vehicle is in violation. For example, when it is determined that the vehicle is turning left, the worker will need to determine whether the vehicle is correctly positioned on a left-turning lane and whether the vehicle is operating to turn left under the correct left-turning indicator, and when it is determined that the vehicle is turning right, the worker will need to determine whether the vehicle is correctly positioned on a right-turning lane and whether the vehicle is operating to turn right under the correct right-turning indicator.

In some implementations of the present application, the determining the vehicle state of the target vehicle based on the obtained vehicle projection points includes: and determining whether the target vehicle presses the line according to the condition that intersection points exist between the quadrilateral frame corresponding to each vehicle projection point of the target vehicle and a preset lane line or a preset area.

According to the method, for the image to be recognized, the four recognized effective vehicle projection points are sequentially connected to obtain a quadrilateral frame, and then whether the intersection point exists between the quadrilateral frame and the preset area or not is evaluated, so that whether the target vehicle in the image to be recognized is pressed or not is determined.

Based on the same concept, the embodiment of the present application provides an apparatus for identifying a vehicle state, as shown in fig. 5, the apparatus including:

an image obtaining unit 501, configured to obtain an image to be identified; the image to be recognized is obtained by shooting aiming at the target vehicle.

A vehicle projection point determining unit 502, configured to input the image to be recognized into a first network model, and determine vehicle projection points of the target vehicle through the first network model; the vehicle projection point is a contact point between a tire of the vehicle and the ground; the first network model is used for identifying each vehicle projection point of the vehicle, the first network model is obtained by joint training with a second network model, and the second network model is obtained by taking the vehicle key point information of any sample image in a training set as a model input and taking the vehicle projection point of the sample image as a model output for training.

A vehicle state identification unit 503, configured to determine a vehicle state of the target vehicle based on the obtained vehicle projection points.

Further, for the apparatus, a model training unit 504 is further included; a model training unit 504 for: determining the characteristic information loss of any sample image in a training set through an initial first network model; inputting information of each real vehicle key point in the sample image into an initial second network model, and determining the structural information loss of the sample image through the initial second network model; and adjusting the initial first network model and the initial second network model according to the characteristic information loss and the structural information loss until training conditions are met to obtain the first network model and the second network model.

Further, for the apparatus, the model training unit 504 is specifically configured to: inputting the sample image into an initial first network model aiming at any sample image in a training set to obtain each predicted vehicle key point and each predicted first vehicle projection point of the vehicle; determining the characteristic information loss of the sample image according to the first loss of each real vehicle key point and each predicted vehicle key point in the sample image under the same vehicle key point and according to the second loss of each real vehicle projection point and each predicted first vehicle projection point in the sample image under the same vehicle projection point; inputting information of each real vehicle key point in the sample image into an initial second network model to obtain each predicted second vehicle projection point of the vehicle; determining the predicted structural information of the vehicle according to the predicted second vehicle projection points and the real vehicle key points, wherein the predicted structural information comprises the offset between the predicted second vehicle projection points and the real vehicle key points; determining the loss of the structured information of the sample image according to the third losses of the real structured information and the predicted structured information under the same offset; the real structured information comprises offsets between the projection points of each real vehicle and the key points of each real vehicle; determining fusion loss according to the characteristic information loss and the structural information loss; and respectively adjusting the initial first network model and the initial second network model according to the fusion loss until a training condition is met to obtain the first network model and the second network model.

Further, for the apparatus, the characteristic information loss is determined by a smooth loss function, and the structured information loss is determined by a Euclidean distance loss function.

Further to this apparatus, the vehicle projection point determining unit 502 is specifically configured to: inputting the image to be recognized into the first network model, and determining each first target vehicle projection point of the target vehicle through the first network model; inputting the information of each vehicle key point of the image to be recognized into the second network model, and determining each second target vehicle projection point of the target vehicle through the second network model; determining the similarity of each first target vehicle projection point and each second target vehicle projection point under the target vehicle projection points based on the same target vehicle projection point; and determining the vehicle projection point of the target vehicle according to the similarity.

Further, for the device, the information of each vehicle key point of the image to be recognized is obtained by performing a vehicle key point detection algorithm on the image to be recognized or is obtained by processing the image to be recognized by the first network model.

Further to the apparatus, the target vehicle includes M-1 vehicle proxels; the M-1 vehicle projection points comprise coaxial vehicle projection points and first single projection points, the coaxial vehicle projection points are contact points of two rear tires or two front tires and the ground respectively, and the first single projection points are contact points of one front tire or one rear tire and the ground; the apparatus further comprises a single proxel determination unit 505; a single-proxel determination unit 505 for: determining a first intersection point formed by the intersection of the first lane line and the first straight line and a second intersection point formed by the intersection of the first lane line and the second straight line; the first straight line is a straight line where a connecting line between the coaxial vehicle projection points is located, and the second straight line is a straight line which passes through the single projection point and is parallel to the first straight line; intercepting a first line segment from a third straight line which passes through the first intersection point and intersects a second lane line at a third intersection point, and intercepting a second line segment from a fourth straight line which passes through the second intersection point and intersects the second lane line at a fourth intersection point, wherein the first line segment takes the first intersection point and the third intersection point as end points respectively, the second line segment takes the second intersection point and the fourth intersection point as end points respectively, and the first line segment and the second line segment are parallel to each other; determining a rate of change between the first line segment and the second line segment; and determining a second single projection point of the target vehicle from the second straight line according to the distance between the coaxial vehicle projection points and the change rate.

Further, for the apparatus, the vehicle state identification unit 503 is specifically configured to: if the coaxial projection points of the target vehicle are the contact points of the two rear tires and the ground respectively, determining that the target vehicle is driven away from a lens; and if the coaxial projection points of the target vehicle are the contact points of the two front tires and the ground respectively, determining that the target vehicle drives to a lens.

Further, for the apparatus, the vehicle state identification unit 503 is specifically configured to: determining steering of the target vehicle according to the first vehicle projection point of the target vehicle; the first vehicle projection point includes a front left tire contact point of the target vehicle and a rear left tire contact point of the target vehicle, or includes a front right tire contact point of the target vehicle and a rear right tire contact point of the target vehicle.

Further, for the apparatus, the vehicle state identification unit 503 is specifically configured to: and determining whether the target vehicle presses the line according to the condition that intersection points exist between the quadrilateral frame corresponding to each vehicle projection point of the target vehicle and a preset lane line or a preset area.

The embodiment of the present application provides a computing device, which may specifically be a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), and the like. The computing device may include a Central Processing Unit (CPU), memory, input/output devices, etc., the input devices may include a keyboard, mouse, touch screen, etc., and the output devices may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), etc.

Memory, which may include Read Only Memory (ROM) and Random Access Memory (RAM), provides the processor with program instructions and data stored in the memory. In an embodiment of the present application, the memory may be used to store program instructions for a method of identifying a vehicle state;

and the processor is used for calling the program instructions stored in the memory and executing the identification method of the vehicle state according to the obtained program.

As shown in fig. 6, a schematic diagram of a computing device provided in an embodiment of the present application includes:

a processor 601, a memory 602, a transceiver 603, a bus interface 604; the processor 601, the memory 602 and the transceiver 603 are connected by a bus 605;

the processor 601 is configured to read the program in the memory 602 and execute the method for identifying the vehicle state;

the processor 601 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP. But also a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The memory 602 is used for storing one or more executable programs, and may store data used by the processor 601 in performing operations.

In particular, the program may include program code including computer operating instructions. The memory 602 may include a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 602 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); the memory 602 may also comprise a combination of memories of the kind described above.

The memory 602 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof:

and (3) operating instructions: including various operational instructions for performing various operations.

Operating the system: including various system programs for implementing various basic services and for handling hardware-based tasks.

The bus 605 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

The bus interface 604 may be a wired communication access port, a wireless bus interface, or a combination thereof, wherein the wired bus interface may be, for example, an ethernet interface. The ethernet interface may be an optical interface, an electrical interface, or a combination thereof. The wireless bus interface may be a WLAN interface.

Embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a method for identifying a vehicle state.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for recognizing a vehicle state, characterized by comprising:

acquiring an image to be identified; the image to be recognized is obtained by shooting a target vehicle;

inputting the image to be recognized into a first network model, and determining each vehicle projection point of the target vehicle through the first network model; the vehicle projection point is a contact point between a tire of the vehicle and the ground; the first network model is used for identifying each vehicle projection point of the vehicle, the first network model is obtained by joint training with a second network model, and the second network model is obtained by taking the vehicle key point information of any sample image in a training set as model input and taking the vehicle projection point of the sample image as model output for training;

and determining the vehicle state of the target vehicle based on the obtained vehicle projection points.

2. The method of claim 1, wherein the first network model is derived by joint training with a second network model, comprising:

determining the characteristic information loss of any sample image in a training set through an initial first network model;

3. The method of claim 2, wherein the determining, for any sample image in the training set, a loss of feature information for the sample image through an initial first network model comprises:

inputting the sample image into an initial first network model aiming at any sample image in a training set to obtain each predicted vehicle key point and each predicted first vehicle projection point of the vehicle;

determining the characteristic information loss of the sample image according to the first loss of each real vehicle key point and each predicted vehicle key point in the sample image under the same vehicle key point and according to the second loss of each real vehicle projection point and each predicted first vehicle projection point in the sample image under the same vehicle projection point;

inputting information of each real vehicle key point in the sample image into an initial second network model, and determining structural information loss of the sample image through the initial second network model, wherein the method comprises the following steps:

inputting information of each real vehicle key point in the sample image into an initial second network model to obtain each predicted second vehicle projection point of the vehicle; determining the predicted structural information of the vehicle according to the predicted second vehicle projection points and the real vehicle key points, wherein the predicted structural information comprises the offset between the predicted second vehicle projection points and the real vehicle key points;

determining the loss of the structured information of the sample image according to the third losses of the real structured information and the predicted structured information under the same offset; the real structured information comprises offsets between the projection points of each real vehicle and the key points of each real vehicle;

the adjusting the initial first network model and the initial second network model according to the characteristic information loss and the structural information loss until a training condition is satisfied to obtain the first network model and the second network model includes:

determining fusion loss according to the characteristic information loss and the structural information loss;

and respectively adjusting the initial first network model and the initial second network model according to the fusion loss until a training condition is met to obtain the first network model and the second network model.

4. The method of claim 3, wherein the characteristic information loss is determined by a smooth loss function and the structured information loss is determined by a Euclidean distance loss function.

5. The method of claim 1, wherein the inputting the image to be recognized into a first network model from which vehicle proxels of the target vehicle are determined comprises:

inputting the image to be recognized into the first network model, and determining each first target vehicle projection point of the target vehicle through the first network model;

inputting the information of each vehicle key point of the image to be recognized into the second network model, and determining each second target vehicle projection point of the target vehicle through the second network model;

determining the similarity of each first target vehicle projection point and each second target vehicle projection point under the target vehicle projection points based on the same target vehicle projection point;

and determining the vehicle projection point of the target vehicle according to the similarity.

6. The method according to claim 5, wherein each piece of vehicle key point information of the image to be recognized is obtained by performing a vehicle key point detection algorithm on the image to be recognized or is obtained by processing the image to be recognized by the first network model.

7. The method of claim 1, wherein the target vehicle includes M-1 vehicle proxels; the M-1 vehicle projection points comprise coaxial vehicle projection points and first single projection points, the coaxial vehicle projection points are contact points of two rear tires or two front tires and the ground respectively, and the first single projection points are contact points of one front tire or one rear tire and the ground; the method further comprises the following steps:

determining a first intersection point formed by the intersection of the first lane line and the first straight line and a second intersection point formed by the intersection of the first lane line and the second straight line; the first straight line is a straight line where a connecting line between the coaxial vehicle projection points is located, and the second straight line is a straight line which passes through the single projection point and is parallel to the first straight line;

intercepting a first line segment from a third straight line which passes through the first intersection point and intersects a second lane line at a third intersection point, and intercepting a second line segment from a fourth straight line which passes through the second intersection point and intersects the second lane line at a fourth intersection point, wherein the first line segment takes the first intersection point and the third intersection point as end points respectively, the second line segment takes the second intersection point and the fourth intersection point as end points respectively, and the first line segment and the second line segment are parallel to each other;

determining a rate of change between the first line segment and the second line segment;

and determining a second single projection point of the target vehicle from the second straight line according to the distance between the coaxial vehicle projection points and the change rate.

8. An apparatus for recognizing a vehicle state, characterized by comprising:

the image acquisition unit is used for acquiring an image to be identified; the image to be recognized is obtained by shooting a target vehicle;

the vehicle projection point determining unit is used for inputting the image to be recognized into a first network model and determining each vehicle projection point of the target vehicle through the first network model; the vehicle projection point is a contact point between a tire of the vehicle and the ground; the first network model is used for identifying each vehicle projection point of the vehicle, the first network model is obtained by joint training with a second network model, and the second network model is obtained by taking the vehicle key point information of any sample image in a training set as model input and taking the vehicle projection point of the sample image as model output for training;

and the vehicle state identification unit is used for determining the vehicle state of the target vehicle based on the obtained vehicle projection points.

9. A computer device, comprising:

a memory for storing a computer program;

a processor for calling a computer program stored in said memory, for executing the method according to any one of claims 1-7 in accordance with the obtained program.

10. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1-7.