CN113435318A

CN113435318A - Neural network training, image detection and driving control method and device

Info

Publication number: CN113435318A
Application number: CN202110713234.0A
Authority: CN
Inventors: 李昂; 蒋沁宏; 石建萍
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2021-09-24

Abstract

The present disclosure provides a neural network training method, an image detection method, a driving control method, an image detection device, an electronic device, and a storage medium, the neural network training method including: acquiring a sample image and two-dimensional labeling data of the sample image; the two-dimensional labeling data comprise detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing a three-dimensional pose of the target vehicle; training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional labeling data; wherein, after the sample image is input to the target neural network, each branch network outputs one of the following information: the two-dimensional detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle.

Description

Neural network training, image detection and driving control method and device

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a method and an apparatus for neural network training, image detection, and driving control, an electronic device, and a storage medium.

Background

With the development of technology, an automatic driving function, a driving assistance function, and the like are widely used in vehicles. Among them, the automatic driving function and the driving support function are important for detecting a vehicle traveling on a road.

Generally, a camera may be disposed on a vehicle, an image is captured by the camera, a running vehicle is detected according to the captured image, a two-dimensional detection result of the running vehicle is obtained, and the obtained two-dimensional detection result of the running vehicle is input to a downstream module such as a tracking module and a ranging module, so as to obtain three-dimensional information of the running vehicle.

Disclosure of Invention

In view of the above, the present disclosure provides at least a neural network training method, an image detection method, a driving control method, an apparatus, an electronic device, and a storage medium.

In a first aspect, the present disclosure provides a neural network training method, including:

acquiring a sample image and two-dimensional labeling data of the sample image; the two-dimensional labeling data comprise detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing a three-dimensional pose of the target vehicle;

training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional labeling data;

wherein, after the sample image is input to the target neural network, each branch network outputs one of the following information: the two-dimensional detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle.

In the method, a sample image and two-dimensional labeling data of the sample image are obtained, wherein the two-dimensional labeling data comprise detection frame information of a target vehicle and at least one attribute information capable of representing a three-dimensional pose of the target vehicle; the target neural network is trained by utilizing the sample image and the two-dimensional labeling data, and the two-dimensional labeling data comprises at least one attribute information capable of representing the three-dimensional pose of the target vehicle, so that the trained target neural network can output and predict the at least one attribute information, the data types for determining the three-dimensional pose information of the target vehicle are enriched, the at least one attribute information and the detection frame information are obtained by subsequently utilizing the prediction output by the target neural network, and the three-dimensional pose information of the target vehicle can be accurately determined.

Meanwhile, since the target neural network includes a plurality of branch networks, after the sample image is input to the target neural network, each branch network outputs one of the following information: the two-dimensional detection frame information of the target vehicle and the at least one attribute information capable of representing the three-dimensional pose of the target vehicle are output in parallel, and the information detection efficiency is improved.

In one possible embodiment, the attribute information includes at least one of:

first position information of any demarcation point on a boundary between adjacent visible surfaces of the target vehicle, second position information of a contact point between at least one visible wheel of the target vehicle and a ground surface, orientation information of the target vehicle;

wherein the orientation information comprises first orientation information and/or second orientation information, and a second orientation indicated by the second orientation information is included in the first orientation indicated by the first orientation information.

And the attribute information is set, so that the content of the attribute information is enriched, and the three-dimensional pose information of the target vehicle can be accurately determined according to at least one attribute information.

In a possible implementation, in a case where the orientation information includes first orientation information, the first orientation information includes: a first category characterizing the front and rear of the vehicle as invisible, a second category characterizing the rear and front of the vehicle as invisible, and a third category characterizing the front and rear of the vehicle as invisible.

In one possible embodiment, in a case where the orientation information includes second orientation information, the second orientation information includes: a first intermediate category not visible from the front and rear of the vehicle and visible from the left side of the vehicle; a second intermediate category not visible from the front and rear of the vehicle and visible from the right side of the vehicle; a third intermediate category in which the vehicle is visible behind and not visible on the front of the vehicle, and not visible on the sides of the vehicle; a fourth intermediate category visible from the rear of the vehicle, not visible from the front of the vehicle, and visible from the right side of the vehicle; a fifth intermediate category visible from the rear of the vehicle, not visible from the front of the vehicle, and visible from the left side of the vehicle; a sixth intermediate category in which the front of the vehicle is visible, the rear of the vehicle is not visible, and the side of the vehicle is not visible; a seventh intermediate category visible from the front of the vehicle, not visible from the rear of the vehicle, and visible from the right side of the vehicle, an eighth intermediate category visible from the front of the vehicle, not visible from the rear of the vehicle, and visible from the left side of the vehicle.

By adopting the method, the orientation of the target vehicle can be represented more accurately through the set first orientation information and/or second orientation information.

In one possible embodiment, in a case where the attribute information includes first position information of the dividing point, acquiring two-dimensional annotation data of the sample image includes:

acquiring coordinate information of the demarcation point in the horizontal direction of an image coordinate system corresponding to the sample image;

and determining the coordinate information of the vertical direction indicated by the detection frame information of the target vehicle as the coordinate information of the vertical direction of the demarcation point in the image coordinate system corresponding to the sample image.

By adopting the method, the coordinate information of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image can be determined, the coordinate information of the demarcation point in the horizontal direction can be obtained, and further, when the first position information of the demarcation point is determined by the target neural network, the coordinate information of the demarcation point in the horizontal direction can be determined without determining the coordinate information of the demarcation point in the vertical direction, the regression data types are reduced, the influence on other regression data (such as detection frame information) except the first position information of the demarcation point can be avoided when the regression data types are more, and the accuracy of other regression data is reduced.

In one possible embodiment, in a case where the attribute information includes second position information of the contact point, acquiring two-dimensional annotation data of the sample image includes:

acquiring coordinate information of the contact point in the horizontal direction of an image coordinate system corresponding to the sample image; determining the coordinate information in the vertical direction indicated by the detection frame information of the target vehicle as the coordinate information in the vertical direction of the contact point in the image coordinate system corresponding to the sample image; and/or the presence of a gas in the gas,

acquiring coordinate information of the contact point in the vertical direction of an image coordinate system corresponding to the sample image; and determining the coordinate information in the horizontal direction indicated by the detection frame information of the target vehicle as the coordinate information in the horizontal direction of the contact point in the image coordinate system corresponding to the sample image.

By adopting the method, one coordinate information in the second position information of the contact point can be determined, and the other coordinate information in the second position information of the contact point can be obtained, so that the regression data types are reduced, the influence on other regression data (such as detection frame information, the type of a detection frame and the like) except the second position information of the contact point can be avoided when the regression data types are more, and the accuracy of other regression data is reduced.

In a second aspect, the present disclosure provides an image detection method, the method comprising:

acquiring an image to be detected;

inputting the image to be detected to a trained target neural network comprising a plurality of branch networks, and obtaining two-dimensional detection frame information of the vehicle in the image to be detected, which is output by the plurality of branch networks in parallel, and at least one attribute information capable of representing the three-dimensional pose of the vehicle;

and determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.

In the method, the target neural network is obtained based on the neural network training method of the first aspect, so that the trained target neural network can accurately output the two-dimensional detection frame information and the at least one attribute information of the vehicle, and further, the detection result of the image to be detected can be accurately determined based on the two-dimensional detection frame information and the at least one attribute information of the vehicle.

In a possible embodiment, determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information includes:

determining depth information of the vehicle based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.

and determining the three-dimensional detection data of the vehicle based on the image to be detected, the two-dimensional detection frame information of the vehicle included in the image to be detected and at least one attribute information corresponding to the vehicle.

In a possible embodiment, in a case where the at least one attribute information includes first position information of any dividing point on a boundary between adjacent visible surfaces of the vehicle and second position information of a contact point between at least one visible wheel of the vehicle and the ground, the determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle includes:

determining two-dimensional compact frame information representing a single plane of the vehicle based on the first position information of the demarcation point, the second position information of the contact point and the two-dimensional detection frame information of the vehicle;

and determining the three-dimensional detection data of the vehicle based on the two-dimensional compact frame information and the image to be detected.

By adopting the method, the spatial information contained in the two-dimensional compact frame information is accurate, so that the three-dimensional detection data of the vehicle can be accurately determined based on the two-dimensional compact frame information and the image to be detected.

In a third aspect, the present disclosure provides a running control method including:

acquiring a road image acquired by a driving device in the driving process;

detecting the road image by using a target neural network obtained by training with the neural network training method of any one of the first aspect to obtain target detection data of a target vehicle included in the road image;

controlling the running device based on target detection data of a target vehicle included in the road image.

In a fourth aspect, the present disclosure provides a neural network training device, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample image and two-dimensional annotation data of the sample image; the two-dimensional labeling data comprise detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing a three-dimensional pose of the target vehicle;

a training module for training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional labeling data;

In a fifth aspect, the present disclosure provides an image detection apparatus comprising:

the second acquisition module is used for acquiring an image to be detected;

the first generation module is used for inputting the image to be detected to a trained target neural network comprising a plurality of branch networks, so as to obtain two-dimensional detection frame information of the vehicle in the image to be detected, which is output by the plurality of branch networks in parallel, and at least one attribute information capable of representing the three-dimensional pose of the vehicle;

and the determining module is used for determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.

In a sixth aspect, the present disclosure provides a running control apparatus comprising:

the third acquisition module is used for acquiring a road image acquired by the driving device in the driving process;

a second generation module, configured to detect the road image by using a target neural network obtained by training with the neural network training method according to any one of the first aspects, so as to obtain target detection data of a target vehicle included in the road image;

a control module for controlling the travel device based on target detection data of a target vehicle included in the road image.

In a seventh aspect, the present disclosure provides an electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the neural network training method as set forth in the first aspect or any one of the embodiments; or when executed, perform the steps of the image detection method as described in the second aspect or any embodiment above; or steps for performing the running control method according to the third aspect as described above when executed.

In an eighth aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the neural network training method according to the first aspect or any one of the embodiments; or when executed, perform the steps of the image detection method as described in the second aspect or any embodiment above; or steps for performing the running control method according to the third aspect as described above when executed.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 is a schematic flow chart illustrating a neural network training method provided in an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a sample image in a neural network training method provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a target vehicle in a neural network training method provided by an embodiment of the disclosure;

FIG. 4 is a schematic diagram illustrating a target neural network in a neural network training method provided by an embodiment of the present disclosure;

fig. 5 is a schematic flow chart illustrating an image detection method provided by an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating a two-dimensional compact frame in an image detection method provided by an embodiment of the present disclosure;

fig. 7 is a flowchart illustrating a driving control method according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating an architecture of a neural network training device provided in an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating an architecture of an image detection apparatus provided in an embodiment of the present disclosure;

fig. 10 is a schematic diagram illustrating an architecture of a driving control device provided in an embodiment of the present disclosure;

fig. 11 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of another electronic device provided in the embodiment of the present disclosure;

fig. 13 shows a schematic structural diagram of another electronic device provided in an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Generally, a camera may be disposed on a vehicle, an image is captured by the camera, a running vehicle is detected according to the captured image, a two-dimensional detection result of the running vehicle is obtained, and the obtained two-dimensional detection result of the running vehicle is input to a downstream module such as a tracking module and a ranging module, so as to obtain three-dimensional information of the running vehicle. In order to alleviate the above problem, embodiments of the present disclosure provide a neural network training method.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

For facilitating understanding of the embodiments of the present disclosure, a neural network training method, an image detection method, and a driving control method disclosed in the embodiments of the present disclosure will be described in detail first. The execution subject of the neural network training method, the image detection method, and the driving control method provided in the embodiments of the present disclosure is generally a computer device with certain computing power, and the computer device includes: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the neural network training method, the image detection method, and the driving control method may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, a schematic flow chart of a neural network training method provided in the embodiment of the present disclosure is shown, the method includes S101-S102, where:

s101, obtaining a sample image and two-dimensional labeling data of the sample image; the two-dimensional labeling data comprises detection frame information of the target vehicle in the sample image and at least one attribute information capable of representing the three-dimensional pose of the target vehicle.

S102, training a target neural network comprising a plurality of branch networks based on the sample images and the two-dimensional labeling data.

S101 to S102 will be specifically described below.

For S101:

the sample image may be any acquired image containing the target vehicle. The image may be an image containing a target vehicle in any scene, for example, the sample image may include an image of a target vehicle driving on a road, an image of a target vehicle parked in a parking space, and the like. The target vehicle may be any motor vehicle, for example, a truck, a car, a van, or the like.

The two-dimensional labeling data comprises detection frame information of the target vehicle in the sample image, and the detection frame information can comprise position information of a detection frame, size information of the detection frame, the type of the detection frame and the like, for example, the type of the detection frame can be small-sized vehicles, medium-sized vehicles, large-sized vehicles and the like, or the type can also comprise cars, vans, off-road vehicles and the like; still another category may be a first category that belongs to motor vehicles and a second category that does not belong to motor vehicles.

For example, the detection frame information may be position information of four vertices of the detection frame, a category of the detection frame; alternatively, the detection frame information may be position information of a center point of the detection frame, size information of the detection frame, a type of the detection frame, or the like.

Wherein the at least one attribute information capable of characterizing the three-dimensional pose of the target vehicle may include at least one of: first position information of any demarcation point on a boundary between adjacent visible surfaces of the target vehicle, second position information of a contact point between at least one visible wheel of the target vehicle and the ground, orientation information of the target vehicle; the orientation information comprises first orientation information and/or second orientation information, and the second orientation indicated by the second orientation information is covered in the first orientation indicated by the first orientation information. And the attribute information is set, so that the content of the attribute information is enriched, and the three-dimensional pose information of the target vehicle can be accurately determined according to at least one attribute information.

Any dividing point of the target vehicle may be any point on a dividing line between two adjacent visible surfaces of the target vehicle on the sample image, for example, the dividing line may be a line perpendicular to the ground where a vehicle lamp on the target vehicle is located, and the dividing point may be an intersection point of the dividing line and the detection frame. The visible wheel can be a wheel on one side of a visible surface on the target vehicle contained in the sample image; preferably, the number of contact points may be two, i.e. two visible contact points of the wheel with the ground.

Referring to fig. 2, a schematic diagram of a sample image in a neural network training method is shown, wherein the sample image includes a dividing point 21, a contact point between two wheels and the ground, i.e., a first contact point 22, and a second contact point 23.

In an alternative embodiment, in the case that the attribute information includes the first position information of the dividing point, in S101, acquiring the two-dimensional annotation data of the sample image may include:

s1011, acquiring coordinate information of the demarcation point in the horizontal direction of the image coordinate system corresponding to the sample image;

s1012, the coordinate information in the vertical direction indicated by the detection frame information of the target vehicle is determined as the coordinate information in the vertical direction of the image coordinate system corresponding to the sample image of the demarcation point.

Here, the first position information of the demarcation point includes abscissa information and ordinate information, that is, the abscissa information is coordinate information in the horizontal direction in the image coordinate system corresponding to the sample image, and the ordinate information is coordinate information in the vertical direction in the image coordinate system corresponding to the sample image.

Taking the dividing point included in fig. 2 as an example for explanation, the detection frame information of the target object included in the sample image of fig. 2 may include position information of four vertices, i.e., (x)₁，y₁)、(x₁，y₂)、(x₂，y₁)、(x₂，y₂). Further, the coordinate information x of the demarcation point 21 in the horizontal direction of the image coordinate system corresponding to the sample image can be obtained_AAnd determining the coordinate information in the vertical direction indicated by the detection frame information of the target vehicle as the coordinate information y of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image₂That is, the first position information of the demarcation point can be obtained as (x)_A，y₂)。

By adopting the method, the coordinate information of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image can be determined, the coordinate information of the demarcation point in the horizontal direction can be obtained, and further, when the target neural network regresses to determine the first position information of the demarcation point, the coordinate information of the demarcation point in the horizontal direction can be determined without determining the coordinate information of the demarcation point in the vertical direction, the regressed data types are reduced, the influence on other regression data (such as detection frame information) except the first position information of the demarcation point can be avoided when the regressed data types are more, and the accuracy of other regression data is reduced.

In an alternative embodiment, in the case that the attribute information includes the second position information of the contact point, in S101, acquiring the two-dimensional annotation data of the sample image may include:

acquiring coordinate information of a contact point in the horizontal direction of an image coordinate system corresponding to a sample image in a first mode; determining the coordinate information in the vertical direction indicated by the detection frame information of the target vehicle as the coordinate information in the vertical direction of the contact point in the image coordinate system corresponding to the sample image;

obtaining coordinate information of the contact point in the vertical direction of an image coordinate system corresponding to the sample image; and determining the coordinate information in the horizontal direction indicated by the detection frame information of the target vehicle as the coordinate information in the horizontal direction of the contact point in the image coordinate system corresponding to the sample image.

In the first mode, the abscissa information of the contact point may be obtained, and the coordinate information of the contact point in the vertical direction of the image coordinate system corresponding to the sample image may be determined as the ordinate information of the contact point. In the second method, the ordinate information of the contact point may be acquired, and the coordinate information of the contact point in the horizontal direction of the image coordinate system corresponding to the sample image may be determined as the abscissa information of the contact point.

When the number of the contact points is two, the second position information of one contact point may be determined in the first mode, and the second position information of the other contact point may be determined in the second mode.

Taking the contact points included in fig. 2 as an example, the coordinate information x of the first contact point 22 in the horizontal direction of the image coordinate system corresponding to the sample image may be acquired_B1And determining the coordinate information in the vertical direction indicated by the detection frame information of the target vehicle as the coordinate information y of the demarcation point in the vertical direction of the image coordinate system corresponding to the sample image₂That is, the second position information of the first contact point 22 may be (x)_B1，y₂)。

For the second contact point 23, a second contact may be acquiredCoordinate information y of the point 23 in the vertical direction of the image coordinate system corresponding to the sample image_B2And determines the horizontal coordinate information indicated by the detection frame information of the target vehicle as the horizontal coordinate information x of the first contact point 22 in the image coordinate system corresponding to the sample image₁. That is, the second position information of the first contact point 22 may be (x)₁，y_B2)。

Wherein x is₁Or x₂The selection of (a) may be determined based on the orientation information of the target vehicle in the sample image; alternatively, the determination may be made based on where the first contact point is located in the sample image. For example, x is selected when the second contact point is to the right of the first contact point₂(ii) a If the first contact point is located to the left of the second contact point, x is selected₁. Or, if the direction information of the target vehicle is the fourth intermediate category or the seventh intermediate category, determining to select x₁If the direction information of the target vehicle is the fifth intermediate category or the eighth intermediate category, determining to select x₂。

By adopting the method, one piece of coordinate information in the second position information of the contact point can be determined, and the other piece of coordinate information in the second position information of the contact point can be obtained, so that the regression data types are reduced, the influence on other regression data (such as detection frame information, the type of a detection frame and the like) except the second position information of the contact point can be avoided when the regression data types are more, and the accuracy of other regression data is reduced.

In an optional embodiment, in a case where the orientation information includes first orientation information, the first orientation information includes: a first category characterizing the front and rear of the vehicle as invisible, a second category characterizing the rear and front of the vehicle as invisible, and a third category characterizing the front and rear of the vehicle as invisible.

In an optional embodiment, in a case where the orientation information includes second orientation information, the second orientation information includes: a first intermediate category not visible from the front and rear of the vehicle and visible from the left side of the vehicle; a second intermediate category not visible from the front and rear of the vehicle and visible from the right side of the vehicle; a third intermediate category in which the vehicle is visible behind and not visible on the front of the vehicle, and not visible on the sides of the vehicle; a fourth intermediate category visible from the rear of the vehicle, not visible from the front of the vehicle, and visible from the right side of the vehicle; a fifth intermediate category visible from the rear of the vehicle, not visible from the front of the vehicle, and visible from the left side of the vehicle; a sixth intermediate category in which the front of the vehicle is visible, the rear of the vehicle is not visible, and the side of the vehicle is not visible; a seventh intermediate category in which the front of the vehicle is visible, the rear of the vehicle is not visible, and the right side of the vehicle is visible; an eighth intermediate category visible from the front of the vehicle, not visible from the rear of the vehicle, and visible from the left side of the vehicle.

The second orientation indicated by the second orientation information covers the first orientation indicated by the first orientation information, i.e. the first category of the first orientation information may include: a first intermediate category and a second intermediate category in the second orientation information; the second category of the first orientation information may include: a third intermediate category, a fourth intermediate category, and a fifth intermediate category in the second orientation information; the third category of the first orientation information may include: a sixth intermediate category, a seventh intermediate category, and an eighth intermediate category in the second orientation information.

It can be understood that the first orientation information of the target vehicle included in fig. 2 is of the third category, and the second orientation information is of the seventh intermediate category.

Referring to fig. 3, a schematic diagram of a target vehicle in a neural network training method is shown. For example, the first orientation information corresponding to the target vehicle 31 in fig. 3 may be a first category, and the second orientation information may be a first intermediate category; the first orientation information corresponding to the target vehicle 32 may be in the third category and the second orientation information may be in the eighth intermediate category; the first orientation information corresponding to the target vehicle 33 may be of the second category, and the second orientation information may be of the third intermediate category; the first orientation information corresponding to the target vehicle 34 may be in the second category and the second orientation information may be in the fourth intermediate category.

For S102:

the sample image containing the two-dimensional labeling data can be input into a target neural network to be trained, wherein the target neural network comprises a plurality of score networks, and the target neural network is trained for multiple times until the accuracy of the trained target neural network is higher than a set accuracy threshold value, or until the loss value of the trained target neural network is lower than a set loss threshold value, so that the trained target neural network is obtained.

Wherein, after the sample image is input to the target neural network, each branch network respectively outputs one of the following information: the two-dimensional detection frame information of the target vehicle and at least one attribute information capable of representing the three-dimensional pose of the target vehicle are used for outputting the two-dimensional detection frame information of the vehicle in parallel and at least one attribute information capable of representing the three-dimensional pose of the vehicle aiming at any image to be detected.

Referring to fig. 4, a schematic diagram of a target neural network in the neural network training method is shown. The image of fig. 4 includes a sample image 41, a trunk network 42, and a plurality of branch networks 43, which may include a branch network corresponding to the detection frame information, a branch network corresponding to the category of the target vehicle, a branch network corresponding to the first position information of the demarcation point, a branch network corresponding to the second position information of the first contact point, a branch network corresponding to the second position information of the second contact point, and a branch network corresponding to the orientation information.

Referring to fig. 5, a schematic flow chart of an image detection method provided in the embodiment of the present disclosure is shown, the method includes S501-S503, where:

s501, acquiring an image to be detected;

s502, inputting the image to be detected into a trained target neural network comprising a plurality of branch networks, and obtaining two-dimensional detection frame information of the vehicle in the image to be detected, which is output by the plurality of branch networks in parallel, and at least one attribute information capable of representing the three-dimensional pose of the vehicle;

s503, determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and at least one attribute information.

For S501 and S502:

the image to be detected can be any image, the acquired image to be detected is input into a trained target neural network comprising a plurality of branch networks, two-dimensional detection frame information of the vehicle included in the image to be detected and output in parallel by the plurality of branch networks is obtained, and at least one attribute information capable of representing the three-dimensional pose of the vehicle is obtained. For example, two-dimensional detection frame information, first position information of a demarcation point, second position information of a contact point, orientation information, and the like corresponding to the vehicle included in the image to be detected can be obtained.

For S503:

in an optional embodiment, determining a detection result of an image to be detected based on two-dimensional detection frame information of a vehicle and at least one attribute information includes:

and determining the depth information of the vehicle based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.

The depth information of the vehicle may be a distance between a center of the vehicle and an image capturing apparatus that captures an image to be detected. For example, according to the two-dimensional detection frame information and at least one attribute information of the vehicle, a bird's-eye view corresponding to the vehicle can be determined through an operation process of coordinate transformation, and then the depth information of the vehicle can be determined according to the bird's-eye view; alternatively, the trained neural network for determining the depth information may be used to determine the depth information of the vehicle according to the two-dimensional detection frame information of the vehicle and the at least one attribute information.

For example, the image to be detected, the two-dimensional detection frame information of the vehicle included in the image to be detected, and the at least one attribute information corresponding to the vehicle may be input into a trained three-dimensional detection neural network to determine the three-dimensional detection data of the vehicle. The three-dimensional detection data of the vehicle may include position information of a three-dimensional detection frame of the vehicle, size information of the three-dimensional detection frame, a category of the three-dimensional detection frame, and the like.

In an optional embodiment, in a case where the at least one attribute information includes first position information of any one of demarcation points on a boundary between adjacent visible surfaces of the vehicle and second position information of a contact point between at least one visible wheel of the vehicle and the ground, determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle, includes:

determining two-dimensional compact frame information of a single plane representing a vehicle based on first position information of a demarcation point, second position information of a contact point and two-dimensional detection frame information of the vehicle;

and secondly, determining three-dimensional detection data of the vehicle based on the two-dimensional compact frame information and the image to be detected.

When the at least one attribute information includes the demarcation point, the two-dimensional detection frame of the vehicle may be divided into two detection frames by using the first position information of the demarcation point corresponding to the vehicle, and the detection frame including the contact point may be determined as the two-dimensional compact frame information representing the single plane of the vehicle. Referring to fig. 6, a schematic diagram of a two-dimensional compact frame in an image detection method, a detection frame on the right side of the boundary is a determined two-dimensional compact frame, where 21 in fig. 6 is a boundary point, and 22 and 23 are two different contact points.

And inputting the two-dimensional compact frame information and the image to be detected into a trained three-dimensional detection neural network to determine the three-dimensional detection data of the vehicle.

Referring to fig. 7, a flow chart of a driving control method provided in the embodiment of the present disclosure is schematically illustrated, and the method includes S701-S703, where:

s701, acquiring a road image acquired by a driving device in the driving process;

s702, detecting a road image by using a target neural network obtained by training through the neural network training method described in the embodiment to obtain target detection data of a target vehicle included in the road image;

s703 controls the running device based on the target detection data of the target vehicle included in the road image.

For example, the traveling device may be an autonomous vehicle, a vehicle equipped with an Advanced Driving Assistance System (ADAS), a robot, or the like. The road image may be an image acquired by the driving device in real time during driving.

When the driving device is controlled, the driving device can be controlled to accelerate, decelerate, turn, brake and the like, or voice prompt information can be played to prompt a driver to control the driving device to accelerate, decelerate, turn, brake and the like.

In specific implementation, the road image can be input into the trained target neural network, and the road image is detected to obtain target detection data of the target vehicle included in the road image; the running device may be controlled based on the target detection data of the target vehicle. For example, controlling the running device based on the target detection data of the target vehicle may include: target detection data of the target vehicle is input into the trained three-dimensional detection neural network to obtain three-dimensional detection information of the target object, and the driving device is controlled based on the detected three-dimensional detection information of the target object.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same concept, an embodiment of the present disclosure further provides a neural network training device, as shown in fig. 8, which is an architecture schematic diagram of the neural network training device provided in the embodiment of the present disclosure, and includes a first obtaining module 801 and a training module 802, specifically:

a first obtaining module 801, configured to obtain a sample image and two-dimensional annotation data of the sample image; the two-dimensional labeling data comprise detection frame information of a target vehicle in the sample image and at least one attribute information capable of representing a three-dimensional pose of the target vehicle;

a training module 802 for training a target neural network comprising a plurality of branch networks based on the sample image and the two-dimensional labeling data;

In one possible embodiment, the attribute information includes at least one of:

In a possible implementation manner, in the case that the attribute information includes first position information of the dividing point, the first obtaining module 801, when obtaining the two-dimensional annotation data of the sample image, is configured to:

In a possible implementation manner, in the case that the attribute information includes second position information of the contact point, the first obtaining module 801, when obtaining the two-dimensional annotation data of the sample image, is configured to:

Based on the same concept, an embodiment of the present disclosure further provides an image detection apparatus, as shown in fig. 9, which is an architecture schematic diagram of the image detection apparatus provided in the embodiment of the present disclosure, and includes a second obtaining module 901, a first generating module 902, and a determining module 903, specifically:

a second obtaining module 901, configured to obtain an image to be detected;

a first generating module 902, configured to input the image to be detected to a trained target neural network including a plurality of branch networks, so as to obtain two-dimensional detection frame information of a vehicle in the image to be detected, which is output in parallel by the plurality of branch networks, and at least one attribute information capable of representing a three-dimensional pose of the vehicle;

a determining module 903, configured to determine a detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information.

In a possible embodiment, the determining module 903, when determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information, is configured to:

In a possible embodiment, in the case that the at least one attribute information includes first position information of any demarcation point on a boundary between adjacent visible surfaces of the vehicle and second position information of a contact point between at least one visible wheel of the vehicle and the ground, the determining module 903 is configured to, when determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle:

Based on the same concept, an embodiment of the present disclosure further provides a driving control device, as shown in fig. 10, which is a schematic structural diagram of the driving control device provided in the embodiment of the present disclosure, and includes a third obtaining module 1001, a second generating module 1002, and a control module 1003, specifically:

a third obtaining module 1001, configured to obtain a road image acquired by a driving device in a driving process;

a second generating module 1002, configured to detect the road image by using the target neural network obtained through training by using the neural network training method described in the foregoing embodiment, so as to obtain target detection data of a target vehicle included in the road image;

a control module 1003 for controlling the running device based on the object detection data of the object vehicle included in the road image.

In some embodiments, the functions of the apparatus provided in the embodiments of the present disclosure or the included templates may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, no further description is provided here.

Based on the same technical concept, the embodiment of the disclosure also provides an electronic device. Referring to fig. 11, a schematic structural diagram of an electronic device provided in the embodiment of the present disclosure includes a processor 1101, a memory 1102, and a bus 1103. The storage 1102 is used for storing execution instructions and includes a memory 11021 and an external storage 11022; the memory 11021 is also referred to as an internal memory, and temporarily stores operation data in the processor 1101 and data exchanged with an external memory 11022 such as a hard disk, the processor 1101 exchanges data with the external memory 11022 through the memory 11021, and when the electronic device 1100 operates, the processor 1101 communicates with the memory 1102 through the bus 1103, so that the processor 1101 executes the following instructions:

Based on the same technical concept, the embodiment of the disclosure also provides an electronic device. Referring to fig. 12, a schematic structural diagram of an electronic device provided in the embodiment of the present disclosure includes a processor 1201, a memory 1202, and a bus 1203. The storage 1202 is used for storing execution instructions, and includes a memory 12021 and an external storage 12022; the memory 12021 is also referred to as an internal memory, and is used to temporarily store operation data in the processor 1201 and data exchanged with an external memory 12022 such as a hard disk, the processor 1201 exchanges data with the external memory 12022 through the memory 12021, and when the electronic apparatus 1200 is operated, the processor 1201 and the memory 1202 communicate with each other through the bus 1203 to cause the processor 1201 to execute the following instructions:

acquiring an image to be detected;

Based on the same technical concept, the embodiment of the disclosure also provides an electronic device. Referring to fig. 13, a schematic structural diagram of an electronic device provided in the embodiment of the present disclosure includes a processor 1301, a memory 1302, and a bus 1303. The storage 1302 is used for storing execution instructions and includes a memory 13021 and an external storage 13022; the memory 13021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 1301 and data exchanged with an external storage 13022 such as a hard disk, the processor 1301 exchanges data with the external storage 13022 through the memory 13021, and when the electronic device 1300 operates, the processor 1301 and the storage 1302 communicate through the bus 1303, so that the processor 1301 executes the following instructions:

acquiring a road image acquired by a driving device in the driving process;

detecting the road image by using a target neural network obtained by training with the neural network training method in the embodiment to obtain target detection data of a target vehicle included in the road image;

In addition, the present disclosure also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the neural network training method, the image detection method, and the driving control method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product bears a program code, and instructions included in the program code may be used to execute the steps of the neural network training method, the image detection method, and the driving control method described in the above method embodiments, which may be referred to in detail in the above method embodiments and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above are only specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A neural network training method, comprising:

2. The method of claim 1, wherein the attribute information comprises at least one of:

3. The method according to claim 2, wherein in the case where the orientation information includes first orientation information, the first orientation information includes: a first category characterizing the front and rear of the vehicle as invisible, a second category characterizing the rear and front of the vehicle as invisible, and a third category characterizing the front and rear of the vehicle as invisible.

4. The method according to claim 2 or 3, wherein in the case where the orientation information includes second orientation information, the second orientation information includes: a first intermediate category not visible from the front and rear of the vehicle and visible from the left side of the vehicle; a second intermediate category not visible from the front and rear of the vehicle and visible from the right side of the vehicle; a third intermediate category in which the vehicle is visible behind and not visible on the front of the vehicle, and not visible on the sides of the vehicle; a fourth intermediate category visible from the rear of the vehicle, not visible from the front of the vehicle, and visible from the right side of the vehicle; a fifth intermediate category visible from the rear of the vehicle, not visible from the front of the vehicle, and visible from the left side of the vehicle; a sixth intermediate category in which the front of the vehicle is visible, the rear of the vehicle is not visible, and the side of the vehicle is not visible; a seventh intermediate category visible from the front of the vehicle, not visible from the rear of the vehicle, and visible from the right side of the vehicle, an eighth intermediate category visible from the front of the vehicle, not visible from the rear of the vehicle, and visible from the left side of the vehicle.

5. The method according to claim 2, wherein in a case where the attribute information includes first position information of the dividing point, acquiring two-dimensional annotation data of the sample image includes:

6. The method according to claim 2, wherein in a case where the attribute information includes second position information of the contact point, acquiring two-dimensional annotation data of the sample image includes:

7. An image detection method, characterized in that the method comprises:

acquiring an image to be detected;

8. The method of claim 7, wherein determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information comprises:

9. The method of claim 7, wherein determining the detection result of the image to be detected based on the two-dimensional detection frame information of the vehicle and the at least one attribute information comprises:

10. The method according to claim 9, wherein in a case where the at least one attribute information includes first position information of any demarcation point on a boundary line between adjacent visible surfaces of the vehicle and second position information of a contact point between at least one visible wheel of the vehicle and a ground surface, the determining three-dimensional detection data of the vehicle based on the image to be detected, two-dimensional detection frame information of the vehicle included in the image to be detected, and at least one attribute information corresponding to the vehicle includes:

11. A travel control method characterized by comprising:

acquiring a road image acquired by a driving device in the driving process;

detecting the road image by using a target neural network obtained by training according to the neural network training method of any one of claims 1 to 6 to obtain target detection data of a target vehicle included in the road image;

12. A neural network training device, comprising:

13. An image detection apparatus, characterized in that the apparatus comprises:

the second acquisition module is used for acquiring an image to be detected;

14. A travel control device characterized by comprising:

a second generation module, configured to detect the road image by using the target neural network trained by the neural network training method according to any one of claims 1 to 6, so as to obtain target detection data of a target vehicle included in the road image;

15. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the neural network training method of any one of claims 1 to 6; or performing the steps of the image detection method according to any of claims 7 to 10; or the steps of the running control method according to claim 11.

16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the neural network training method according to any one of claims 1 to 6; or performing the steps of the image detection method according to any of claims 7 to 10; or the steps of the running control method according to claim 11.