CN111767843A

CN111767843A - Three-dimensional position prediction method, device, equipment and storage medium

Info

Publication number: CN111767843A
Application number: CN202010604409.XA
Authority: CN
Inventors: 舒茂
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-13
Anticipated expiration: 2040-06-29
Also published as: CN111767843B

Abstract

The embodiment of the application discloses a three-dimensional position prediction method, a three-dimensional position prediction device, three-dimensional position prediction equipment and a storage medium, and relates to the technical field of intelligent transportation and vehicle-road cooperation. One embodiment of the method comprises: acquiring a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera, wherein the ground depth map stores the distance between a ground point corresponding to a pixel point in the two-dimensional image and the roadside camera, and the two-dimensional image comprises an image of an obstacle located on the ground point; extracting surface features of the obstacles from the two-dimensional image and extracting depth features of the obstacles from the ground depth map; fusing the surface features and the depth features to generate fused features; and predicting the three-dimensional position of the obstacle based on the fusion features. The embodiment provides a new obstacle position prediction method, and meanwhile prediction cost and prediction precision are considered.

Description

Three-dimensional position prediction method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of intelligent transportation and vehicle-road cooperation, and particularly relates to a three-dimensional position prediction method, a three-dimensional position prediction device, three-dimensional position prediction equipment and a storage medium.

Background

The V2X vehicle-road cooperation technology is an effective way for solving the intelligent limitation of the automatic driving automobile. V2X promotes the perception ability of crossing complex environment through deploying the sensor at the roadside, sends barrier information to the autopilot car, greatly promotes autopilot's security.

Implementation of the vehicle-road coordination technique relies on accurate detection of obstacles and accurate prediction of three-dimensional positions. The commonly used method for predicting the three-dimensional position of the obstacle mainly comprises the following two methods: firstly, a radar sensor is used for acquiring point cloud, and then the three-dimensional position of an obstacle is predicted based on the point cloud; secondly, a camera sensor is used for acquiring an image, information such as a two-dimensional frame position, an object length, width and height, an object orientation angle and the like of an obstacle in the image is detected by a visual method, and then the three-dimensional position of the object is calculated by post-processing and modeling by using an object, a camera coordinate system and a ground coordinate system.

Disclosure of Invention

The embodiment of the application provides a three-dimensional position prediction method, a three-dimensional position prediction device, three-dimensional position prediction equipment and a storage medium.

In a first aspect, an embodiment of the present application provides a three-dimensional position prediction method, including: acquiring a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera, wherein the ground depth map stores the distance between a ground point corresponding to a pixel point in the two-dimensional image and the roadside camera, and the two-dimensional image comprises an image of an obstacle located on the ground point; extracting surface features of the obstacles from the two-dimensional image and extracting depth features of the obstacles from the ground depth map; fusing the surface features and the depth features to generate fused features; and predicting the three-dimensional position of the obstacle based on the fusion features.

In a second aspect, an embodiment of the present application provides a three-dimensional position prediction apparatus, including: the system comprises an acquisition module and a display module, wherein the acquisition module is configured to acquire a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera, the ground depth map stores the distance between a ground point corresponding to a pixel point in the two-dimensional image and the roadside camera, and the two-dimensional image comprises an image of an obstacle located on the ground point; an extraction module configured to extract surface features of the obstacle from the two-dimensional image and extract depth features of the obstacle from the ground depth map; a fusion module configured to fuse the surface features with the depth features to generate fused features; a prediction module configured to predict a three-dimensional position of the obstacle based on the fused features.

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application propose a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.

According to the three-dimensional position prediction method, the three-dimensional position prediction device, the three-dimensional position prediction equipment and the storage medium, firstly, a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera are obtained; then extracting surface features of the obstacles from the two-dimensional image, and extracting depth features of the obstacles from the ground depth map; then fusing the surface features and the depth features to generate fused features; and finally, predicting the three-dimensional position of the obstacle based on the fusion characteristics. A new obstacle position prediction method is provided, and both prediction cost and prediction accuracy are considered. The three-dimensional position of the obstacle can be predicted based on the two-dimensional image and the ground depth map of the camera only by deploying the roadside camera. Compared with the method for predicting the position of the obstacle based on the point cloud acquired by the radar sensor, the method reduces the prediction cost and can be deployed and applied in a large scale. Compared with the method for calculating the three-dimensional position based on the two-dimensional frame position modeling of the obstacle, the method does not depend on the detection precision of the two-dimensional frame position, avoids errors caused by post-processing geometric modeling, and improves the prediction precision. In addition, the three-dimensional position of the obstacle is sent to the automatic driving automobile, and the safety of automatic driving can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings. The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture to which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a three-dimensional position prediction method according to the present application;

FIG. 3 is a flow diagram of yet another embodiment of a three-dimensional position prediction method according to the present application;

FIG. 4 is a block flow diagram of a three-dimensional position prediction method that may implement an embodiment of the present application;

FIG. 5 is a schematic block diagram of one embodiment of a three-dimensional position prediction apparatus according to the present application;

fig. 6 is a block diagram of an electronic device for implementing the three-dimensional position prediction method according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the three-dimensional position prediction method or the three-dimensional position prediction apparatus of the present application may be applied.

As shown in fig. 1, a roadside camera 101, a network 102, a server 103, and an autonomous automobile 104 may be included in the system architecture 100. Network 102 is the medium used to provide communication links between roadside cameras 101, server 103, and autonomous cars 104. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The roadside camera 101 may capture a two-dimensional image of a road and transmit it to the server 103 through the network 102. The server 103 may perform processing such as analysis on data such as a two-dimensional image of a road, and transmit the processing result (e.g., the three-dimensional position of an obstacle) to the autonomous vehicle 104.

It should be noted that the three-dimensional position prediction method provided in the embodiment of the present application is generally executed by the server 103, and accordingly, the three-dimensional position prediction apparatus is generally disposed in the server 103.

It should be understood that the number of roadside cameras, networks, servers, and autonomous cars in FIG. 1 are merely illustrative. There may be any number of roadside cameras, networks, servers, and autonomous vehicles, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a three-dimensional position prediction method according to the present application is shown. The three-dimensional position prediction method comprises the following steps:

step 201, acquiring a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera.

In the present embodiment, the execution subject of the three-dimensional position prediction method (e.g., the server 103 shown in fig. 1) may acquire a two-dimensional image taken by the roadside camera, and a ground depth map of the roadside camera.

The roadside camera may be a camera which is deployed on one side of the road in advance and is used for acquiring two-dimensional images including the ground when the automatic driving automobile passes through. In general, the three-dimensional position prediction of an obstacle is performed only when the obstacle exists on the ground, and thus the two-dimensional image here includes an image of the obstacle located on a ground point. The obstacle may be an object in the vicinity of the autonomous vehicle, including but not limited to a car, a person, an item, and so forth.

The ground depth map may be the same as a two-dimensional image captured by a road side camera in size, and the pixel points correspond to one another. The ground depth map may also correspond to a portion of a two-dimensional image captured by a roadside camera. The ground depth map may be used to store distances between ground points corresponding to pixel points in the two-dimensional image and the roadside camera, which may also be referred to as depths. After the road side camera is deployed, the shooting range of the road side camera is fixed, the road side camera can correspond to the determined ground depth map, and each pixel point in the two-dimensional image shot by the road side camera also has a unique corresponding point in the ground depth map. The ground depth map can be obtained by calculating camera internal parameters and camera external parameters of the road side camera, and is irrelevant to the content of the shot two-dimensional image.

Optionally, the ground depth map is calculated as follows:

firstly, calibrating a road side camera to obtain camera internal parameters and camera external parameters of the road side camera.

And then, fitting to obtain a ground equation under a world coordinate system based on the information of the ground points in the high-precision map.

In the high-precision map, various data information of ground points in the shooting range of the roadside camera is selected, and a ground equation can be obtained through fitting. The data information of the ground points may include, but is not limited to, information of lane lines, various ground signs, and the like.

And then, obtaining a ground equation under the camera coordinate system of the road side camera based on the camera external parameters of the road side camera and the ground equation under the world coordinate system.

Here, the ground equation in the world coordinate system is transformed based on the camera external reference of the roadside camera, and the ground equation in the camera coordinate system of the roadside camera can be obtained.

And finally, calculating a ground depth map based on the camera internal parameters and the ground equation under the camera coordinate system of the roadside camera.

Here, for a pixel point (u, v) in the ground depth map, the corresponding ground point is (x, y, z), and the following equation system should be satisfied:

ax+by+cz+d＝0

where ax + by + cz + d is 0, a, b, c, and d are parameters of the ground equation in the camera coordinate system of the roadside camera, K is a camera parameter, and λ is a scale factor. And solving the equation system to obtain z corresponding to each (u, v), thus obtaining the ground depth map.

Step 202, extracting surface features of the obstacles from the two-dimensional image, and extracting depth features of the obstacles from the ground depth map.

In this embodiment, the execution subject may extract surface features of the obstacle from the two-dimensional image, and extract depth features of the obstacle from the ground depth map. The surface features may store surface features of obstacles in the two-dimensional image, including but not limited to size, texture, material, and the like. The depth feature may be a distance depth feature of a ground point corresponding to an obstacle in the two-dimensional image and the roadside camera. Here, the execution body described above may extract the surface feature and the depth feature in various ways, including but not limited to a conventional feature extraction method and a feature extraction method based on depth learning.

And step 203, fusing the surface features and the depth features to generate fused features.

In this embodiment, the execution subject may fuse the surface feature and the depth feature to generate a fused feature. In general, the execution agent may dimensionally fuse surface features with depth features using, for example, a concat merge function.

And step 204, predicting the three-dimensional position of the obstacle based on the fusion characteristics.

In this embodiment, the execution subject may predict the three-dimensional position of the obstacle based on the fusion feature. In general, the coordinates of the obstacle on the two-dimensional image are first predicted based on the surface features in the fused features, and then the depth distance between the ground point corresponding to the obstacle and the roadside camera is predicted based on the corresponding depth features. Therefore, by combining the surface feature and the depth feature in the fusion feature, the three-dimensional position of the obstacle can be predicted.

In addition, the three-dimensional position of the obstacle is usually sent to the autonomous vehicle, so that the autonomous vehicle can avoid the obstacle based on the three-dimensional position of the obstacle in the driving process, and the safety of autonomous driving is improved.

According to the three-dimensional position prediction method provided by the embodiment of the application, firstly, a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera are obtained; then extracting surface features of the obstacles from the two-dimensional image, and extracting depth features of the obstacles from the ground depth map; then fusing the surface features and the depth features to generate fused features; and finally, predicting the three-dimensional position of the obstacle based on the fusion characteristics. A new obstacle position prediction method is provided, and both prediction cost and prediction accuracy are considered. The three-dimensional position of the obstacle can be predicted based on the two-dimensional image and the ground depth map of the camera only by deploying the roadside camera. Compared with the method for predicting the position of the obstacle based on the point cloud acquired by the radar sensor, the method reduces the prediction cost and can be deployed and applied in a large scale. Compared with the method for calculating the three-dimensional position based on the two-dimensional frame position modeling of the obstacle, the method does not depend on the detection precision of the two-dimensional frame position, avoids errors caused by post-processing geometric modeling, and improves the prediction precision. In addition, the three-dimensional position of the obstacle is sent to the automatic driving automobile, and the safety of automatic driving can be improved.

With further reference to fig. 3, a flow 300 of yet another embodiment of a three-dimensional position prediction method according to the present application is shown. The three-dimensional position prediction method comprises the following steps:

step 301, acquiring a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera.

In this embodiment, the specific operation of step 301 has been described in detail in step 201 in the embodiment shown in fig. 2, and is not described herein again.

Step 302, inputting the two-dimensional image into a first branch of the double-flow neural network, outputting the surface features, and inputting the ground depth map into a second branch of the double-flow neural network, outputting the depth features.

In the present embodiment, the execution subject of the three-dimensional position prediction method (for example, the server 103 shown in fig. 1) may extract the surface feature and the depth feature of the obstacle using the dual-flow neural network.

Generally, a dual-flow neural network includes two branches: a first branch and a second branch. The network structure of the two branches is the same, and the two branches are backbone networks (backbones) of the basic object classification network, such as DenseNet121, ResNet34, and the like. The input of the first branch is a two-dimensional image, the input of the second branch is a ground depth map, and the two inputs are extracted through two branches of a double-current neural network to obtain surface features and depth features.

Step 303, fusing the surface features and the depth features to generate fused features.

In this embodiment, the specific operation of step 303 has been described in detail in step 203 in the embodiment shown in fig. 2, and is not described herein again.

And step 304, predicting the reference three-dimensional coordinates of the center point of the obstacle based on the fusion characteristics.

In this embodiment, the execution subject may predict the reference three-dimensional coordinates of the center point of the obstacle based on the fusion feature. Wherein the reference three-dimensional coordinates may be represented as (cx, cy, Z). (cx, cy) is a projection coordinate of the center point of the obstacle on the two-dimensional image, and Z is a coordinate of the center point of the obstacle on the vertical axis of the camera coordinate system of the roadside camera.

Generally, the execution main body can process the fusion features by using a preset network layer to obtain a reference three-dimensional coordinate of a center point of the obstacle, so that the precision and the speed of the obtained reference three-dimensional coordinate are improved. The preset network layer may include, but is not limited to, a convolutional layer (conv), a batch normalization layer (bn), an activation function layer (relu), and the like.

Step 305, generating a real three-dimensional coordinate of the center point of the obstacle based on the reference three-dimensional coordinate and the camera internal reference of the road side camera.

In this embodiment, the execution body may generate a real three-dimensional coordinate of a center point of the obstacle based on the reference three-dimensional coordinate and the camera internal reference of the roadside camera.

Wherein, the real three-dimensional coordinates (X, Y, Z) of the center point of the obstacle can be calculated by the following formula:

as can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the three-dimensional position prediction method in the present embodiment highlights the feature extraction step and the three-dimensional coordinate determination step. Therefore, in the scheme described in this embodiment, the double-flow neural network is used to predict the three-dimensional position of the obstacle end to end, so that the generalization problem of three-dimensional detection performed by different cameras is solved, and the position prediction accuracy is improved.

For ease of understanding, the following provides a flow chart of a three-dimensional position prediction method that may implement embodiments of the present application. As shown in fig. 4, a two-dimensional image 401 taken by a roadside camera and a ground depth map 402 of the roadside camera are acquired first; the two-dimensional image 401 is then input into a first branch 403 of a two-flow neural network, outputting the surface features F of the obstacle_image405, inputting the ground depth map 402 into a second branch 404 of the dual-flow neural network, and outputting a depth characteristic F of the obstacle_depth406; then F is put_image405 and F_depth406 to generate fused features 407; then, based on the fusion features 407, reference three-dimensional coordinates (cx, cy, Z) of the center point of the obstacle are predicted; finally, the real three-dimensional coordinates (X, Y, Z) of the center point of the obstacle are generated based on the reference three-dimensional coordinates (cx, cy, Z) and the camera parameters 408 of the roadside camera.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a three-dimensional position prediction apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the three-dimensional position prediction apparatus 500 of the present embodiment may include: an acquisition module 501, an extraction module 502, a fusion module 503, and a prediction module 504. The acquiring module 501 is configured to acquire a two-dimensional image captured by a roadside camera and a ground depth map of the roadside camera, wherein the ground depth map stores distances between a ground point corresponding to a pixel point in the two-dimensional image and the roadside camera, and the two-dimensional image includes an image of an obstacle located on the ground point; an extraction module 502 configured to extract surface features of the obstacle from the two-dimensional image and extract depth features of the obstacle from the ground depth map; a fusion module 503 configured to fuse the surface features with the depth features to generate fused features; a prediction module 504 configured to predict a three-dimensional position of the obstacle based on the fused features.

In the present embodiment, the three-dimensional position prediction apparatus 500: the specific processing and the technical effects thereof of the obtaining module 501, the extracting module 502, the fusing module 503 and the predicting module 504 can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of the present embodiment, the dual-flow neural network includes a first branch and a second branch; and the extraction module 502 is further configured to: inputting the two-dimensional image into a first branch, outputting the surface features, and inputting the ground depth map into a second branch, outputting the depth features.

In some optional implementations of this embodiment, the prediction module 504 includes: a prediction sub-module (not shown in the drawings) configured to predict a reference three-dimensional coordinate of the center point of the obstacle based on the fusion feature, wherein an abscissa and an ordinate of the reference three-dimensional coordinate are projected coordinates of the center point of the obstacle on the two-dimensional image, and an ordinate of the reference three-dimensional coordinate is a coordinate of the center point of the obstacle on a vertical axis of a camera coordinate system of the roadside camera; a generation submodule (not shown in the figure) configured to generate real three-dimensional coordinates of a center point of the obstacle based on the reference three-dimensional coordinates and camera parameters of the roadside camera.

In some optional implementations of this embodiment, the prediction sub-module is further configured to: and processing the fusion characteristics by using a preset network layer to obtain a reference three-dimensional coordinate of the center point of the barrier.

In some optional implementations of this embodiment, the obtaining module 501 is further configured to: calibrating the roadside camera to obtain camera internal parameters and camera external parameters of the roadside camera; fitting to obtain a ground equation under a world coordinate system based on information of ground points in the high-precision map; obtaining a ground equation under a camera coordinate system of the road side camera based on camera external parameters of the road side camera and the ground equation under the world coordinate system; and calculating a ground depth map based on the camera internal parameters and the ground equation under the camera coordinate system of the roadside camera.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the three-dimensional position prediction method provided herein. A non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the three-dimensional position prediction method provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the three-dimensional position prediction method in the embodiment of the present application (for example, the obtaining module 501, the extracting module 502, the fusing module 503, and the predicting module 504 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implements the three-dimensional position prediction method in the above-described method embodiments.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of the three-dimensional position prediction method, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to the electronics of the three-dimensional position prediction method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the three-dimensional position prediction method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the three-dimensional position prediction method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the application, firstly, a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera are obtained; then extracting surface features of the obstacles from the two-dimensional image, and extracting depth features of the obstacles from the ground depth map; then fusing the surface features and the depth features to generate fused features; and finally, predicting the three-dimensional position of the obstacle based on the fusion characteristics. A new obstacle position prediction method is provided, and both prediction cost and prediction accuracy are considered. The three-dimensional position of the obstacle can be predicted based on the two-dimensional image and the ground depth map of the camera only by deploying the roadside camera. Compared with the method for predicting the position of the obstacle based on the point cloud acquired by the radar sensor, the method reduces the prediction cost and can be deployed and applied in a large scale. Compared with the method for calculating the three-dimensional position based on the two-dimensional frame position modeling of the obstacle, the method does not depend on the detection precision of the two-dimensional frame position, avoids errors caused by post-processing geometric modeling, and improves the prediction precision. In addition, the three-dimensional position of the obstacle is sent to the automatic driving automobile, and the safety of automatic driving can be improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A three-dimensional position prediction method, comprising:

acquiring a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera, wherein the ground depth map stores the distance between a ground point corresponding to a pixel point in the two-dimensional image and the roadside camera, and the two-dimensional image comprises an image of an obstacle located on the ground point;

extracting surface features of the obstacle from the two-dimensional image and extracting depth features of the obstacle from the ground depth map;

fusing the surface features with the depth features to generate fused features;

predicting a three-dimensional position of the obstacle based on the fused features.

2. The method of claim 1, wherein the bifilar neural network comprises a first branch and a second branch; and

the extracting surface features of the obstacle from the two-dimensional image and extracting depth features of the obstacle from the ground depth map comprises:

inputting the two-dimensional image to the first branch, outputting the surface feature, and inputting the ground depth map to the second branch, outputting the depth feature.

3. The method of claim 1, wherein said predicting a three-dimensional location of the obstacle based on the fused feature comprises:

predicting a reference three-dimensional coordinate of a center point of the obstacle based on the fused feature, wherein an abscissa and an ordinate of the reference three-dimensional coordinate are projected coordinates of the center point of the obstacle on the two-dimensional image, and an ordinate of the reference three-dimensional coordinate is a coordinate of the center point of the obstacle on a vertical axis of a camera coordinate system of the roadside camera;

generating a real three-dimensional coordinate of a center point of the obstacle based on the reference three-dimensional coordinate and camera parameters of the roadside camera.

4. The method of claim 3, wherein said predicting, based on the fused feature, reference three-dimensional coordinates of a center point of the obstacle comprises:

and processing the fusion characteristics by utilizing a preset network layer to obtain a reference three-dimensional coordinate of the center point of the barrier.

5. The method of claim 1, wherein the obtaining the ground depth map of the roadside camera comprises:

calibrating the roadside camera to obtain camera internal parameters and camera external parameters of the roadside camera;

fitting to obtain a ground equation under a world coordinate system based on the information of the ground points in the high-precision map;

obtaining a ground equation under the camera coordinate system of the road side camera based on the camera external parameters of the road side camera and the ground equation under the world coordinate system;

and calculating the ground depth map based on the camera internal parameters and the ground equation under the camera coordinate system of the roadside camera.

6. A three-dimensional position prediction apparatus comprising:

the system comprises an acquisition module and a display module, wherein the acquisition module is configured to acquire a two-dimensional image shot by a roadside camera and a ground depth map of the roadside camera, the ground depth map stores the distance between a ground point corresponding to a pixel point in the two-dimensional image and the roadside camera, and the two-dimensional image comprises an image of an obstacle located on the ground point;

an extraction module configured to extract surface features of the obstacle from the two-dimensional image and to extract depth features of the obstacle from the ground depth map;

a fusion module configured to fuse the surface features with the depth features, generating fused features;

a prediction module configured to predict a three-dimensional position of the obstacle based on the fused features.

7. The apparatus of claim 6, wherein the bifilar neural network comprises a first branch and a second branch; and

the extraction module is further configured to:

8. The apparatus of claim 6, wherein the prediction module comprises:

a prediction sub-module configured to predict a reference three-dimensional coordinate of the center point of the obstacle based on the fused feature, wherein an abscissa and an ordinate of the reference three-dimensional coordinate are projected coordinates of the center point of the obstacle on the two-dimensional image, and an ordinate of the reference three-dimensional coordinate is a coordinate of the center point of the obstacle on a vertical axis of a camera coordinate system of the roadside camera;

a generation submodule configured to generate real three-dimensional coordinates of a center point of the obstacle based on the reference three-dimensional coordinates and camera parameters of the roadside camera.

9. The apparatus of claim 8, wherein the prediction sub-module is further configured to:

10. The apparatus of claim 6, wherein the acquisition module is further configured to:

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-5.