CN114119757A

CN114119757A - Image processing method, apparatus, device, medium, and computer program product

Info

Publication number: CN114119757A
Application number: CN202111548086.8A
Authority: CN
Inventors: 李德辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-01

Abstract

The application relates to an image processing method, an image processing device, an image processing apparatus, an image processing medium and a computer program product, which belong to the technical field of maps and the technical field of vehicle networking, and the method comprises the following steps: acquiring an image sequence comprising a plurality of sample onboard images; respectively taking each sample vehicle-mounted image as a current sample vehicle-mounted image, and inputting the current sample vehicle-mounted image into a depth prediction network to be trained to obtain depth information of each pixel point in the current sample vehicle-mounted image; generating a current three-dimensional point cloud of a current sample vehicle-mounted image based on the depth information; acquiring vehicle pose information from a vehicle-mounted pose sensor; reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information; training a depth prediction network based on the difference between the adjacent sample on-board images and the adjacent on-board images; and the trained depth prediction network is used for predicting the pixel depth of the target vehicle-mounted image. By adopting the method, the training cost of the deep prediction network can be reduced.

Description

Image processing method, apparatus, device, medium, and computer program product

Technical Field

The present application relates to artificial intelligence technology, and more particularly to the field of car networking technology, and more particularly to an image processing method, apparatus, device, medium, and computer program product.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

With the development of artificial intelligence technology, predicting depth information of an image through a depth learning network has become a mainstream way for acquiring image depth information. In the conventional technology, a deep learning network is usually trained in a supervised training manner, that is, an image marked with depth information is used as training data to train the deep learning network. However, in the current deep learning network training mode, a large amount of manpower is required to label the image depth information, so that the training cost of the deep learning network is high.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an image processing method, an apparatus, a device, a medium, and a computer program product capable of reducing the training cost of a depth prediction network.

A method of image processing, the method comprising:

acquiring an image sequence; the image sequence comprises a plurality of sample vehicle-mounted images collected in sequence;

respectively taking each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, inputting the current sample vehicle-mounted image into a depth prediction network to be trained, and predicting to obtain depth information of each pixel point in the current sample vehicle-mounted image;

generating a current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth information;

acquiring vehicle pose information determined based on the current sample vehicle-mounted image and a next adjacent sample vehicle-mounted image from a vehicle-mounted pose sensor;

reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information;

training the depth prediction network based on a difference between the adjacent sample onboard images and reconstructed adjacent onboard images; and the trained depth prediction network is used for predicting the pixel depth of the target vehicle-mounted image.

An image processing apparatus, the apparatus comprising:

an acquisition module for acquiring an image sequence; the image sequence comprises a plurality of sample vehicle-mounted images collected in sequence;

the prediction module is used for respectively taking each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, inputting the current sample vehicle-mounted image into a depth prediction network to be trained, and predicting to obtain depth information of each pixel point in the current sample vehicle-mounted image;

the generating module is used for generating a current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth information;

the acquisition module is further used for acquiring vehicle pose information determined based on the current sample vehicle-mounted image and a next adjacent sample vehicle-mounted image from a vehicle-mounted pose sensor;

the reconstruction module is used for reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information;

the training module is used for training the depth prediction network based on the difference between the adjacent sample vehicle-mounted image and the reconstructed adjacent vehicle-mounted image; and the trained depth prediction network is used for predicting the pixel depth of the target vehicle-mounted image.

In one embodiment, the current sample onboard image is acquired by an onboard camera; the depth information comprises depth coordinates of all pixel points in the current sample vehicle-mounted image under a camera coordinate system; the generating module is further used for acquiring plane coordinates of all pixel points in the current sample vehicle-mounted image under a camera coordinate system; the camera coordinate system is a coordinate system established by taking the optical center of the vehicle-mounted camera as an origin; and generating the current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth coordinate and the plane coordinate of each pixel point.

In one embodiment, the generation module is further configured to determine a focus coordinate of the focus of the onboard camera in the camera coordinate system; determining the current pixel coordinate of each pixel point in the current sample vehicle-mounted image under the current pixel coordinate system; the current pixel coordinate system is a pixel coordinate system established based on the current sample vehicle-mounted image; and determining the plane coordinates of each pixel point in a camera coordinate system according to the focus coordinates and the current pixel coordinates of each pixel point in the current pixel coordinate system.

In one embodiment, the current sample onboard image is acquired by an onboard camera; the reconstruction module is further used for determining adjacent three-dimensional point clouds of the adjacent sample vehicle-mounted images based on the current three-dimensional point cloud and the vehicle pose information; and reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image according to the adjacent three-dimensional point cloud and the internal parameters of the vehicle-mounted camera.

In one embodiment, the vehicle pose information includes an offset distance and a rotation angle that occur from a current position to a next adjacent position of the vehicle; the current position is the position of the vehicle when the vehicle-mounted image of the current sample is acquired; the next adjacent position is the position of the vehicle when the vehicle-mounted image of the next adjacent sample is acquired; and the reconstruction module is also used for respectively adjusting each point in the current three-dimensional point cloud according to the offset distance and the rotation angle to obtain the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

In one embodiment, the reconstruction module is further configured to construct an offset matrix based on the offset distance; constructing a rotation matrix based on the rotation angle; and carrying out offset processing on each point in the current three-dimensional point cloud according to the offset matrix, and carrying out rotation processing on each point in the current three-dimensional point cloud according to the rotation matrix to obtain adjacent three-dimensional point clouds of the adjacent sample vehicle-mounted images.

In one embodiment, the internal parameters include focus coordinates of a focus of the onboard camera in a camera coordinate system; the reconstruction module is further used for constructing a reconstruction matrix based on the focus coordinates; converting the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud into two-dimensional coordinates based on the reconstruction matrix; and constructing the next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the two-dimensional coordinates of each point.

In one embodiment, the sample onboard image comprises a sample onboard monocular image; the acquisition module is also used for acquiring a plurality of sample vehicle-mounted monocular images through a vehicle-mounted monocular camera; and generating the image sequence according to the sequence of the acquisition time corresponding to the plurality of sample vehicle-mounted monocular images respectively.

In one embodiment, the apparatus further comprises:

the navigation module is used for carrying out road element detection on a target vehicle-mounted image so as to identify road elements from the target vehicle-mounted image; predicting the depth information of each pixel point in the road element based on the trained depth prediction network; and generating navigation information of the target road element based on the depth information of each pixel point in the road element.

In one embodiment, the target vehicle-mounted image is acquired by a vehicle-mounted camera arranged on a current vehicle; the device further comprises:

the early warning module is used for carrying out vehicle detection on the target vehicle-mounted image so as to identify an image area corresponding to a front vehicle from the target vehicle-mounted image; acquiring depth information of each pixel point in the image area; the depth information of each pixel point in the image area is obtained by the depth prediction network prediction after the training is finished; determining a relative distance between the current vehicle and the front vehicle based on depth information of each pixel point in the image area; and carrying out collision early warning based on the relative distance.

In one embodiment, the apparatus further comprises:

the writing module is used for carrying out road element detection on the target vehicle-mounted image so as to identify road elements from the target vehicle-mounted image; acquiring depth information of each pixel point in the road element; the depth information of each pixel point in the road element is obtained by the depth prediction network prediction after the training is finished; and writing the target road element into a vehicle navigation map based on the depth information of each pixel point in the road element.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

A computer program product comprising a computer program which when executed by a processor performs the steps of:

According to the image processing method, the image processing device, the image processing equipment, the image processing medium and the computer program product, the image sequence comprising the plurality of sequentially collected sample vehicle-mounted images is obtained, each sample vehicle-mounted image in the image sequence is respectively used as a current sample vehicle-mounted image, the current sample vehicle-mounted image is input into the depth prediction network to be trained, and the depth information of each pixel point in the current sample vehicle-mounted image can be obtained in a prediction mode. Based on the depth information, a current three-dimensional point cloud of a current sample vehicle-mounted image can be generated, and vehicle pose information determined based on the current sample vehicle-mounted image and a next adjacent sample vehicle-mounted image can be directly acquired from the vehicle pose sensor. Based on the current three-dimensional point cloud and the vehicle pose information, the next adjacent vehicle-mounted image of the current sample vehicle-mounted image can be reconstructed. Based on the difference between the adjacent sample vehicle-mounted images and the reconstructed adjacent vehicle-mounted images, the depth prediction network can be directly subjected to unsupervised training. Compared with the traditional supervised training mode, the method and the device have the advantages that the adjacent vehicle-mounted images corresponding to the adjacent sample vehicle-mounted images can be reconstructed directly through the current sample vehicle-mounted images in the image sequence, the depth information of the current sample vehicle-mounted images and the vehicle pose information determined based on the current sample vehicle-mounted images and the next adjacent sample vehicle-mounted images, the unsupervised training of the depth prediction network can be directly realized based on the difference between the adjacent sample vehicle-mounted images and the adjacent vehicle-mounted images, the image depth information does not need to be marked in advance, and the training cost of the depth prediction network is greatly reduced.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of an image processing method;

FIG. 2 is a flow diagram illustrating a method for image processing according to one embodiment;

FIG. 3 is a diagram illustrating an on-board image of a target in one embodiment;

FIG. 4 is a depth image schematic of a target in-vehicle image in one embodiment;

FIG. 5 is a schematic diagram illustrating a driving process of a current vehicle and a preceding vehicle in one embodiment;

FIG. 6 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 7 is a flowchart illustrating an image processing method according to another embodiment;

FIG. 8 is a block diagram showing the configuration of an image processing apparatus according to an embodiment;

FIG. 9 is a block diagram showing the construction of an image processing apparatus according to another embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The image processing method provided by the application can be applied to the application environment shown in fig. 1. Wherein the onboard camera 104 may capture a sample onboard image and the computer device 102 may communicate with the onboard camera 104 to obtain the sample onboard image. The computer device 102 may be a server or a terminal, where the terminal may be but is not limited to various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and vehicle-mounted terminals, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms. The computer device 102 and the onboard camera 104 may be connected directly or indirectly through wired or wireless communication, and the application is not limited thereto.

The computer device 102 may acquire a sequence of images including a plurality of sample onboard images sequentially acquired by an onboard camera 104 in the onboard terminal. The computer device 102 may directly obtain the sample onboard image from the onboard camera 104, or after the onboard camera 104 stores the collected sample onboard image in a server, the computer device 102 obtains the sample onboard image from the server. The computer device 102 may use each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, and input the current sample vehicle-mounted image into the depth prediction network to be trained, so as to obtain depth information of each pixel point in the current sample vehicle-mounted image by prediction. The computer device 102 may generate a current three-dimensional point cloud of a current sample onboard image based on the depth information, and obtain vehicle pose information determined based on the current sample onboard image and a next adjacent sample onboard image from an onboard pose sensor. The computer device 102 may reconstruct a next adjacent on-board image of the current sample on-board image based on the current three-dimensional point cloud and the vehicle pose information, train the depth prediction network based on a difference between the adjacent sample on-board image and the reconstructed adjacent on-board image. The trained depth prediction network can be used for predicting the pixel depth of the target vehicle-mounted image.

It is understood that the computer device 102 may be a server or the in-vehicle terminal itself.

It should be noted that the image processing method in some embodiments of the present application uses an artificial intelligence technique. For example, the depth information of each pixel point in the current sample vehicle-mounted image belongs to the depth information predicted by using an artificial intelligence technology.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, intelligent transportation and other technologies, and also includes common biometric identification technologies such as face recognition and fingerprint recognition. It should be noted that the image processing method in some embodiments of the present application uses computer vision technology. For example, the current three-dimensional point cloud of the current sample vehicle-mounted image belongs to point cloud information generated by using a computer vision technology, and the next adjacent vehicle-mounted image of the current sample vehicle-mounted image also belongs to an image reconstructed by using the computer vision technology.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. It should be noted that the image processing method in some embodiments of the present application uses machine learning. For example, the deep prediction network after training belongs to a neural network obtained by machine learning training.

The automatic driving technology generally comprises technologies such as high-precision maps, environment perception, behavior decision, path planning, motion control and the like. It should be noted that the image processing method in some embodiments of the present application uses an automatic driving technique. For example, the vehicle-mounted posture sensor in the present application may be a sensor provided in an autonomous vehicle. The target vehicle-mounted image in the present application may be an image captured by a vehicle-mounted camera provided in the autonomous vehicle. The predicted pixel depth of the target vehicle-mounted image can be applied to obstacle early warning of an automatic driving vehicle in the automatic driving process.

In one embodiment, as shown in fig. 2, an image processing method is provided, which is applicable to the computer device 102 and also to the interaction process of the computer device 102 and the server 104. The embodiment is described by taking the method as an example applied to the computer device 102 in fig. 1, and includes the following steps:

step 202, acquiring an image sequence; the image sequence comprises a plurality of sequentially acquired sample vehicle-mounted images.

The sample on-vehicle image is an on-vehicle image as training data, and it can be understood that the sample on-vehicle image is an on-vehicle image for training the depth prediction network. The on-vehicle image is an image captured by an on-vehicle camera provided in the vehicle in an on-vehicle scene. The in-order acquisition refers to an acquisition mode that in the running process of the vehicle, the vehicle-mounted camera of the vehicle sequentially acquires images according to the sequence of the running time. The image sequence refers to a sequence comprising a plurality of sequentially acquired sample vehicle-mounted images.

In one embodiment, the server has stored therein an image sequence comprising a plurality of sequentially acquired sample onboard images, and the computer device may be in communication with the server and may obtain the image sequence directly from the server.

In one embodiment, the computer device may be deployed on a vehicle including an onboard camera, and the computer device may obtain the plurality of sample onboard images in sequence through the onboard camera on the vehicle during the driving of the vehicle, and directly generate an image sequence including the plurality of sample onboard images in sequence based on the plurality of sample onboard images obtained in sequence.

And 204, taking each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, inputting the current sample vehicle-mounted image into a depth prediction network to be trained, and predicting to obtain depth information of each pixel point in the current sample vehicle-mounted image.

The current sample on-board image is a currently processed sample on-board image, and it can be understood that the image sequence includes a plurality of sample on-board images, and the currently processed sample on-board image is the current sample on-board image. The depth prediction network is a neural network used for predicting the depth information of each pixel point in the image. The depth information of each pixel point in the current sample vehicle-mounted image refers to the distance information between each pixel point and the vehicle-mounted camera for collecting the current sample vehicle-mounted image.

Specifically, the image sequence comprises a plurality of sample vehicle-mounted images, and the computer device can respectively take each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image and input the current sample vehicle-mounted image into the depth prediction network to be trained. The computer equipment can carry out depth prediction on the current sample vehicle-mounted image through a depth prediction network to be trained to obtain the depth information of each pixel point in the current sample vehicle-mounted image. It can be understood that the computer device may respectively input each sample vehicle-mounted image in the image sequence to the depth prediction network to be trained, and perform depth prediction on each sample vehicle-mounted image in the image sequence sequentially input through the depth prediction network to be trained, so as to obtain depth information of each pixel point in each sample vehicle-mounted image in the image sequence.

In one embodiment, FIG. 3 is a sample in-vehicle image including a vehicle 301, a person 302, and a sky's white cloud 303. The computer device can predict the depth information of each pixel point in the sample vehicle-mounted image through the depth prediction network to be trained, and obtain the depth image of the sample vehicle-mounted image as shown in fig. 4. As can be seen from fig. 4, the distance between the pixel point of the image area corresponding to the vehicle 301 and the vehicle-mounted camera that captures the sample vehicle-mounted image is closest, while the distance between the pixel point of the image area corresponding to the person 302 and the sky white cloud 303 and the vehicle-mounted camera that captures the sample vehicle-mounted image is farther.

And step 206, generating a current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth information.

The three-dimensional point cloud is a three-dimensional data expression mode, and it can be understood that the three-dimensional point cloud is a three-dimensional coordinate point of each pixel point in an image. And the current three-dimensional point cloud is the three-dimensional point cloud corresponding to the current sample vehicle-mounted image.

Specifically, the computer device may convert the two-dimensional coordinates of each pixel point in the current sample vehicle-mounted image into three-dimensional coordinates based on the depth information, so as to obtain a current three-dimensional point cloud of the current sample vehicle-mounted image.

In one embodiment, the computer device may coordinate convert the two-dimensional coordinates of each pixel point in the current sample onboard image. The computer device may generate a current three-dimensional point cloud of the current sample vehicle-mounted image based on the converted two-dimensional coordinates of the pixel points and depth information of each pixel point in the current sample vehicle-mounted image. It can be understood that the two-dimensional coordinates of the pixel points after the coordinate conversion are coordinates of two dimensions in a three-dimensional coordinate system, and the depth information of each pixel point in the current sample vehicle-mounted image can be directly used as the coordinates of the third dimension in the three-dimensional coordinate system, so as to obtain the current three-dimensional point cloud of the current sample vehicle-mounted image.

And 208, acquiring vehicle pose information determined based on the current sample vehicle-mounted image and the next adjacent sample vehicle-mounted image from the vehicle-mounted pose sensor.

The vehicle-mounted pose sensor (IMU, Inertial Measurement Unit) is a sensor provided in a vehicle and used for acquiring vehicle pose information. The vehicle-mounted pose sensor can automatically and accurately acquire the pose information of the vehicle without depending on any pose prediction algorithm in the running process of the vehicle. The vehicle pose information is the information of the attitude change of the vehicle in the space in the process from the acquisition of the current sample vehicle-mounted image to the acquisition of the next adjacent sample vehicle-mounted image. The next adjacent sample on-board image is the next frame of sample on-board image adjacent to the current sample on-board image in the image sequence, that is, the sample on-board image that is actually acquired after the current sample on-board image and is adjacent to the current sample on-board image.

Specifically, a vehicle-mounted pose sensor is deployed in the vehicle, and the vehicle-mounted pose sensor deployed in the vehicle can automatically determine vehicle pose information based on a current sample vehicle-mounted image and a next adjacent sample vehicle-mounted image and send the vehicle pose information to the computer device. The computer device may receive vehicle pose information sent by the vehicle-mounted pose sensor.

And step 210, reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information.

And the next adjacent vehicle-mounted image is the reconstructed vehicle-mounted image which has basically the same image content as the next adjacent sample vehicle-mounted image. The substantially same image content means that the main content of the images is the same, but there is a slight difference in image details.

Specifically, the computer device may adjust a current three-dimensional point cloud of the current sample vehicle-mounted image based on the vehicle pose information to obtain a processed three-dimensional point cloud. The computer device may reconstruct a next adjacent onboard image of the current sample onboard image based on the processed three-dimensional point cloud.

In one embodiment, the computer device may convert a current three-dimensional point cloud of a current sample onboard image to a three-dimensional point cloud corresponding to an adjacent sample onboard image based on the vehicle pose information. The computer device may reconstruct a next adjacent onboard image of the current sample onboard image based on the three-dimensional point cloud corresponding to the adjacent sample onboard image.

Step 212, training a depth prediction network based on the difference between the adjacent sample vehicle-mounted image and the reconstructed adjacent vehicle-mounted image; and the trained depth prediction network is used for predicting the pixel depth of the target vehicle-mounted image.

The target vehicle-mounted image is a vehicle-mounted image serving as a target, and the target vehicle-mounted image is a vehicle-mounted image to be predicted acquired by a vehicle-mounted camera deployed in a vehicle in the process of completing depth prediction network training and putting the vehicle-mounted image into practical application.

Specifically, the computer device may extract image features based on the adjacent sample onboard images and reconstructed image features of the adjacent onboard images, respectively, and determine a difference between the adjacent sample onboard images and the reconstructed adjacent onboard images based on the image features of the adjacent sample onboard images and the reconstructed image features of the adjacent onboard images. Thus, the computer device may iteratively train the depth prediction network in a direction that reduces a difference between the adjacent sample onboard image and the reconstructed adjacent onboard image until a training stop condition is reached, resulting in a trained depth prediction network.

In one embodiment, the training stop condition may be that a difference between the adjacent sample onboard image and the reconstructed adjacent onboard image is smaller than a preset difference threshold, or that the number of iterations reaches a preset number of learning times.

In the image processing method, an image sequence comprising a plurality of sequentially acquired sample vehicle-mounted images is obtained, each sample vehicle-mounted image in the image sequence is respectively used as a current sample vehicle-mounted image, the current sample vehicle-mounted image is input into a depth prediction network to be trained, and depth information of each pixel point in the current sample vehicle-mounted image can be obtained through prediction. Based on the depth information, a current three-dimensional point cloud of a current sample vehicle-mounted image can be generated, and vehicle pose information determined based on the current sample vehicle-mounted image and a next adjacent sample vehicle-mounted image can be directly acquired from the vehicle pose sensor. Based on the current three-dimensional point cloud and the vehicle pose information, the next adjacent vehicle-mounted image of the current sample vehicle-mounted image can be reconstructed. Based on the difference between the adjacent sample vehicle-mounted images and the reconstructed adjacent vehicle-mounted images, the depth prediction network can be directly subjected to unsupervised training. Compared with the traditional supervised training mode, the method and the device have the advantages that the adjacent vehicle-mounted images corresponding to the adjacent sample vehicle-mounted images can be reconstructed directly through the current sample vehicle-mounted images in the image sequence, the depth information of the current sample vehicle-mounted images and the vehicle pose information determined based on the current sample vehicle-mounted images and the next adjacent sample vehicle-mounted images, the unsupervised training of the depth prediction network can be directly realized based on the difference between the adjacent sample vehicle-mounted images and the adjacent vehicle-mounted images, the image depth information does not need to be marked in advance, and the training cost of the depth prediction network is greatly reduced.

Meanwhile, compared with a binocular image pair-based unsupervised training mode with high cost, the method and the device for the depth prediction network training can further reduce the training cost of the depth prediction network based on the unsupervised training mode of the image sequence.

In addition, according to the unsupervised training mode based on the image sequence, accurate vehicle pose information determined based on the current sample vehicle-mounted image and the next adjacent sample vehicle-mounted image can be directly acquired from the vehicle-mounted pose sensor, and the accurate vehicle pose information is not acquired through an additional pose prediction network, namely, the pose prediction network is not required to be trained additionally, only the depth prediction network is required to be trained, the complexity of network training is reduced, and meanwhile, the depth prediction accuracy of the trained depth prediction network is improved.

In one embodiment, the current sample onboard image is acquired by an onboard camera; the depth information comprises depth coordinates of all pixel points in the current sample vehicle-mounted image under a camera coordinate system; generating a current three-dimensional point cloud of a current sample onboard image based on the depth information, comprising: acquiring plane coordinates of each pixel point in the current sample vehicle-mounted image under a camera coordinate system; the camera coordinate system is a coordinate system established by taking the optical center of the vehicle-mounted camera as an original point; and generating the current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth coordinate and the plane coordinate of each pixel point.

The depth coordinate of each pixel point in the camera coordinate system is a coordinate of each pixel point in the camera coordinate system representing pixel depth information. The plane coordinates of each pixel point in the camera coordinate system are coordinates of each pixel point in the camera coordinate system representing pixel plane information. For example, the camera coordinate system includes an X axis, a Y axis, and a Z axis, and then the plane coordinates of each pixel point in the camera coordinate system include an X value and a Y value, which may be represented as (X, Y, 0), and the depth coordinates of each pixel point in the camera coordinate system include a Z value, which may be represented as (0, 0, Z), so that the current three-dimensional point cloud (X, Y, Z) of the current sample vehicle-mounted image is generated based on the depth coordinates (X, Y, 0) and the plane coordinates (0, 0, Z) of each pixel point.

Specifically, the computer device can perform coordinate conversion on coordinates of each pixel point in the current sample vehicle-mounted image to obtain plane coordinates of each pixel point in the current sample vehicle-mounted image in the camera coordinate system. And then, the computing equipment can perform coordinate fusion on the depth coordinate and the plane coordinate of each pixel point to obtain the current three-dimensional point cloud of the current sample vehicle-mounted image.

In one embodiment, the computer device may perform coordinate transformation on coordinates of each pixel point in the current sample vehicle-mounted image based on internal parameters of the vehicle-mounted camera to obtain plane coordinates of each pixel point in the current sample vehicle-mounted image in the camera coordinate system.

In the above embodiment, the current three-dimensional point cloud of the current sample vehicle-mounted image can be generated by obtaining the plane coordinates of each pixel point in the current sample vehicle-mounted image in the camera coordinate system and based on the depth coordinates and the plane coordinates of each pixel point, so that the two-dimensional pixel points in the current sample vehicle-mounted image are converted into three-dimensional point cloud information.

In one embodiment, obtaining the plane coordinates of each pixel point in the current sample vehicle-mounted image in the camera coordinate system includes: determining a focus coordinate of a focus of the vehicle-mounted camera under a camera coordinate system; determining the current pixel coordinate of each pixel point in the current sample vehicle-mounted image under the current pixel coordinate system; the current pixel coordinate system is a pixel coordinate system established based on the current sample vehicle-mounted image; and determining the plane coordinates of each pixel point in the camera coordinate system according to the focus coordinates and the current pixel coordinates of each pixel point in the current pixel coordinate system.

And the current pixel coordinate is the pixel coordinate of each pixel point in the current sample vehicle-mounted image in the current pixel coordinate system.

Specifically, the computer device may determine focal point coordinates of a focal point of the onboard camera in a camera coordinate system based on internal parameters of the onboard camera. It is understood that the internal parameters of the vehicle-mounted camera include the focal coordinates of the focal point of the vehicle-mounted camera in the camera coordinate system, and the computer device may directly acquire the focal coordinates of the focal point of the vehicle-mounted camera in the camera coordinate system from the internal parameters of the vehicle-mounted camera. The computer device can determine the current pixel coordinates of each pixel point in the current sample vehicle-mounted image under the current pixel coordinate system. Furthermore, the computer device can determine the plane coordinates of each pixel point in the camera coordinate system according to the focus coordinates and the current pixel coordinates of each pixel point in the current pixel coordinate system.

In one embodiment, the computer device may construct a coordinate transformation matrix for the point cloud based on the focal point coordinates. Furthermore, the computer device can perform coordinate conversion on the current pixel coordinate of each pixel point in the current pixel coordinate system based on the constructed coordinate conversion matrix of the point cloud to obtain the plane coordinate of each pixel point in the camera coordinate system.

In the above embodiment, the focal point coordinate of the focal point of the vehicle-mounted camera in the camera coordinate system is determined, the current pixel coordinate of each pixel point in the current sample vehicle-mounted image in the current pixel coordinate system is determined, and then the plane coordinate of each pixel point in the camera coordinate system can be determined according to the focal point coordinate and the current pixel coordinate of each pixel point in the current pixel coordinate system, so that the current pixel coordinate in the two-dimensional coordinate system can be converted into the plane coordinate in the three-dimensional coordinate system.

In one embodiment, the computer device may construct a pixel matrix based on current pixel coordinates of pixels in the current sample in-vehicle image in the current pixel coordinate system. The computer device may generate a current three-dimensional point cloud of the current sample onboard image based on a product of a coordinate conversion matrix of the point cloud and a pixel matrix, and based on depth information of each pixel point. The pixel matrix is constructed based on coordinate values in the current pixel coordinate of each pixel point in the current sample vehicle-mounted image.

In one embodiment, the current three-dimensional point cloud of the current sample onboard image may be calculated by the following formula:

wherein the content of the first and second substances,

is the three-dimensional coordinate of each point in the current three-dimensional point cloud, d is the depth of each pixel point in the current sample vehicle-mounted image,

and

the focal lengths of the focal points of the vehicle-mounted camera on the X axis and the Y axis of the camera coordinate system respectively,

and the current pixel coordinate of each pixel point in the current sample vehicle-mounted image under the current pixel coordinate system is obtained.

In one embodiment, the current sample onboard image is acquired by an onboard camera; reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information, comprising: determining adjacent three-dimensional point clouds of adjacent sample vehicle-mounted images based on the current three-dimensional point clouds and vehicle pose information; and reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image according to the adjacent three-dimensional point cloud and the internal parameters of the vehicle-mounted camera.

And the adjacent three-dimensional point cloud is the three-dimensional point cloud corresponding to the adjacent sample vehicle-mounted image.

In one embodiment, the computer device may perform an adjustment process on the current three-dimensional point cloud based on the vehicle pose information to generate an adjacent three-dimensional point cloud of adjacent sample onboard images. Further, the computer device may perform coordinate conversion on the adjacent three-dimensional point cloud based on the internal parameters of the onboard camera, and reconstruct a next adjacent onboard image of the current sample onboard image based on the converted coordinates.

In the above embodiment, based on the current three-dimensional point cloud and the vehicle pose information, the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image can be determined, and then according to the adjacent three-dimensional point cloud and the internal parameters of the vehicle-mounted camera, the next adjacent vehicle-mounted image of the current sample vehicle-mounted image can be rapidly reconstructed, so that the depth prediction network can be trained subsequently.

In one embodiment, the vehicle pose information includes an offset distance and a rotation angle that occur from the current position to the next adjacent position of the vehicle; the current position is the position of the vehicle when the vehicle-mounted image of the current sample is acquired; the next adjacent position is the position of the vehicle when the vehicle-mounted image of the next adjacent sample is acquired; determining an adjacent three-dimensional point cloud of an adjacent sample vehicle-mounted image based on the current three-dimensional point cloud and vehicle pose information, comprising: and respectively adjusting each point in the current three-dimensional point cloud according to the offset distance and the rotation angle to obtain the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

The offset distance is a moving distance between the vehicle moving from the current position to the next adjacent position. The rotation angle is an angle from an angle corresponding to the current position of the vehicle to an angle corresponding to the next adjacent position of the vehicle.

In one embodiment, the computer device may shift each point in the current three-dimensional point cloud according to the shift distance, and then rotate each point in the current three-dimensional point cloud according to the rotation angle to obtain an adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

In one embodiment, the computer device may shift each point in the current three-dimensional point cloud according to a shift distance, and then rotate each point in the shifted three-dimensional point cloud according to a rotation angle to obtain an adjacent three-dimensional point cloud of an adjacent sample vehicle-mounted image.

In one embodiment, the computer device may first rotate each point in the current three-dimensional point cloud according to the rotation angle, and then shift each point in the rotated three-dimensional point cloud according to the shift distance to obtain an adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

In the above embodiment, the adjacent three-dimensional point clouds of the adjacent sample vehicle-mounted images can be quickly and accurately obtained by respectively adjusting each point in the current three-dimensional point cloud according to the offset distance and the rotation angle.

In one embodiment, adjusting each point in the current three-dimensional point cloud according to the offset distance and the rotation angle to obtain an adjacent three-dimensional point cloud of an adjacent sample vehicle-mounted image, including: constructing an offset matrix based on the offset distance; constructing a rotation matrix based on the rotation angle; and carrying out offset processing on each point in the current three-dimensional point cloud according to the offset matrix, and carrying out rotation processing on each point in the current three-dimensional point cloud according to the rotation matrix to obtain the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

The offset matrix is a matrix for controlling each point in the current three-dimensional point cloud to move. The rotation matrix is used for controlling each point in the current three-dimensional point cloud to rotate.

In one embodiment, the computer device may construct an offset matrix based on the offset distance and a rotation matrix based on the rotation angle. The computer device may perform coordinate transformation on the three-dimensional coordinates of each point in the current three-dimensional point cloud based on the constructed offset matrix, and perform coordinate transformation on the three-dimensional coordinates of each point in the current three-dimensional point cloud based on the constructed rotation matrix, thereby obtaining an adjacent three-dimensional point cloud of an adjacent sample vehicle-mounted image.

In one embodiment, the computer device may construct a first point cloud matrix based on a current three-dimensional point cloud of a current sample in-vehicle image, and calculate an offset three-dimensional point cloud based on a product of the offset matrix and the constructed first point cloud matrix. The first point cloud matrix is a matrix constructed based on coordinate values of all points in the current three-dimensional point cloud and supplemented dimension coordinate values.

In one embodiment, the shifted three-dimensional point cloud can be computed by the following formula:

wherein the content of the first and second substances,

is the three-dimensional coordinates of each point in the three-dimensional point cloud after the offset,

the offset of each point in the current three-dimensional point cloud on the X axis, the Y axis and the Z axis is respectively.

In one embodiment, the computer device may construct a second point cloud matrix based on a current three-dimensional point cloud of a current sample onboard image and calculate a rotated three-dimensional point cloud based on a product of each rotation matrix and the constructed second point cloud matrix. And the second point cloud matrix is a matrix directly constructed based on the coordinate values of all points in the current three-dimensional point cloud.

In one embodiment, the rotated three-dimensional point cloud can be computed by the following formula:

wherein the content of the first and second substances,

is the three-dimensional coordinates of each point in the rotated three-dimensional point cloud,

the rotation matrixes of each point in the current three-dimensional point cloud aiming at the X axis, the Y axis and the Z axis are respectively.

Respectively as follows:

wherein the content of the first and second substances,

respectively, each point in the current three-dimensional point cloud is at XRotation angles of the axis, Y-axis and Z-axis.

In one embodiment, the computer device may generate an adjacent three-dimensional point cloud of adjacent sample onboard images based on the offset three-dimensional point cloud and the rotated three-dimensional point cloud.

In the above embodiment, an offset matrix may be constructed based on the offset distance, a rotation matrix may be constructed based on the rotation angle, and then each point in the current three-dimensional point cloud is offset according to the offset matrix, and each point in the current three-dimensional point cloud is rotated according to the rotation matrix, so that the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image can be obtained quickly and accurately.

In one embodiment, the internal parameters include focus coordinates of a focus of the onboard camera in a camera coordinate system; reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image according to the adjacent three-dimensional point cloud and the internal parameters of the vehicle-mounted camera, wherein the method comprises the following steps: constructing a reconstruction matrix based on the focus coordinates; converting the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud into two-dimensional coordinates based on the reconstruction matrix; and constructing the next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the two-dimensional coordinates of each point.

And the reconstruction matrix is used for reconstructing a matrix of a next adjacent vehicle-mounted image of the current sample vehicle-mounted image.

In one embodiment, the computer device may construct a reconstruction matrix based on the focus coordinates and convert the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud to two-dimensional coordinates based on the reconstruction matrix. Thus, the computer device can construct a next adjacent onboard image of the current sample onboard image based on the two-dimensional coordinates of each point. It can be understood that after the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud are converted into two-dimensional coordinates, each point in the adjacent three-dimensional point cloud is converted into each pixel point in the next adjacent vehicle-mounted image of the current sample vehicle-mounted image.

In one embodiment, the computer device may construct a matrix of target point clouds based on neighboring three-dimensional point clouds of neighboring sample in-vehicle images. The computer device may generate a next adjacent onboard image of the current sample onboard image based on a product of the reconstruction matrix and the target point cloud matrix. The target point cloud matrix is a matrix constructed based on coordinate values of all points in adjacent three-dimensional point clouds and supplemented dimensional coordinate values.

In one embodiment, the coordinates of each pixel point in the adjacent vehicle-mounted images can be calculated by the following formula:

wherein the content of the first and second substances,

the coordinates of each pixel point in the adjacent vehicle-mounted image,

three-dimensional coordinates of each point in the adjacent three-dimensional point cloud.

In the above embodiment, a reconstruction matrix may be constructed based on the focus coordinates, and based on the reconstruction matrix, the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud may be converted into two-dimensional coordinates, and then based on the two-dimensional coordinates of each point, the next adjacent vehicle-mounted image of the current sample vehicle-mounted image may be quickly and accurately constructed.

In one embodiment, the sample onboard image comprises a sample onboard monocular image; acquiring a sequence of images, comprising: collecting a plurality of sample vehicle-mounted monocular images through a vehicle-mounted monocular camera; and generating an image sequence according to the sequence of the acquisition time corresponding to the plurality of sample vehicle-mounted monocular images respectively.

The vehicle-mounted monocular image of the sample is a vehicle-mounted image of the sample acquired based on the vehicle-mounted monocular camera, and can be understood that the binocular image can be acquired by the binocular camera every time, the binocular image is a group of image pairs and comprises two frames of images, the vehicle-mounted monocular camera can acquire one frame of vehicle-mounted image of the sample every time, namely the vehicle-mounted monocular image of the sample is a single frame of image.

In one embodiment, an onboard monocular camera may be deployed in the vehicle, and the computer device may capture a plurality of sample onboard monocular images via the onboard monocular camera. The computer equipment can generate an image sequence according to the sequence of the acquisition time corresponding to the plurality of sample vehicle-mounted monocular images respectively.

In the above embodiment, the vehicle-mounted monocular images of the plurality of samples are acquired by the vehicle-mounted monocular camera with lower cost, and the image sequence can be generated according to the sequence of the acquisition time corresponding to each of the plurality of sample vehicle-mounted monocular images, so that the training cost of the depth prediction network is further reduced.

In one embodiment, the method further comprises: carrying out road element detection on the target vehicle-mounted image so as to identify road elements from the target vehicle-mounted image; predicting the depth information of each pixel point in the road elements based on the trained depth prediction network; and generating navigation information of the target road element based on the depth information of each pixel point in the road element.

The road element is an entity object existing in a road, such as a road, a bridge, a road sign, a tunnel entrance, and the like. The target road element is a road element as a target. The navigation information of the target road element is information used for describing the target road element in the navigation scene, for example, the target road element is a tunnel, and the navigation information of the target road element may include "enter the tunnel 50 meters ahead", and the like.

In one embodiment, the computer device may perform feature extraction on the target onboard image and perform road element detection based on the extracted features to identify road elements from the target onboard image. The computer device may predict depth information for each pixel point in the road element based on the trained depth prediction network. Further, the computer device may generate navigation information for the target road element based on the depth information for each pixel point in the road element.

In the above embodiment, the road element detection is performed on the target vehicle-mounted image, the road element can be identified from the target vehicle-mounted image, the depth information of each pixel point in the road element can be accurately predicted based on the trained depth prediction network, and the navigation information of the target road element can be generated based on the depth information of each pixel point in the road element, so that the navigation accuracy is improved.

In one embodiment, the target vehicle-mounted image is acquired by a vehicle-mounted camera arranged on the current vehicle; the method further comprises the following steps: carrying out vehicle detection on the target vehicle-mounted image so as to identify an image area corresponding to a front vehicle from the target vehicle-mounted image; acquiring depth information of each pixel point in an image area; the depth information of each pixel point in the image area is obtained by the prediction of the trained depth prediction network; determining the relative distance between the current vehicle and the front vehicle based on the depth information of each pixel point in the image area; and carrying out collision early warning based on the relative distance.

Specifically, the computer device may perform feature extraction on the target onboard image and perform vehicle detection based on the extracted features to identify an image area corresponding to the preceding vehicle from the target onboard image. And the computer equipment can predict the depth information of each pixel point in the image area through the trained depth prediction network. Further, the computer device may determine a relative distance between the current vehicle and the preceding vehicle based on the depth information of each pixel point in the image region. The computer device may generate collision warning information based on the relative distance and perform collision warning based on the generated collision warning information.

In one embodiment, performing collision warning based on relative distance includes: determining the relative speed of the current vehicle relative to the front vehicle, determining the relative time for the current vehicle to catch up with the front vehicle based on the relative distance and the relative speed, generating collision early warning information when the relative time is less than the preset safe time, and performing collision early warning based on the generated collision early warning information.

In one embodiment, as shown in fig. 5, the current vehicle a travels on the road in the same direction of travel as its preceding vehicle B. The computer equipment can predict the depth information of each pixel point in the image area corresponding to the front vehicle B through the trained depth prediction network. Furthermore, the computer device may determine a relative distance between the current vehicle a and the preceding vehicle B based on the depth information of each pixel point in the image area, and at the same time, the computer device may determine a relative speed of the current vehicle a with respect to the preceding vehicle B, determine a relative time at which the current vehicle a catches up with the preceding vehicle B based on the relative distance and the relative speed, generate collision warning information when the relative time is less than a preset safe time, and transmit the collision warning information to the current vehicle a to perform collision warning.

In the above embodiment, vehicle detection is performed on the target vehicle-mounted image, an image area corresponding to the front vehicle can be identified from the target vehicle-mounted image, the depth information of each pixel point in the image area can be accurately predicted through the trained depth prediction network, and then, based on the depth information of each pixel point in the image area, the relative distance between the current vehicle and the front vehicle can be determined, collision early warning is performed based on the relative distance, and driving safety is improved.

In one embodiment, the method further comprises: carrying out road element detection on the target vehicle-mounted image so as to identify road elements from the target vehicle-mounted image; acquiring depth information of each pixel point in a road element; the depth information of each pixel point in the road element is obtained by the prediction of the trained depth prediction network; and writing the target road element into the vehicle-mounted navigation map based on the depth information of each pixel point in the road element.

The vehicle-mounted navigation map is a map which is arranged in a vehicle and is used for vehicle navigation.

In one embodiment, the computer device may perform feature extraction on the target onboard image and perform road element detection based on the extracted features to identify road elements from the target onboard image. The computer device may predict depth information for each pixel point in the road element based on the trained depth prediction network. Furthermore, the computer device can write the target road element into the vehicle-mounted navigation map based on the depth information of each pixel point in the road element so as to update the vehicle-mounted navigation map.

In the above embodiment, the road element detection is performed on the target vehicle-mounted image, the road element can be identified from the target vehicle-mounted image, and the depth information of each pixel point in the road element can be accurately predicted through the trained depth prediction network, so that the target road element can be written into the vehicle-mounted navigation map based on the depth information of each pixel point in the road element, and the vehicle-mounted navigation map can be accurately updated.

In one embodiment, as shown in fig. 6, the computer device may obtain, through a vehicle-mounted camera disposed on the vehicle, an image sequence including a plurality of sequentially acquired sample vehicle-mounted images, respectively take each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, and input the current sample vehicle-mounted image into a depth prediction network to be trained, so as to predict depth information of each pixel point in the current sample vehicle-mounted image. The computer device may generate a current three-dimensional point cloud of a current sample in-vehicle image based on the depth information. The computer device may acquire vehicle pose information determined based on the current sample vehicle-mounted image and a next adjacent sample vehicle-mounted image from a vehicle-mounted pose sensor provided to the vehicle, and reconstruct a next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information. The computer device can perform iterative training on the depth prediction network through back propagation based on the difference between the adjacent sample vehicle-mounted image and the reconstructed adjacent vehicle-mounted image until the training is completed when the iteration stop condition is met, so as to obtain the trained depth prediction network. And the computer equipment can predict the pixel depth of the target vehicle-mounted image through the trained depth prediction network.

As shown in fig. 7, in an embodiment, an image processing method is provided, which specifically includes the following steps:

step 702, acquiring an image sequence; the image sequence comprises a plurality of sample vehicle-mounted images which are collected in sequence; the current sample vehicle-mounted image is acquired by a vehicle-mounted camera.

Step 704, taking each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, inputting the current sample vehicle-mounted image into a depth prediction network to be trained, and predicting to obtain depth information of each pixel point in the current sample vehicle-mounted image; the depth information comprises depth coordinates of all pixel points in the current sample vehicle-mounted image under a camera coordinate system; the camera coordinate system is a coordinate system established by taking the optical center of the vehicle-mounted camera as an origin.

And step 706, determining the focal point coordinate of the focal point of the vehicle-mounted camera in the camera coordinate system.

Step 708, determining the current pixel coordinate of each pixel point in the current sample vehicle-mounted image under the current pixel coordinate system; and the current pixel coordinate system is a pixel coordinate system established based on the current sample vehicle-mounted image.

And step 710, determining the plane coordinates of each pixel point in the camera coordinate system according to the focus coordinates and the current pixel coordinates of each pixel point in the current pixel coordinate system.

And 712, generating a current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth coordinate and the plane coordinate of each pixel point.

Step 714, obtaining vehicle pose information determined based on the current sample vehicle-mounted image and the next adjacent sample vehicle-mounted image from the vehicle-mounted pose sensor; the vehicle pose information includes an offset distance and a rotation angle of the vehicle from the current position to the next adjacent position.

And step 716, constructing an offset matrix based on the offset distance, and constructing a rotation matrix based on the rotation angle.

And 718, performing offset processing on each point in the current three-dimensional point cloud according to the offset matrix, and performing rotation processing on each point in the current three-dimensional point cloud according to the rotation matrix to obtain the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

And 720, constructing a reconstruction matrix based on the focus coordinate of the focus of the vehicle-mounted camera in the camera coordinate system, and converting the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud into two-dimensional coordinates based on the reconstruction matrix.

And step 722, constructing the next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the two-dimensional coordinates of each point.

Step 724, training a depth prediction network based on the difference between the adjacent sample vehicle-mounted image and the reconstructed adjacent vehicle-mounted image; and the trained depth prediction network is used for predicting the pixel depth of the target vehicle-mounted image.

In one embodiment, road element detection is performed on the target onboard image to identify road elements from the target onboard image; predicting the depth information of each pixel point in the road elements based on the trained depth prediction network; and generating navigation information of the target road element based on the depth information of each pixel point in the road element.

In one embodiment, the target vehicle-mounted image is acquired by a vehicle-mounted camera arranged on the current vehicle; carrying out vehicle detection on the target vehicle-mounted image so as to identify an image area corresponding to a front vehicle from the target vehicle-mounted image; acquiring depth information of each pixel point in an image area; the depth information of each pixel point in the image area is obtained by the prediction of the trained depth prediction network; determining the relative distance between the current vehicle and the front vehicle based on the depth information of each pixel point in the image area; and carrying out collision early warning based on the relative distance.

In one embodiment, road element detection is performed on the target onboard image to identify road elements from the target onboard image; acquiring depth information of each pixel point in a road element; the depth information of each pixel point in the road element is obtained by the prediction of the trained depth prediction network; and writing the target road element into the vehicle-mounted navigation map based on the depth information of each pixel point in the road element.

The application also provides an application scene, and the application scene applies the image processing method. In particular, the image processing method can be applied to a scene of vehicle-mounted monocular image processing. A computer device may acquire a sequence of images; the image sequence comprises a plurality of sample vehicle-mounted monocular images which are collected in sequence; the current sample vehicle-mounted monocular image is acquired by a vehicle-mounted monocular camera. Respectively taking each sample vehicle-mounted monocular image in the image sequence as a current sample vehicle-mounted monocular image, inputting the current sample vehicle-mounted monocular image into a depth prediction network to be trained, and predicting to obtain depth information of each pixel point in the current sample vehicle-mounted monocular image; the depth information comprises depth coordinates of all pixel points in the current sample vehicle-mounted monocular image under a camera coordinate system; the camera coordinate system is a coordinate system established by taking the optical center of the vehicle-mounted monocular camera as an origin. And determining the focal point coordinate of the focal point of the vehicle-mounted monocular camera in the camera coordinate system. Determining the current pixel coordinate of each pixel point in the current sample vehicle-mounted monocular image under the current pixel coordinate system; and the current pixel coordinate system is a pixel coordinate system established based on the current sample vehicle-mounted monocular image. And determining the plane coordinates of each pixel point in the camera coordinate system according to the focus coordinates and the current pixel coordinates of each pixel point in the current pixel coordinate system. And generating the current three-dimensional point cloud of the current sample vehicle-mounted monocular image based on the depth coordinate and the plane coordinate of each pixel point.

The computer device may acquire vehicle pose information determined based on the current sample vehicle monocular image and the next adjacent sample vehicle monocular image from the vehicle pose sensor; the vehicle pose information includes an offset distance and a rotation angle of the vehicle from the current position to the next adjacent position. And constructing an offset matrix based on the offset distance, and constructing a rotation matrix based on the rotation angle. And carrying out offset processing on each point in the current three-dimensional point cloud according to the offset matrix, and carrying out rotation processing on each point in the current three-dimensional point cloud according to the rotation matrix to obtain the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted monocular image.

The computer device can construct a reconstruction matrix based on the focal point coordinates of the focal point of the vehicle-mounted monocular camera in the camera coordinate system, and convert the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud into two-dimensional coordinates based on the reconstruction matrix. And constructing the next adjacent vehicle-mounted monocular image of the current sample vehicle-mounted monocular image based on the two-dimensional coordinates of each point. Training a depth prediction network based on the difference between the adjacent sample vehicle-mounted monocular images and the reconstructed adjacent vehicle-mounted monocular images; and the trained depth prediction network is used for predicting the pixel depth of the target vehicle-mounted monocular image.

The application further provides an application scenario, and the application scenario applies the image processing method. Specifically, the image processing method can be applied to a scene of vehicle-mounted binocular image processing. It can be understood that the scene of the vehicle-mounted binocular image processing of the present application is only used for providing more sample vehicle-mounted images, that is, richer training data are provided for training the depth prediction network, and the depth prediction is not realized based on the image difference between the binocular images as in the conventional method.

It should be understood that, although the steps in the flowcharts of the above embodiments are shown in sequence, the steps are not necessarily executed in sequence. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the sub-steps or the stages of other steps.

In one embodiment, as shown in fig. 8, an image processing apparatus 800 is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes:

an obtaining module 801, configured to obtain an image sequence; the image sequence comprises a plurality of sequentially acquired sample vehicle-mounted images.

The prediction module 802 is configured to use each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, input the current sample vehicle-mounted image to a depth prediction network to be trained, and predict depth information of each pixel point in the current sample vehicle-mounted image.

A generating module 803, configured to generate a current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth information;

the obtaining module 801 is further configured to obtain vehicle pose information determined based on the current sample vehicle-mounted image and the next adjacent sample vehicle-mounted image from the vehicle-mounted pose sensor.

And the reconstruction module 804 is used for reconstructing the next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information.

A training module 805 configured to train the depth prediction network based on a difference between the adjacent sample in-vehicle image and the reconstructed adjacent in-vehicle image; and the trained depth prediction network is used for predicting the pixel depth of the target vehicle-mounted image.

In one embodiment, the current sample onboard image is acquired by an onboard camera; the depth information comprises depth coordinates of all pixel points in the current sample vehicle-mounted image under a camera coordinate system; the generating module 803 is further configured to obtain a plane coordinate of each pixel point in the current sample vehicle-mounted image in the camera coordinate system; the camera coordinate system is a coordinate system established by taking the optical center of the vehicle-mounted camera as an original point; and generating the current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth coordinate and the plane coordinate of each pixel point.

In one embodiment, the generation module 803 is further configured to determine focus coordinates of the focus of the onboard camera in a camera coordinate system; determining the current pixel coordinate of each pixel point in the current sample vehicle-mounted image under the current pixel coordinate system; the current pixel coordinate system is a pixel coordinate system established based on the current sample vehicle-mounted image; and determining the plane coordinates of each pixel point in the camera coordinate system according to the focus coordinates and the current pixel coordinates of each pixel point in the current pixel coordinate system.

In one embodiment, the current sample onboard image is acquired by an onboard camera; the reconstruction module 804 is further configured to determine an adjacent three-dimensional point cloud of an adjacent sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information; and reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image according to the adjacent three-dimensional point cloud and the internal parameters of the vehicle-mounted camera.

In one embodiment, the vehicle pose information includes an offset distance and a rotation angle that occur from the current position to the next adjacent position of the vehicle; the current position is the position of the vehicle when the vehicle-mounted image of the current sample is acquired; the next adjacent position is the position of the vehicle when the vehicle-mounted image of the next adjacent sample is acquired; the reconstruction module 804 is further configured to adjust each point in the current three-dimensional point cloud according to the offset distance and the rotation angle, so as to obtain an adjacent three-dimensional point cloud of an adjacent sample vehicle-mounted image.

In one embodiment, the reconstruction module 804 is further configured to construct an offset matrix based on the offset distance; constructing a rotation matrix based on the rotation angle; and carrying out offset processing on each point in the current three-dimensional point cloud according to the offset matrix, and carrying out rotation processing on each point in the current three-dimensional point cloud according to the rotation matrix to obtain the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

In one embodiment, the internal parameters include focus coordinates of a focus of the onboard camera in a camera coordinate system; the reconstruction module 804 is further configured to construct a reconstruction matrix based on the focus coordinates; converting the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud into two-dimensional coordinates based on the reconstruction matrix; and constructing the next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the two-dimensional coordinates of each point.

In one embodiment, the sample onboard image comprises a sample onboard monocular image; the obtaining module 801 is further configured to collect a plurality of sample vehicle-mounted monocular images through a vehicle-mounted monocular camera; and generating an image sequence according to the sequence of the acquisition time corresponding to the plurality of sample vehicle-mounted monocular images respectively.

In one embodiment, the apparatus further comprises:

a navigation module 806, configured to perform road element detection on the target vehicle-mounted image to identify a road element from the target vehicle-mounted image; predicting the depth information of each pixel point in the road elements based on the trained depth prediction network; and generating navigation information of the target road element based on the depth information of each pixel point in the road element.

In one embodiment, the target vehicle-mounted image is acquired by a vehicle-mounted camera arranged on the current vehicle; the device still includes:

the early warning module 807 is used for performing vehicle detection on the target vehicle-mounted image so as to identify an image area corresponding to a front vehicle from the target vehicle-mounted image; acquiring depth information of each pixel point in an image area; the depth information of each pixel point in the image area is obtained by the prediction of the trained depth prediction network; determining the relative distance between the current vehicle and the front vehicle based on the depth information of each pixel point in the image area; and carrying out collision early warning based on the relative distance.

In one embodiment, the apparatus further comprises:

a writing module 808, configured to perform road element detection on the target vehicle-mounted image to identify a road element from the target vehicle-mounted image; acquiring depth information of each pixel point in a road element; the depth information of each pixel point in the road element is obtained by the prediction of the trained depth prediction network; and writing the target road element into the vehicle-mounted navigation map based on the depth information of each pixel point in the road element.

Referring to fig. 9, in one embodiment, the image processing apparatus 800 further comprises a navigation module 806, an early warning module 807, and a writing module 808.

The image processing device acquires an image sequence comprising a plurality of sequentially acquired sample vehicle-mounted images, takes each sample vehicle-mounted image in the image sequence as a current sample vehicle-mounted image, inputs the current sample vehicle-mounted image into a depth prediction network to be trained, and can predict depth information of each pixel point in the current sample vehicle-mounted image. Based on the depth information, a current three-dimensional point cloud of a current sample vehicle-mounted image can be generated, and vehicle pose information determined based on the current sample vehicle-mounted image and a next adjacent sample vehicle-mounted image can be directly acquired from the vehicle pose sensor. Based on the current three-dimensional point cloud and the vehicle pose information, the next adjacent vehicle-mounted image of the current sample vehicle-mounted image can be reconstructed. Based on the difference between the adjacent sample vehicle-mounted images and the reconstructed adjacent vehicle-mounted images, the depth prediction network can be directly subjected to unsupervised training. Compared with the traditional supervised training mode, the method and the device have the advantages that the adjacent vehicle-mounted images corresponding to the adjacent sample vehicle-mounted images can be reconstructed directly through the current sample vehicle-mounted images in the image sequence, the depth information of the current sample vehicle-mounted images and the vehicle pose information determined based on the current sample vehicle-mounted images and the next adjacent sample vehicle-mounted images, the unsupervised training of the depth prediction network can be directly realized based on the difference between the adjacent sample vehicle-mounted images and the adjacent vehicle-mounted images, the image depth information does not need to be marked in advance, and the training cost of the depth prediction network is greatly reduced.

For specific limitations of the image processing apparatus, reference may be made to the above limitations of the image processing method, which are not described herein again. The respective modules in the image processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image processing method.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method of claim 1, wherein the current sample onboard image is acquired by an onboard camera; the depth information comprises depth coordinates of all pixel points in the current sample vehicle-mounted image under a camera coordinate system; the generating a current three-dimensional point cloud of the current sample onboard image based on the depth information comprises:

acquiring plane coordinates of each pixel point in the current sample vehicle-mounted image under a camera coordinate system; the camera coordinate system is a coordinate system established by taking the optical center of the vehicle-mounted camera as an origin;

and generating the current three-dimensional point cloud of the current sample vehicle-mounted image based on the depth coordinate and the plane coordinate of each pixel point.

3. The method of claim 2, wherein the obtaining of the plane coordinates of each pixel point in the current sample on-board image in the camera coordinate system comprises:

determining a focus coordinate of a focus of the vehicle-mounted camera in the camera coordinate system;

determining the current pixel coordinate of each pixel point in the current sample vehicle-mounted image under the current pixel coordinate system; the current pixel coordinate system is a pixel coordinate system established based on the current sample vehicle-mounted image;

and determining the plane coordinates of each pixel point in a camera coordinate system according to the focus coordinates and the current pixel coordinates of each pixel point in the current pixel coordinate system.

4. The method of claim 1, wherein the current sample onboard image is acquired by an onboard camera; reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information, comprising:

determining an adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information;

and reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image according to the adjacent three-dimensional point cloud and the internal parameters of the vehicle-mounted camera.

5. The method according to claim 4, wherein the vehicle pose information includes an offset distance and a rotation angle of the vehicle from a current position to a next adjacent position; the current position is the position of the vehicle when the vehicle-mounted image of the current sample is acquired; the next adjacent position is the position of the vehicle when the vehicle-mounted image of the next adjacent sample is acquired;

the determining of the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image based on the current three-dimensional point cloud and the vehicle pose information comprises:

and respectively adjusting each point in the current three-dimensional point cloud according to the offset distance and the rotation angle to obtain the adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image.

6. The method according to claim 5, wherein the adjusting each point in the current three-dimensional point cloud according to the offset distance and the rotation angle to obtain an adjacent three-dimensional point cloud of the adjacent sample vehicle-mounted image comprises:

constructing an offset matrix based on the offset distance; constructing a rotation matrix based on the rotation angle;

and carrying out offset processing on each point in the current three-dimensional point cloud according to the offset matrix, and carrying out rotation processing on each point in the current three-dimensional point cloud according to the rotation matrix to obtain adjacent three-dimensional point clouds of the adjacent sample vehicle-mounted images.

7. The method of claim 4, wherein the internal parameters include focus coordinates of a focus of the onboard camera in a camera coordinate system;

reconstructing a next adjacent vehicle-mounted image of the current sample vehicle-mounted image according to the adjacent three-dimensional point cloud and the internal parameters of the vehicle-mounted camera, including:

constructing a reconstruction matrix based on the focus coordinates;

converting the three-dimensional coordinates of each point in the adjacent three-dimensional point cloud into two-dimensional coordinates based on the reconstruction matrix;

and constructing the next adjacent vehicle-mounted image of the current sample vehicle-mounted image based on the two-dimensional coordinates of each point.

8. The method of claim 1, wherein the sample onboard image comprises a sample onboard monocular image; the acquiring of the sequence of images comprises:

collecting a plurality of sample vehicle-mounted monocular images through a vehicle-mounted monocular camera;

and generating the image sequence according to the sequence of the acquisition time corresponding to the plurality of sample vehicle-mounted monocular images respectively.

9. The method according to any one of claims 1 to 8, further comprising:

performing road element detection on a target vehicle-mounted image to identify road elements from the target vehicle-mounted image;

predicting the depth information of each pixel point in the road element based on the trained depth prediction network;

and generating navigation information of the road element based on the depth information of each pixel point in the road element.

10. The method according to any one of claims 1 to 8, wherein the target onboard image is acquired by an onboard camera provided on a current vehicle; the method further comprises the following steps:

carrying out vehicle detection on the target vehicle-mounted image so as to identify an image area corresponding to a front vehicle from the target vehicle-mounted image;

acquiring depth information of each pixel point in the image area; the depth information of each pixel point in the image area is obtained by the depth prediction network prediction after the training is finished;

determining a relative distance between the current vehicle and the front vehicle based on depth information of each pixel point in the image area;

and carrying out collision early warning based on the relative distance.

11. The method according to any one of claims 1 to 8, further comprising:

performing road element detection on the target vehicle-mounted image to identify road elements from the target vehicle-mounted image;

acquiring depth information of each pixel point in the road element; the depth information of each pixel point in the road element is obtained by the depth prediction network prediction after the training is finished;

and writing the road elements into a vehicle navigation map based on the depth information of each pixel point in the road elements.

12. An image processing apparatus, characterized in that the apparatus comprises:

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 11 when executing the computer program.

14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.