CN116704472B

CN116704472B - Image processing method, device, apparatus, medium, and program product

Info

Publication number: CN116704472B
Application number: CN202310546183.6A
Authority: CN
Inventors: 冷汉超; 俞昆
Original assignee: Xiaomi Automobile Technology Co Ltd
Current assignee: Xiaomi Automobile Technology Co Ltd
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2024-04-02
Anticipated expiration: 2043-05-15
Also published as: CN116704472A

Abstract

The present disclosure provides an image processing method, apparatus, device, medium, and program product. The present disclosure relates to the field of autopilot technology, and in particular, to an image processing method, apparatus, device, medium, and program product. In some embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.

Description

Image processing method, device, apparatus, medium, and program product

Technical Field

The present disclosure relates to the field of autopilot technology, and in particular, to an image processing method, apparatus, device, medium, and program product.

Background

In recent years, the technology of automatic driving has rapidly developed, and an important research direction in this field is to accurately and comprehensively perceive the three-dimensional environment around an automatic driving vehicle.

Currently, an object detection method is adopted to identify a road environment image, and only specific objects around a vehicle, such as an automobile, a bicycle, a pedestrian and the like, can be detected; the recognition accuracy of the road environment image is low.

Disclosure of Invention

The disclosure provides an image processing method, an image processing device, an image processing medium and a program product, which are used for at least solving the problem that the existing road environment image recognition precision is low.

The technical scheme of the present disclosure is as follows:

an embodiment of the present disclosure provides an image processing method, applied to a vehicle, including:

acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;

inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.

Optionally, after inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object, and semantic information of the object, the method further includes:

generating a navigation image according to the three-dimensional boundary of the object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object;

and performing navigation operation according to the navigation image.

Optionally, the inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object and semantic information of the object, including:

carrying out picture feature extraction on the three-dimensional road environment image in an object feature recognition network to obtain a picture feature map;

performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map;

performing feature fusion processing on the overlook feature map to obtain fused target detection features and occupied grid detection features;

Object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and

and detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object.

Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the extracting the picture feature of the three-dimensional road environment image to obtain a picture feature map includes:

and inputting the three-dimensional road environment image into the picture feature extraction sub-network to obtain the picture feature map.

Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the top view feature extraction is performed on the image feature map, the camera parameter and the image depth value to obtain a top view feature map, including:

and inputting the picture feature map, the camera parameters and the image depth values into the top view feature extraction sub-network to obtain the top view feature map.

Optionally, the object feature recognition network includes a task stage feature extraction sub-network, where the task stage feature extraction sub-network includes a first codec module and a second codec module, and performs feature fusion processing on the top view feature map to obtain a fused target detection feature and an occupied grid detection feature, and the method includes:

Inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain the occupied grid detection characteristics;

and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features.

Optionally, the object feature recognition network includes a target detector, and the object detection is performed according to the target detection feature, so as to obtain a three-dimensional boundary of the object and a type of the object, including:

inputting the target detection characteristic into the target detector to obtain the three-dimensional boundary of the object and the type of the object.

Optionally, the object feature recognition network includes an occupied mesh detector, and the detecting occupied meshes according to the occupied mesh detection features to obtain semantic information of the object includes:

inputting the occupied grid detection characteristics into the occupied grid detector to obtain semantic information of the object.

The embodiment of the disclosure also provides an image processing apparatus, including:

The acquisition module is used for acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;

the feature recognition module inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.

Optionally, after the three-dimensional road environment image and the camera parameters are input into an object feature recognition network, the feature recognition module is further configured to:

and performing navigation operation according to the navigation image.

Optionally, the feature recognition module is configured to, when inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object, and semantic information of the object:

Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the feature recognition module is configured to, when performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map:

and inputting the three-dimensional road environment image into the picture feature extraction sub-network to obtain the picture feature map. Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the feature recognition module is configured to, when performing top view feature extraction on the image feature map, the camera parameter, and the image depth value to obtain a top view feature map:

Optionally, the object feature recognition network includes a task stage feature extraction sub-network, where the task stage feature extraction sub-network includes a first codec module and a second codec module, and when the feature recognition module performs feature fusion processing on the top view feature map, the feature recognition module is configured to:

Optionally, the object feature recognition network includes a target detector, and the feature recognition module is configured to, when performing object detection according to the target detection feature, obtain a three-dimensional boundary of the object and a type of the object:

Optionally, the object feature recognition network includes an occupied mesh detector, and the feature recognition module is configured to, when performing occupied mesh detection according to the occupied mesh detection feature, obtain semantic information of the object:

The disclosed embodiments also provide a vehicle including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the steps in the method described above.

The embodiment of the disclosure also provides an electronic device, including:

a processor;

a memory for storing the processor-executable instructions;

The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the above-mentioned method.

The disclosed embodiments also provide a computer program product comprising a computer program/instruction which, when executed by a processor, implements the above-described method.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in some embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a flowchart of an image processing method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a network block diagram of an object feature identification network provided in an exemplary embodiment of the present disclosure;

FIG. 3 is a flow chart of another image processing method according to an exemplary embodiment of the present disclosure;

fig. 4 is a schematic structural view of an image processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, the user information related to the present disclosure includes, but is not limited to: user equipment information and user personal information; the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the user information in the present disclosure all conform to the regulations of the relevant laws and regulations and do not violate the well-known and popular public order.

At present, an object detection method is adopted to identify an image of a road environment, and only specific objects around a vehicle, such as an automobile, a bicycle, a pedestrian and the like, can be detected, but in a real automatic driving scene, other physical structures in the environment are also important to the perception of a three-dimensional scene.

For example: roads, sidewalks and vegetation play an important role in the overall understanding of a scene. Existing object detection methods do not contain such additional information, which may limit the performance of the autopilot system.

In view of the above-mentioned technical problems, in some embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.

The following describes in detail the technical solutions provided by the embodiments of the present disclosure with reference to the accompanying drawings.

Fig. 1 is a flowchart of an image processing method according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the method includes:

s101: acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;

s102: inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.

In this embodiment, the execution subject of the method is an automatic driving vehicle, and the type of the automatic driving vehicle is not limited in the embodiment of the present disclosure, and may be classified according to the level of autonomy and application scenario implemented. Based on autonomy. Fully automatic driving of the vehicle: a vehicle that can perform all driving tasks without human intervention; partially autonomous vehicles: vehicles that require a human driver to take over vehicle control when necessary; driving assisting vehicle: only the assistance of partial driving tasks is provided, and the vehicle can normally run under the operation of a human driver. Second, based on usage scenario. Public road autopilot vehicles: an autonomous vehicle traveling on a public road has high safety and reliability; special-purpose scenes automatically drive vehicles: autonomous vehicles designed for specific scenarios or tasks, such as agricultural operations, transportation logistics, and the like. Thirdly, realizing based on technology. Perceived as a guided type: acquiring surrounding environment information by using a sensor, and performing autonomous driving by selecting an optimal path; the operation is of the guiding type: autonomous driving of the vehicle is controlled by tracking the navigation map and planning the path. Fourth, based on the vehicle type. Autonomous driving passenger car: the method can be used for private, business, taxi and other car application scenes. Autonomous driving of a bus: in the public transportation field, the cost of manual drivers is reduced as a main application target. Autonomous driving truck: automated transportation is becoming an important trend in the logistics industry.

It should be noted that three-dimensional object detection focuses on identifying boundaries of objects and types of objects, and three-dimensional semantic occupied grid prediction enables a deeper understanding of a three-dimensional road environment by attaching semantic information to all objects. The three-dimensional object detection and the three-dimensional semantic occupation grid prediction can be combined to obtain finer geometric information and richer semantic information, and the recognition accuracy of the road environment image is greatly improved.

In the embodiment, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.

The three-dimensional road environment image refers to an image that can represent an object such as a road or a building in a real scene in three dimensions. Typically, these images are data acquired by a laser radar, camera, gps, etc., and generated by computer processing. Such images may be used in the areas of context awareness, virtual reality and game making, city planning and design of autopilot cars. The three-dimensional road environment image is collected through the camera.

Camera parameters, including camera intrinsic and camera extrinsic. Wherein the camera internal parameters comprise focal length, principal point coordinates and distortion parameters; the camera internal parameters function to convert coordinates from the camera coordinate system into the pixel coordinate system. The camera external parameters comprise: translation and rotation matrices that convert coordinates from the vehicle coordinate system to the camera coordinate system; the camera external parameters function to convert coordinates from the vehicle coordinate system into the camera coordinate system.

In one application scenario, a vehicle acquires a three-dimensional road environment image around the vehicle through an image acquisition device installed on the vehicle. Inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object; generating a navigation image according to the three-dimensional boundary of the object, the type of the object and the semantic information of the object contained in the three-dimensional road environment image, and displaying the navigation image on a central control screen of the vehicle so as to enable a user to check the navigation image and perform control operation of the vehicle; and performing navigation operation according to the navigation image.

In some embodiments of the present disclosure, a three-dimensional road environment image is acquired. One way this can be achieved is to acquire a three-dimensional image of the road environment around the vehicle by means of an image acquisition device mounted on the vehicle. The type of the image acquisition device is not limited, and can be adjusted according to actual conditions. The image capture device may be a FOV camera mounted to the side or top of the vehicle, or a 360 degree wide angle camera mounted to the top of the vehicle. The number of FOV cameras mounted on the vehicle is not limited, and may be 6, 8, 12, or the like.

In some embodiments of the present disclosure, the image depth performs data acquisition on the three-dimensional road environment around the vehicle simultaneously by the radar and the image acquisition device mounted on the vehicle to acquire the image depth corresponding to the three-dimensional road environment image.

In some embodiments of the present disclosure, a three-dimensional road environment image and camera parameters are input into an object feature recognition network, resulting in three-dimensional boundaries of objects contained in the three-dimensional road environment image, types of objects, and semantic information of the objects. One implementation method is that in the object feature recognition network, picture feature extraction is carried out on a three-dimensional road environment image to obtain a picture feature map; performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map; performing feature fusion processing on the overlook feature map to obtain fused target detection features and occupied grid detection features; object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object. According to the embodiment of the disclosure, the characteristics occupying the grid and the characteristics of target detection are fused, the target detection branch characteristic map fused with the characteristics occupying the grid has finer geometric information and richer semantic information, and the result of a detection task is greatly improved.

Fig. 2 is a network configuration diagram of an object feature recognition network according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the object feature recognition network includes a picture feature extraction sub-network, a top view feature extraction sub-network, a mission phase feature extraction sub-network, a target detector, and an occupancy grid detector. The task stage feature extraction sub-network comprises a first encoding and decoding module and a second encoding and decoding module.

As shown in fig. 2, the image feature extraction is performed on the three-dimensional road environment image, and an image feature map is obtained. One implementation way is to input the three-dimensional road environment image into a picture feature extraction sub-network to obtain a picture feature map. The type of the image feature extraction sub-network is not limited in the embodiments of the present disclosure, and the image feature extraction sub-network may be any type of feature extraction network, for example, a backhaul network.

As shown in fig. 2, a top view feature is extracted from the image feature map, the camera parameters, and the image depth values, and a top view feature map is obtained. One implementation way is that the image feature map, the camera parameters and the image depth value are input into a top view feature extraction sub-network to obtain a top view feature map. The embodiment of the disclosure does not limit the top view feature extraction sub-network, such as Image2BEV block network.

As shown in fig. 2, the task-stage feature extraction sub-network performs feature fusion processing on the top-view feature map to obtain fused target detection features and occupied grid detection features. One implementation way is that the overlook characteristic diagram is input into a first encoding and decoding module to carry out encoding and decoding operations, so as to obtain occupied grid detection characteristics; and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features. The Task phase feature extraction sub-network, for example, a Task stage network, is not limited in the embodiments of the present disclosure. With reference to fig. 2, in the second encoding and decoding module, the features decoded by the top view feature map and the occupied grid detection features are fused and decoded, so as to obtain the fused target detection features.

As shown in fig. 2, the object feature recognition network includes a target detector, and performs object detection according to the target detection feature, so as to obtain a three-dimensional boundary of the object and a type of the object. One way this can be achieved is to input the object detection feature into the object detector, resulting in a three-dimensional boundary of the object and the type of object. Among other things, embodiments of the present disclosure are not limited to the type of target detector, e.g., an OD head network.

As shown in FIG. 2, the occupied grid detection is performed according to the occupied grid detection features to obtain the semantic information of the object, and one possible way is to input the occupied grid detection features into the occupied grid detector to obtain the semantic information of the object. Among other things, embodiments of the present disclosure are not limited in the type of occupancy grid detector, such as an OC head network.

In some embodiments of the present disclosure, labeling of target detection is consistent with other tasks, including 3D bounding boxes and categories as labels. The occupied grid task provides two kinds of supervision, one is occupied grids without labeling semantics, i.e. thick labels; one is to mark the semantic occupied grids, namely, the fine labels, the coarse labels can be directly generated through the point cloud data obtained by the Lidar sensor, the manual marking cost is not needed, the fine labels need to manually mark the category of each grid, and the two labels can remarkably improve the target detection performance. The embodiment of the disclosure fuses the characteristics of the task of which the prediction semantics occupy the grid and the characteristics of the target detection task, thereby being beneficial to realizing better understanding of scenes and more accurate object detection.

In conjunction with the above descriptions of the embodiments, fig. 3 is a schematic flow chart of another image processing method according to an exemplary embodiment of the disclosure. As shown in fig. 3, the method includes:

S301: acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;

s302: inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object;

s303: generating a navigation image according to the three-dimensional boundary of the object, the type of the object and the semantic information of the object contained in the three-dimensional road environment image;

s304: and performing navigation operation according to the navigation image.

In the embodiment of the present disclosure, the execution subject of the method is an autonomous vehicle, and the embodiment of the present disclosure does not limit the type of the autonomous vehicle. The description of the corresponding parts of the previous embodiments can be seen in relation to the type of vehicle.

In this embodiment, the implementation manner of each step of the above method may refer to the description of the corresponding portion of each embodiment, which is not described herein.

Fig. 4 is a schematic structural view of an image processing apparatus 40 according to an exemplary embodiment of the present disclosure. As shown in fig. 4, the image processing apparatus 40 includes: an acquisition module 41 and a feature recognition module 42.

The acquiring module 41 acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;

The feature recognition module 42 inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of the object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network performs fusion processing on the features occupying the grid extracted from the three-dimensional road environment image and the features detected by the target.

Optionally, after inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, the feature recognition module 42 may be further configured to:

generating a navigation image according to the three-dimensional boundary of the object, the type of the object and the semantic information of the object contained in the three-dimensional road environment image;

and performing navigation operation according to the navigation image.

Optionally, the feature recognition module 42 is configured to, when inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, obtain the three-dimensional boundary of the object, the type of the object, and the semantic information of the object included in the three-dimensional road environment image:

carrying out picture feature extraction on the three-dimensional road environment image in the object feature recognition network to obtain a picture feature map;

Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the feature recognition module 42 is configured to, when performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map:

and inputting the three-dimensional road environment image into a picture feature extraction sub-network to obtain a picture feature map.

Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the feature recognition module 42 is configured to, when performing top view feature extraction on the image feature map, the camera parameters, and the image depth values, obtain a top view feature map:

and inputting the image feature map, the camera parameters and the image depth values into a top view feature extraction sub-network to obtain a top view feature map.

Optionally, the object feature recognition network includes a task stage feature extraction sub-network, where the task stage feature extraction sub-network includes a first codec module and a second codec module, and the feature recognition module 42 is configured to, when performing feature fusion processing on the top view feature map to obtain a fused target detection feature and an occupied grid detection feature:

inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain occupied grid detection characteristics;

Optionally, the object feature recognition network includes a target detector, and the feature recognition module 42 is configured to, when performing object detection according to the target detection feature, obtain a three-dimensional boundary of the object and a type of the object:

the object detection features are input into an object detector to obtain a three-dimensional boundary of the object and the type of the object.

Optionally, the object feature recognition network includes an occupied mesh detector, and the feature recognition module 42 is configured to, when performing occupied mesh detection according to occupied mesh detection features, obtain semantic information of the object:

Inputting the occupied grid detection characteristics into an occupied grid detector to obtain semantic information of the object.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure. As shown in fig. 5, the electronic device includes: a memory 51 and a processor 52. In addition, the electronic device further comprises a power supply component 53 and a communication component 54.

The memory 51 is used for storing computer programs and may be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on an electronic device.

The memory 51 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

A communication component 54 for data transmission with other devices.

A processor 52, executable computer instructions stored in memory 51, for: acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.

Optionally, after inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, the processor 52 may be further configured to:

and performing navigation operation according to the navigation image.

Optionally, the processor 52 is configured to, when inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, obtain the three-dimensional boundary of the object, the type of the object, and the semantic information of the object included in the three-dimensional road environment image:

Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the processor 52 is configured to, when performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map:

Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the processor 52 is configured to, when performing top view feature extraction on the image feature map, the camera parameters, and the image depth values, obtain a top view feature map:

Optionally, the object feature recognition network includes a task-stage feature extraction sub-network, where the task-stage feature extraction sub-network includes a first codec module and a second codec module, and the processor 52 is configured to, when performing feature fusion processing on the top view feature map to obtain the fused target detection feature and the occupied grid detection feature:

Optionally, the object feature recognition network includes a target detector, and the processor 52 is configured to, when performing object detection according to the target detection feature, obtain a three-dimensional boundary of the object and a type of the object:

Optionally, the object feature recognition network includes an occupied mesh detector, and the processor 52 is configured to, when performing occupied mesh detection based on the occupied mesh detection features, obtain semantic information of the object:

Accordingly, the disclosed embodiments also provide a computer-readable storage medium storing a computer program. The computer-readable storage medium stores a computer program that, when executed by one or more processors, causes the one or more processors to perform the steps in the method embodiment of fig. 1.

Accordingly, the disclosed embodiments also provide a computer program product comprising a computer program/instructions for executing the steps of the method embodiment of fig. 1 by a processor.

The communication assembly of fig. 5 is configured to facilitate wired or wireless communication between the device in which the communication assembly is located and other devices. The device where the communication component is located can access a wireless network based on a communication standard, such as a mobile communication network of WiFi,2G, 3G, 4G/LTE, 5G, etc., or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

The power supply assembly shown in fig. 5 provides power for various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

The electronic device further comprises a display screen and an audio component.

The display screen includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation.

An audio component, which may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

In the above-mentioned apparatus, storage medium and program product embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The above is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image processing method, applied to a vehicle, comprising:

inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy grids and are extracted from the three-dimensional road environment image and the features detected by the targets;

inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and semantic information of the object, wherein the three-dimensional boundary comprises:

performing feature fusion processing on the top view feature map to obtain fused target detection features and occupied grid detection features;

detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object;

the object feature recognition network comprises a task stage feature extraction sub-network, the task stage feature extraction sub-network comprises a first encoding and decoding module and a second encoding and decoding module, the feature fusion processing is carried out on the overlook feature map to obtain fused target detection features and occupied grid detection features, and the object feature recognition network comprises:

2. The method of claim 1, wherein after inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object, and semantic information of the object, the method further comprises:

and performing navigation operation according to the navigation image.

3. The method according to claim 1, wherein the object feature recognition network includes a picture feature extraction sub-network, and the performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map includes:

4. The method according to claim 1, wherein the object feature recognition network includes a top view feature extraction sub-network, and the performing top view feature extraction on the picture feature map, the camera parameters, and the image depth values to obtain a top view feature map includes:

5. The method of claim 1, wherein the object feature recognition network comprises a target detector, wherein the object detection based on the target detection feature results in a three-dimensional boundary of the object and a type of the object, comprising:

6. The method of claim 1, wherein the object feature recognition network comprises an occupied mesh detector, wherein the performing occupied mesh detection based on the occupied mesh detection features obtains semantic information of the object, comprising:

7. An image processing apparatus, comprising:

the feature recognition module inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy grids and are extracted from the three-dimensional road environment image and the features detected by the target;

the feature recognition module is used for inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the three-dimensional boundary is as follows:

8. A vehicle, characterized by comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the steps in the method of any of claims 1-6.

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-6.