CN116704472B - Image processing method, device, apparatus, medium, and program product - Google Patents

Image processing method, device, apparatus, medium, and program product Download PDF

Info

Publication number
CN116704472B
CN116704472B CN202310546183.6A CN202310546183A CN116704472B CN 116704472 B CN116704472 B CN 116704472B CN 202310546183 A CN202310546183 A CN 202310546183A CN 116704472 B CN116704472 B CN 116704472B
Authority
CN
China
Prior art keywords
road environment
feature
environment image
dimensional
dimensional road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310546183.6A
Other languages
Chinese (zh)
Other versions
CN116704472A (en
Inventor
冷汉超
俞昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaomi Automobile Technology Co Ltd
Original Assignee
Xiaomi Automobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaomi Automobile Technology Co Ltd filed Critical Xiaomi Automobile Technology Co Ltd
Priority to CN202310546183.6A priority Critical patent/CN116704472B/en
Publication of CN116704472A publication Critical patent/CN116704472A/en
Application granted granted Critical
Publication of CN116704472B publication Critical patent/CN116704472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The present disclosure provides an image processing method, apparatus, device, medium, and program product. The present disclosure relates to the field of autopilot technology, and in particular, to an image processing method, apparatus, device, medium, and program product. In some embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.

Description

Image processing method, device, apparatus, medium, and program product
Technical Field
The present disclosure relates to the field of autopilot technology, and in particular, to an image processing method, apparatus, device, medium, and program product.
Background
In recent years, the technology of automatic driving has rapidly developed, and an important research direction in this field is to accurately and comprehensively perceive the three-dimensional environment around an automatic driving vehicle.
Currently, an object detection method is adopted to identify a road environment image, and only specific objects around a vehicle, such as an automobile, a bicycle, a pedestrian and the like, can be detected; the recognition accuracy of the road environment image is low.
Disclosure of Invention
The disclosure provides an image processing method, an image processing device, an image processing medium and a program product, which are used for at least solving the problem that the existing road environment image recognition precision is low.
The technical scheme of the present disclosure is as follows:
an embodiment of the present disclosure provides an image processing method, applied to a vehicle, including:
acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;
inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.
Optionally, after inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object, and semantic information of the object, the method further includes:
generating a navigation image according to the three-dimensional boundary of the object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object;
and performing navigation operation according to the navigation image.
Optionally, the inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object and semantic information of the object, including:
carrying out picture feature extraction on the three-dimensional road environment image in an object feature recognition network to obtain a picture feature map;
performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map;
performing feature fusion processing on the overlook feature map to obtain fused target detection features and occupied grid detection features;
Object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and
and detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object.
Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the extracting the picture feature of the three-dimensional road environment image to obtain a picture feature map includes:
and inputting the three-dimensional road environment image into the picture feature extraction sub-network to obtain the picture feature map.
Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the top view feature extraction is performed on the image feature map, the camera parameter and the image depth value to obtain a top view feature map, including:
and inputting the picture feature map, the camera parameters and the image depth values into the top view feature extraction sub-network to obtain the top view feature map.
Optionally, the object feature recognition network includes a task stage feature extraction sub-network, where the task stage feature extraction sub-network includes a first codec module and a second codec module, and performs feature fusion processing on the top view feature map to obtain a fused target detection feature and an occupied grid detection feature, and the method includes:
Inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain the occupied grid detection characteristics;
and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features.
Optionally, the object feature recognition network includes a target detector, and the object detection is performed according to the target detection feature, so as to obtain a three-dimensional boundary of the object and a type of the object, including:
inputting the target detection characteristic into the target detector to obtain the three-dimensional boundary of the object and the type of the object.
Optionally, the object feature recognition network includes an occupied mesh detector, and the detecting occupied meshes according to the occupied mesh detection features to obtain semantic information of the object includes:
inputting the occupied grid detection characteristics into the occupied grid detector to obtain semantic information of the object.
The embodiment of the disclosure also provides an image processing apparatus, including:
The acquisition module is used for acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;
the feature recognition module inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.
Optionally, after the three-dimensional road environment image and the camera parameters are input into an object feature recognition network, the feature recognition module is further configured to:
generating a navigation image according to the three-dimensional boundary of the object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object;
and performing navigation operation according to the navigation image.
Optionally, the feature recognition module is configured to, when inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object, and semantic information of the object:
Carrying out picture feature extraction on the three-dimensional road environment image in an object feature recognition network to obtain a picture feature map;
performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map;
performing feature fusion processing on the overlook feature map to obtain fused target detection features and occupied grid detection features;
object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and
and detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object.
Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the feature recognition module is configured to, when performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map:
and inputting the three-dimensional road environment image into the picture feature extraction sub-network to obtain the picture feature map. Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the feature recognition module is configured to, when performing top view feature extraction on the image feature map, the camera parameter, and the image depth value to obtain a top view feature map:
And inputting the picture feature map, the camera parameters and the image depth values into the top view feature extraction sub-network to obtain the top view feature map.
Optionally, the object feature recognition network includes a task stage feature extraction sub-network, where the task stage feature extraction sub-network includes a first codec module and a second codec module, and when the feature recognition module performs feature fusion processing on the top view feature map, the feature recognition module is configured to:
inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain the occupied grid detection characteristics;
and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features.
Optionally, the object feature recognition network includes a target detector, and the feature recognition module is configured to, when performing object detection according to the target detection feature, obtain a three-dimensional boundary of the object and a type of the object:
Inputting the target detection characteristic into the target detector to obtain the three-dimensional boundary of the object and the type of the object.
Optionally, the object feature recognition network includes an occupied mesh detector, and the feature recognition module is configured to, when performing occupied mesh detection according to the occupied mesh detection feature, obtain semantic information of the object:
inputting the occupied grid detection characteristics into the occupied grid detector to obtain semantic information of the object.
The disclosed embodiments also provide a vehicle including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the steps in the method described above.
The embodiment of the disclosure also provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the steps in the method described above.
The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, characterized in that the computer program, when executed by a processor, implements the above-mentioned method.
The disclosed embodiments also provide a computer program product comprising a computer program/instruction which, when executed by a processor, implements the above-described method.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
in some embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
Fig. 1 is a flowchart of an image processing method according to an exemplary embodiment of the present disclosure;
FIG. 2 is a network block diagram of an object feature identification network provided in an exemplary embodiment of the present disclosure;
FIG. 3 is a flow chart of another image processing method according to an exemplary embodiment of the present disclosure;
fig. 4 is a schematic structural view of an image processing apparatus according to an exemplary embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, the user information related to the present disclosure includes, but is not limited to: user equipment information and user personal information; the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the user information in the present disclosure all conform to the regulations of the relevant laws and regulations and do not violate the well-known and popular public order.
At present, an object detection method is adopted to identify an image of a road environment, and only specific objects around a vehicle, such as an automobile, a bicycle, a pedestrian and the like, can be detected, but in a real automatic driving scene, other physical structures in the environment are also important to the perception of a three-dimensional scene.
For example: roads, sidewalks and vegetation play an important role in the overall understanding of a scene. Existing object detection methods do not contain such additional information, which may limit the performance of the autopilot system.
In view of the above-mentioned technical problems, in some embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.
The following describes in detail the technical solutions provided by the embodiments of the present disclosure with reference to the accompanying drawings.
Fig. 1 is a flowchart of an image processing method according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the method includes:
s101: acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;
s102: inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.
In this embodiment, the execution subject of the method is an automatic driving vehicle, and the type of the automatic driving vehicle is not limited in the embodiment of the present disclosure, and may be classified according to the level of autonomy and application scenario implemented. Based on autonomy. Fully automatic driving of the vehicle: a vehicle that can perform all driving tasks without human intervention; partially autonomous vehicles: vehicles that require a human driver to take over vehicle control when necessary; driving assisting vehicle: only the assistance of partial driving tasks is provided, and the vehicle can normally run under the operation of a human driver. Second, based on usage scenario. Public road autopilot vehicles: an autonomous vehicle traveling on a public road has high safety and reliability; special-purpose scenes automatically drive vehicles: autonomous vehicles designed for specific scenarios or tasks, such as agricultural operations, transportation logistics, and the like. Thirdly, realizing based on technology. Perceived as a guided type: acquiring surrounding environment information by using a sensor, and performing autonomous driving by selecting an optimal path; the operation is of the guiding type: autonomous driving of the vehicle is controlled by tracking the navigation map and planning the path. Fourth, based on the vehicle type. Autonomous driving passenger car: the method can be used for private, business, taxi and other car application scenes. Autonomous driving of a bus: in the public transportation field, the cost of manual drivers is reduced as a main application target. Autonomous driving truck: automated transportation is becoming an important trend in the logistics industry.
It should be noted that three-dimensional object detection focuses on identifying boundaries of objects and types of objects, and three-dimensional semantic occupied grid prediction enables a deeper understanding of a three-dimensional road environment by attaching semantic information to all objects. The three-dimensional object detection and the three-dimensional semantic occupation grid prediction can be combined to obtain finer geometric information and richer semantic information, and the recognition accuracy of the road environment image is greatly improved.
In the embodiment, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.
The three-dimensional road environment image refers to an image that can represent an object such as a road or a building in a real scene in three dimensions. Typically, these images are data acquired by a laser radar, camera, gps, etc., and generated by computer processing. Such images may be used in the areas of context awareness, virtual reality and game making, city planning and design of autopilot cars. The three-dimensional road environment image is collected through the camera.
Camera parameters, including camera intrinsic and camera extrinsic. Wherein the camera internal parameters comprise focal length, principal point coordinates and distortion parameters; the camera internal parameters function to convert coordinates from the camera coordinate system into the pixel coordinate system. The camera external parameters comprise: translation and rotation matrices that convert coordinates from the vehicle coordinate system to the camera coordinate system; the camera external parameters function to convert coordinates from the vehicle coordinate system into the camera coordinate system.
In one application scenario, a vehicle acquires a three-dimensional road environment image around the vehicle through an image acquisition device installed on the vehicle. Inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object; generating a navigation image according to the three-dimensional boundary of the object, the type of the object and the semantic information of the object contained in the three-dimensional road environment image, and displaying the navigation image on a central control screen of the vehicle so as to enable a user to check the navigation image and perform control operation of the vehicle; and performing navigation operation according to the navigation image.
In some embodiments of the present disclosure, a three-dimensional road environment image is acquired. One way this can be achieved is to acquire a three-dimensional image of the road environment around the vehicle by means of an image acquisition device mounted on the vehicle. The type of the image acquisition device is not limited, and can be adjusted according to actual conditions. The image capture device may be a FOV camera mounted to the side or top of the vehicle, or a 360 degree wide angle camera mounted to the top of the vehicle. The number of FOV cameras mounted on the vehicle is not limited, and may be 6, 8, 12, or the like.
In some embodiments of the present disclosure, the image depth performs data acquisition on the three-dimensional road environment around the vehicle simultaneously by the radar and the image acquisition device mounted on the vehicle to acquire the image depth corresponding to the three-dimensional road environment image.
In some embodiments of the present disclosure, a three-dimensional road environment image and camera parameters are input into an object feature recognition network, resulting in three-dimensional boundaries of objects contained in the three-dimensional road environment image, types of objects, and semantic information of the objects. One implementation method is that in the object feature recognition network, picture feature extraction is carried out on a three-dimensional road environment image to obtain a picture feature map; performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map; performing feature fusion processing on the overlook feature map to obtain fused target detection features and occupied grid detection features; object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object. According to the embodiment of the disclosure, the characteristics occupying the grid and the characteristics of target detection are fused, the target detection branch characteristic map fused with the characteristics occupying the grid has finer geometric information and richer semantic information, and the result of a detection task is greatly improved.
Fig. 2 is a network configuration diagram of an object feature recognition network according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the object feature recognition network includes a picture feature extraction sub-network, a top view feature extraction sub-network, a mission phase feature extraction sub-network, a target detector, and an occupancy grid detector. The task stage feature extraction sub-network comprises a first encoding and decoding module and a second encoding and decoding module.
As shown in fig. 2, the image feature extraction is performed on the three-dimensional road environment image, and an image feature map is obtained. One implementation way is to input the three-dimensional road environment image into a picture feature extraction sub-network to obtain a picture feature map. The type of the image feature extraction sub-network is not limited in the embodiments of the present disclosure, and the image feature extraction sub-network may be any type of feature extraction network, for example, a backhaul network.
As shown in fig. 2, a top view feature is extracted from the image feature map, the camera parameters, and the image depth values, and a top view feature map is obtained. One implementation way is that the image feature map, the camera parameters and the image depth value are input into a top view feature extraction sub-network to obtain a top view feature map. The embodiment of the disclosure does not limit the top view feature extraction sub-network, such as Image2BEV block network.
As shown in fig. 2, the task-stage feature extraction sub-network performs feature fusion processing on the top-view feature map to obtain fused target detection features and occupied grid detection features. One implementation way is that the overlook characteristic diagram is input into a first encoding and decoding module to carry out encoding and decoding operations, so as to obtain occupied grid detection characteristics; and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features. The Task phase feature extraction sub-network, for example, a Task stage network, is not limited in the embodiments of the present disclosure. With reference to fig. 2, in the second encoding and decoding module, the features decoded by the top view feature map and the occupied grid detection features are fused and decoded, so as to obtain the fused target detection features.
As shown in fig. 2, the object feature recognition network includes a target detector, and performs object detection according to the target detection feature, so as to obtain a three-dimensional boundary of the object and a type of the object. One way this can be achieved is to input the object detection feature into the object detector, resulting in a three-dimensional boundary of the object and the type of object. Among other things, embodiments of the present disclosure are not limited to the type of target detector, e.g., an OD head network.
As shown in FIG. 2, the occupied grid detection is performed according to the occupied grid detection features to obtain the semantic information of the object, and one possible way is to input the occupied grid detection features into the occupied grid detector to obtain the semantic information of the object. Among other things, embodiments of the present disclosure are not limited in the type of occupancy grid detector, such as an OC head network.
In some embodiments of the present disclosure, labeling of target detection is consistent with other tasks, including 3D bounding boxes and categories as labels. The occupied grid task provides two kinds of supervision, one is occupied grids without labeling semantics, i.e. thick labels; one is to mark the semantic occupied grids, namely, the fine labels, the coarse labels can be directly generated through the point cloud data obtained by the Lidar sensor, the manual marking cost is not needed, the fine labels need to manually mark the category of each grid, and the two labels can remarkably improve the target detection performance. The embodiment of the disclosure fuses the characteristics of the task of which the prediction semantics occupy the grid and the characteristics of the target detection task, thereby being beneficial to realizing better understanding of scenes and more accurate object detection.
In conjunction with the above descriptions of the embodiments, fig. 3 is a schematic flow chart of another image processing method according to an exemplary embodiment of the disclosure. As shown in fig. 3, the method includes:
S301: acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;
s302: inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object;
s303: generating a navigation image according to the three-dimensional boundary of the object, the type of the object and the semantic information of the object contained in the three-dimensional road environment image;
s304: and performing navigation operation according to the navigation image.
In the embodiment of the present disclosure, the execution subject of the method is an autonomous vehicle, and the embodiment of the present disclosure does not limit the type of the autonomous vehicle. The description of the corresponding parts of the previous embodiments can be seen in relation to the type of vehicle.
In this embodiment, the implementation manner of each step of the above method may refer to the description of the corresponding portion of each embodiment, which is not described herein.
Fig. 4 is a schematic structural view of an image processing apparatus 40 according to an exemplary embodiment of the present disclosure. As shown in fig. 4, the image processing apparatus 40 includes: an acquisition module 41 and a feature recognition module 42.
The acquiring module 41 acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;
The feature recognition module 42 inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of the object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network performs fusion processing on the features occupying the grid extracted from the three-dimensional road environment image and the features detected by the target.
Optionally, after inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, the feature recognition module 42 may be further configured to:
generating a navigation image according to the three-dimensional boundary of the object, the type of the object and the semantic information of the object contained in the three-dimensional road environment image;
and performing navigation operation according to the navigation image.
Optionally, the feature recognition module 42 is configured to, when inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, obtain the three-dimensional boundary of the object, the type of the object, and the semantic information of the object included in the three-dimensional road environment image:
carrying out picture feature extraction on the three-dimensional road environment image in the object feature recognition network to obtain a picture feature map;
Performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map;
performing feature fusion processing on the overlook feature map to obtain fused target detection features and occupied grid detection features;
object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and
and detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object.
Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the feature recognition module 42 is configured to, when performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map:
and inputting the three-dimensional road environment image into a picture feature extraction sub-network to obtain a picture feature map.
Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the feature recognition module 42 is configured to, when performing top view feature extraction on the image feature map, the camera parameters, and the image depth values, obtain a top view feature map:
and inputting the image feature map, the camera parameters and the image depth values into a top view feature extraction sub-network to obtain a top view feature map.
Optionally, the object feature recognition network includes a task stage feature extraction sub-network, where the task stage feature extraction sub-network includes a first codec module and a second codec module, and the feature recognition module 42 is configured to, when performing feature fusion processing on the top view feature map to obtain a fused target detection feature and an occupied grid detection feature:
inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain occupied grid detection characteristics;
and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features.
Optionally, the object feature recognition network includes a target detector, and the feature recognition module 42 is configured to, when performing object detection according to the target detection feature, obtain a three-dimensional boundary of the object and a type of the object:
the object detection features are input into an object detector to obtain a three-dimensional boundary of the object and the type of the object.
Optionally, the object feature recognition network includes an occupied mesh detector, and the feature recognition module 42 is configured to, when performing occupied mesh detection according to occupied mesh detection features, obtain semantic information of the object:
Inputting the occupied grid detection characteristics into an occupied grid detector to obtain semantic information of the object.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure. As shown in fig. 5, the electronic device includes: a memory 51 and a processor 52. In addition, the electronic device further comprises a power supply component 53 and a communication component 54.
The memory 51 is used for storing computer programs and may be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on an electronic device.
The memory 51 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
A communication component 54 for data transmission with other devices.
A processor 52, executable computer instructions stored in memory 51, for: acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy the grids and are extracted from the three-dimensional road environment image and the features detected by the targets.
Optionally, after inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, the processor 52 may be further configured to:
generating a navigation image according to the three-dimensional boundary of the object, the type of the object and the semantic information of the object contained in the three-dimensional road environment image;
and performing navigation operation according to the navigation image.
Optionally, the processor 52 is configured to, when inputting the three-dimensional road environment image and the camera parameters into the object feature recognition network, obtain the three-dimensional boundary of the object, the type of the object, and the semantic information of the object included in the three-dimensional road environment image:
Carrying out picture feature extraction on the three-dimensional road environment image in the object feature recognition network to obtain a picture feature map;
performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map;
performing feature fusion processing on the overlook feature map to obtain fused target detection features and occupied grid detection features;
object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and
and detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object.
Optionally, the object feature recognition network includes a picture feature extraction sub-network, and the processor 52 is configured to, when performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map:
and inputting the three-dimensional road environment image into a picture feature extraction sub-network to obtain a picture feature map.
Optionally, the object feature recognition network includes a top view feature extraction sub-network, and the processor 52 is configured to, when performing top view feature extraction on the image feature map, the camera parameters, and the image depth values, obtain a top view feature map:
and inputting the image feature map, the camera parameters and the image depth values into a top view feature extraction sub-network to obtain a top view feature map.
Optionally, the object feature recognition network includes a task-stage feature extraction sub-network, where the task-stage feature extraction sub-network includes a first codec module and a second codec module, and the processor 52 is configured to, when performing feature fusion processing on the top view feature map to obtain the fused target detection feature and the occupied grid detection feature:
inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain occupied grid detection characteristics;
and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features.
Optionally, the object feature recognition network includes a target detector, and the processor 52 is configured to, when performing object detection according to the target detection feature, obtain a three-dimensional boundary of the object and a type of the object:
the object detection features are input into an object detector to obtain a three-dimensional boundary of the object and the type of the object.
Optionally, the object feature recognition network includes an occupied mesh detector, and the processor 52 is configured to, when performing occupied mesh detection based on the occupied mesh detection features, obtain semantic information of the object:
Inputting the occupied grid detection characteristics into an occupied grid detector to obtain semantic information of the object.
Accordingly, the disclosed embodiments also provide a computer-readable storage medium storing a computer program. The computer-readable storage medium stores a computer program that, when executed by one or more processors, causes the one or more processors to perform the steps in the method embodiment of fig. 1.
Accordingly, the disclosed embodiments also provide a computer program product comprising a computer program/instructions for executing the steps of the method embodiment of fig. 1 by a processor.
The communication assembly of fig. 5 is configured to facilitate wired or wireless communication between the device in which the communication assembly is located and other devices. The device where the communication component is located can access a wireless network based on a communication standard, such as a mobile communication network of WiFi,2G, 3G, 4G/LTE, 5G, etc., or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
The power supply assembly shown in fig. 5 provides power for various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.
The electronic device further comprises a display screen and an audio component.
The display screen includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation.
An audio component, which may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.
In the above-mentioned apparatus, storage medium and program product embodiments of the present disclosure, a vehicle acquires a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image; the vehicle inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, and the object feature recognition network fuses the features which occupy the grids and are extracted from the three-dimensional road environment image with the features which are detected by the targets, so that the accuracy of target detection is improved, the semantic information of the object is increased, and the recognition accuracy of the road environment image is improved.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The above is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An image processing method, applied to a vehicle, comprising:
acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;
inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy grids and are extracted from the three-dimensional road environment image and the features detected by the targets;
inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and semantic information of the object, wherein the three-dimensional boundary comprises:
carrying out picture feature extraction on the three-dimensional road environment image in an object feature recognition network to obtain a picture feature map;
performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map;
performing feature fusion processing on the top view feature map to obtain fused target detection features and occupied grid detection features;
Object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and
detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object;
the object feature recognition network comprises a task stage feature extraction sub-network, the task stage feature extraction sub-network comprises a first encoding and decoding module and a second encoding and decoding module, the feature fusion processing is carried out on the overlook feature map to obtain fused target detection features and occupied grid detection features, and the object feature recognition network comprises:
inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain the occupied grid detection characteristics;
and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features.
2. The method of claim 1, wherein after inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain a three-dimensional boundary of an object contained in the three-dimensional road environment image, a type of the object, and semantic information of the object, the method further comprises:
Generating a navigation image according to the three-dimensional boundary of the object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object;
and performing navigation operation according to the navigation image.
3. The method according to claim 1, wherein the object feature recognition network includes a picture feature extraction sub-network, and the performing picture feature extraction on the three-dimensional road environment image to obtain a picture feature map includes:
and inputting the three-dimensional road environment image into the picture feature extraction sub-network to obtain the picture feature map.
4. The method according to claim 1, wherein the object feature recognition network includes a top view feature extraction sub-network, and the performing top view feature extraction on the picture feature map, the camera parameters, and the image depth values to obtain a top view feature map includes:
and inputting the picture feature map, the camera parameters and the image depth values into the top view feature extraction sub-network to obtain the top view feature map.
5. The method of claim 1, wherein the object feature recognition network comprises a target detector, wherein the object detection based on the target detection feature results in a three-dimensional boundary of the object and a type of the object, comprising:
Inputting the target detection characteristic into the target detector to obtain the three-dimensional boundary of the object and the type of the object.
6. The method of claim 1, wherein the object feature recognition network comprises an occupied mesh detector, wherein the performing occupied mesh detection based on the occupied mesh detection features obtains semantic information of the object, comprising:
inputting the occupied grid detection characteristics into the occupied grid detector to obtain semantic information of the object.
7. An image processing apparatus, comprising:
the acquisition module is used for acquiring a three-dimensional road environment image and camera parameters corresponding to the three-dimensional road environment image;
the feature recognition module inputs the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the object feature recognition network carries out fusion processing on the features which occupy grids and are extracted from the three-dimensional road environment image and the features detected by the target;
the feature recognition module is used for inputting the three-dimensional road environment image and the camera parameters into an object feature recognition network to obtain the three-dimensional boundary of an object contained in the three-dimensional road environment image, the type of the object and the semantic information of the object, wherein the three-dimensional boundary is as follows:
Carrying out picture feature extraction on the three-dimensional road environment image in an object feature recognition network to obtain a picture feature map;
performing top view feature extraction on the picture feature map, the camera parameters and the image depth values to obtain a top view feature map;
performing feature fusion processing on the top view feature map to obtain fused target detection features and occupied grid detection features;
object detection is carried out according to the target detection characteristics, and the three-dimensional boundary of the object and the type of the object are obtained; and
detecting the occupied grid according to the occupied grid detection characteristics to obtain semantic information of the object;
the object feature recognition network comprises a task stage feature extraction sub-network, the task stage feature extraction sub-network comprises a first encoding and decoding module and a second encoding and decoding module, the feature fusion processing is carried out on the overlook feature map to obtain fused target detection features and occupied grid detection features, and the object feature recognition network comprises:
inputting the overlooking characteristic diagram into a first encoding and decoding module to perform encoding and decoding operations to obtain the occupied grid detection characteristics;
and in the second encoding and decoding module of the overlook feature map, carrying out fusion decoding operation on the features decoded by the overlook feature map and the occupied grid detection features in the second encoding and decoding module to obtain the fused target detection features.
8. A vehicle, characterized by comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the steps in the method of any of claims 1-6.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the steps in the method of any of claims 1-6.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-6.
CN202310546183.6A 2023-05-15 2023-05-15 Image processing method, device, apparatus, medium, and program product Active CN116704472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310546183.6A CN116704472B (en) 2023-05-15 2023-05-15 Image processing method, device, apparatus, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310546183.6A CN116704472B (en) 2023-05-15 2023-05-15 Image processing method, device, apparatus, medium, and program product

Publications (2)

Publication Number Publication Date
CN116704472A CN116704472A (en) 2023-09-05
CN116704472B true CN116704472B (en) 2024-04-02

Family

ID=87826760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310546183.6A Active CN116704472B (en) 2023-05-15 2023-05-15 Image processing method, device, apparatus, medium, and program product

Country Status (1)

Country Link
CN (1) CN116704472B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369743A (en) * 2015-08-28 2018-08-03 帝国科技及医学学院 Use multi-directional camera map structuring space
CN110119148A (en) * 2019-05-14 2019-08-13 深圳大学 A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium
CN112509115A (en) * 2020-11-26 2021-03-16 中国人民解放军战略支援部队信息工程大学 Three-dimensional time-varying unconstrained reconstruction method and system for dynamic scene of sequence image
CN113554698A (en) * 2020-04-23 2021-10-26 杭州海康威视数字技术股份有限公司 Vehicle pose information generation method and device, electronic equipment and storage medium
CN113728625A (en) * 2019-05-14 2021-11-30 英特尔公司 Immersive video coding techniques for metadata (3DoF +/MIV) and video-point cloud coding (V-PCC) for three-degree-of-freedom additive/immersive video
CN114973181A (en) * 2022-07-29 2022-08-30 武汉极目智能技术有限公司 Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium
CN114998856A (en) * 2022-06-17 2022-09-02 苏州浪潮智能科技有限公司 3D target detection method, device, equipment and medium of multi-camera image
CN115205365A (en) * 2022-07-14 2022-10-18 小米汽车科技有限公司 Vehicle distance detection method and device, vehicle, readable storage medium and chip
CN115620277A (en) * 2022-10-13 2023-01-17 北京京深深向科技有限公司 Monocular 3D environment sensing method and device, electronic equipment and storage medium
CN115641581A (en) * 2022-09-30 2023-01-24 北京迈格威科技有限公司 Target detection method, electronic device, and storage medium
CN115830555A (en) * 2022-11-24 2023-03-21 国家石油天然气管网集团有限公司 Target identification method based on radar point cloud, storage medium and equipment
CN115965970A (en) * 2023-02-02 2023-04-14 清华大学 Method and system for realizing bird's-eye view semantic segmentation based on implicit set prediction
CN116051751A (en) * 2023-01-30 2023-05-02 清华大学 Three-dimensional semantic occupation prediction method and system based on three-plane representation
CN116110025A (en) * 2023-02-02 2023-05-12 清华大学 Method and system for constructing environment semantic occupation and velocity field by grid detection tracking framework

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369743A (en) * 2015-08-28 2018-08-03 帝国科技及医学学院 Use multi-directional camera map structuring space
CN110119148A (en) * 2019-05-14 2019-08-13 深圳大学 A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium
CN113728625A (en) * 2019-05-14 2021-11-30 英特尔公司 Immersive video coding techniques for metadata (3DoF +/MIV) and video-point cloud coding (V-PCC) for three-degree-of-freedom additive/immersive video
CN113554698A (en) * 2020-04-23 2021-10-26 杭州海康威视数字技术股份有限公司 Vehicle pose information generation method and device, electronic equipment and storage medium
CN112509115A (en) * 2020-11-26 2021-03-16 中国人民解放军战略支援部队信息工程大学 Three-dimensional time-varying unconstrained reconstruction method and system for dynamic scene of sequence image
CN114998856A (en) * 2022-06-17 2022-09-02 苏州浪潮智能科技有限公司 3D target detection method, device, equipment and medium of multi-camera image
CN115205365A (en) * 2022-07-14 2022-10-18 小米汽车科技有限公司 Vehicle distance detection method and device, vehicle, readable storage medium and chip
CN114973181A (en) * 2022-07-29 2022-08-30 武汉极目智能技术有限公司 Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium
CN115641581A (en) * 2022-09-30 2023-01-24 北京迈格威科技有限公司 Target detection method, electronic device, and storage medium
CN115620277A (en) * 2022-10-13 2023-01-17 北京京深深向科技有限公司 Monocular 3D environment sensing method and device, electronic equipment and storage medium
CN115830555A (en) * 2022-11-24 2023-03-21 国家石油天然气管网集团有限公司 Target identification method based on radar point cloud, storage medium and equipment
CN116051751A (en) * 2023-01-30 2023-05-02 清华大学 Three-dimensional semantic occupation prediction method and system based on three-plane representation
CN115965970A (en) * 2023-02-02 2023-04-14 清华大学 Method and system for realizing bird's-eye view semantic segmentation based on implicit set prediction
CN116110025A (en) * 2023-02-02 2023-05-12 清华大学 Method and system for constructing environment semantic occupation and velocity field by grid detection tracking framework

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
M2BEV:Multi Camera Joint 3D Detection and Segmentation with Unified Birds Eye View Representation;Enze Xie等;arXiv;1-13 *
基于可见光多视图像的三维重建关键技术研究;吴禹萱;中国优秀硕士学位论文全文数据库 (信息科技辑);第2023年卷(第01期);I138-2473 *
基于金字塔特征融合的二阶段三维点云车辆检测;张名芳等;交通运输系统工程与信息;第22卷(第5期);107-116 *

Also Published As

Publication number Publication date
CN116704472A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
US20200192351A1 (en) Vehicle path updates via remote vehicle control
US20180319338A1 (en) Vehicle periphery monitoring device and vehicle periphery monitoring system
US20210365696A1 (en) Vehicle Intelligent Driving Control Method and Device and Storage Medium
CN111144211B (en) Point cloud display method and device
CN111311902B (en) Data processing method, device, equipment and machine readable medium
CN111157014A (en) Road condition display method and device, vehicle-mounted terminal and storage medium
CN107406072B (en) Vehicle assistance system
EP3038021A1 (en) Risk determination method, risk determination device, risk determination system, and risk output device
CN111516690B (en) Control method and device of intelligent automobile and storage medium
CN114764911A (en) Obstacle information detection method, obstacle information detection device, electronic device, and storage medium
KR102433345B1 (en) Method and apparatus for providing information using vehicle's camera
CN116704472B (en) Image processing method, device, apparatus, medium, and program product
KR102174863B1 (en) Autonomous Vehicle Exterior Display Interaction Apparatus and Method
CN115146013A (en) Vehicle parking memory map display method, device, equipment and storage medium
KR102370876B1 (en) Method and apparatus for providing driving information by using object recognition of cloud
US20210302991A1 (en) Method and system for generating an enhanced field of view for an autonomous ground vehicle
CN114511834A (en) Method and device for determining prompt information, electronic equipment and storage medium
WO2020258222A1 (en) Method and system for identifying object
KR20140106126A (en) Auto parking method based on around view image
CN110979319A (en) Driving assistance method, device and system
CN114061598A (en) Navigation method, device, system and storage medium
CN112526477A (en) Method and apparatus for processing information
CN115817161A (en) Speed limiting method, device, vehicle and medium
KR20220028941A (en) Method and apparatus for providing a parking location using vehicle's terminal
CN110926425A (en) Navigation logistics transportation system of 3D structured light camera and control method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant