CN117485345A

CN117485345A - Method, system, equipment and storage medium for determining forward-looking perception of parking scene

Info

Publication number: CN117485345A
Application number: CN202311643027.8A
Authority: CN
Inventors: 张龙; 崔旭冰
Original assignee: Wuhan Kotei Informatics Co Ltd
Current assignee: Wuhan Kotei Informatics Co Ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-02-02

Abstract

The invention discloses a method, a system, equipment and a storage medium for determining forward-looking perception of a parking scene, wherein the method comprises the following steps: acquiring a previous frame surrounding image and a current surrounding image; inputting the surrounding environment image into a multi-task network model to obtain obstacle key point information, lane line information and drivable area information; respectively determining a lane line boundary frame and a drivable area boundary frame according to the lane line information and the drivable area information; establishing a target association relationship between a surrounding image of a previous frame and a current surrounding image based on the lane line boundary frame and the drivable area boundary frame; and obtaining forward-looking perception information through coordinate conversion processing according to the obstacle key point information, the lane line boundary frame and the drivable area boundary frame based on the target association relation. According to the invention, the coordinate conversion processing is carried out on the obstacle key point information, the lane line boundary frame and the drivable area boundary frame based on the target association relation, so that the surrounding environment information of the vehicle is quickly and accurately perceived.

Description

Method, system, equipment and storage medium for determining forward-looking perception of parking scene

Technical Field

The present invention relates to the field of parking technologies, and in particular, to a method, a system, an apparatus, and a storage medium for determining a forward-looking perception of a parking scene.

Background

The automatic parking is that the vehicle to be parked uses the vehicle-mounted sensor to collect the surrounding information of the vehicle, the collected information is transmitted to the sensing module to analyze and obtain the coordinate position information of the obstacle and the information of the drivable area, the planning module calculates the parking track according to the obstacle and the drivable area analyzed by the sensing module, and finally the control module controls the vehicle to park in the parking space.

Because the range of the obstacle information acquired by the looking-around sensor is within 5m, the obstacle avoidance function cannot be achieved in time only by looking-around within 5m, and the parking planning is performed by sensing a farther obstacle range in the automatic parking process. Therefore, how to quickly and accurately sense the surrounding environment information of the vehicle becomes a problem to be solved.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide a forward-looking perception determination method, a forward-looking perception determination system, forward-looking perception determination equipment and a storage medium for a parking scene, and aims to solve the technical problem of how to quickly and accurately perceive surrounding environment information of a vehicle.

In order to achieve the above object, the present invention provides a method for determining a forward-looking perception of a parking scene, the method comprising:

acquiring a surrounding image of a front frame and a current surrounding image through a front-view camera;

inputting the surrounding environment image into a multi-task network model to obtain barrier key point information, lane line information and drivable area information;

respectively determining a lane line boundary frame and a drivable area boundary frame according to the lane line information and the drivable area information;

establishing a target association relationship between the previous frame surrounding image and the current surrounding image based on the lane line bounding box and the drivable region bounding box;

and carrying out coordinate conversion processing on the obstacle key point information, the lane line boundary box and the drivable area boundary box according to the target association relation to obtain forward-looking perception information of the parking scene.

Optionally, the step of inputting the surrounding image into a multi-task network model to obtain obstacle keypoint information, lane line information and travelable area information includes:

performing image format conversion on the surrounding environment image to obtain a preset format environment image;

scaling and edge-repairing processing are carried out on the environment image in the preset format, and a forward-looking environment image is obtained;

and inputting the forward-looking environment image into a multi-task network model to obtain obstacle key point information, lane line information and travelable area information.

Optionally, the step of inputting the forward-looking environment image into a multi-task network model to obtain obstacle key point information, lane line information and travelable area information includes:

inputting the forward-looking environment image into a parking target detection module in a multi-task network model to obtain a plurality of obstacle feature graphs, wherein the parking target detection module comprises a space pyramid module and a feature pyramid network module;

fusing a plurality of obstacle feature maps to obtain obstacle key point information, wherein the obstacle key point information comprises 2.5D obstacle key points, obstacle categories, obstacle coordinates and obstacle orientation information;

and inputting the forward-looking environment image into a parking target segmentation module in the multi-task network model to obtain lane line information and drivable area information, wherein the lane line information comprises a lane line mask map and a lane line type, and the drivable area information comprises a drivable area mask map and a drivable area type.

Optionally, the step of establishing a target association relationship between the previous frame surrounding image and the current surrounding image based on the lane line bounding box and the drivable region bounding box includes:

respectively determining lane line position information and drivable region position information corresponding to the lane line boundary frame and the drivable region boundary frame;

respectively carrying out filtering treatment on the lane line boundary frame and the drivable region boundary frame through a Kalman filter to obtain a filtered lane line boundary frame and a filtered drivable region boundary frame;

and establishing a target association relation between the surrounding image of the previous frame and the current surrounding image through a tracking algorithm according to the lane line position information, the filtered lane line boundary box, the drivable region position information and the filtered drivable region boundary box.

Optionally, the step of performing coordinate conversion processing on the obstacle key point information, the lane line bounding box and the drivable area bounding box according to the target association relationship to obtain forward-looking perception information of the parking scene includes:

performing scene fusion on the obstacle key point information, the lane line boundary frame and the drivable area boundary frame according to the target association relationship to obtain scene fusion information;

establishing a coordinate mapping relation between pixel points in the current surrounding environment image and 3D space ground points according to camera calibration external parameters;

and carrying out coordinate conversion processing on the scene fusion information based on the coordinate mapping relation to obtain the forward-looking perception information of the parking scene.

In addition, in order to achieve the above object, the present invention also provides a front view perception determination system of a parking scene, the front view perception determination system of the parking scene includes:

the acquisition module is used for acquiring a surrounding image of a front frame and a current surrounding image through the front-view camera;

the output module is used for inputting the surrounding environment image into a multi-task network model to obtain obstacle key point information, lane line information and drivable area information;

the determining module is used for determining a lane line boundary frame and a drivable area boundary frame according to the lane line information and the drivable area information respectively;

the establishing module is used for establishing a target association relation between the surrounding image of the previous frame and the current surrounding image based on the lane line boundary box and the drivable area boundary box;

and the conversion module is used for carrying out coordinate conversion processing on the obstacle key point information, the lane line boundary box and the drivable area boundary box according to the target association relation to obtain the forward-looking perception information of the parking scene.

In addition, to achieve the above object, the present invention also proposes a forward-looking perception determination apparatus of a parking scene, the apparatus comprising: the system comprises a memory, a processor and a forward-looking perception determination program of a parking scene stored on the memory and capable of running on the processor, wherein the forward-looking perception determination program of the parking scene is configured to realize the steps of the forward-looking perception determination method of the parking scene.

In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon a forward-looking-sense determining program of a parking scene, which when executed by a processor, implements the steps of the forward-looking-sense determining method of a parking scene as described above.

According to the method, a front-frame surrounding image and a current surrounding image are acquired through a front-view camera, then the surrounding image is input into a multi-task network model to obtain obstacle key point information, lane line information and drivable area information, a lane line boundary frame and a drivable area boundary frame are respectively determined according to the lane line information and the drivable area information, then a target association relation between the front-frame surrounding image and the current surrounding image is established based on the lane line boundary frame and the drivable area boundary frame, and finally front-view sensing information of a parking scene is obtained through coordinate conversion processing according to the obstacle key point information, the lane line boundary frame and the drivable area boundary frame based on the target association relation. According to the method, the obstacle key point information, the lane line information and the drivable area information are obtained through the multi-task network model, then the obstacle key point information, the lane line boundary frame and the drivable area boundary frame are accurately corrected according to the target association relation, and then the corrected obstacle key point information, the corrected lane line boundary frame and the corrected drivable area boundary frame are subjected to coordinate conversion processing, so that the forward-looking perception information of a long-distance parking scene is rapidly and accurately obtained.

Drawings

FIG. 1 is a schematic diagram of a front view perception determination device of a parking scene of a hardware running environment according to an embodiment of the present invention;

FIG. 2 is a flow chart of a first embodiment of a method for determining a forward-looking perception of a parking scene according to the present invention;

FIG. 3 is a schematic diagram of a multi-task network according to a first embodiment of a method for determining a forward-looking perception of a parking scenario according to the present invention;

FIG. 4 is a process flow diagram of a first embodiment of a forward looking awareness determination method of a parking scene of the present invention;

FIG. 5 is a front-to-rear effect comparison chart of a first embodiment of a front-view sensing determination method of a parking scene according to the present invention;

fig. 6 is a block diagram illustrating a first embodiment of a forward-looking awareness determination system for a parking scene according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a front view sensing determination device of a parking scene of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the forward-looking perception determination apparatus of the parking scene may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage system separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the forward-looking perception determination apparatus of the parking scene, and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include an operating system, a network communication module, a user interface module, and a front view sensing determination program of a parking scene.

In the forward-looking awareness determining device of the parking scenario shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the forward-looking sense determining device of the parking scene may be disposed in the forward-looking sense determining device of the parking scene, where the forward-looking sense determining device of the parking scene invokes the forward-looking sense determining program of the parking scene stored in the memory 1005 through the processor 1001, and executes the forward-looking sense determining method of the parking scene provided by the embodiment of the invention.

The embodiment of the invention provides a forward-looking perception determination method of a parking scene, and referring to fig. 2, fig. 2 is a flow chart of a first embodiment of the forward-looking perception determination method of the parking scene.

In this embodiment, the method for determining the forward-looking perception of the parking scene includes the following steps:

step S10: and acquiring a surrounding image of a previous frame and a current surrounding image through a front-view camera.

It is to be understood that the execution subject of the present embodiment may be a front view sensing determination system of a parking scene with functions of data processing, network communication, program running, etc., or may be other computer devices with similar functions, etc., and the present embodiment is not limited thereto.

The front view camera is mounted on the current vehicle, and a front-view surrounding image of a previous frame of the current vehicle (i.e., a front-frame surrounding image) and a front-view surrounding image of the current frame (i.e., a current surrounding image) are acquired through the front view camera.

Step S20: and inputting the surrounding environment image into a multi-task network model to obtain obstacle key point information, lane line information and travelable area information.

It should also be understood that, before model training, the open source labeling software labelme is utilized to directly label key points of the obstacle, the lane line and the drivable area on the front view acquired image, so as to obtain the category, the coordinates, the key point information and the like of the obstacle, the lane line and the drivable area. The noted information is then input to a multi-tasking model network for training, which can be divided into an Encoder and 3 decoders.

The Encoder includes backbones and neg, backbones, using a spatial pyramid (SPP) module and a Feature Pyramid Network (FPN) module (i.e., a park target detection module). The Decoders are three task heads, the target detection head uses a Path Aggregation Network (PAN) structure, the structure can fuse characteristic graphs of a plurality of scale characteristic graphs, the drivable region segmentation head and the lane line detection head belong to semantic segmentation tasks (namely parking target segmentation modules), and output characteristic graphs are restored to be (W, H, 2) through three upsampling steps, and then specific task processing is carried out. Wherein, a multitasking network model is adopted to share a backup, and the related information shares network parameters. Referring to fig. 3 and 4, fig. 3 is a schematic diagram of a multi-task network structure according to a first embodiment of a front view sensing determination method of a parking scene according to the present invention, and fig. 4 is a flowchart of a process according to a first embodiment of a front view sensing determination method of a parking scene according to the present invention.

Further, inputting the surrounding environment image into a multi-task network model (i.e. a trained multi-task network model), wherein the processing mode for obtaining the key point information, the lane line information and the drivable area information of the obstacle is that the surrounding environment image is subjected to image format conversion to obtain a preset format environment image; scaling and edge-repairing processing are carried out on the environment image in the preset format, and a front-view environment image is obtained; and inputting the forward-looking environment image into a multi-task network model to obtain obstacle key point information, lane line information and travelable area information.

In a specific implementation, a front-view camera acquires a 3840x2160 resolution size image (namely a previous frame surrounding image and a current surrounding image), performs YUV-to-RGB format conversion on the acquired image to obtain a preset format surrounding image, scales the preset format surrounding image resize to 640x640 network model size, and performs edge trimming operation on image deletion in the scaling process of the preset format surrounding image to obtain the front-view surrounding image.

Further, inputting the forward-looking environment image into a multi-task network model, and obtaining obstacle key point information, lane line information and drivable area information in a processing mode that inputting the forward-looking environment image into a multi-task network model, a parking target detection module obtains a plurality of obstacle feature diagrams, wherein the parking target detection module comprises a space pyramid module and a feature pyramid network module; fusing the plurality of obstacle feature maps to obtain obstacle key point information, wherein the obstacle key point information comprises 2.5D obstacle key points, obstacle categories, obstacle coordinates and obstacle orientation information; and inputting the forward-looking environment image into a parking target segmentation module in the multi-task network model to obtain lane line information and drivable area information, wherein the lane line information comprises a lane line mask map and a lane line type, and the drivable area information comprises a drivable area mask map and a drivable area type.

It should be further noted that, the network model (i.e., the multi-task network model) predicts the vehicle key points 2.5D (i.e., other vehicles except the host vehicle), the pedestrian key points, the cone key points, and the like, and the corresponding obstacle types, coordinates, and orientation information, the lane line is divided into the mask map and the category information, and the area that can be driven is divided into the mask map and the category information.

It should be understood that the obstacle target detection output by the network model includes outputting obstacle information such as 2.5D vehicle key point information and pedestrian detection frames, performing non-maximum suppression on the output detection frames, and outputting related results (coordinates, category and confidence). The target segmentation information output by the network model comprises lane line segmentation information and drivable region segmentation information, binarizes the image of the output segmentation information, and outputs original image mask and pixel type information. The vehicle key point network detection is adopted, 2.5D vehicle information is output, the direction information of the vehicle can be well known, and more accurate vehicle position and world coordinate information are obtained; the lane line segmentation information is output through the network model, so that the vehicle track positioning and parking control can be better; the drivable region segmentation information is output through the network model, so that the drivable region of the vehicle can be better known.

Step S30: and respectively determining a lane line boundary box and a drivable area boundary box according to the lane line information and the drivable area information.

In a specific implementation, a target detector is used to detect the position and bounding box of a target object in each frame, wherein the target object is a vehicle, a lane line, and a travelable region.

Step S40: and establishing a target association relation between the previous frame surrounding image and the current surrounding image based on the lane line boundary box and the drivable area boundary box.

Further, the processing mode of establishing the target association relationship between the surrounding image of the previous frame and the current surrounding image based on the lane line boundary frame and the drivable area boundary frame is to respectively determine lane line position information and drivable area position information corresponding to the lane line boundary frame and the drivable area boundary frame; respectively carrying out filtering treatment on the lane line boundary frame and the drivable region boundary frame through a Kalman filter to obtain a filtered lane line boundary frame and a filtered drivable region boundary frame; and establishing a target association relation between the surrounding image of the previous frame and the current surrounding image through a tracking algorithm according to the lane line position information, the filtered lane line boundary box, the drivable area position information and the filtered drivable area boundary box.

It should also be appreciated that, since the target association relationship reflects the position correspondence relationship between the obstacle keypoint information, the lane line information, and the drivable region information between the previous frame surrounding image and the current surrounding image, the obstacle keypoint information, the lane line bounding box, and the drivable region bounding box can be more accurately adjusted according to the target association relationship.

Step S50: and carrying out coordinate conversion processing on the obstacle key point information, the lane line boundary box and the drivable area boundary box according to the target association relation to obtain forward-looking perception information of the parking scene.

Further, coordinate conversion processing is carried out on the obstacle key point information, the lane line boundary frame and the drivable area boundary frame according to the target association relation, and the processing mode of obtaining the forward-looking perception information of the parking scene is that scene fusion is carried out on the obstacle key point information, the lane line boundary frame and the drivable area boundary frame according to the target association relation, so that scene fusion information is obtained; establishing a coordinate mapping relation between pixel points in the current surrounding environment image and 3D space ground points according to camera calibration external parameters; and carrying out coordinate conversion processing on the scene fusion information based on the coordinate mapping relation to obtain forward-looking perception information of the parking scene, wherein the forward-looking perception information comprises barrier information, lane line information, drivable area information and the like after coordinate conversion.

In a specific implementation, a camera is calibrated to obtain R, T external parameters, mapping of pixel points in an image and ground points in a 3D space is established, 2.5D key point information, lane line boundary frames and drivable area boundary frames output by a vehicle are combined, the key points, the lane line boundary frames and the drivable area boundary frames of the vehicle in the 3D space are obtained through fusion, the key point information, the lane line boundary frames and the drivable area boundary frames of the vehicle are corrected, a 2.5D detection frame, a lane line detection frame and a drivable area detection frame of the vehicle in the space are obtained, and then coordinate information related to the 2.5D vehicle is obtained through projection into an image coordinate system, so that forward-looking perception information of a parking scene is obtained. Referring to fig. 5, fig. 5 is a front-rear effect comparison chart of a first embodiment of a front-view sensing determination method of a parking scene according to the present invention.

In this embodiment, a front-frame surrounding image and a current surrounding image are acquired through a front-view camera, then the surrounding image is input into a multi-task network model to obtain obstacle key point information, lane line information and drivable area information, a lane line boundary frame and a drivable area boundary frame are respectively determined according to the lane line information and the drivable area information, then a target association relationship between the front-frame surrounding image and the current surrounding image is established based on the lane line boundary frame and the drivable area boundary frame, and finally front-view sensing information of a parking scene is obtained through coordinate conversion processing according to the obstacle key point information, the lane line boundary frame and the drivable area boundary frame based on the target association relationship. According to the method, the obstacle key point information, the lane line information and the drivable area information are obtained through the multi-task network model, then the obstacle key point information, the lane line boundary frame and the drivable area boundary frame are accurately corrected according to the target association relation, and then the corrected obstacle key point information, the corrected lane line boundary frame and the corrected drivable area boundary frame are subjected to coordinate conversion processing, so that the forward-looking perception information of a long-distance parking scene is rapidly and accurately obtained.

Referring to fig. 6, fig. 6 is a block diagram illustrating a first embodiment of a front view sensing determination system of a parking scene according to the present invention.

As shown in fig. 6, a front view sensing determining system for a parking scene according to an embodiment of the present invention includes:

an acquisition module 6001 for acquiring a previous frame surrounding image and a current surrounding image by a front view camera.

The output module 6002 is configured to input the surrounding image into a multi-task network model, and obtain obstacle keypoint information, lane line information, and travelable region information.

A determining module 6003 configured to determine a lane line bounding box and a drivable region bounding box according to the lane line information and the drivable region information, respectively.

A building module 6004, configured to build a target association relationship between the previous frame surrounding image and the current surrounding image based on the lane line bounding box and the drivable region bounding box.

The conversion module 6005 is configured to perform coordinate conversion processing on the obstacle key point information, the lane line bounding box, and the drivable area bounding box according to the target association relationship, so as to obtain forward-looking perception information of a parking scene.

Further, coordinate conversion processing is carried out on the obstacle key point information, the lane line boundary frame and the drivable area boundary frame according to the target association relation, and the processing mode of obtaining the forward-looking perception information of the parking scene is that scene fusion is carried out on the obstacle key point information, the lane line boundary frame and the drivable area boundary frame according to the target association relation, so that scene fusion information is obtained; establishing a coordinate mapping relation between pixel points in the current surrounding environment image and 3D space ground points according to camera calibration external parameters; and carrying out coordinate conversion processing on the scene fusion information based on the coordinate mapping relation to obtain the forward-looking perception information of the parking scene.

Other embodiments or specific implementations of the forward-looking perception determination system for a parking scene according to the present invention may refer to the above method embodiments, and will not be described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for determining a forward-looking perception of a parking scene, the method comprising the steps of:

2. The method of claim 1, wherein the step of inputting the surrounding image into a multitasking network model to obtain obstacle keypoint information, lane line information, and travelable region information comprises:

3. The method of claim 2, wherein the step of inputting the forward looking environment image into a multitasking network model to obtain obstacle keypoint information, lane line information, and travelable region information comprises:

4. A method according to any one of claims 1-3, wherein the step of establishing a target association between the previous frame surrounding image and the current surrounding image based on the lane-line bounding box and the travelable region bounding box comprises:

5. The method of claim 4, wherein the step of performing coordinate conversion processing on the obstacle keypoint information, the lane line bounding box, and the drivable region bounding box according to the target association relationship to obtain forward-looking perception information of a parking scene comprises:

6. A forward-looking-sensation determination system for a parking scene, the forward-looking-sensation determination system for a parking scene comprising:

7. A forward-looking perception determination apparatus of a parking scene, the apparatus comprising: a memory, a processor and a forward-looking-sense determining program of a parking scene stored on the memory and executable on the processor, the forward-looking-sense determining program of a parking scene being configured to implement the steps of the forward-looking-sense determining method of a parking scene as claimed in any one of claims 1 to 5.

8. A storage medium having stored thereon a forward-looking-sense determining program of a parking scene, which when executed by a processor, implements the steps of the forward-looking-sense determining method of a parking scene as claimed in any one of claims 1 to 5.