CN117636307A

CN117636307A - Object detection method and device based on semantic information and automatic driving vehicle

Info

Publication number: CN117636307A
Application number: CN202311754638.XA
Authority: CN
Inventors: 丁光耀; 董嘉蓉; 乔延琦
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-03-01

Abstract

The disclosure provides a semantic information-based target detection method and device and an automatic driving vehicle, relates to the technical field of artificial intelligence, and particularly relates to the fields of automatic driving, autonomous parking, internet of things, intelligent transportation, deep learning and the like. The specific implementation scheme is as follows: performing target detection on the special-shaped object to obtain an original detection frame; according to semantic information of the special-shaped object, expanding the original detection frame to obtain an expanded region, wherein the special-shaped object comprises a main body part positioned in the original detection frame and other parts positioned outside the original detection frame, and the semantic information characterizes the relative position relation between the other parts and the main body part; determining a concave detection frame of other parts according to the position information of the point cloud area represented by the point cloud information in the expansion area; and fusing the original detection frame and the concave detection frame to obtain a target detection frame of the special-shaped object. The method is beneficial to realizing target detection of the special-shaped object, optimizing boundary detection aiming at the special-shaped object and improving the accuracy of target detection.

Description

Object detection method and device based on semantic information and automatic driving vehicle

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the fields of automatic driving, autonomous parking, internet of things, intelligent transportation, deep learning and the like, and specifically relates to a target detection method and device based on semantic information and an automatic driving vehicle.

Background

An automotive vehicle, also known as a robotic vehicle, an autopilot vehicle, or an unmanned vehicle, is a vehicle that is capable of sensing its environment and traveling with little or no human input. Autopilot vehicles incorporate a variety of sensors to sense the surrounding environment, such as radar, lidar, sonar, global positioning system, odometer and inertial measurement units. Advanced control systems interpret the sensory information to identify appropriate navigation paths, obstructions, and associated markers.

The object detection, also called object extraction, is an image segmentation based on the geometric and statistical characteristics of the object, which combines the segmentation and recognition of the object into one, and the accuracy and the real-time performance are an important capability of the whole system.

Disclosure of Invention

The disclosure provides a semantic information-based target detection method and device and an automatic driving vehicle.

According to an aspect of the present disclosure, there is provided a target detection method including: performing target detection on the special-shaped object to obtain an original detection frame; according to semantic information of the special-shaped object, expanding the original detection frame to obtain an expanded region, wherein the special-shaped object comprises a main body part positioned in the original detection frame and other parts positioned outside the original detection frame, and the semantic information characterizes the relative position relation between the other parts and the main body part; determining concave detection frames of other parts according to the position information of the point cloud area represented by the point cloud information in the expansion area, wherein the point cloud information represents the point cloud information of the other parts; and fusing the original detection frame and the concave detection frame to obtain a target detection frame of the special-shaped object.

According to another aspect of the present disclosure, there is provided an object detection apparatus including: the target detection module is used for carrying out target detection on the special-shaped object to obtain an original detection frame; the expansion module is used for expanding the original detection frame according to semantic information of the special-shaped object to obtain an expansion area, the special-shaped object comprises a main body part positioned in the original detection frame and other parts positioned outside the original detection frame, and the semantic information characterizes the relative position relation between the other parts and the main body part; the concave packet determining module is used for determining concave packet detection frames of other parts according to the position information of the point cloud area represented by the point cloud information in the expansion area, and the point cloud information represents the point cloud information of the other parts; and the fusion module is used for fusing the original detection frame and the concave detection frame to obtain a target detection frame of the special-shaped object.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the object detection methods of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the object detection method of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, the computer program, when executed by a processor, implementing the object detection method of the present disclosure.

According to another aspect of the present disclosure, an autonomous vehicle is provided that includes the electronic device of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which target detection methods and apparatus may be applied, according to embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a target detection method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a schematic diagram of determining an expansion area according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of an expanded effect of an original detection frame for a left side door opening vehicle in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an object detection apparatus according to an embodiment of the disclosure; and

FIG. 6 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

ORU (other road Users) boundary optimization is the optimization of pointers to special obstacle boundaries such as an automatic driving scene, including door opening vehicles, tail ramp vehicles, etc. When an automatic driving data set is constructed, most scenes only comprise ORUs with smooth edges, such as ORUs without doors, slopes and the like, so that the trained models are easy to miss convex parts of the ORUs, such as doors, slopes and the like, and potential safety hazards exist.

The automatic driving detection scheme is generally based on detection of a detection frame, and the model outputs a rotation frame to represent information such as the position, the size and the like of the obstacle for different obstacles in a scene.

The inventor finds that in the process of realizing the conception of the disclosure, under the condition that the ORU has a convex boundary, a detection frame obtained based on the method is easy to shake, the detection quality is poor, and risks such as automatic driving emergency brake and collision are easy to cause.

Fig. 1 schematically illustrates an exemplary system architecture to which target detection methods and apparatus may be applied, according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the target detection method and apparatus may be applied may include a terminal device, but the terminal device may implement the target detection method and apparatus provided by the embodiments of the present disclosure without interaction with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include an autonomous vehicle 101, a network 102, and a server 103. The network 102 serves as a medium for providing a communication link between the driving apparatus 101 and the server 103. Network 102 may include various connection types, such as wired and/or wireless communication links, and the like.

The autonomous vehicle 101 may interact with a server 103 through a network 102 to receive or transmit data or the like.

The autopilot vehicle 101 may be provided with a display screen for implementing a human-machine interface, and may be further provided with various cameras, infrared scanning sensors, and/or information acquisition devices such as a laser radar, for acquiring information of surrounding environments.

The server 103 may be a server that provides various services, such as a background management server (for example only) that provides support for information collected by the autonomous vehicle 101. The background management server may perform processing such as analysis on the received data, and feed back the processing result (e.g., web pages, information, or data acquired or generated according to the request) to the automated guided vehicle 101. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, the object detection method provided in the embodiments of the present disclosure may be generally performed by the autonomous vehicle 101. Accordingly, the object detection device provided by the embodiment of the present disclosure may also be provided in the autonomous vehicle 101.

Alternatively, the object detection method provided by the embodiments of the present disclosure may be generally performed by the server 103. Accordingly, the object detection apparatus provided by the embodiments of the present disclosure may be generally provided in the server 103. The object detection method provided by the embodiments of the present disclosure may also be performed by a server or cluster of servers other than the server 103 and capable of communicating with the autonomous vehicle 101 and/or the server 103. Accordingly, the object detection apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 103 and is capable of communicating with the autonomous vehicle 101 and/or the server 103.

For example, when performing object detection, the autopilot vehicle 101 may acquire special-shaped object information, then send the acquired special-shaped object information to the server 103, perform object detection on the special-shaped object by the server 103 to obtain an original detection frame, expand the original detection frame according to semantic information of the special-shaped object to obtain an expanded region, the special-shaped object includes a main body part located in the original detection frame and other parts located outside the original detection frame, the semantic information characterizes a relative position relationship between the other parts and the main body part, the concave packet detection frame of the other parts is determined according to position information of the point cloud region characterized by the point cloud information in the expanded region, the point cloud information characterizes the point cloud information of the other parts, and the original detection frame and the concave packet detection frame are fused to obtain the object detection frame of the special-shaped object. Or the server cluster capable of communicating with the automatic driving vehicle 101 and/or the server 103 analyzes the special-shaped object information and realizes a target detection frame of the special-shaped object.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a flow chart of a target detection method according to an embodiment of the disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, target detection is performed on the irregular object, and an original detection frame is obtained.

In operation S220, the original detection frame is expanded according to the semantic information of the special-shaped object, so as to obtain an expansion area, the special-shaped object includes a main body portion located in the original detection frame and other portions located outside the original detection frame, and the semantic information characterizes the relative positional relationship between the other portions and the main body portion.

In operation S230, the concave detection frame of the other portion is determined according to the point cloud area location information represented by the point cloud information in the extension area, and the point cloud information represents the point cloud information of the other portion.

In operation S240, the original detection frame and the concave detection frame are fused to obtain a target detection frame of the special-shaped object.

According to embodiments of the present disclosure, the shaped object may represent a three-dimensional object having an irregular shape at any at least one viewing angle, may also include a two-dimensional object having an irregular shape acquired by an image, and may not be limited thereto. When the target detection is carried out, the target detection can be realized based on various target detection algorithms. The target detection algorithm may include at least one of: fasterR-CNN (Region with CNN Feature, deep learning based target detection), SSD (Single Shot MultiBox Detector, a target detection algorithm), YOLO (You Only Look Once, real-time rapid target detection), etc., and may not be limited thereto.

According to the embodiment of the disclosure, in the data labeling stage, only a main part of the special-shaped object can be labeled, or a more complete part of the special-shaped object can be selected to be labeled regularly, and other parts of the special-shaped object are not labeled, so that a training data set is obtained. For example, only a vehicle body portion may be marked with respect to an open vehicle image acquired in an overhead view, and an open vehicle door portion may not be marked. The target detection model obtained based on the training data set can identify a more complete and main part of the special-shaped object, and can determine a more regular original detection frame.

According to the embodiment of the disclosure, based on the principle of data annotation as above, the part based on the original detection frame column can be determined as the main body part of the special-shaped object, and the part based on the frame column can be determined as other parts needing detection by expanding the original detection frame.

According to embodiments of the present disclosure, the semantic information may characterize a semantic description describing at least one of an appearance, shape, feature, structure, orientation, etc. of the shaped object. The semantic model may assign a semantic tag to the object. By inputting the three-dimensional information or the two-dimensional information of the special-shaped object into the semantic model, the semantic information can be obtained. The semantic information may include, for example: the positional information of each part, whether the door is opened in the front, rear, left and right, whether the slope is included, and the like, and may not be limited thereto.

Fig. 3 schematically illustrates a schematic diagram of determining an extension area according to an embodiment of the present disclosure.

As shown in fig. 3, the detection of the shaped object may be performed based on the object detection model 310 and the semantic model 320, resulting in an extended region.

For example, a predicted obstacle of one detection model may be given as the profiled object 300. By inputting the related information 301 of the special-shaped object 300 into the target detection model 310, an original detection box 311 can be obtained. By inputting the shaped object 300 or its related information 301 into the semantic model 320, semantic information 321 can be obtained. After the semantic information 321 is acquired, the original detection frame 311 may be expanded according to the semantic information 321 to obtain an expanded region 312. The related information 301 may be image information, point cloud information, or the like of the special-shaped object 300, and may not be limited thereto.

According to the embodiment of the disclosure, by analyzing and processing the semantic information, the relative position information of the main body part and other parts divided by the original detection frame can be obtained, and how to expand the original detection frame can be determined. For example, where the semantic information characterizes that each location relative to the body portion has other portions that are not framed by the original detection frame, the original detection frame itself may be expanded. The distance of expansion may be unlimited, may be infinite, may be a default distance, may be determined according to the size of the shaped object, and is not limited herein. The extended region may characterize other regions of the extended region than the original region characterized by the original detection box.

According to embodiments of the present disclosure, a notch is a pattern that has no unique values and is well-defined, and may represent an area occupied by a set of points that are as realistic as possible, and a notch detection box may represent a detection box that is defined corresponding to the area. The method includes the steps that firstly, a laser radar scanning method and the like are adopted to scan an expansion area, so that point cloud information in the expansion area can be obtained. Then, by combining the concave-bag algorithm, the concave-bag detection frame of other parts can be obtained by taking the point cloud information as input.

It should be noted that, because the extended area may be infinitely large, the point cloud information in the extended area may include point cloud information representing other portions of the shaped object, and may also include other point cloud information representing other objects. In this case, the point cloud information representing the other parts can be obtained by calculating the distance between the point cloud information and the original detection frame or whether the point cloud information has a connection relation with the original detection frame. For example, in a case where a certain point cloud information area is connected to the original detection frame or a distance from the original detection frame is smaller than a distance threshold value, the point cloud information area may be determined as point cloud information characterizing other portions. Otherwise, the point cloud information area may be filtered out.

According to the embodiment of the disclosure, after the original detection frame and the concave packet detection frame are determined, under the condition that the original detection frame and the concave packet detection frame are connected, the original detection frame and the concave packet detection frame can be simply spliced and fused to obtain the target detection frame. Under the condition that the original detection frame and the concave detection frame are not connected, a large detection frame comprising the original detection frame and the concave detection frame can be divided on the basis of fixing the positions of the original detection frame and the concave detection frame and used as a target detection frame. The determination method of the target detection frame is not limited thereto.

Through the embodiment of the disclosure, a boundary optimization method can be realized, and the mode of determining the concave detection frame is combined, so that the target detection of the special-shaped object is facilitated, the boundary detection of the special-shaped object is optimized, the method is applicable to more target detection scenes, the wide applicability of the target detection method is improved, and the accuracy of target detection can be effectively improved.

The method shown in fig. 2 is further described below in connection with the specific examples.

According to an embodiment of the present disclosure, the above operation S210 may include: and carrying out target detection on the special-shaped object to obtain the three-dimensional detection frame. And extracting two-dimensional information from the target visual angle aiming at the three-dimensional detection frame to obtain the two-dimensional detection frame. The two-dimensional detection frame is determined as the original detection frame.

According to the embodiment of the disclosure, the object detection can be performed on the body of the special-shaped object, for example, a three-dimensional detection frame corresponding to the main body part of the special-shaped object can be obtained. The target viewing angle may characterize an arbitrary viewing angle, and may include, for example, a vertical (i.e., overhead) viewing angle, a horizontal viewing angle, and the like, and may not be limited thereto. By extracting two-dimensional information for a three-dimensional detection frame at a certain fixed viewing angle, a two-dimensional original detection frame can be obtained.

Before the above process is executed, the main body portion of the special-shaped object may be labeled, similar to the foregoing data labeling process, to obtain a training data set, and train the target detection model, so that the determined three-dimensional detection frame is a detection frame determined for the main body portion of the special-shaped object.

According to embodiments of the present disclosure, directly expanding the original frame may result in inaccurate detection frame sizes after expanding. In view of this, the above operation S220 may include: and determining the expansion direction according to the relative position relation and the orientation information of the special-shaped object represented by the original detection frame. And expanding the original detection frame along the expansion direction to obtain an expansion area.

According to an embodiment of the present disclosure, the original detection frame may have attribute information related to the shaped object, for example, in a case where the shaped object includes a plurality of body parts, the original detection frame may have attribute information characterizing a certain body part of the shaped object, attribute information characterizing an orientation of the shaped object, or the like, and may not be limited thereto.

For example, the profiled object may be a door opening vehicle and the semantic information may be left side door opening.

Fig. 4 schematically illustrates a schematic diagram of an expanded effect of an original detection frame for a left-side door-opening vehicle according to an embodiment of the present disclosure.

As shown in fig. 4, a door opening vehicle 400 has an original detection frame 410. The original detection frame 410 has attribute information indicating that the head of the door-opening vehicle 400 is oriented horizontally to the right in the page direction. In the case of obtaining semantic information representing "left door open", it may be determined that the upper frame line 411 of the original detection frame 410 is the left side of the door open vehicle 400, and expansion may be performed upward along the page direction based on the upper frame line 411, so that expansion is performed to the left side of the door open vehicle 400 based on the original detection frame 410, and an expansion area 420 for indicating a door portion may be obtained.

Through the embodiment of the disclosure, by implementing directional expansion, redundant and useless region expansion can be reduced, the calculation amount for the expansion region can be reduced, and the post-processing efficiency can be effectively improved.

According to the embodiment of the present disclosure, in the flaring region, there is a large blank region in addition to the door region, so that the drivable area for driving is reduced. In view of this, the concave pocket may be used to represent the protruding portion of the shaped object, i.e. the other portion described above. The process may include, for example: the point cloud in the expansion area is calculated first, and the concave packet detection frame is calculated under a proper view angle.

In practice, the inventors found that: only a single frame point cloud is used for aggregating concave packets, and concave packets among different frames are easy to shake. In view of this, the above operation S230 may be optimized based on the timing smoothing strategy.

According to an embodiment of the present disclosure, the above operation S230 may include: and in response to determining that the absolute rate of the special-shaped object is smaller than a first threshold, acquiring first current point cloud region position information related to other parts according to first current frame information acquired for the special-shaped object. And acquiring first historical point cloud region position information related to other parts according to first historical frame information acquired for the special-shaped object, wherein the first historical frame represents any frame positioned before the first current frame. And fusing the first current point cloud region position information and the first historical point cloud region position information to obtain first fused region position information. And determining a concave packet detection frame according to the position information of the first fusion area.

According to embodiments of the present disclosure, the absolute velocity may characterize the rate of movement of the shaped object in the world coordinate system. The first threshold may characterize a low speed threshold for distinguishing between stationary and low speed motion state settings, which may be denoted for example by α. The first current frame information may characterize information of a profiled object acquired at the current moment. The first historical frame information may characterize information of a profiled object acquired at any historical moment prior to the current moment.

According to embodiments of the present disclosure, considering that most open-door vehicles and vehicles carrying a tail ramp are stationary, the concave pockets may be smoothed by collecting multi-frame information, using a historical frame point cloud in combination.

For example, the absolute velocity of the shaped object is v. Under the condition that v is less than or equal to alpha, the special-shaped object can be considered to be in a static state, and the concave packet of the current frame can be calculated by fusing the position information of the point cloud area in the extended area frame in all the historical frames, so that the concave packet detection frame of the current frame is obtained.

According to an embodiment of the present disclosure, the above operation S230 may include: and in response to determining that the absolute rate of the special-shaped object is greater than a second threshold and less than a third threshold, acquiring second current point cloud region position information related to other parts according to second current frame information acquired for the special-shaped object. And acquiring second historical point cloud region position information related to other parts according to second historical frame information acquired for the special-shaped object, wherein the second historical frame represents any frame in a preset number of frames which are positioned before and adjacent to the second current frame. And fusing the second current point cloud region position information and the second historical point cloud region position information to obtain second fused region position information. And determining a concave detection frame according to the position information of the second fusion area.

According to embodiments of the present disclosure, the second threshold may represent a low-speed threshold for distinguishing between a stationary state and a low-speed motion state, which may be the same as or different from the first threshold, and may not be limited herein. The second threshold value may also be denoted by a, for example. The third threshold may characterize a high-speed threshold for distinguishing between a low-speed motion state and a high-speed motion state setting, which may be represented by β, for example. The second current frame information may characterize information of the profiled object acquired at the current moment. The second historical frame information may characterize information of the profiled object acquired in a preset time period before the current time.

For example, the absolute velocity of the shaped object is v. Under the condition that alpha is smaller than v and smaller than or equal to beta, the special-shaped object can be considered to be in a low-speed running state, and the concave packet of the current frame can be calculated by fusing the position information of the point cloud area in the expansion area in the previous k frame history frames before the current frame, so that the concave packet detection frame of the current frame is obtained.

According to an embodiment of the present disclosure, the above operation S230 may include: and in response to determining that the absolute rate of the special-shaped object is greater than a fourth threshold, acquiring third current point cloud region position information related to other parts according to third current frame information acquired for the special-shaped object. And determining a concave detection frame of the special-shaped object at the moment corresponding to the third current frame according to the position information of the third current point cloud area.

According to the above embodiment of the present disclosure, the fourth threshold may represent a high-speed threshold for distinguishing the low-speed motion state from the high-speed motion state, and may be the same as or different from the third threshold, which may not be limited herein. The fourth threshold value may also be denoted by β, for example. The third current frame information may characterize information of the profiled object acquired at the current moment.

For example, the absolute velocity of the shaped object is v. In the case of β < v, the profiled object may be considered to be in a high-speed running state. For each moment, only the point cloud area position information in the expansion area in the current frame acquired at the corresponding moment can be used for calculating the concave packet of the current frame to obtain the concave packet detection frame of the current frame.

It should be noted that the above manner of dividing the velocity v of the shaped object into 3 segments using the low velocity threshold α and the high velocity threshold β and smoothing the concave packet in combination with different policies for each segment is only an exemplary embodiment. In practical application, the classification of more categories and the use of more strategies can also be performed according to the actual service requirements, which is not limited herein.

Through the embodiment of the disclosure, the problem of concave packet jitter between multi-frame special-shaped objects can be effectively relieved, and the practical value of the concave packet detection frame is maximized.

In practice, the inventors have also found that: for high density lidar sensors, the number of point clouds of a single obstacle can be as many as tens of thousands, and directly computing the notch can introduce excessive time delay. In view of this, the above-described operation S230 may be further optimized based on the raster down-sampling strategy.

According to an embodiment of the present disclosure, before performing the above operation S230, the point cloud may also be first subjected to a thinning process, which may include: and carrying out sparsification processing on the point cloud information to obtain sparse point cloud information. And storing the sparse point cloud information.

According to embodiments of the present disclosure, only a small portion of the edge point cloud acts to support the pocket during the computation of the pocket. The point cloud density may be reduced in combination with the thinning method, and may not be limited thereto.

According to an embodiment of the present disclosure, the above operation S230 may include: the extended area is divided into a plurality of grids of a preset size. In response to determining that the number of points characterized by the point cloud information within the grid is greater than a preset threshold, the grid is determined to be a target grid. And determining a concave detection frame according to the target grid.

According to embodiments of the present disclosure, a grid method may also be proposed to reduce the point cloud density.

For example, a given grid size h×w may be given, h may be the height of the grid, and w may be the width of the grid. The extension area can be divided into oneH may be the height of the extended area and W may be the width of the extended area. In each grid, we only reserve n points, n being a positive integer, which can characterize a preset threshold. Considering the characteristics of lidar scanning point clouds, the point clouds tend to be highly concentrated on one or both visible faces of the obstacle. Therefore, the notch detection frame can be determined by combining the target grids, and the error of the obtained notch detection frame is less than or equal to max (h, w).

Through the embodiment of the disclosure, the grid downsampling method can effectively visualize the surface point cloud density, and improves the computing efficiency.

Based on the method, an ORU boundary optimization algorithm can be constructed, scenes such as a vehicle door opening scene, a tail slope scene and the like can be effectively detected, and risks such as related collision, sudden braking and the like on a road are reduced.

Fig. 5 schematically illustrates a block diagram of an object detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the object detection apparatus 500 may include an object detection module 510, an extension module 520, a recess determination module 530, and a fusion module 540.

And the target detection module 510 is configured to perform target detection on the special-shaped object to obtain an original detection frame.

The expansion module 520 is configured to expand the original detection frame according to semantic information of the special-shaped object, so as to obtain an expansion area, where the special-shaped object includes a main body portion located in the original detection frame and other portions located outside the original detection frame, and the semantic information characterizes a relative positional relationship between the other portions and the main body portion.

The concave determining module 530 is configured to determine concave detection frames of other portions according to the point cloud area location information represented by the point cloud information in the extension area, where the point cloud information represents the point cloud information of the other portions.

And the fusion module 540 is used for fusing the original detection frame and the concave detection frame to obtain a target detection frame of the special-shaped object.

According to an embodiment of the present disclosure, an expansion module includes an expansion direction determining unit and an expansion unit.

And the expansion direction determining unit is used for determining the expansion direction according to the relative position relation and the orientation information of the special-shaped object represented by the original detection frame.

And the expansion unit is used for expanding the original detection frame along the expansion direction to obtain an expansion area.

According to an embodiment of the disclosure, the concave packet determining module includes a first current point cloud region position information obtaining unit, a first history point cloud region position information obtaining unit, a first fusion unit, and a first concave packet determining unit.

And the first current point cloud region position information acquisition unit is used for acquiring first current point cloud region position information related to other parts according to the first current frame information acquired for the special-shaped object in response to determining that the absolute rate of the special-shaped object is smaller than a first threshold value.

The first historical point cloud region position information acquisition unit is used for acquiring first historical point cloud region position information related to other parts according to first historical frame information acquired for the special-shaped object, and the first historical frame represents any frame positioned before the first current frame.

The first fusion unit is used for fusing the first current point cloud region position information and the first historical point cloud region position information to obtain first fusion region position information.

And the first concave packet determining unit is used for determining a concave packet detection frame according to the first fusion area position information.

According to an embodiment of the disclosure, the concave packet determining module includes a second current point cloud region position information obtaining unit, a second history point cloud region position information obtaining unit, a second fusion unit, and a second concave packet determining unit.

And the second current point cloud region position information acquisition unit is used for acquiring second current point cloud region position information related to other parts according to second current frame information acquired for the special-shaped object in response to determining that the absolute rate of the special-shaped object is larger than a second threshold value and smaller than a third threshold value.

The second historical point cloud area position information acquisition unit is used for acquiring second historical point cloud area position information related to other parts according to second historical frame information acquired for the special-shaped object, and the second historical frame represents any frame in a preset number of frames which are positioned before and adjacent to the second current frame.

And the second fusion unit is used for fusing the second current point cloud region position information and the second historical point cloud region position information to obtain second fusion region position information.

And the second concave packet determining unit is used for determining a concave packet detection frame according to the second fusion area position information.

According to an embodiment of the disclosure, the concave packet determining module includes a third current point cloud region position information obtaining unit and a third concave packet determining unit.

And the third current point cloud region position information acquisition unit is used for acquiring third current point cloud region position information related to other parts according to third current frame information acquired for the special-shaped object in response to determining that the absolute rate of the special-shaped object is greater than a fourth threshold value.

And the third concave packet determining unit is used for determining a concave packet detection frame of the special-shaped object at the moment corresponding to the third current frame according to the position information of the third current point cloud area.

According to an embodiment of the disclosure, the object detection device further includes a thinning module and a storage module.

And the sparsification module is used for performing sparsification processing on the point cloud information to obtain sparse point cloud information.

And the storage module is used for storing the sparse point cloud information.

According to an embodiment of the present disclosure, the pocket determination module includes a dividing unit, a target grid determination unit, and a fourth pocket determination unit.

And the dividing unit is used for dividing the extension area into a plurality of grids with preset sizes.

And the target grid determining unit is used for determining the grid as the target grid in response to determining that the number of points characterized by the point cloud information in the grid is larger than a preset threshold.

And the fourth concave packet determining unit is used for determining a concave packet detection frame according to the target grid.

According to an embodiment of the present disclosure, a target detection module includes a target detection unit, an extraction unit, and an original detection frame determination unit.

And the target detection unit is used for carrying out target detection on the special-shaped object to obtain a three-dimensional detection frame.

The extraction unit is used for extracting two-dimensional information at the target visual angle aiming at the three-dimensional detection frame to obtain the two-dimensional detection frame.

And the original detection frame determining unit is used for determining the two-dimensional detection frame as the original detection frame.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the object detection methods of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the object detection method of the present disclosure.

According to an embodiment of the present disclosure, a computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, the computer program, when executed by a processor, implements the object detection method of the present disclosure.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The various components in device 600 are connected to an input/output (I/O) interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as the target detection method. For example, in some embodiments, the object detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the object detection method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the target detection method by any other suitable means (e.g. by means of firmware).

Based on the foregoing electronic device, the disclosure further provides an automatic driving vehicle, which may include the electronic device, and may further include a communication component, a display screen for implementing a human-machine interface, an information acquisition device for acquiring surrounding environment information, and the like. The communication part, the display screen, the information acquisition equipment and the electronic equipment can be in communication connection.

The electronic equipment can be integrated with the communication part, the display screen and the information acquisition equipment, and can be arranged separately from the communication part, the display screen and the information acquisition equipment.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A target detection method comprising:

performing target detection on the special-shaped object to obtain an original detection frame;

expanding the original detection frame according to semantic information of the special-shaped object to obtain an expanded region, wherein the special-shaped object comprises a main body part positioned in the original detection frame and other parts positioned outside the original detection frame, and the semantic information characterizes the relative position relationship between the other parts and the main body part;

determining a concave detection frame of the other part according to the position information of the point cloud area represented by the point cloud information in the expansion area, wherein the point cloud information represents the point cloud information of the other part; and

and fusing the original detection frame and the concave detection frame to obtain the target detection frame of the special-shaped object.

2. The method of claim 1, wherein expanding the original detection frame according to the semantic information of the special-shaped object to obtain an expanded region comprises:

determining an expansion direction according to the relative position relation and the orientation information of the special-shaped object represented by the original detection frame; and

and expanding the original detection frame along the expansion direction to obtain the expansion area.

3. The method of claim 1 or 2, wherein the determining the other portion of the concave detection box from the point cloud region location information characterized by the point cloud information within the extension region comprises:

in response to determining that the absolute rate of the special-shaped object is smaller than a first threshold, acquiring first current point cloud region position information related to the other part according to first current frame information acquired for the special-shaped object;

acquiring first historical point cloud region position information related to the other parts according to first historical frame information acquired for the special-shaped object, wherein the first historical frame represents any frame positioned before a first current frame;

fusing the first current point cloud region position information and the first historical point cloud region position information to obtain first fused region position information; and

and determining the concave detection frame according to the position information of the first fusion area.

4. The method of claim 1 or 2, wherein the determining the other portion of the concave detection box from the point cloud region location information characterized by the point cloud information within the extension region comprises:

in response to determining that the absolute rate of the special-shaped object is greater than a second threshold and less than a third threshold, acquiring second current point cloud region position information related to the other part according to second current frame information acquired for the special-shaped object;

Acquiring second historical point cloud region position information related to the other parts according to second historical frame information acquired for the special-shaped object, wherein the second historical frame represents any frame in a preset number of frames which are positioned before and adjacent to a second current frame;

fusing the second current point cloud region position information and the second historical point cloud region position information to obtain second fused region position information; and

and determining the concave detection frame according to the position information of the second fusion area.

5. The method of claim 1 or 2, wherein the determining the other portion of the concave detection box from the point cloud region location information characterized by the point cloud information within the extension region comprises:

in response to determining that the absolute rate of the special-shaped object is greater than a fourth threshold, acquiring third current point cloud region position information related to the other part according to third current frame information acquired for the special-shaped object; and

and determining a concave detection frame of the special-shaped object at the moment corresponding to the third current frame according to the position information of the third current point cloud area.

6. The method of any of claims 1-5, further comprising: before the position information of the point cloud area characterized by the point cloud information in the expansion area is determined, the concave detection frames of the other parts are determined,

Performing sparsification processing on the point cloud information to obtain sparse point cloud information; and

and storing the sparse point cloud information.

7. The method of any of claims 1-6, wherein the determining the other portion of the concave packet detection box from point cloud region location information characterized by point cloud information within the extended region comprises:

dividing the expansion area into a plurality of grids with preset sizes;

determining the grid as a target grid in response to determining that the number of points characterized by the point cloud information in the grid is greater than a preset threshold; and

and determining the concave packet detection frame according to the target grid.

8. The method according to any one of claims 1-7, wherein the performing target detection on the shaped object to obtain an original detection frame comprises:

performing target detection on the special-shaped object to obtain a three-dimensional detection frame;

extracting two-dimensional information from the target visual angle aiming at the three-dimensional detection frame to obtain a two-dimensional detection frame; and

and determining the two-dimensional detection frame as the original detection frame.

9. An object detection apparatus comprising:

the target detection module is used for carrying out target detection on the special-shaped object to obtain an original detection frame;

The expansion module is used for expanding the original detection frame according to semantic information of the special-shaped object to obtain an expansion area, the special-shaped object comprises a main body part positioned in the original detection frame and other parts positioned outside the original detection frame, and the semantic information characterizes the relative position relation between the other parts and the main body part;

the concave determining module is used for determining concave detection frames of the other parts according to the position information of the point cloud areas represented by the point cloud information in the expansion area, and the point cloud information represents the point cloud information of the other parts; and

and the fusion module is used for fusing the original detection frame and the concave detection frame to obtain the target detection frame of the special-shaped object.

10. The apparatus of claim 9, wherein the expansion module comprises:

the expansion direction determining unit is used for determining an expansion direction according to the relative position relation and the orientation information of the special-shaped object represented by the original detection frame; and

and the expansion unit is used for expanding the original detection frame along the expansion direction to obtain the expansion area.

11. The apparatus of claim 9 or 10, wherein the pocket determination module comprises:

a first current point cloud region position information obtaining unit, configured to obtain first current point cloud region position information related to the other portion according to first current frame information acquired for the special-shaped object in response to determining that an absolute rate of the special-shaped object is smaller than a first threshold;

the first historical point cloud area position information acquisition unit is used for acquiring first historical point cloud area position information related to the other parts according to first historical frame information acquired for the special-shaped object, and the first historical frame represents any frame positioned before a first current frame;

the first fusion unit is used for fusing the first current point cloud region position information and the first historical point cloud region position information to obtain first fusion region position information; and

and the first concave packet determining unit is used for determining the concave packet detection frame according to the first fusion area position information.

12. The apparatus of claim 9 or 10, wherein the pocket determination module comprises:

a second current point cloud region position information obtaining unit, configured to obtain second current point cloud region position information related to the other portion according to second current frame information acquired for the special-shaped object in response to determining that an absolute rate of the special-shaped object is greater than a second threshold and less than a third threshold;

The second historical point cloud area position information acquisition unit is used for acquiring second historical point cloud area position information related to the other parts according to second historical frame information acquired for the special-shaped object, wherein the second historical frame represents any frame in a preset number of frames which are positioned before and adjacent to a second current frame;

the second fusion unit is used for fusing the second current point cloud region position information and the second historical point cloud region position information to obtain second fusion region position information; and

and the second concave packet determining unit is used for determining the concave packet detection frame according to the second fusion area position information.

13. The apparatus of claim 9 or 10, wherein the pocket determination module comprises:

a third current point cloud region position information obtaining unit, configured to obtain third current point cloud region position information related to the other portion according to third current frame information acquired for the special-shaped object in response to determining that the absolute rate of the special-shaped object is greater than a fourth threshold; and

14. The apparatus of any of claims 9-13, further comprising:

the sparsification module is used for performing sparsification processing on the point cloud information to obtain sparse point cloud information; and

and the storage module is used for storing the sparse point cloud information.

15. The apparatus of any of claims 9-14, wherein the pocket determination module comprises:

a dividing unit for dividing the extension area into a plurality of grids with preset sizes;

a target grid determining unit, configured to determine the grid as a target grid in response to determining that the number of points represented by the point cloud information in the grid is greater than a preset threshold; and

and the fourth concave packet determining unit is used for determining the concave packet detection frame according to the target grid.

16. The apparatus of any of claims 9-15, wherein the object detection module comprises:

the target detection unit is used for carrying out target detection on the special-shaped object to obtain a three-dimensional detection frame;

the extraction unit is used for extracting two-dimensional information from the target visual angle aiming at the three-dimensional detection frame to obtain a two-dimensional detection frame; and

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program stored on at least one of a readable storage medium and an electronic device, which, when executed by a processor, implements the method according to any one of claims 1-8.

20. An autonomous vehicle comprising the electronic device of claim 17.