WO2023155580A1

WO2023155580A1 - Object recognition method and apparatus

Info

Publication number: WO2023155580A1
Application number: PCT/CN2022/139873
Authority: WO
Inventors: 张宝丰
Original assignee: 京东鲲鹏(江苏)科技有限公司
Priority date: 2022-02-17
Filing date: 2022-12-19
Publication date: 2023-08-24
Also published as: CN114550116A

Abstract

An object recognition method and apparatus, relating to the technical field of computers. A specific embodiment of the method comprises: collecting multiple point clouds and multiple images in a detection area; constructing mapping between the point clouds and image instances according to a preset conversion relation between a first coordinate system of the point clouds and a second coordinate system of the images; determining, according to the mapping result, a point cloud cluster corresponding to the image instances and category coordinates of the point cloud cluster, and determining center coordinates of the point cloud cluster; and inputting point cloud coordinates, category coordinates and center coordinates of the point clouds into a preset detection model to recognize target objects of the point clouds. According to the method, various simulated feature data can be used as the input of the detection model, and the feature data are complementary, so that the input of the model is more multi-dimensional, the accuracy of the recognition result output and determined by the model is greatly improved, and each target object in an autonomous driving scenario can be accurately recognized.

Description

A method and device for object recognition

Cross References to Related Applications

This application claims the priority of the Chinese patent application 202210147726.2 with the public name "A Method and Device for Object Recognition" filed on February 17, 2022, and the content disclosed in the above Chinese patent application is cited in its entirety as a part of this application or all.

technical field

The present disclosure relates to the technical field of automatic driving, and in particular to an object recognition method and device.

Background technique

Autonomous driving uses artificial intelligence, machine vision, radar, navigation and positioning, communication and other technologies to cooperate together, so that the vehicle can drive automatically and safely through the computer control system without any active human operation.

When performing target detection in existing autonomous driving, in order to identify targets more accurately, the learning results of multiple machine learning models are usually fused to determine the final target; or, multiple machine learning models are fused, according to the fusion model The output of determines the final goal.

Among the existing target recognition results, either the learning results of multiple models are fused, and the fusion logic is relatively rough, which makes the final target recognition results inaccurate; or the final target is determined according to the output of the fusion model, and multiple models are fused When the feature data cannot be fully aligned, it will also make the final target determination accuracy lower, even lower than the recognition result of a single model.

Contents of the invention

In view of this, the embodiments of the present disclosure provide an object recognition method and device, which can use various simulated feature data as the input of the detection model, and the feature data are complementary, so that the input of the model is more multi-dimensional, and the output of the model is The accuracy of the determined recognition results is greatly improved, and each target object in the automatic driving scene can be accurately identified, thereby providing a reference for the simulation of automatic driving.

In order to achieve the above purpose, according to an aspect of the embodiments of the present disclosure, an object recognition method is provided, including: collecting multiple frames of point clouds and multiple frames of images in the detection area; wherein, the point cloud includes multiple point cloud coordinates , each frame of the image corresponds to one or more image instances; according to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image, construct the point cloud and the The mapping of the image instance; according to the result of the mapping, determine the point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determine the center coordinates of the point cloud cluster; the point cloud of the point cloud Coordinates, category coordinates and center coordinates are input into a preset detection model to identify the target object of the point cloud.

According to one or more embodiments of the present disclosure, the determining the point cloud cluster corresponding to the image instance includes: obtaining the current frame point cloud and the previous frame image; in the current frame point cloud, the A plurality of point cloud coordinates to which any image instance included in the previous frame image is mapped constitutes the point cloud cluster.

According to one or more embodiments of the present disclosure, forming the point cloud cluster from the plurality of point cloud coordinates to which any image instance included in the previous frame image is mapped includes: The cloud is projected to the second coordinate system; the point cloud cluster is composed of projected point cloud coordinates corresponding to a plurality of pixel points of any image instance included in the previous frame image.

According to one or more embodiments of the present disclosure, the image instance indicates category information; the determining the category coordinates of the point cloud cluster includes: determining according to the category information of the image instance included in the previous frame image The class coordinates of the point cloud cluster corresponding to the image instance.

According to one or more embodiments of the present disclosure, it also includes: splicing the point cloud coordinates, category coordinates and center coordinates of the point cloud; the inputting the point cloud coordinates, category coordinates and center coordinates of the point cloud The preset detection model includes: inputting the spliced results into the detection model.

According to one or more embodiments of the present disclosure, the splicing the point cloud coordinates, category coordinates, and center coordinates of the point cloud includes: for each point cloud coordinate of the same point cloud cluster, execute:

Stitching the category coordinates and the center coordinates to the point cloud coordinates.

According to one or more embodiments of the present disclosure, it further includes: for the point cloud coordinates in the point cloud except for the point cloud cluster, determining that the category coordinates and center coordinates of the point cloud coordinates are the default category coordinates and Default center coordinates, splicing the default category coordinates and the default center coordinates to the point cloud coordinates.

According to one or more embodiments of the present disclosure, further comprising: performing noise reduction processing on the point cloud by clustering;

The step of determining the center coordinates of the point cloud clusters is performed based on the point cloud after the noise reduction processing.

According to one or more embodiments of the present disclosure, the point cloud and the image are respectively obtained by a radar sensor and a camera, wherein the radar sensor and the camera collect synchronously.

According to one or more embodiments of the present disclosure, the category information is mapped with corresponding binary coordinates; determining the category coordinates of the point cloud cluster corresponding to the image instance includes: determining the binary coordinates of the category information of the image instance is the class coordinate of the point cloud cluster.

According to still another aspect of an embodiment of the present disclosure, an object recognition device is provided, including: an acquisition module, configured to acquire multi-frame point clouds and multi-frame images in the detection area; wherein, the point cloud includes multiple point clouds Coordinates, each frame of the image corresponds to one or more image instances; the mapping module is used to construct the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image The mapping between the point cloud and the image instance; the data processing module is used to determine the point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster according to the mapping result, and determine the point cloud cluster The center coordinates of the point cloud; the identification module is used to input the point cloud coordinates, category coordinates and center coordinates of the point cloud into a preset detection model to identify the target object of the point cloud.

According to another aspect of the embodiments of the present disclosure, an object recognition electronic device is provided, including:

one or more processors;

storage means for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the object recognition method provided in the present disclosure.

According to still another aspect of the embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, and when the program is executed by a processor, the object recognition method provided in the present disclosure is implemented.

An embodiment in the above disclosure has the following advantages or beneficial effects: because the point cloud of the current frame is converted to the camera coordinate system by using the real-time performance of the point cloud of the current frame and the previous frame image, thereby obtaining the point cloud corresponding to the image instance The category coordinates of the clusters, and calculate the center coordinates of the point cloud clusters, splice the point cloud coordinates, category coordinates and center coordinates as the input of the detection model, and identify the target object according to the output of the detection model, so it overcomes the existing goals. The accuracy of the recognition results is low, and it is unable to provide a reference for the simulation of automatic driving. In this way, a variety of simulated characteristic data can be used as the input of the detection model, and the characteristic data are complementary, so that the input of the model is more accurate. Multi-dimensional, the accuracy of the recognition results determined by the model output is greatly improved, and it can accurately identify each target object in the automatic driving scene, thereby providing a reference technical effect for the simulation of automatic driving.

The further effects of the above-mentioned non-conventional alternatives will be described below in conjunction with specific embodiments.

Description of drawings

The accompanying drawings are for better understanding of the present disclosure, and do not constitute an improper limitation of the present disclosure. in:

FIG. 1 is a schematic diagram of the main flow of an object recognition method according to an embodiment of the present disclosure;

2 is a schematic diagram of the main flow of a method for determining a point cloud cluster according to an embodiment of the present disclosure;

3 is a schematic diagram of the main flow of a method for determining category coordinates of a point cloud according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of the main flow of a method for determining the center coordinates of a point cloud according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of the main flow of a point cloud coordinate splicing method according to an embodiment of the present disclosure;

6 is a schematic diagram of main modules of an object recognition device according to an embodiment of the disclosure;

FIG. 7 shows an exemplary system architecture diagram of an object recognition method or an object recognition device suitable for application in an embodiment of the present disclosure;

Fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present disclosure.

Detailed ways

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of the main flow of an object recognition method according to an embodiment of the present disclosure. As shown in Fig. 1, the object recognition method of the present disclosure includes the following steps:

Unmanned driving integrates sensors, computers, artificial intelligence, communications, navigation and positioning, pattern recognition, machine vision, intelligent control and other cutting-edge disciplines, and can achieve environmental perception, navigation and positioning, path planning, decision-making control and other goals.

Driverless cars use sensor technology, signal processing technology, communication technology, and computer technology to identify where the car is located by integrating various on-board sensors such as cameras, laser radars, ultrasonic sensors, microwave radars, GPS, odometers, and magnetic compass. Environment and status, and analyze and judge according to the obtained road information, traffic signal information, vehicle location information and obstacle information, and control the vehicle driving path, so as to realize humanoid driving.

Step S101, collecting multi-frame point clouds and multi-frame images in the detection area; wherein, the point cloud includes a plurality of point cloud coordinates, and each frame of the image corresponds to

In the embodiment of the present disclosure, the point cloud and the image are respectively obtained by the radar sensor and the camera, and the radar sensor and the camera collect synchronously during the driving of the vehicle.

In the embodiment of the present disclosure, the image is the output result of inputting the picture captured by the camera into the image instance segmentation model, therefore, each frame of image includes one or more image instances. The object recognition server of the present disclosure is equipped with an image instance segmentation model; wherein, the image instance segmentation model can adopt methods such as Mask-RCNN, RetinaMask, CenterMask, DeepMask, PANet, and YOLACT.

Step S102, constructing a mapping between the point cloud and the image instance according to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image.

In the embodiment of the present disclosure, the first coordinate system is a point cloud coordinate system, or called a radar coordinate system; the second coordinate system is a camera coordinate system, or called an image coordinate system. The point cloud includes multiple point cloud coordinates, and the point cloud coordinates are usually three-dimensional coordinates, such as (x, y, z); the image includes multiple pixel points, and the pixel points are usually two-dimensional coordinates.

In the embodiment of the present disclosure, due to the dimension difference between point cloud coordinates and pixel points, one pixel point can usually map one or more point cloud coordinates. When the first coordinate system and the second coordinate system are converted, the point cloud realizes the mapping between point cloud coordinates and pixel points through the internal and external parameters of the camera.

Step S103, according to the mapping result, determine the point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determine the center coordinates of the point cloud cluster.

In the embodiment of the present disclosure, as shown in FIG. 2 , the method for determining the point cloud cluster of the present disclosure includes the following steps:

Step S201, acquiring the point cloud of the current frame and the image of the previous frame.

In the embodiment of the present disclosure, on the one hand, because the image is the output result of the image instance segmentation model, the acquisition of the image requires a certain amount of processing time. Therefore, the method for determining the point cloud cluster in the present disclosure does not need to wait for the operation of the image instance segmentation model to Obtain the current frame image, directly obtain the previous frame image and the current frame point cloud for subsequent processing, which can ensure the real-time performance of object recognition; on the other hand, the interval between the current frame and the previous frame is negligible, therefore, in When converting the first coordinate system of the point cloud and the second coordinate system of the image, the mapping relationship between the point cloud of the current frame and the image of the previous frame is good.

Further, the current frame point cloud and the current frame image can also be used for subsequent processing. However, compared with the current frame point cloud and the previous frame image, the real-time performance of obtaining the current frame point cloud and the current frame image is slightly worse, and it is necessary to wait for the image The processing result of the instance segmentation model to obtain the current frame image.

In the embodiment of the present disclosure, the object recognition server can save the image of each frame, so that the point cloud of the current frame and the image of the previous frame can be obtained in real time when the point cloud cluster is determined.

Step S202, in the point cloud of the current frame, a plurality of point cloud coordinates to which any image instance included in the previous frame image is mapped to form a point cloud cluster.

Step S2021, project the point cloud of the current frame to the second coordinate system.

In the embodiment of the present disclosure, the point cloud of the current frame is projected into the second coordinate system according to the conversion relationship between the first coordinate system and the second coordinate system.

In step S2022, the projected point cloud coordinates corresponding to a plurality of pixels of any image instance included in the previous frame image form a point cloud cluster.

In the embodiment of the present disclosure, the point cloud cluster includes multiple points corresponding to the pixel points of the image instance.

In the embodiment of the present disclosure, through the method for determining the point cloud cluster of the present disclosure, the point cloud cluster corresponding to the image instance can be determined according to the transformation relationship between the point cloud and the coordinate system of the image, so as to facilitate the analysis of the point cloud. The category coordinates and center coordinates of the point cloud can be determined.

In the embodiment of the present disclosure, as shown in FIG. 3 , the method for determining the category coordinates of the point cloud of the present disclosure includes the following steps:

Step S301, obtaining the determination result of the point cloud cluster.

In the embodiment of the present disclosure, one or more point cloud clusters of the current frame point cloud corresponding to one or more image instances of the previous frame image determined in step S202 are obtained.

Step S302, according to the determination result of the point cloud cluster, judge whether the point cloud coordinates of the current frame point cloud correspond to the image instance, if yes, go to step S303; if not, go to step S304.

Step S303, according to the category information of the image instance included in the previous frame image, determine the category coordinates of the point cloud cluster corresponding to the image instance.

In the embodiments of the present disclosure, the image instance indicates category information. The image is the output result of the image instance segmentation model, which has details such as texture and color, and can accurately determine the category information to which the image instance belongs; where the category information includes car (car), truck (truck), bus (bus), trailer Any one of (trailer), cart (c_vehicle), pedestrian (pedestrian), motorcycle (motorcycle), bicycle (bicycle), traffic cone (traffic_cone), and roadblock (barrier).

In the embodiment of the present disclosure, one category may correspond to one or more image instances, and one image instance may only correspond to one category.

Furthermore, the category information is mapped with corresponding binary coordinates, and the category coordinates of the point cloud cluster are the binary coordinates of the category information of the image instance. Corresponding to the category of the above 10 kinds of image instances, the binary coordinates corresponding to the category information are 10-bit binary coordinates. For example, the category of the image instance is a motorcycle, and the corresponding binary coordinates are (0,0,0,0,0,0 ,1,0,0,0), the category coordinates of the point cloud cluster are (0,0,0,0,0,0,1,0,0,0); for another example, the category of the image instance is a car, The corresponding binary coordinates are (1,0,0,0,0,0,0,0,0,0), and the category coordinates of the point cloud cluster are (1,0,0,0,0,0,0, 0,0,0).

In the embodiment of the present disclosure, when determining the category coordinates of the point cloud cluster corresponding to the image instance, the following two methods can be used: one point cloud cluster corresponds to one category coordinate; or, each point cloud coordinate of a point cloud cluster Corresponding to a category coordinate, the coordinates of each category are the same.

Step S304, for each point cloud coordinate in the point cloud except the point cloud cluster, determine the category coordinate of the point cloud coordinate as the default category coordinate.

In the embodiment of the present disclosure, the default category coordinates are (0,0,0,0,0,0,0,0,0,0), that is, it does not belong to any of the above 10 categories, which can represent the background, Blanks, misprojected points, etc. For the point cloud coordinates in the point cloud except point cloud clusters, each point cloud coordinate corresponds to a default category coordinate.

In the embodiment of the present disclosure, through the method for determining the category coordinates of the point cloud in the present disclosure, the category coordinates of the point cloud can be determined, using the complementarity between the feature data of the image and the point cloud, the image has details such as color and texture, The point cloud has deep details, which expands the data latitude of the point cloud from 3 bits to 13 bits, which is equivalent to providing prior knowledge of the category for the points in the point cloud, thus making the input of the model more multidimensional and convenient for subsequent detection The learning of the model improves the accuracy of the recognition results output by the model.

In the embodiment of the present disclosure, as shown in FIG. 4, the method for determining the center coordinates of the point cloud of the present disclosure includes the following steps:

Step S401, obtaining the determination result of the point cloud cluster.

Step S402, according to the determination result of the point cloud cluster, judge whether the point cloud coordinates of the current frame point cloud correspond to the image instance, if yes, go to step S403; if not, go to step S404.

Step S403, according to the point cloud coordinates of the point cloud cluster, determine the center coordinates of the point cloud cluster corresponding to the image instance.

In the embodiment of the present disclosure, according to multiple point cloud coordinates (x ₁ , y ₁ , z ₁ ), (x ₂ , y ₂ , z ₂ ), ..., (x _n , y _n , z _n ), determine the average value of multiple point cloud coordinates, the average value is the center coordinates (x _center , y _center , z _center ) of the point cloud cluster corresponding to the image instance.

In the embodiment of the present disclosure, when determining the center coordinates of the point cloud cluster corresponding to the image instance, the following two methods can be used: one point cloud cluster corresponds to one center coordinate, or each point cloud coordinate of a point cloud cluster Corresponding to a center coordinate, each center coordinate is the same.

Step S404, for each point cloud coordinate in the point cloud except the point cloud cluster, determine the center coordinate of the point cloud coordinate as the default center coordinate.

In the embodiment of the present disclosure, the default center coordinate is (0,0,0), which can represent the background, blank, wrong projection point and so on. For the point cloud coordinates in the point cloud except point cloud clusters, each point cloud coordinate corresponds to a default center coordinate.

In the embodiment of the present disclosure, through the method for determining the central coordinates of the point cloud in the present disclosure, the central coordinates of the point cloud can be determined, so that the data latitude of the point cloud is further expanded from 13 bits to 16 bits, so that the input of the model It is more multi-dimensional, which facilitates the learning of subsequent detection models and improves the accuracy of the recognition results output by the model.

Step S104, input the point cloud coordinates, category coordinates and center coordinates of the point cloud into a preset detection model to identify the target object of the point cloud.

In the embodiment of the present disclosure, the category coordinates and center coordinates of the point cloud after data dimension expansion are spliced with the point cloud coordinates of the point cloud, and the result after splicing is used as the input of the preset detection model, and according to the detection model The output, identifies the target object of the point cloud.

In the embodiment of the present disclosure, the target object includes a box of each object in the detection area and a category of each object.

In the embodiment of the present disclosure, the detection model is a 3D object detection model.

In the embodiment of the present disclosure, as shown in FIG. 5 , the splicing method of the coordinates of the point cloud of the present disclosure includes the following steps:

Step S501, obtaining the determination result of the point cloud cluster.

Step S502, according to the determination result of the point cloud cluster, judge whether the point cloud coordinates of the current frame point cloud correspond to the image instance, if yes, go to step S503; if not, go to step S504.

Step S503, for each point cloud coordinate of the same point cloud cluster, perform: stitching the category coordinates and center coordinates into the point cloud coordinates.

In the embodiment of the present disclosure, the splicing result of splicing category coordinates and center coordinates to point cloud coordinates is (point cloud coordinates (x, y, z), category coordinates (10-bit binary type), center coordinates (x _center , y _center , z _center )).

In the embodiment of the present disclosure, for the same point cloud cluster, the category coordinates and center coordinates can also be spliced into the point cloud cluster.

Step S504, for each point cloud coordinate in the point cloud except the point cloud cluster, the default category coordinates and default center coordinates are spliced into the point cloud coordinates.

In the embodiment of the present disclosure, the splicing result of splicing the default category coordinates and default center coordinates to the point cloud coordinates is (x, y, z, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ,0,0,0).

In the embodiment of the present disclosure, the input of the detection model can be determined through the splicing method of the point cloud coordinates of the present disclosure, and the model input is 16-bit coordinates expanded by the data latitude, which makes the learning of the detection model easier and the recognition of the model output The accuracy of the result is higher.

Further, the point cloud can also be denoised by clustering to eliminate the noise caused by the expansion of the data dimension, and based on the denoised point cloud, execute the method for determining the center coordinates of the point cloud cluster in the present disclosure , and then update the center coordinates of the point cloud, and execute the splicing method of the point cloud coordinates of the present disclosure according to the updated center coordinates of the point cloud. The stitching result determined according to the center coordinates of the updated point cloud is input into the preset detection model, so that the recognition result of the target object is more accurate. Among them, the clustering algorithm can use DBSCAN, K-median and other methods.

In the embodiment of the present disclosure, by collecting multiple frames of point clouds and multiple frames of images in the detection area; wherein, the point cloud includes multiple point cloud coordinates, and each frame of the image corresponds to one or more image instances; According to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image, construct a mapping between the point cloud and the image instance; determine the image instance according to the mapping result Corresponding point cloud clusters and category coordinates of the point cloud clusters, and determine the center coordinates of the point cloud clusters; input the point cloud coordinates, category coordinates, and center coordinates of the point clouds into a preset detection model, and identify all The steps of describing the target object of the point cloud can use a variety of simulated characteristic data as the input of the detection model, and the characteristic data are complementary, so that the input of the model is more multi-dimensional, and the accuracy of the recognition result determined by the model output is greatly improved. Improvement, it can accurately identify each target object in the autonomous driving scene, so as to provide a reference for the simulation of automatic driving.

FIG. 6 is a schematic diagram of main modules of an object recognition device according to an embodiment of the present disclosure. As shown in FIG. 6 , the object recognition device 600 of the present disclosure includes:

The collection module 601 is configured to collect multiple frames of point clouds and multiple frames of images within the detection area; wherein, the point cloud includes multiple point cloud coordinates, and each frame of the image corresponds to one or more image instances.

In the embodiment of the present disclosure, the collection module 601 is used to collect multi-frame point clouds and multi-frame images in the detection area; wherein, the point clouds and images are respectively obtained through the radar sensor and the camera, and during the driving process of the vehicle, Synchronous acquisition by radar sensor and camera.

The mapping module 602 is configured to construct a mapping between the point cloud and the image instance according to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image.

In the embodiment of the present disclosure, the mapping module 602 is used to construct the point cloud and the image instance according to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image mapping. The first coordinate system is a point cloud coordinate system, or called a radar coordinate system; the second coordinate system is a camera coordinate system, or called an image coordinate system. The point cloud includes multiple point cloud coordinates, and the point cloud coordinates are usually three-dimensional coordinates, such as (x, y, z); the image includes multiple pixel points, and the pixel points are usually two-dimensional coordinates.

In the embodiment of the present disclosure, due to the dimension difference between point cloud coordinates and pixel points, one pixel point can usually map one or more point cloud coordinates. When the first coordinate system and the second coordinate system are transformed, the point cloud realizes the mapping between point cloud coordinates and pixel points through the internal and external parameters of the camera.

The data processing module 603 is configured to determine the point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster according to the mapping result, and determine the center coordinates of the point cloud cluster.

In the embodiment of the present disclosure, the data processing module 603 is used to determine the point cloud corresponding to the image instance in the current frame point cloud on the basis of the mapping result of the point cloud and the image instance according to the point cloud cluster determination method of the present disclosure. A point cloud cluster, a point cloud cluster usually includes multiple point cloud coordinates.

In the embodiment of the present disclosure, the data processing module 603 is further configured to determine the category coordinates of the current frame point cloud according to the method for determining the category coordinates of the point cloud in the present disclosure.

In the embodiment of the present disclosure, the data processing module 603 is further configured to determine the center coordinates of the current frame point cloud according to the method for determining the center coordinates of the point cloud in the present disclosure.

The identification module 604 is configured to input the point cloud coordinates, category coordinates and center coordinates of the point cloud into a preset detection model to identify the target object of the point cloud.

In the embodiment of the present disclosure, the identification module 604 is used to splice the category coordinates and center coordinates of the point cloud after data dimension expansion and the point cloud coordinates of the point cloud, and use the spliced result as a preset detection model According to the input of the detection model, the target object of the point cloud is identified.

In the embodiment of the present disclosure, various simulated characteristic data can be used as the input of the detection model through modules such as the acquisition module, the mapping module, the data processing module, and the identification module, and the characteristic data are complementary, so that the model's The input is more multi-dimensional, and the accuracy of the recognition results determined by the model output is greatly improved, which can accurately identify each target object in the autonomous driving scene, thereby providing a reference for the simulation of automatic driving.

Fig. 7 shows an exemplary system architecture diagram of an object recognition method or an object recognition device applicable to an embodiment of the present disclosure. As shown in Fig. 7, an exemplary system of an object recognition method or an object recognition device in an embodiment of the present disclosure Architecture includes:

As shown in FIG. 7 , a system architecture 700 may include

detection devices

701 , 702 , and 703 , a network 704 and a server 705 . The network 704 is used to provide a medium for communication links between the

detection devices

701, 702, 703 and the server 105. Network 704 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

The

detection devices

701, 702, 703 interact with the server 705 through the network 704 to collect or send messages and the like.

The

detection devices

701, 702, and 703 may be various electronic devices with detection functions, including but not limited to lidar sensors, cameras, and so on.

The server 705 may be a server that provides various services, such as a background management server that supports the collected point clouds and images sent by the

detection devices

701 , 702 , and 703 . The background management server can analyze and process the collected data such as point clouds and images, and output the processing results (such as the object to which the point cloud cluster belongs).

It should be noted that the object recognition method provided by the embodiment of the present disclosure is generally executed by the server 705 , and correspondingly, the object recognition device is generally disposed in the server 705 .

It should be understood that the numbers of detection devices, networks and servers in Fig. 7 are only illustrative. There may be any number of detection devices, networks, and servers according to implementation requirements.

FIG. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server in an embodiment of the present disclosure. As shown in FIG. 8 , the computer system 800 of a terminal device or a server in an embodiment of the present disclosure includes:

A central processing unit (CPU) 801 that can execute various appropriate actions and processes according to programs stored in a read only memory (ROM) 802 or loaded from a storage section 808 into a random access memory (RAM) 803 . In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801 , ROM 802 , and RAM 803 are connected to each other via a bus 804 . An input/output (I/O) interface 805 is also connected to the bus 804 .

The following components are connected to the I/O interface 805: an input section 806 including a keyboard, a mouse, etc.; an output section 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 808 including a hard disk, etc. and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the Internet. A drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 810 as necessary so that a computer program read therefrom is installed into the storage section 808 as necessary.

In particular, according to the disclosed embodiments of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, the disclosed embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 809 and/or installed from removable media 811 . When this computer program is executed by a central processing unit (CPU) 801, the above-described functions defined in the system of the present disclosure are performed.

It should be noted that the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.

The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. The described modules can also be set in a processor, for example, it can be described as: a processor includes an acquisition module, a mapping module, a data processing module and an identification module. Among them, the names of these modules do not constitute a limitation of the module itself under certain circumstances. For example, the recognition module can also be described as "inputting the point cloud coordinates, category coordinates and center coordinates of the point cloud into the preset detection model , a module for identifying target objects in point clouds".

As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above embodiments, or may exist independently without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by one of the devices, the device includes: collecting multiple frames of point clouds and multiple frames of images in the detection area; wherein, the The point cloud includes a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances; according to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image, Construct the mapping of the point cloud and the image instance; according to the result of the mapping, determine the point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determine the center coordinates of the point cloud cluster; Inputting the point cloud coordinates, category coordinates and center coordinates of the point cloud into a preset detection model to identify the target object of the point cloud.

When performing target detection in existing autonomous driving, multiple models (such as image-based 3D target detection, point cloud-based 3D target detection, image-based semantic segmentation, image-based instance segmentation, point cloud-based semantic Segmentation) output results are fused to determine the final perception result; or multiple models are fused to determine the final perception result. Due to the rough fusion logic of the output results of the existing target detection methods, or the inability to align the input features of multi-model fusion, the accuracy of the target recognition results is low, and the significance of being a simulation reference for autonomous driving is small.

According to the technical solutions of the embodiments of the present disclosure, based on the multimodal fusion of radar point clouds and camera images, the problem of feature alignment between modalities is solved, and the complementary relationship between point clouds and images is fully utilized to optimize the 3D detection model. input, greatly improving the accuracy of the model, and the operation of the model is real-time.

According to the technical solution of the embodiment of the present disclosure, according to the point cloud data without details such as color and texture, it is very accurate in depth (distance above), while the image data has no depth details, and is very accurate in texture and color, fully Using the data complementarity between the two modalities, the point cloud and image instance segmentation results are fused, and the category information of the image instance is provided to the point cloud according to the coordinate correspondence between the two, making the learning of the 3D detection model easier. Compared with the existing 3-bit point cloud coordinate input, the accuracy of the recognition result is greatly improved. Coupled with the center coordinates of the point cloud cluster corresponding to the image instance, the 16-bit coordinates are used as input to improve the accuracy of the model and at the same time promote the convergence of the model. On the basis of using the current frame point cloud and the previous frame image, It greatly improves the operating efficiency of the model.

According to the technical solutions of the embodiments of the present disclosure, a variety of simulated feature data can be used as the input of the detection model, and the feature data are complementary, so that the input of the model is more multi-dimensional, and the accuracy of the recognition result determined by the model output is greatly improved. Improvement, it can accurately identify each target object in the autonomous driving scene, so as to provide a reference for the simulation of automatic driving.

The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims

A method of object recognition comprising:

Collect multi-frame point clouds and multi-frame images in the detection area; wherein, the point cloud includes a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances;

Constructing a mapping between the point cloud and the image instance according to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image;

According to the result of the mapping, determine the point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determine the center coordinates of the point cloud cluster;

Inputting the point cloud coordinates, category coordinates and center coordinates of the point cloud into a preset detection model to identify the target object of the point cloud.
The method according to claim 1, wherein said determining a point cloud cluster corresponding to said image instance comprises:

Obtain the point cloud of the current frame and the image of the previous frame;

In the current frame point cloud, a plurality of point cloud coordinates to which any image instance included in the previous frame image is mapped to form the point cloud cluster.
The method according to claim 2, wherein the point cloud cluster is composed of a plurality of point cloud coordinates to which any image instance included in the previous frame image is mapped, comprising:

projecting the current frame point cloud to the second coordinate system;

The point cloud cluster is composed of projected point cloud coordinates corresponding to multiple pixel points of any image instance included in the previous frame image.
The method according to claim 2, wherein said image instance indicates category information; said determining category coordinates of said point cloud cluster comprises:

According to the category information of the image instance included in the previous frame image, the category coordinates of the point cloud cluster corresponding to the image instance are determined.
The method according to claim 1, further comprising:

Splicing the point cloud coordinates, category coordinates and center coordinates of the point cloud;

The input of the point cloud coordinates, category coordinates and center coordinates of the point cloud into the preset detection model includes:

Input the spliced results into the detection model.
The method according to claim 5, wherein said splicing the point cloud coordinates, category coordinates and center coordinates of the point cloud comprises:

For each point cloud coordinate of the same point cloud cluster, execute:

Stitching the category coordinates and the center coordinates to the point cloud coordinates.
The method of claim 6, further comprising:

For the point cloud coordinates in the point cloud except the point cloud cluster, determine that the category coordinates and center coordinates of the point cloud coordinates are default category coordinates and default center coordinates respectively, and set the default category coordinates and the default The center coordinates are stitched to the point cloud coordinates.
The method of claim 1, further comprising:

Performing noise reduction processing on the point cloud by means of clustering;

The step of determining the center coordinates of the point cloud clusters is performed based on the point cloud after the noise reduction processing.
The method according to claim 2, wherein the point cloud and the image are respectively obtained by a radar sensor and a camera, wherein the radar sensor and the camera collect synchronously.
The method according to claim 4, wherein the category information is mapped with corresponding binary coordinates; determining the category coordinates of the point cloud cluster corresponding to the image instance comprises:

Determine the binary coordinates of the category information of the image instance as the category coordinates of the point cloud cluster.
An object recognition device, comprising:

An acquisition module, configured to acquire a multi-frame point cloud and a multi-frame image in the detection area; wherein the point cloud includes a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances;

A mapping module, configured to construct a mapping between the point cloud and the image instance according to the preset conversion relationship between the first coordinate system of the point cloud and the second coordinate system of the image;

The data processing module is used to determine the point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster according to the mapping result, and determine the center coordinates of the point cloud cluster;

The identification module is configured to input the point cloud coordinates, category coordinates and center coordinates of the point cloud into a preset detection model to identify the target object of the point cloud.
An electronic device for object recognition, comprising:

one or more processors;

storage means for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-10.
A computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, the method according to any one of claims 1-10 is realized.