CN114550116A

CN114550116A - Object identification method and device

Info

Publication number: CN114550116A
Application number: CN202210147726.2A
Authority: CN
Inventors: 张宝丰
Original assignee: Jingdong Kunpeng Jiangsu Technology Co Ltd
Current assignee: Jingdong Kunpeng Jiangsu Technology Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-05-27
Also published as: WO2023155580A1

Abstract

The invention discloses an object identification method and device, and relates to the technical field of computers. The specific implementation mode of the method comprises the following steps: collecting multiple frames of point clouds and multiple frames of images in a detection area; according to a conversion relation between a preset first coordinate system of the point cloud and a preset second coordinate system of the image, mapping of the point cloud and an image instance is constructed; determining a point cloud cluster corresponding to the image example and the category coordinates of the point cloud cluster according to the mapping result, and determining the center coordinates of the point cloud cluster; and inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model, and identifying a target object of the point cloud. According to the embodiment, various simulated feature data can be used as the input of the detection model, and the feature data have complementarity, so that the input of the model is more multidimensional, the accuracy of the recognition result determined by the output of the model is greatly improved, and each target object in the automatic driving scene can be accurately recognized.

Description

Object identification method and device

Technical Field

The invention relates to the technical field of automatic driving, in particular to an object identification method and device.

Background

The automatic driving utilizes the cooperation of the technologies of artificial intelligence, machine vision, radar, navigation positioning, communication and the like, so that the vehicle can automatically and safely run through a computer control system without any active operation of human beings.

When the existing automatic driving detects a target, in order to more accurately identify the target, learning results of a plurality of machine learning models are generally fused, so that a final target is determined; or fusing a plurality of machine learning models, and determining a final target according to the output of the fusion model.

In the existing target recognition results, or the target recognition results are obtained by fusing the learning results of a plurality of models, the fusion logic is rough, so that the final target recognition results are inaccurate; or the final target is determined according to the output of the fusion model, the feature data when multiple models are fused cannot be completely aligned, and the determination accuracy of the final target is also lower, even lower than the recognition result of a single model.

Disclosure of Invention

In view of this, embodiments of the present invention provide an object identification method and apparatus, which can use multiple simulated feature data as input of a detection model, and the feature data have complementarity, so that the input of the model is more multidimensional, the accuracy of an identification result determined by model output is greatly improved, and each target object in an automatic driving scene can be accurately identified, thereby providing a reference for the simulation of automatic driving.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an object recognition method including: collecting multiple frames of point clouds and multiple frames of images in a detection area; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances; constructing mapping of the point cloud and the image instance according to a preset transformation relation between a first coordinate system of the point cloud and a second coordinate system of the image; according to the mapping result, determining a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determining the center coordinates of the point cloud cluster; and inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model, and identifying a target object of the point cloud.

Optionally, the determining a point cloud cluster corresponding to the image instance includes: acquiring a current frame point cloud and a previous frame image; in the current frame point cloud, a plurality of point cloud coordinates mapped by any image instance included in the previous frame image form the point cloud cluster.

Optionally, the step of forming the point cloud cluster by the plurality of point cloud coordinates to which any image instance included in the previous frame of image is mapped includes: projecting the current frame point cloud to the second coordinate system; and forming the point cloud cluster by point cloud coordinates corresponding to a plurality of pixel points of any image instance included in the previous frame of image after projection.

Optionally, the image instance indicates category information; the determining the category coordinates of the point cloud cluster comprises: and determining the category coordinates of the point cloud cluster corresponding to the image example according to the category information of the image example included in the previous frame of image.

Optionally, the method further comprises: splicing the point cloud coordinates, the category coordinates and the center coordinates of the point cloud; the method for inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model comprises the following steps: and inputting the spliced result into the detection model.

Optionally, the stitching the point cloud coordinates, the category coordinates, and the center coordinates of the point cloud includes: executing the following steps for each point cloud coordinate of the same point cloud cluster:

and splicing the category coordinate and the center coordinate to the point cloud coordinate.

Optionally, the method further comprises: and determining category coordinates and center coordinates of the point cloud coordinates except the point cloud cluster as default category coordinates and default center coordinates respectively, and splicing the default category coordinates and the default center coordinates to the point cloud coordinates.

Optionally, the method further comprises: carrying out noise reduction processing on the point cloud in a clustering mode;

and executing the step of determining the center coordinates of the point cloud cluster based on the point cloud after the noise reduction treatment.

Optionally, the point cloud and the image are obtained through a radar sensor and a camera respectively, wherein the radar sensor and the camera are synchronously collected.

Optionally, the category information is mapped with corresponding binary coordinates; determining category coordinates of a point cloud cluster corresponding to the image instance, comprising: and determining the binary coordinate of the category information of the image instance as the category coordinate of the point cloud cluster.

According to still another aspect of an embodiment of the present invention, there is provided an object recognition apparatus including: the acquisition module is used for acquiring multi-frame point clouds and multi-frame images in the detection area; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances; the mapping module is used for constructing mapping between the point cloud and the image instance according to a preset conversion relation between a first coordinate system of the point cloud and a second coordinate system of the image; the data processing module is used for determining a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster according to the mapping result, and determining the center coordinates of the point cloud cluster; and the identification module is used for inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model and identifying the target object of the point cloud.

According to another aspect of an embodiment of the present invention, there is provided an object recognition electronic device including:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the object recognition method provided by the present invention.

According to still another aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing an object recognition method provided by the present invention.

One embodiment of the above invention has the following advantages or benefits: because the current frame point cloud is converted into the camera coordinate system by utilizing the real-time performance of the current frame point cloud and the previous frame image, thereby obtaining the category coordinates of the point cloud cluster corresponding to the image instance, calculating the center coordinates of the point cloud cluster, splicing the point cloud coordinates, the category coordinates and the center coordinates as the input of a detection model, identifying the target object according to the output of the detection model, therefore, the technical problems that the accuracy of the existing target recognition result is low and reference cannot be provided for the simulation of automatic driving are solved, thereby achieving the purpose of utilizing characteristic data of various simulations as the input of the detection model, and the characteristic data has complementarity, therefore, the input of the model is more multidimensional, the accuracy of the identification result determined by the output of the model is greatly improved, and each target object in the automatic driving scene can be accurately identified, so that the technical effect of providing reference for the simulation of automatic driving is achieved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

fig. 1 is a schematic diagram of a main flow of an object recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a main flow of a method of determining a point cloud cluster according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a main flow of a method of determining category coordinates of a point cloud according to an embodiment of the invention;

fig. 4 is a schematic diagram of a main flow of a method of determining center coordinates of a point cloud according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a main flow of a method for stitching coordinates of a point cloud according to an embodiment of the present invention;

fig. 6 is a schematic diagram of main blocks of an object recognition apparatus according to an embodiment of the present invention;

fig. 7 shows an exemplary system architecture diagram of an object recognition method or object recognition apparatus suitable for application to embodiments of the present invention;

fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a main flow of an object identification method according to an embodiment of the present invention, and as shown in fig. 1, the object identification method of the present invention includes the following steps:

the unmanned driving integrates multiple leading-edge subjects such as sensors, computers, artificial intelligence, communication, navigation positioning, mode recognition, machine vision, intelligent control and the like, and can realize the goals of environment perception, navigation positioning, path planning, decision control and the like.

The unmanned automobile utilizes sensor technology, signal processing technology, communication technology, computer technology and the like, identifies the environment and the state of the automobile by integrating various vehicle-mounted sensors such as a camera, a laser radar, an ultrasonic sensor, a microwave radar, a GPS, a speedometer, a magnetic compass and the like, analyzes and judges according to the obtained road information, traffic signal information, vehicle position information and obstacle information, controls the driving path of the automobile, and realizes the anthropomorphic driving.

Step S101, collecting multi-frame point clouds and multi-frame images in a detection area; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to

In the embodiment of the invention, the point cloud and the image are respectively obtained through the radar sensor and the camera, and the radar sensor and the camera are synchronously collected in the driving process of the vehicle.

In the embodiment of the invention, the image is the result of inputting the picture shot by the camera into the image instance segmentation model and outputting the picture, so that each frame of image comprises one or more image instances. The object recognition server of the invention is loaded with an image instance segmentation model; the image example segmentation model can adopt a Mask-RCNN, RetinaMask, CenterMask, DeepMask, PANET, YOLACT and other methods.

And S102, constructing mapping of the point cloud and the image instance according to a preset conversion relation between a first coordinate system of the point cloud and a second coordinate system of the image.

In the embodiment of the invention, the first coordinate system is a point cloud coordinate system, or a radar coordinate system; the second coordinate system is the camera coordinate system, alternatively referred to as the image coordinate system. The point cloud comprises a plurality of point cloud coordinates, typically three-dimensional coordinates, such as (x, y, z); the image includes a plurality of pixel points, which are typically two-dimensional coordinates.

In the embodiment of the present invention, due to the dimension difference between the point cloud coordinates and the pixel points, one pixel point may generally map one or more point cloud coordinates. And when the first coordinate system and the second coordinate system are converted, the point cloud realizes the mapping of the point cloud coordinates and the pixel points through the internal reference and the external reference of the camera.

Step S103, according to the mapping result, determining a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determining the center coordinates of the point cloud cluster.

In the embodiment of the present invention, as shown in fig. 2, the method for determining a point cloud cluster of the present invention includes the following steps:

step S201, a current frame point cloud and a previous frame image are acquired.

In the embodiment of the invention, on one hand, because the image is the output result of the image instance segmentation model, and certain processing time is needed for obtaining the image, the method for determining the point cloud cluster of the invention does not need to wait for the operation of the image instance segmentation model to obtain the current frame image, and can directly obtain the previous frame image and the current frame point cloud for subsequent processing, thereby ensuring the real-time property of object identification; on the other hand, the interval time between the current frame and the previous frame can be ignored, so that the mapping relation between the current frame point cloud and the previous frame image is good when the first coordinate system of the point cloud and the second coordinate system of the image are converted.

Further, the current frame point cloud and the current frame image may be subjected to subsequent processing, however, compared with the current frame point cloud and the previous frame image, the real-time performance of obtaining the current frame point cloud and the current frame image is slightly poor, and the current frame image needs to be obtained by waiting for the processing result of the image instance segmentation model.

In the embodiment of the invention, the object recognition server can store the image of each frame, so that the current frame point cloud and the previous frame image can be acquired in real time when the point cloud cluster is determined.

Step S202, in the current frame point cloud, a plurality of point cloud coordinates mapped by any image example included in the previous frame image form a point cloud cluster.

Step S2021, projecting the current frame point cloud to a second coordinate system.

In the embodiment of the invention, the current frame point cloud is projected into the second coordinate system according to the conversion relation between the first coordinate system and the second coordinate system.

Step S2022, forming point cloud coordinates corresponding to the plurality of pixel points of any image instance included in the previous frame image after projection into a point cloud cluster.

In an embodiment of the invention, the point cloud cluster includes a plurality of points corresponding to pixel points of the image instance.

In the embodiment of the invention, the point cloud cluster corresponding to the image instance can be determined according to the conversion relation between the point cloud and the coordinate system of the image by the point cloud determining method, so that the point cloud can be analyzed conveniently, and the category coordinate and the center coordinate of the point cloud can be determined subsequently.

In the embodiment of the present invention, as shown in fig. 3, the method for determining the category coordinates of the point cloud of the present invention includes the following steps:

step S301, obtaining a determination result of the point cloud cluster.

In the embodiment of the present invention, one or more point cloud clusters of the current frame point cloud corresponding to one or more image instances of the previous frame image determined in step S202 are obtained.

Step S302, judging whether the point cloud coordinate of the current frame point cloud corresponds to the image instance or not according to the determination result of the point cloud cluster, and if so, turning to step S303; if not, go to step S304.

Step S303, determining the category coordinates of the point cloud cluster corresponding to the image instance according to the category information of the image instance included in the previous frame of image.

In the embodiment of the present invention, the image instance indicates category information. The image is the result output by the image instance segmentation model, has the details of texture, color and the like, and can accurately determine the category information of the image instance; the category information includes any one of cars (car), trucks (truck), buses (bus), trailers (trailer), carts (c _ vehicle), pedestrians (pedestrian), motorcycles (motorcycler), bicycles (bicycle), traffic cones (traffic _ con), and barriers (barrier).

In embodiments of the present invention, one category may correspond to one or more image instances, and one image instance may correspond to only one category.

Further, the category information is mapped with corresponding binary coordinates, and the category coordinates of the point cloud cluster are the binary coordinates of the category information of the image instance. The binary coordinate corresponding to the category information of the 10 image instances is a 10-bit binary coordinate, for example, the category of the image instance is a motorcycle, the corresponding binary coordinate is (0,0,0,0,0,0,1,0,0,0), and the category coordinate of the point cloud cluster is (0,0,0,0, 1,0,0, 0); for another example, the type of the image example is an automobile, the corresponding binary coordinates are (1,0,0,0,0,0,0, 0), and the type coordinates of the point cloud cluster are (1,0,0,0,0,0,0,0, 0).

In the embodiment of the present invention, when determining the category coordinates of the point cloud cluster corresponding to the image instance, the point cloud cluster may be determined in the following two ways: one point cloud cluster corresponds to one category coordinate; or each point cloud coordinate of one point cloud cluster corresponds to one category coordinate, and all the category coordinates are the same.

Step S304, determining the category coordinate of the point cloud coordinate as a default category coordinate for each point cloud coordinate except the point cloud cluster in the point cloud.

In the embodiment of the present invention, the default category coordinate is (0,0,0,0,0, 0), that is, the default category coordinate does not belong to any of the categories in the above 10, and can represent a background, a blank, an error projection point, and the like. And for point cloud coordinates except the point cloud clusters in the point cloud, each point cloud coordinate corresponds to a default category coordinate.

In the embodiment of the invention, the method for determining the category coordinate of the point cloud can determine the category coordinate of the point cloud, and by utilizing the complementarity between the image and the feature data of the point cloud, the image has details such as color, texture and the like, and the point cloud has depth details, so that the latitude of the data of the point cloud is expanded, and the latitude is expanded from 3 bits to 13 bits, which is equivalent to providing prior knowledge of categories for points in the point cloud, thereby enabling the input of the model to be more multidimensional, facilitating the subsequent learning of the detection model, and improving the accuracy of the identification result output by the model.

In the embodiment of the present invention, as shown in fig. 4, the method for determining the center coordinates of the point cloud of the present invention includes the following steps:

step S401, obtaining a determination result of the point cloud cluster.

Step S402, judging whether the point cloud coordinate of the current frame point cloud corresponds to the image instance or not according to the determination result of the point cloud cluster, and if so, turning to step S403; if not, go to step S404.

And S403, determining the center coordinates of the point cloud cluster corresponding to the image instance according to the point cloud coordinates of the point cloud cluster.

In an embodiment of the invention, the plurality of point cloud coordinates (x) from the point cloud cluster₁,y₁,z₁)、(x₂,y₂,z₂)、……、(x_n,y_n,z_n) Determining an average of the plurality of point cloud coordinates, the average being a center coordinate (x) of the point cloud cluster corresponding to the image instance_center,y_center,z_center)。

In the embodiment of the present invention, when determining the center coordinates of the point cloud cluster corresponding to the image instance, the following two ways may be used: one point cloud cluster corresponds to one central coordinate, or each point cloud coordinate of one point cloud cluster corresponds to one central coordinate, and all the central coordinates are the same.

Step S404, determining the center coordinate of the point cloud coordinate as a default center coordinate for each point cloud coordinate except the point cloud cluster in the point cloud.

In the embodiment of the present invention, the default center coordinate is (0,0,0), and may represent a background, a blank, an erroneous projected point, and the like. And for point cloud coordinates except the point cloud clusters in the point cloud, each point cloud coordinate corresponds to a default center coordinate.

In the embodiment of the invention, by the method for determining the center coordinate of the point cloud, the center coordinate of the point cloud can be determined, so that the data latitude of the point cloud is further expanded and is expanded from 13 bits to 16 bits, thereby enabling the input of the model to be more multidimensional, facilitating the subsequent learning of the detection model and improving the accuracy of the identification result output by the model.

And step S104, inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model, and identifying a target object of the point cloud.

In the embodiment of the invention, the category coordinates and the center coordinates of the point cloud after data dimension expansion are spliced with the point cloud coordinates of the point cloud, the spliced result is used as the input of a preset detection model, and the target object of the point cloud is identified according to the output of the detection model.

In an embodiment of the present invention, the target object includes a box of each object within the detection area and a category of each object.

In the embodiment of the invention, the detection model is a 3D target detection model.

In the embodiment of the present invention, as shown in fig. 5, the method for stitching point cloud coordinates of the present invention includes the following steps:

step S501, a determination result of the point cloud cluster is obtained.

Step S502, judging whether the point cloud coordinate of the current frame point cloud corresponds to the image instance or not according to the determination result of the point cloud cluster, and if so, turning to step S503; if not, go to step S504.

Step S503, for each point cloud coordinate of the same point cloud cluster, executing: and splicing the category coordinate and the center coordinate to the point cloud coordinate.

In the embodiment of the present invention, the result of stitching the category coordinate and the center coordinate to the point cloud coordinate is (point cloud coordinate (x, y, z)), category coordinate (10-bit binary type), and center coordinate (x)_center,y_center,z_center))。

In the embodiment of the invention, the category coordinate and the center coordinate can also be spliced to the point cloud cluster for the same point cloud cluster.

Step S504, for each point cloud coordinate except the point cloud cluster, the default category coordinate and the default center coordinate are spliced to the point cloud coordinate.

In the embodiment of the present invention, the result of stitching the default category coordinate and the default center coordinate to the point cloud coordinate is (x, y, z,0,0,0,0,0,0, 0).

In the embodiment of the invention, the input of the detection model can be determined by the method for splicing the point cloud coordinates, and the model input is 16-bit coordinates subjected to data latitude expansion, so that the learning of the detection model is easier, and the accuracy of the identification result output by the model is higher.

Furthermore, the point cloud can be subjected to noise reduction treatment in a clustering mode to eliminate noise caused by data dimension expansion, the method for determining the center coordinates of the point cloud cluster is executed based on the point cloud subjected to noise reduction treatment, the center coordinates of the point cloud are updated, and the method for splicing the coordinates of the point cloud is executed according to the updated center coordinates of the point cloud. And inputting the splicing result determined according to the updated center coordinates of the point cloud into a preset detection model, so that the identification result of the target object is more accurate. The clustering algorithm can adopt methods such as DBSCAN and K-mean.

In the embodiment of the invention, multi-frame point clouds and multi-frame images in a detection area are acquired; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances; constructing mapping of the point cloud and the image instance according to a preset transformation relation between a first coordinate system of the point cloud and a second coordinate system of the image; according to the mapping result, determining a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determining the center coordinates of the point cloud cluster; the method comprises the steps of inputting point cloud coordinates, category coordinates and center coordinates of the point cloud into a preset detection model, identifying target objects of the point cloud and the like, and can utilize various simulated feature data as the input of the detection model, wherein the feature data have complementarity, so that the input of the model is more multidimensional, the accuracy of a recognition result determined by model output is greatly improved, each target object in an automatic driving scene can be accurately identified, and reference is provided for automatic driving simulation.

Fig. 6 is a schematic diagram of main blocks of an object recognition apparatus according to an embodiment of the present invention, and as shown in fig. 6, an object recognition apparatus 600 of the present invention includes:

the acquisition module 601 is used for acquiring multi-frame point clouds and multi-frame images in the detection area; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances.

In the embodiment of the present invention, the acquisition module 601 is configured to acquire multiple frames of point clouds and multiple frames of images in a detection area; the point cloud and the image are obtained through a radar sensor and a camera respectively, and the radar sensor and the camera are synchronously collected in the driving process of the vehicle.

A mapping module 602, configured to construct a mapping between the point cloud and the image instance according to a preset transformation relationship between the first coordinate system of the point cloud and the second coordinate system of the image.

In an embodiment of the present invention, the mapping module 602 is configured to construct a mapping between the point cloud and the image instance according to a preset transformation relationship between a first coordinate system of the point cloud and a second coordinate system of the image. The first coordinate system is a point cloud coordinate system or a radar coordinate system; the second coordinate system is the camera coordinate system, alternatively referred to as the image coordinate system. The point cloud includes a plurality of point cloud coordinates, typically three-dimensional coordinates, such as (x, y, z); the image includes a plurality of pixel points, which are typically two-dimensional coordinates.

And the data processing module 603 is configured to determine, according to the mapping result, a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determine the center coordinates of the point cloud cluster.

In an embodiment of the present invention, the data processing module 603 is configured to determine a point cloud cluster corresponding to the image instance in the current frame point cloud based on the mapping result of the point cloud and the image instance according to the method for determining a point cloud cluster of the present invention, where the point cloud cluster generally includes a plurality of point cloud coordinates.

In this embodiment of the present invention, the data processing module 603 is further configured to determine the category coordinate of the current frame point cloud according to the method for determining the category coordinate of the point cloud of the present invention.

In this embodiment of the present invention, the data processing module 603 is further configured to determine the center coordinate of the current frame point cloud according to the method for determining the center coordinate of the point cloud of the present invention.

The identifying module 604 is configured to input the point cloud coordinates, the category coordinates, and the center coordinates of the point cloud into a preset detection model, and identify a target object of the point cloud.

In the embodiment of the present invention, the identification module 604 is configured to splice the category coordinates and the center coordinates of the point cloud after data dimension expansion and the point cloud coordinates of the point cloud, use the spliced result as an input of a preset detection model, and identify a target object of the point cloud according to an output of the detection model.

In the embodiment of the invention, through the acquisition module, the mapping module, the data processing module, the recognition module and other modules, various simulated feature data can be used as the input of the detection model, and the feature data have complementarity, so that the input of the model is more multidimensional, the accuracy of the recognition result determined by the model output is greatly improved, each target object in an automatic driving scene can be accurately recognized, and reference is provided for the simulation of automatic driving.

Fig. 7 shows an exemplary system architecture diagram of an object recognition method or an object recognition apparatus suitable for application to an embodiment of the present invention, and as shown in fig. 7, the exemplary system architecture of the object recognition method or the object recognition apparatus of the embodiment of the present invention includes:

as shown in fig. 7, the system architecture 700 may include

detection devices

701, 702, 703, a network 704, and a server 705. The network 704 is used to provide a medium for communication links between the

detection devices

701, 702, 703 and the server 105. Network 704 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

The

detection devices

701, 702, 703 interact with a server 705 over a network 704 to collect or send messages, etc.

The

detection devices

701, 702, 703 may be various electronic devices having detection functionality, including but not limited to lidar sensors, cameras, and the like.

The server 705 may be a server providing various services, such as a background management server providing support for the collected point clouds and images sent by the

detection devices

701, 702, 703. The background management server may analyze and otherwise process the acquired data such as the point cloud and the image, and output a processing result (e.g., an object to which the point cloud cluster belongs).

It should be noted that the object identification method provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the object identification apparatus is generally disposed in the server 705.

It should be understood that the number of detection devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of detection devices, networks, and servers, as desired for an implementation.

Fig. 8 is a schematic structural diagram of a computer system suitable for implementing the terminal device or the server according to the embodiment of the present invention, and as shown in fig. 8, the computer system 800 of the terminal device or the server according to the embodiment of the present invention includes:

a Central Processing Unit (CPU)801 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The CPU801, ROM802, and RAM803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that the computer program read out therefrom is mounted on the storage section 808 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a mapping module, a data processing module, and an identification module. The names of these modules do not constitute a limitation to the module itself in some cases, and for example, the identification module may be further described as a module that inputs point cloud coordinates, category coordinates, and center coordinates of the point cloud into a preset detection model and identifies a target object of the point cloud.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: collecting multiple frames of point clouds and multiple frames of images in a detection area; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances; constructing mapping of the point cloud and the image instance according to a preset transformation relation between a first coordinate system of the point cloud and a second coordinate system of the image; according to the mapping result, determining a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determining the center coordinates of the point cloud cluster; and inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model, and identifying a target object of the point cloud.

When the existing automatic driving detects a target, output results of a plurality of models (such as image-based 3D target detection, point cloud-based 3D target detection, image-based semantic segmentation, image-based instance segmentation, and point cloud-based semantic segmentation) are generally fused to determine a final perception result; or multiple models may be fused to determine the final perception result. The existing target detection method has the disadvantages that the output result fusion logic is rough, or the input features of multi-model fusion cannot be aligned, so that the target recognition result accuracy is low, and the significance of the simulation reference for automatic driving is small.

According to the technical scheme of the embodiment of the invention, based on multi-mode fusion of the radar point cloud and the camera image, the problem of feature alignment between modes is solved, the complementary relation between the point cloud and the image is fully utilized, the input of a 3D detection model is optimized, the accuracy of the model is greatly improved, and the operation of the model has real-time performance.

According to the technical scheme of the embodiment of the invention, according to the characteristics that the data of the point cloud has no details such as color, texture and the like, is very accurate in depth (above distance), and the data of the image has no depth details, is very accurate in texture and color, the data complementarity between two modes is fully utilized, the point cloud and the image example segmentation result are fused, and the category information of the image example is provided for the point cloud according to the coordinate corresponding relation of the point cloud and the image example, so that the 3D detection model can be more easily learned. Compared with the existing 3-site cloud coordinate input, the accuracy of the recognition result is greatly improved. And the center coordinates of the point cloud cluster corresponding to the image instance are used, and the 16-bit coordinates are used as input, so that the model accuracy is improved, the model convergence is promoted, and the operation efficiency of the model is greatly improved on the basis of utilizing the current frame point cloud and the previous frame image.

According to the technical scheme of the embodiment of the invention, various simulated feature data can be used as the input of the detection model, and the feature data have complementarity, so that the input of the model is more multidimensional, the accuracy of the identification result determined by the output of the model is greatly improved, each target object in an automatic driving scene can be accurately identified, and reference is provided for the simulation of automatic driving.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An object recognition method, comprising:

collecting multiple frames of point clouds and multiple frames of images in a detection area; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances;

constructing mapping of the point cloud and the image instance according to a preset transformation relation between a first coordinate system of the point cloud and a second coordinate system of the image;

according to the mapping result, determining a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster, and determining the center coordinates of the point cloud cluster;

and inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model, and identifying a target object of the point cloud.

2. The method of claim 1, wherein determining the point cloud cluster corresponding to the image instance comprises:

acquiring a current frame point cloud and a previous frame image;

in the current frame point cloud, a plurality of point cloud coordinates mapped by any image instance included in the previous frame image form the point cloud cluster.

3. The method of claim 2, wherein the step of composing the point cloud cluster from a plurality of point cloud coordinates to which any image instance included in the previous frame of image is mapped comprises:

projecting the current frame point cloud to the second coordinate system;

and forming the point cloud cluster by point cloud coordinates corresponding to a plurality of pixel points of any image instance included in the previous frame of image after projection.

4. The method of claim 2, wherein the image instance indicates category information; the determining the category coordinates of the point cloud cluster comprises:

and determining the category coordinates of the point cloud cluster corresponding to the image instance according to the category information of the image instance included in the previous frame of image.

5. The method of claim 1, further comprising:

splicing the point cloud coordinates, the category coordinates and the center coordinates of the point cloud;

the method for inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model comprises the following steps:

and inputting the spliced result into the detection model.

6. The method of claim 5, wherein the stitching point cloud coordinates, category coordinates, and center coordinates of the point cloud comprises:

executing the following steps for each point cloud coordinate of the same point cloud cluster:

7. The method of claim 6, further comprising:

and determining the category coordinate and the center coordinate of the point cloud coordinate to be a default category coordinate and a default center coordinate respectively for the point cloud coordinates except the point cloud cluster in the point cloud, and splicing the default category coordinate and the default center coordinate to the point cloud coordinate.

8. The method of claim 1, further comprising:

carrying out noise reduction processing on the point cloud in a clustering mode;

9. The method of claim 2, wherein the point cloud and the image are obtained by a radar sensor and a camera, respectively, wherein the radar sensor and the camera are synchronously acquired.

10. The method of claim 4, wherein the category information is mapped with corresponding binary coordinates; determining category coordinates of a point cloud cluster corresponding to the image instance, comprising:

and determining the binary coordinate of the category information of the image instance as the category coordinate of the point cloud cluster.

11. An object recognition apparatus, comprising:

the acquisition module is used for acquiring multi-frame point clouds and multi-frame images in the detection area; wherein the point cloud comprises a plurality of point cloud coordinates, and each frame of the image corresponds to one or more image instances;

the mapping module is used for constructing mapping between the point cloud and the image instance according to a preset conversion relation between a first coordinate system of the point cloud and a second coordinate system of the image;

the data processing module is used for determining a point cloud cluster corresponding to the image instance and the category coordinates of the point cloud cluster according to the mapping result, and determining the center coordinates of the point cloud cluster;

and the identification module is used for inputting the point cloud coordinates, the category coordinates and the center coordinates of the point cloud into a preset detection model and identifying the target object of the point cloud.

12. An electronic device for object recognition, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.

13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.