CN111191582B

CN111191582B - Three-dimensional target detection method, detection device, terminal device and computer readable storage medium

Info

Publication number: CN111191582B
Application number: CN201911383359.0A
Authority: CN
Inventors: 刘培超; 徐培; 郎需林; 刘主福
Original assignee: Shenzhen Yuejiang Technology Co Ltd
Current assignee: Shenzhen Yuejiang Technology Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2022-11-01
Anticipated expiration: 2039-12-27
Also published as: CN111191582A

Abstract

The application is applicable to the technical field of computer vision, and provides a three-dimensional target detection method, a detection device, a terminal device and a computer readable storage medium, wherein the three-dimensional target detection method comprises the following steps: acquiring a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data; carrying out example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example; mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance; and performing image recognition on the example image to obtain a recognition result of the example corresponding to the example image. The method and the device can solve the technical problem of low accuracy of target detection under the conditions of low computation complexity and low cost.

Description

Three-dimensional target detection method, detection device, terminal device and computer readable storage medium

Technical Field

The present application belongs to the field of computer vision technology, and in particular, to a three-dimensional target detection method, a detection apparatus, a terminal device, and a computer-readable storage medium.

Background

The 3D target detection is to detect and identify the pose and the category of an interested target in an image or a video, and compared with the 2D target detection, the position of the target in a 3D space needs to be detected, which puts higher requirements on a detection algorithm.

At present, a deep neural network method and a feature engineering algorithm are mainly used as a 3D target detection algorithm, the deep neural network algorithm directly processes point cloud data, the feature engineering algorithm constructs 3D features of an object, and then the target is matched with a template. However, regardless of the deep neural network algorithm or the feature engineering algorithm, the high accuracy of the target detection is at the cost of the complexity of the calculation, and the complexity of the calculation is high, resulting in high cost. Therefore, there is a need for a method that can improve the accuracy of target detection with low computational complexity and cost.

Disclosure of Invention

The embodiment of the application provides a three-dimensional target detection method, a detection device, a terminal device and a computer readable storage medium, and can solve the technical problem of low target detection accuracy rate under the conditions of low computation complexity and low cost.

In a first aspect, an embodiment of the present application provides a three-dimensional target detection method, including:

acquiring a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data;

example segmentation is carried out on the first point cloud data to obtain second point cloud data corresponding to at least one example;

mapping the second point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example;

and carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.

In a second aspect, an embodiment of the present application provides a three-dimensional target detection apparatus, including:

the image acquisition module is used for acquiring a depth image and an RGB image of a scene to be detected and converting the depth image into first point cloud data;

the example segmentation module is used for carrying out example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example;

a first mapping module, configured to map the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;

and the identification module is used for carrying out image identification on the example image to obtain an identification result of the example corresponding to the example image.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and the computer program implements the steps of the method according to the first aspect when executed by a processor.

In a fifth aspect, the present application provides a computer program product, which includes a computer program that, when executed by one or more processors, implements the steps of the method according to the first aspect.

It is to be understood that, for the beneficial effects of the second aspect to the fifth aspect, reference may be made to the relevant description in the first aspect, and details are not described herein again.

In view of the above, the present application provides a three-dimensional target detection method, which includes, first, obtaining a depth image and an RGB image of a scene to be detected (the scene to be detected includes an interesting target, for example, the scene to be detected includes lawns with a plurality of balloons, and the balloons are the interesting target), and converting the depth image into first point cloud data; performing example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example (for example, a lawn comprises a plurality of balloons, and after the example segmentation is performed on the first point cloud data, obtaining second point cloud data corresponding to each balloon); mapping the second point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example; and carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.

Obviously, according to the method and the device, the point cloud data are subjected to example segmentation, and then the segmented point cloud data are mapped to the RGB image, and the point cloud data are not directly processed, so that the complexity of calculation is reduced. And the point cloud data after being divided is mapped to the RGB image, so that the corresponding example image can be found on the RGB image more accurately by the example, meanwhile, the point cloud data is mapped to the RGB image, the example image corresponding to the example is obtained, and the example image is separately identified, so that the accuracy of example identification can be improved. Therefore, the technical scheme of the application can improve the accuracy of three-dimensional target detection under the conditions of low computational complexity and low cost.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation of a three-dimensional target detection method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating an implementation of a method for obtaining an example image according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a mask image bounding box according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for training a neural network according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a three-dimensional target detection method according to a second embodiment of the present application;

FIG. 6 is a schematic structural diagram of a three-dimensional object detection apparatus provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The three-dimensional target detection method provided by the embodiment of the application can be applied to terminal devices such as a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA) and the like, and the embodiment of the application does not limit the specific type of the terminal device at all.

In order to explain the technical means described in the present application, the following description will be given by way of specific examples.

Example one

S101, obtaining a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data;

in step S101, acquiring a depth image and an RGB image of a scene to be detected may be acquired by a multi-view camera, for example, a binocular camera is used to acquire the depth image and the RGB image of the scene to be detected, one camera acquires the depth image, and the other camera acquires the RGB image; or the three-camera is used for carrying out depth image and RGB image of the scene to be detected, at the moment, two cameras acquire the depth image, and one camera acquires the RGB image, so that the depth image with more comprehensive depth information can be obtained. The scene to be detected is a scene containing an interested target, for example, if the interested target is a balloon and the balloon is on a lawn, the scene to be detected is the lawn and the balloon needs to be identified from the lawn.

In addition, the depth image and the RGB image may be collected by the terminal device of this embodiment, or may be collected by another terminal device and then sent to the terminal device of this embodiment for processing. In this embodiment, the terminal device for acquiring the depth image and the RGB image is not limited.

Step S102, example segmentation is carried out on the first point cloud data to obtain second point cloud data corresponding to at least one example;

in step S102, the specific type of segmentation algorithm may be selected according to actual requirements. For example, local Convex Connected packed paths (LCCP) and euclidean clustering algorithm may be selected as the segmentation algorithm employed in the present embodiment. In the embodiments of the present application, the type of the segmentation algorithm is not limited.

And example segmentation is performed on the first point cloud data, so that second point cloud data corresponding to different individuals can be obtained, for example, when the target of interest is a balloon, the balloon is on a lawn, the scene to be detected is the lawn, and after the balloon on the graph is segmented, the second point cloud data corresponding to each balloon can be obtained.

In this embodiment, instead of directly processing the point cloud data, the point cloud data is first subjected to instance segmentation, and then the segmented point cloud data is subjected to subsequent processing, so that the complexity of calculation is reduced.

Step S103, mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;

in step S103, the second point cloud data corresponding to each of the instances is mapped to the RGB image according to the following mapping relationship:

wherein, the point p_D(x_D,y_D,z_D) Coordinate representation of the second point cloud data corresponding to each instance on a camera for acquiring the depth image, p_W(z_Du,z_Dv,z_D) For each instance corresponding pixel coordinate representation of the second point cloud data,

is a coefficient matrix, where f_u＝f/dX,f_v= f/dY, referred to as normalized focal length on u-and v-axis, respectively; f is the focal length of the camera, and dX and dY represent the scale of a unit pixel on the u-axis and v-axis of the sensor, respectivelyCun, large and small, u_o,v_oAre the principal point coordinates.

In this embodiment, the segmented point cloud data is mapped to the RGB image, so that the corresponding example image can be found on the RGB image more accurately by the example, and after the example image corresponding to each example is found, the step S104 is executed.

In this embodiment, the method as shown in fig. 2 may be used to obtain the example image corresponding to each of the above examples.

As shown in fig. 2, an example image corresponding to each of the above examples is obtained through steps S201 to S203.

Step S201, mapping the second point cloud data corresponding to each instance to the RGB image to obtain a mask image corresponding to each instance;

in step S201, a mask operation is performed on an image obtained by mapping the second point cloud data corresponding to each of the examples to the RGB image, so as to obtain a mask image, and a mask operation is performed on an image obtained by mapping the second point cloud data corresponding to each of the examples to the RGB image, so that an uninteresting region in the image obtained by mapping the second point cloud data corresponding to each of the examples to the RGB image is removed, thereby reducing the amount of calculation.

Step S202, determining a bounding box of the mask image corresponding to each instance;

in some possible implementations, the bounding box of the mask image corresponding to each of the above examples may be an outer contour (e.g., 301 in fig. 3) of each mask image, and at this time, the calculation amount is relatively small.

In other possible implementations, the bounding box may also be a minimum rectangular bounding box (e.g. 302 in fig. 3) of the mask image, where the calculation amount is greater than that when the bounding box is an outer contour of each mask image, however, in the subsequent step S203, the segmentation accuracy of the RGB image may be improved.

In other possible implementations, the bounding box may also be a rectangular bounding box (as shown in 303 in fig. 3) that is larger than the mask image by a preset ratio, at this time, in the subsequent step S203, the segmentation accuracy of the RGB image is higher, however, the calculation amount is increased accordingly, the preset ratio may be set by a user according to actual needs, and is not specifically limited herein.

The user may determine the specific form in the bounding box of the mask image according to actual needs, which is not specifically limited herein.

Step S203, segmenting the RGB image according to the surrounding frame to obtain an example image corresponding to each example;

in step S203, the RGB image is segmented according to the bounding box to obtain an example image corresponding to each example, so that the example image can be input to the image recognition module for recognition.

In this embodiment, by obtaining a mask image corresponding to each instance, an uninteresting region in an image obtained by mapping the second point cloud data corresponding to each instance to the RGB image is removed, thereby reducing the amount of calculation. And then, the RGB image is segmented according to the surrounding frame of the mask image so as to improve the accuracy of the segmentation of the RGB image.

It should be noted that, in some possible implementation manners, before the first point cloud data is segmented to obtain the second point cloud data corresponding to the at least one example, the first point cloud data is mapped to the RGB space to obtain third point cloud data, then the third point cloud data is segmented to obtain the second point cloud data corresponding to the at least one example, and finally step S103 is executed.

Mapping the first point cloud data to an RGB space according to the following relation:

wherein, P_D(X_D,Y_D,Z_D) For a coordinate representation of the first point cloud data in a camera coordinate system for obtaining the depth image, P_R(X_R,Y_R,Z_R) Shooting of third point cloud data in RGB image acquisitionCoordinate representation under the header coordinate system, [ Rt ]]Is a system matrix, where R is a rotation matrix and t is a translation matrix.

Correspondingly, the mapping relationship for mapping the second point cloud data corresponding to each of the above examples to the RGB image is:

wherein, the point p_R(x_R,y_R,z_R) Coordinate representation, p, of the second point cloud data corresponding to each instance in the camera coordinate system for obtaining the RGB image_W(z_Ru,z_Rv,z_R) For each instance corresponding pixel coordinate representation of the second point cloud data,

is a coefficient matrix, where f_u＝f/dX,f_v= f/dY, referred to as normalized focal length on u-axis and v-axis, respectively; f is the focal length of the camera, dX and dY represent the size of a unit pixel on the u-axis and v-axis of the sensor, respectively, u_o,v_oAre the principal point coordinates.

In other possible implementation manners, the first point cloud data may be segmented to obtain second point cloud data corresponding to at least one example, then the second point cloud data corresponding to each example is mapped to an RGB space to obtain fourth point cloud data corresponding to each example, and then step S103 is performed according to the fourth point cloud data. Correspondingly, in step S103, mapping the fourth point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance. The mapping relationship of the second point cloud data corresponding to each instance to the RGB space is the same as that of the first point cloud data to the RGB space, and is not described herein again.

Because the coordinate system of the camera for obtaining the depth image is inconsistent with the coordinate system of the camera for obtaining the RGB image, the first point cloud data is mapped to the RGB space so as to obtain third point cloud data expressed on the coordinate system of the camera for obtaining the RGB image, then the third point cloud data is subjected to instance segmentation to obtain second point cloud data corresponding to at least one instance, or the second point cloud data corresponding to each instance is mapped to the RGB space to obtain fourth point cloud data corresponding to each instance, so that the second point cloud data (the fourth point cloud data) corresponding to each instance is better mapped to the RGB image.

And step S104, carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.

In step S104, image recognition is performed on the example image according to a preset neural network model. Referring to fig. 4, in the present embodiment, a preset neural network model may be trained through steps S401-S402.

S401, obtaining training sample data, wherein the training sample data comprises a sample image and a label of the sample image;

before training a preset neural network model, labeling a training sample image. The training sample images are divided example images, the example images are classified first, and the number of the example images in each category can be set according to the actual needs of the user, for example, can be set to 100-200.

S402, inputting the training sample data into a preset neural network model for training to obtain a trained preset neural network model;

and inputting the marked example image into a preset neural network model for training to obtain the trained preset neural network model.

In addition, a verification data set can be prepared so as to verify whether the preset neural network model judges the type of the example image correctly, and if the preset neural network model judges the type of the example image incorrectly, the image with the wrong classification is trained again.

In summary, the present application provides a three-dimensional target detection method, first, a depth image and an RGB image of a scene to be detected are obtained (the scene to be detected includes an interested target, for example, the scene to be detected includes lawns with multiple balloons, and the balloons are the interested target), and the depth image is converted into first point cloud data; performing example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example (for example, a lawn comprises a plurality of balloons, and after the example segmentation is performed on the first point cloud data, obtaining second point cloud data corresponding to each balloon); mapping the second point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example; and carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.

Obviously, according to the method and the device, the point cloud data are subjected to example segmentation, and then the segmented point cloud data are mapped to the RGB image, and the point cloud data are not directly processed, so that the complexity of calculation is reduced. And the point cloud data after segmentation is mapped to the RGB image, so that the corresponding example image can be more accurately found on the RGB image by the example, meanwhile, the point cloud data is mapped to the RGB image, the example image corresponding to the example is obtained, and the example image is separately identified, so that the accuracy of example identification can be improved. Therefore, the technical scheme of the application can improve the accuracy of three-dimensional target detection under the conditions of low computational complexity and low cost.

Example two

The following describes a three-dimensional target detection method provided in the second embodiment of the present application, with reference to fig. 5, where the method includes:

steps S501 to S504 are the same as steps S101 to S104 in the first embodiment, and are not described again here.

In step S505, determining a capture pose according to the second point cloud data corresponding to the target instance;

in step S505, after the identification result corresponding to the example image is obtained, for example, the determination category of the example image is a balloon, that is, the target example is a balloon, the second point cloud data corresponding to the balloon is captured by using a clustering algorithm, and the pose of the point corresponding to each balloon is obtained by calculating the principal curvature and the principal direction of the second point cloud data corresponding to the balloon, so as to determine the pose of each balloon.

The calculation mode of the main curvature can be determined according to actual requirements. For example, the principal curvature may be calculated by a cubic surface fitting algorithm, a network data-based algorithm, or a calculation algorithm that directly performs differential geometric feature quantities on point cloud data.

In step S506, the grabbing pose is sent to the mechanical arm to instruct the mechanical arm to execute a grabbing operation corresponding to the grabbing pose;

in step S506, the pose corresponding to each example of the grasp is sent to the mechanical arm, for example, when the pose corresponding to the balloon is grasped, the pose of the balloon is sent to the mechanical arm, so that the mechanical arm performs the grasping operation corresponding to the pose corresponding to each example of the grasp.

The identification result of the example image only contains the type of the example and does not contain the pose of the three-dimensional space of the example, so that after the identification result corresponding to the example image is obtained, the pose of the three-dimensional space corresponding to each example is determined according to the type of the example, the pose of the three-dimensional space corresponding to each example can be obtained only through the depth image, and the pose of the three-dimensional space corresponding to each example is sent to the mechanical arm so as to control the mechanical arm to sort the examples.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.

EXAMPLE III

Fig. 6 shows a block diagram of a three-dimensional object detection device provided in the embodiment of the present application, corresponding to the three-dimensional object detection method described in the above embodiment, and only the relevant parts to the embodiment of the present application are shown for convenience of description.

Referring to fig. 6, the apparatus 600 includes:

the image acquisition module 601 is configured to acquire a depth image and an RGB image of a scene to be detected, and convert the depth image into first point cloud data;

an example segmentation module 602, configured to perform example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example;

a first mapping module 603, configured to map the second point cloud data corresponding to each instance to the RGB image, so as to obtain an instance image corresponding to each instance;

the identifying module 604 is configured to perform image identification on the example image to obtain an identification result of the example corresponding to the example image.

Optionally, the apparatus 600 further comprises:

the second mapping module is used for mapping the first point cloud data to an RGB space to obtain third point cloud data;

correspondingly, example segmentation is performed on the first point cloud data, and obtaining second point cloud data corresponding to at least one example includes:

and example segmentation is carried out on the third point cloud data to obtain second point cloud data corresponding to at least one example.

Optionally, the first mapping module 603 is specifically configured to:

mapping the second point cloud data corresponding to each example to an RGB space to obtain fourth point cloud data corresponding to each example;

and mapping the fourth point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example.

Optionally, the first mapping module 603 includes:

a mapping unit, configured to map the second point cloud data corresponding to each instance to the RGB image to obtain a mask image corresponding to each instance;

a bounding box determining unit, configured to determine a bounding box of the mask image corresponding to each of the instances;

and the segmentation unit is used for segmenting the RGB image according to the surrounding frame to obtain an example image corresponding to each example.

Optionally, the identifying module 604 is specifically configured to:

and performing image recognition on the example image according to a preset neural network model.

Optionally, the identifying module 604 comprises:

a training sample data obtaining unit, configured to obtain training sample data, where the training sample data includes a sample image and a label of the sample image;

and the training sample data input unit is used for inputting the training sample data into a preset neural network model for training to obtain the trained preset neural network model.

Optionally, the apparatus 600 further comprises:

the grabbing pose determining module is used for determining a grabbing pose according to the second point cloud data corresponding to the target example;

and the grabbing pose sending module is used for sending the grabbing pose to the mechanical arm so as to instruct the mechanical arm to execute grabbing operation corresponding to the grabbing pose.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

Example four

Fig. 7 is a schematic diagram of a terminal device provided in the third embodiment of the present application. As shown in fig. 7, the terminal device 700 of this embodiment includes: a processor 701, a memory 702, and a computer program 703 stored in the memory 702 and executable on the processor 701. The steps in the various method embodiments described above are implemented when the processor 701 executes the computer program 703 described above. Alternatively, the processor 701 implements the functions of each module/unit in each device embodiment when executing the computer program 703.

Illustratively, the computer program 703 may be divided into one or more modules/units, which are stored in the memory 702 and executed by the processor 701 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used for describing the execution process of the computer program 703 in the terminal device 700. For example, the computer program 703 may be divided into an image acquisition module, an instance division module, a first mapping module, and an identification module, and the specific functions of each module are as follows:

carrying out example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example;

mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;

The terminal device may include, but is not limited to, a processor 701 and a memory 702. Those skilled in the art will appreciate that fig. 7 is merely an example of a terminal device 700, and does not constitute a limitation of the terminal device 700, and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The Processor 701 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 702 may be an internal storage unit of the terminal device 700, such as a hard disk or a memory of the terminal device 700. The memory 702 may also be an external storage device of the terminal device 700, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the terminal device 700. Further, the memory 702 may include both an internal storage unit and an external storage device of the terminal device 700. The memory 702 is used to store the computer program and other programs and data required by the terminal device. The memory 702 may also be used to temporarily store data that has been output or is to be output.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the above modules or units is only one logical function division, and there may be other division manners in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated modules/units described above, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the above method embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the computer readable medium described above may include content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A three-dimensional target detection method is characterized by comprising the following steps:

performing image recognition on the example image to obtain a recognition result of an example corresponding to the example image;

the mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance includes:

and mapping the fourth point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance.

2. The three-dimensional target detection method of claim 1, wherein the performing instance segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one instance comprises:

mapping the first point cloud data to an RGB space to obtain third point cloud data;

3. The method as claimed in claim 1, wherein the mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance comprises:

mapping the second point cloud data corresponding to each instance to the RGB image to obtain a mask image corresponding to each instance;

determining a bounding box of the mask image corresponding to each instance;

and segmenting the RGB image according to the surrounding frame to obtain an example image corresponding to each example.

4. The three-dimensional object detection method of claim 1, wherein said image recognizing the instance image comprises:

5. The three-dimensional object detection method according to claim 4, wherein the preset neural network model is trained by the following method:

acquiring training sample data, wherein the training sample data comprises a sample image and an annotation of the sample image;

inputting the training sample data into a preset neural network model for training to obtain the trained preset neural network model.

6. The three-dimensional object detection method according to claim 1, wherein after the performing image recognition on the instance image to obtain a recognition result of an instance corresponding to the instance image, the method comprises:

determining a grabbing pose according to the second point cloud data corresponding to the target example;

and sending the grabbing pose to a mechanical arm to instruct the mechanical arm to execute grabbing operation corresponding to the grabbing pose.

7. A three-dimensional object detecting device, comprising:

the system comprises an image acquisition module, a first point cloud data acquisition module and a second point cloud data acquisition module, wherein the image acquisition module is used for acquiring a depth image and an RGB (red, green and blue) image of a scene to be detected and converting the depth image into first point cloud data;

the instance segmentation module is used for performing instance segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one instance;

the first mapping module is used for mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;

the identification module is used for carrying out image identification on the example image to obtain an identification result of the example corresponding to the example image;

the first mapping module is specifically configured to:

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.