CN111191582B - Three-dimensional target detection method, detection device, terminal device and computer readable storage medium - Google Patents

Three-dimensional target detection method, detection device, terminal device and computer readable storage medium Download PDF

Info

Publication number
CN111191582B
CN111191582B CN201911383359.0A CN201911383359A CN111191582B CN 111191582 B CN111191582 B CN 111191582B CN 201911383359 A CN201911383359 A CN 201911383359A CN 111191582 B CN111191582 B CN 111191582B
Authority
CN
China
Prior art keywords
image
point cloud
cloud data
instance
data corresponding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911383359.0A
Other languages
Chinese (zh)
Other versions
CN111191582A (en
Inventor
刘培超
徐培
郎需林
刘主福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yuejiang Technology Co Ltd
Original Assignee
Shenzhen Yuejiang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yuejiang Technology Co Ltd filed Critical Shenzhen Yuejiang Technology Co Ltd
Priority to CN201911383359.0A priority Critical patent/CN111191582B/en
Publication of CN111191582A publication Critical patent/CN111191582A/en
Application granted granted Critical
Publication of CN111191582B publication Critical patent/CN111191582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]

Abstract

The application is applicable to the technical field of computer vision, and provides a three-dimensional target detection method, a detection device, a terminal device and a computer readable storage medium, wherein the three-dimensional target detection method comprises the following steps: acquiring a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data; carrying out example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example; mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance; and performing image recognition on the example image to obtain a recognition result of the example corresponding to the example image. The method and the device can solve the technical problem of low accuracy of target detection under the conditions of low computation complexity and low cost.

Description

Three-dimensional target detection method, detection device, terminal device and computer readable storage medium
Technical Field
The present application belongs to the field of computer vision technology, and in particular, to a three-dimensional target detection method, a detection apparatus, a terminal device, and a computer-readable storage medium.
Background
The 3D target detection is to detect and identify the pose and the category of an interested target in an image or a video, and compared with the 2D target detection, the position of the target in a 3D space needs to be detected, which puts higher requirements on a detection algorithm.
At present, a deep neural network method and a feature engineering algorithm are mainly used as a 3D target detection algorithm, the deep neural network algorithm directly processes point cloud data, the feature engineering algorithm constructs 3D features of an object, and then the target is matched with a template. However, regardless of the deep neural network algorithm or the feature engineering algorithm, the high accuracy of the target detection is at the cost of the complexity of the calculation, and the complexity of the calculation is high, resulting in high cost. Therefore, there is a need for a method that can improve the accuracy of target detection with low computational complexity and cost.
Disclosure of Invention
The embodiment of the application provides a three-dimensional target detection method, a detection device, a terminal device and a computer readable storage medium, and can solve the technical problem of low target detection accuracy rate under the conditions of low computation complexity and low cost.
In a first aspect, an embodiment of the present application provides a three-dimensional target detection method, including:
acquiring a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data;
example segmentation is carried out on the first point cloud data to obtain second point cloud data corresponding to at least one example;
mapping the second point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example;
and carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.
In a second aspect, an embodiment of the present application provides a three-dimensional target detection apparatus, including:
the image acquisition module is used for acquiring a depth image and an RGB image of a scene to be detected and converting the depth image into first point cloud data;
the example segmentation module is used for carrying out example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example;
a first mapping module, configured to map the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;
and the identification module is used for carrying out image identification on the example image to obtain an identification result of the example corresponding to the example image.
In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and the computer program implements the steps of the method according to the first aspect when executed by a processor.
In a fifth aspect, the present application provides a computer program product, which includes a computer program that, when executed by one or more processors, implements the steps of the method according to the first aspect.
It is to be understood that, for the beneficial effects of the second aspect to the fifth aspect, reference may be made to the relevant description in the first aspect, and details are not described herein again.
In view of the above, the present application provides a three-dimensional target detection method, which includes, first, obtaining a depth image and an RGB image of a scene to be detected (the scene to be detected includes an interesting target, for example, the scene to be detected includes lawns with a plurality of balloons, and the balloons are the interesting target), and converting the depth image into first point cloud data; performing example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example (for example, a lawn comprises a plurality of balloons, and after the example segmentation is performed on the first point cloud data, obtaining second point cloud data corresponding to each balloon); mapping the second point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example; and carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.
Obviously, according to the method and the device, the point cloud data are subjected to example segmentation, and then the segmented point cloud data are mapped to the RGB image, and the point cloud data are not directly processed, so that the complexity of calculation is reduced. And the point cloud data after being divided is mapped to the RGB image, so that the corresponding example image can be found on the RGB image more accurately by the example, meanwhile, the point cloud data is mapped to the RGB image, the example image corresponding to the example is obtained, and the example image is separately identified, so that the accuracy of example identification can be improved. Therefore, the technical scheme of the application can improve the accuracy of three-dimensional target detection under the conditions of low computational complexity and low cost.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart illustrating an implementation of a three-dimensional target detection method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating an implementation of a method for obtaining an example image according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a mask image bounding box according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for training a neural network according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a three-dimensional target detection method according to a second embodiment of the present application;
FIG. 6 is a schematic structural diagram of a three-dimensional object detection apparatus provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The three-dimensional target detection method provided by the embodiment of the application can be applied to terminal devices such as a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA) and the like, and the embodiment of the application does not limit the specific type of the terminal device at all.
In order to explain the technical means described in the present application, the following description will be given by way of specific examples.
Example one
S101, obtaining a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data;
in step S101, acquiring a depth image and an RGB image of a scene to be detected may be acquired by a multi-view camera, for example, a binocular camera is used to acquire the depth image and the RGB image of the scene to be detected, one camera acquires the depth image, and the other camera acquires the RGB image; or the three-camera is used for carrying out depth image and RGB image of the scene to be detected, at the moment, two cameras acquire the depth image, and one camera acquires the RGB image, so that the depth image with more comprehensive depth information can be obtained. The scene to be detected is a scene containing an interested target, for example, if the interested target is a balloon and the balloon is on a lawn, the scene to be detected is the lawn and the balloon needs to be identified from the lawn.
In addition, the depth image and the RGB image may be collected by the terminal device of this embodiment, or may be collected by another terminal device and then sent to the terminal device of this embodiment for processing. In this embodiment, the terminal device for acquiring the depth image and the RGB image is not limited.
Step S102, example segmentation is carried out on the first point cloud data to obtain second point cloud data corresponding to at least one example;
in step S102, the specific type of segmentation algorithm may be selected according to actual requirements. For example, local Convex Connected packed paths (LCCP) and euclidean clustering algorithm may be selected as the segmentation algorithm employed in the present embodiment. In the embodiments of the present application, the type of the segmentation algorithm is not limited.
And example segmentation is performed on the first point cloud data, so that second point cloud data corresponding to different individuals can be obtained, for example, when the target of interest is a balloon, the balloon is on a lawn, the scene to be detected is the lawn, and after the balloon on the graph is segmented, the second point cloud data corresponding to each balloon can be obtained.
In this embodiment, instead of directly processing the point cloud data, the point cloud data is first subjected to instance segmentation, and then the segmented point cloud data is subjected to subsequent processing, so that the complexity of calculation is reduced.
Step S103, mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;
in step S103, the second point cloud data corresponding to each of the instances is mapped to the RGB image according to the following mapping relationship:
Figure BDA0002342836240000061
wherein, the point pD(xD,yD,zD) Coordinate representation of the second point cloud data corresponding to each instance on a camera for acquiring the depth image, pW(zDu,zDv,zD) For each instance corresponding pixel coordinate representation of the second point cloud data,
Figure BDA0002342836240000062
is a coefficient matrix, where fu=f/dX,fv= f/dY, referred to as normalized focal length on u-and v-axis, respectively; f is the focal length of the camera, and dX and dY represent the scale of a unit pixel on the u-axis and v-axis of the sensor, respectivelyCun, large and small, uo,voAre the principal point coordinates.
In this embodiment, the segmented point cloud data is mapped to the RGB image, so that the corresponding example image can be found on the RGB image more accurately by the example, and after the example image corresponding to each example is found, the step S104 is executed.
In this embodiment, the method as shown in fig. 2 may be used to obtain the example image corresponding to each of the above examples.
As shown in fig. 2, an example image corresponding to each of the above examples is obtained through steps S201 to S203.
Step S201, mapping the second point cloud data corresponding to each instance to the RGB image to obtain a mask image corresponding to each instance;
in step S201, a mask operation is performed on an image obtained by mapping the second point cloud data corresponding to each of the examples to the RGB image, so as to obtain a mask image, and a mask operation is performed on an image obtained by mapping the second point cloud data corresponding to each of the examples to the RGB image, so that an uninteresting region in the image obtained by mapping the second point cloud data corresponding to each of the examples to the RGB image is removed, thereby reducing the amount of calculation.
Step S202, determining a bounding box of the mask image corresponding to each instance;
in some possible implementations, the bounding box of the mask image corresponding to each of the above examples may be an outer contour (e.g., 301 in fig. 3) of each mask image, and at this time, the calculation amount is relatively small.
In other possible implementations, the bounding box may also be a minimum rectangular bounding box (e.g. 302 in fig. 3) of the mask image, where the calculation amount is greater than that when the bounding box is an outer contour of each mask image, however, in the subsequent step S203, the segmentation accuracy of the RGB image may be improved.
In other possible implementations, the bounding box may also be a rectangular bounding box (as shown in 303 in fig. 3) that is larger than the mask image by a preset ratio, at this time, in the subsequent step S203, the segmentation accuracy of the RGB image is higher, however, the calculation amount is increased accordingly, the preset ratio may be set by a user according to actual needs, and is not specifically limited herein.
The user may determine the specific form in the bounding box of the mask image according to actual needs, which is not specifically limited herein.
Step S203, segmenting the RGB image according to the surrounding frame to obtain an example image corresponding to each example;
in step S203, the RGB image is segmented according to the bounding box to obtain an example image corresponding to each example, so that the example image can be input to the image recognition module for recognition.
In this embodiment, by obtaining a mask image corresponding to each instance, an uninteresting region in an image obtained by mapping the second point cloud data corresponding to each instance to the RGB image is removed, thereby reducing the amount of calculation. And then, the RGB image is segmented according to the surrounding frame of the mask image so as to improve the accuracy of the segmentation of the RGB image.
It should be noted that, in some possible implementation manners, before the first point cloud data is segmented to obtain the second point cloud data corresponding to the at least one example, the first point cloud data is mapped to the RGB space to obtain third point cloud data, then the third point cloud data is segmented to obtain the second point cloud data corresponding to the at least one example, and finally step S103 is executed.
Mapping the first point cloud data to an RGB space according to the following relation:
Figure BDA0002342836240000081
wherein, PD(XD,YD,ZD) For a coordinate representation of the first point cloud data in a camera coordinate system for obtaining the depth image, PR(XR,YR,ZR) Shooting of third point cloud data in RGB image acquisitionCoordinate representation under the header coordinate system, [ Rt ]]Is a system matrix, where R is a rotation matrix and t is a translation matrix.
Correspondingly, the mapping relationship for mapping the second point cloud data corresponding to each of the above examples to the RGB image is:
Figure BDA0002342836240000082
wherein, the point pR(xR,yR,zR) Coordinate representation, p, of the second point cloud data corresponding to each instance in the camera coordinate system for obtaining the RGB imageW(zRu,zRv,zR) For each instance corresponding pixel coordinate representation of the second point cloud data,
Figure BDA0002342836240000083
is a coefficient matrix, where fu=f/dX,fv= f/dY, referred to as normalized focal length on u-axis and v-axis, respectively; f is the focal length of the camera, dX and dY represent the size of a unit pixel on the u-axis and v-axis of the sensor, respectively, uo,voAre the principal point coordinates.
In other possible implementation manners, the first point cloud data may be segmented to obtain second point cloud data corresponding to at least one example, then the second point cloud data corresponding to each example is mapped to an RGB space to obtain fourth point cloud data corresponding to each example, and then step S103 is performed according to the fourth point cloud data. Correspondingly, in step S103, mapping the fourth point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance. The mapping relationship of the second point cloud data corresponding to each instance to the RGB space is the same as that of the first point cloud data to the RGB space, and is not described herein again.
Because the coordinate system of the camera for obtaining the depth image is inconsistent with the coordinate system of the camera for obtaining the RGB image, the first point cloud data is mapped to the RGB space so as to obtain third point cloud data expressed on the coordinate system of the camera for obtaining the RGB image, then the third point cloud data is subjected to instance segmentation to obtain second point cloud data corresponding to at least one instance, or the second point cloud data corresponding to each instance is mapped to the RGB space to obtain fourth point cloud data corresponding to each instance, so that the second point cloud data (the fourth point cloud data) corresponding to each instance is better mapped to the RGB image.
And step S104, carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.
In step S104, image recognition is performed on the example image according to a preset neural network model. Referring to fig. 4, in the present embodiment, a preset neural network model may be trained through steps S401-S402.
S401, obtaining training sample data, wherein the training sample data comprises a sample image and a label of the sample image;
before training a preset neural network model, labeling a training sample image. The training sample images are divided example images, the example images are classified first, and the number of the example images in each category can be set according to the actual needs of the user, for example, can be set to 100-200.
S402, inputting the training sample data into a preset neural network model for training to obtain a trained preset neural network model;
and inputting the marked example image into a preset neural network model for training to obtain the trained preset neural network model.
In addition, a verification data set can be prepared so as to verify whether the preset neural network model judges the type of the example image correctly, and if the preset neural network model judges the type of the example image incorrectly, the image with the wrong classification is trained again.
In summary, the present application provides a three-dimensional target detection method, first, a depth image and an RGB image of a scene to be detected are obtained (the scene to be detected includes an interested target, for example, the scene to be detected includes lawns with multiple balloons, and the balloons are the interested target), and the depth image is converted into first point cloud data; performing example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example (for example, a lawn comprises a plurality of balloons, and after the example segmentation is performed on the first point cloud data, obtaining second point cloud data corresponding to each balloon); mapping the second point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example; and carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.
Obviously, according to the method and the device, the point cloud data are subjected to example segmentation, and then the segmented point cloud data are mapped to the RGB image, and the point cloud data are not directly processed, so that the complexity of calculation is reduced. And the point cloud data after segmentation is mapped to the RGB image, so that the corresponding example image can be more accurately found on the RGB image by the example, meanwhile, the point cloud data is mapped to the RGB image, the example image corresponding to the example is obtained, and the example image is separately identified, so that the accuracy of example identification can be improved. Therefore, the technical scheme of the application can improve the accuracy of three-dimensional target detection under the conditions of low computational complexity and low cost.
Example two
The following describes a three-dimensional target detection method provided in the second embodiment of the present application, with reference to fig. 5, where the method includes:
steps S501 to S504 are the same as steps S101 to S104 in the first embodiment, and are not described again here.
In step S505, determining a capture pose according to the second point cloud data corresponding to the target instance;
in step S505, after the identification result corresponding to the example image is obtained, for example, the determination category of the example image is a balloon, that is, the target example is a balloon, the second point cloud data corresponding to the balloon is captured by using a clustering algorithm, and the pose of the point corresponding to each balloon is obtained by calculating the principal curvature and the principal direction of the second point cloud data corresponding to the balloon, so as to determine the pose of each balloon.
The calculation mode of the main curvature can be determined according to actual requirements. For example, the principal curvature may be calculated by a cubic surface fitting algorithm, a network data-based algorithm, or a calculation algorithm that directly performs differential geometric feature quantities on point cloud data.
In step S506, the grabbing pose is sent to the mechanical arm to instruct the mechanical arm to execute a grabbing operation corresponding to the grabbing pose;
in step S506, the pose corresponding to each example of the grasp is sent to the mechanical arm, for example, when the pose corresponding to the balloon is grasped, the pose of the balloon is sent to the mechanical arm, so that the mechanical arm performs the grasping operation corresponding to the pose corresponding to each example of the grasp.
The identification result of the example image only contains the type of the example and does not contain the pose of the three-dimensional space of the example, so that after the identification result corresponding to the example image is obtained, the pose of the three-dimensional space corresponding to each example is determined according to the type of the example, the pose of the three-dimensional space corresponding to each example can be obtained only through the depth image, and the pose of the three-dimensional space corresponding to each example is sent to the mechanical arm so as to control the mechanical arm to sort the examples.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present application.
EXAMPLE III
Fig. 6 shows a block diagram of a three-dimensional object detection device provided in the embodiment of the present application, corresponding to the three-dimensional object detection method described in the above embodiment, and only the relevant parts to the embodiment of the present application are shown for convenience of description.
Referring to fig. 6, the apparatus 600 includes:
the image acquisition module 601 is configured to acquire a depth image and an RGB image of a scene to be detected, and convert the depth image into first point cloud data;
an example segmentation module 602, configured to perform example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example;
a first mapping module 603, configured to map the second point cloud data corresponding to each instance to the RGB image, so as to obtain an instance image corresponding to each instance;
the identifying module 604 is configured to perform image identification on the example image to obtain an identification result of the example corresponding to the example image.
Optionally, the apparatus 600 further comprises:
the second mapping module is used for mapping the first point cloud data to an RGB space to obtain third point cloud data;
correspondingly, example segmentation is performed on the first point cloud data, and obtaining second point cloud data corresponding to at least one example includes:
and example segmentation is carried out on the third point cloud data to obtain second point cloud data corresponding to at least one example.
Optionally, the first mapping module 603 is specifically configured to:
mapping the second point cloud data corresponding to each example to an RGB space to obtain fourth point cloud data corresponding to each example;
and mapping the fourth point cloud data corresponding to each example to the RGB image to obtain an example image corresponding to each example.
Optionally, the first mapping module 603 includes:
a mapping unit, configured to map the second point cloud data corresponding to each instance to the RGB image to obtain a mask image corresponding to each instance;
a bounding box determining unit, configured to determine a bounding box of the mask image corresponding to each of the instances;
and the segmentation unit is used for segmenting the RGB image according to the surrounding frame to obtain an example image corresponding to each example.
Optionally, the identifying module 604 is specifically configured to:
and performing image recognition on the example image according to a preset neural network model.
Optionally, the identifying module 604 comprises:
a training sample data obtaining unit, configured to obtain training sample data, where the training sample data includes a sample image and a label of the sample image;
and the training sample data input unit is used for inputting the training sample data into a preset neural network model for training to obtain the trained preset neural network model.
Optionally, the apparatus 600 further comprises:
the grabbing pose determining module is used for determining a grabbing pose according to the second point cloud data corresponding to the target example;
and the grabbing pose sending module is used for sending the grabbing pose to the mechanical arm so as to instruct the mechanical arm to execute grabbing operation corresponding to the grabbing pose.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
Example four
Fig. 7 is a schematic diagram of a terminal device provided in the third embodiment of the present application. As shown in fig. 7, the terminal device 700 of this embodiment includes: a processor 701, a memory 702, and a computer program 703 stored in the memory 702 and executable on the processor 701. The steps in the various method embodiments described above are implemented when the processor 701 executes the computer program 703 described above. Alternatively, the processor 701 implements the functions of each module/unit in each device embodiment when executing the computer program 703.
Illustratively, the computer program 703 may be divided into one or more modules/units, which are stored in the memory 702 and executed by the processor 701 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used for describing the execution process of the computer program 703 in the terminal device 700. For example, the computer program 703 may be divided into an image acquisition module, an instance division module, a first mapping module, and an identification module, and the specific functions of each module are as follows:
acquiring a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data;
carrying out example segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one example;
mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;
and carrying out image recognition on the example image to obtain a recognition result of the example corresponding to the example image.
The terminal device may include, but is not limited to, a processor 701 and a memory 702. Those skilled in the art will appreciate that fig. 7 is merely an example of a terminal device 700, and does not constitute a limitation of the terminal device 700, and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.
The Processor 701 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 702 may be an internal storage unit of the terminal device 700, such as a hard disk or a memory of the terminal device 700. The memory 702 may also be an external storage device of the terminal device 700, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the terminal device 700. Further, the memory 702 may include both an internal storage unit and an external storage device of the terminal device 700. The memory 702 is used to store the computer program and other programs and data required by the terminal device. The memory 702 may also be used to temporarily store data that has been output or is to be output.
It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the above modules or units is only one logical function division, and there may be other division manners in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated modules/units described above, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the above method embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the computer readable medium described above may include content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (9)

1. A three-dimensional target detection method is characterized by comprising the following steps:
acquiring a depth image and an RGB image of a scene to be detected, and converting the depth image into first point cloud data;
example segmentation is carried out on the first point cloud data to obtain second point cloud data corresponding to at least one example;
mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;
performing image recognition on the example image to obtain a recognition result of an example corresponding to the example image;
the mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance includes:
mapping the second point cloud data corresponding to each example to an RGB space to obtain fourth point cloud data corresponding to each example;
and mapping the fourth point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance.
2. The three-dimensional target detection method of claim 1, wherein the performing instance segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one instance comprises:
mapping the first point cloud data to an RGB space to obtain third point cloud data;
and example segmentation is carried out on the third point cloud data to obtain second point cloud data corresponding to at least one example.
3. The method as claimed in claim 1, wherein the mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance comprises:
mapping the second point cloud data corresponding to each instance to the RGB image to obtain a mask image corresponding to each instance;
determining a bounding box of the mask image corresponding to each instance;
and segmenting the RGB image according to the surrounding frame to obtain an example image corresponding to each example.
4. The three-dimensional object detection method of claim 1, wherein said image recognizing the instance image comprises:
and performing image recognition on the example image according to a preset neural network model.
5. The three-dimensional object detection method according to claim 4, wherein the preset neural network model is trained by the following method:
acquiring training sample data, wherein the training sample data comprises a sample image and an annotation of the sample image;
inputting the training sample data into a preset neural network model for training to obtain the trained preset neural network model.
6. The three-dimensional object detection method according to claim 1, wherein after the performing image recognition on the instance image to obtain a recognition result of an instance corresponding to the instance image, the method comprises:
determining a grabbing pose according to the second point cloud data corresponding to the target example;
and sending the grabbing pose to a mechanical arm to instruct the mechanical arm to execute grabbing operation corresponding to the grabbing pose.
7. A three-dimensional object detecting device, comprising:
the system comprises an image acquisition module, a first point cloud data acquisition module and a second point cloud data acquisition module, wherein the image acquisition module is used for acquiring a depth image and an RGB (red, green and blue) image of a scene to be detected and converting the depth image into first point cloud data;
the instance segmentation module is used for performing instance segmentation on the first point cloud data to obtain second point cloud data corresponding to at least one instance;
the first mapping module is used for mapping the second point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance;
the identification module is used for carrying out image identification on the example image to obtain an identification result of the example corresponding to the example image;
the first mapping module is specifically configured to:
mapping the second point cloud data corresponding to each example to an RGB space to obtain fourth point cloud data corresponding to each example;
and mapping the fourth point cloud data corresponding to each instance to the RGB image to obtain an instance image corresponding to each instance.
8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.
CN201911383359.0A 2019-12-27 2019-12-27 Three-dimensional target detection method, detection device, terminal device and computer readable storage medium Active CN111191582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911383359.0A CN111191582B (en) 2019-12-27 2019-12-27 Three-dimensional target detection method, detection device, terminal device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911383359.0A CN111191582B (en) 2019-12-27 2019-12-27 Three-dimensional target detection method, detection device, terminal device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111191582A CN111191582A (en) 2020-05-22
CN111191582B true CN111191582B (en) 2022-11-01

Family

ID=70707779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911383359.0A Active CN111191582B (en) 2019-12-27 2019-12-27 Three-dimensional target detection method, detection device, terminal device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111191582B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761999B (en) * 2020-09-07 2024-03-05 北京京东乾石科技有限公司 Target detection method and device, electronic equipment and storage medium
CN112102397B (en) * 2020-09-10 2021-05-11 敬科(深圳)机器人科技有限公司 Method, equipment and system for positioning multilayer part and readable storage medium
CN112215861A (en) * 2020-09-27 2021-01-12 深圳市优必选科技股份有限公司 Football detection method and device, computer readable storage medium and robot
CN112785714A (en) * 2021-01-29 2021-05-11 北京百度网讯科技有限公司 Point cloud instance labeling method and device, electronic equipment and medium
CN113447923A (en) * 2021-06-29 2021-09-28 上海高德威智能交通系统有限公司 Target detection method, device, system, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886272A (en) * 2019-02-25 2019-06-14 腾讯科技(深圳)有限公司 Point cloud segmentation method, apparatus, computer readable storage medium and computer equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107576960B (en) * 2017-09-04 2021-03-16 赵建辉 Target detection method and system for visual radar space-time information fusion
CN107748890A (en) * 2017-09-11 2018-03-02 汕头大学 A kind of visual grasping method, apparatus and its readable storage medium storing program for executing based on depth image
CN108198145B (en) * 2017-12-29 2020-08-28 百度在线网络技术(北京)有限公司 Method and device for point cloud data restoration
CN108171748B (en) * 2018-01-23 2021-12-07 哈工大机器人(合肥)国际创新研究院 Visual identification and positioning method for intelligent robot grabbing application
CN108734120B (en) * 2018-05-15 2022-05-10 百度在线网络技术(北京)有限公司 Method, device and equipment for labeling image and computer readable storage medium
CN109614889B (en) * 2018-11-23 2020-09-18 华为技术有限公司 Object detection method, related device and computer storage medium
CN110472534A (en) * 2019-07-31 2019-11-19 厦门理工学院 3D object detection method, device, equipment and storage medium based on RGB-D data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886272A (en) * 2019-02-25 2019-06-14 腾讯科技(深圳)有限公司 Point cloud segmentation method, apparatus, computer readable storage medium and computer equipment

Also Published As

Publication number Publication date
CN111191582A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111191582B (en) Three-dimensional target detection method, detection device, terminal device and computer readable storage medium
EP3620966A1 (en) Object detection method and apparatus for object detection
CN111178250B (en) Object identification positioning method and device and terminal equipment
CN109117773B (en) Image feature point detection method, terminal device and storage medium
CN111723691B (en) Three-dimensional face recognition method and device, electronic equipment and storage medium
CN110705405A (en) Target labeling method and device
CN109948397A (en) A kind of face image correcting method, system and terminal device
WO2022170844A1 (en) Video annotation method, apparatus and device, and computer readable storage medium
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN112348778B (en) Object identification method, device, terminal equipment and storage medium
CN114724120A (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
CN112200056B (en) Face living body detection method and device, electronic equipment and storage medium
CN111223065A (en) Image correction method, irregular text recognition device, storage medium and equipment
CN114219855A (en) Point cloud normal vector estimation method and device, computer equipment and storage medium
CN112001317A (en) Lead defect identification method and system based on semantic information and terminal equipment
CN114842466A (en) Object detection method, computer program product and electronic device
CN113570725A (en) Three-dimensional surface reconstruction method and device based on clustering, server and storage medium
CN110633630B (en) Behavior identification method and device and terminal equipment
CN112380978A (en) Multi-face detection method, system and storage medium based on key point positioning
CN113592015B (en) Method and device for positioning and training feature matching network
CN113721240B (en) Target association method, device, electronic equipment and storage medium
CN111815683A (en) Target positioning method and device, electronic equipment and computer readable medium
CN112819953B (en) Three-dimensional reconstruction method, network model training method, device and electronic equipment
CN110674817B (en) License plate anti-counterfeiting method and device based on binocular camera
CN111753766A (en) Image processing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 518000 1003, building 2, Chongwen Park, Nanshan wisdom Park, 3370 Liuxian Avenue, Fuguang community, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen Yuejiang Technology Co.,Ltd.

Address before: 518000 1003, building 2, Chongwen Park, Nanshan wisdom Park, 3370 Liuxian Avenue, Fuguang community, Taoyuan Street, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: SHENZHEN YUEJIANG TECHNOLOGY Co.,Ltd.