CN112541948B

CN112541948B - Object detection method, device, terminal equipment and storage medium

Info

Publication number: CN112541948B
Application number: CN202011453067.2A
Authority: CN
Inventors: 黄冠文; 程骏; 庞建新; 谭欢
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2023-11-21
Anticipated expiration: 2040-12-11
Also published as: CN112541948A

Abstract

The invention is applicable to the technical field of object detection, and provides an object detection method, an object detection device, a terminal device and a storage medium, wherein whether an RGBD image is available or not is determined by preprocessing the RGBD image, so that the success rate of object detection can be effectively improved; determining whether an object exists in the preprocessed RGBD image by performing object detection on the preprocessed RGBD image when the RGBD image is available; when an object exists in the preprocessed RGBD image, an object detection result comprising three-dimensional position information and category information of the object is output, the three-dimensional position information and category information of the object in the RGBD image can be obtained without three-dimensional modeling of the object, the time consumption is short, and the method can be widely applied to detecting objects of various categories.

Description

Object detection method, device, terminal equipment and storage medium

Technical Field

The present invention relates to the field of object detection technologies, and in particular, to an object detection method, an object detection device, a terminal device, and a storage medium.

Background

Object detection is one of the classical problems in computer vision, the task of which is to mark the position of an object in an image with a box (bounding box) and give the class of the object. Object detection is becoming more mature from the framework of traditional manually designed feature plus shallow classifier to the end-to-end detection framework based on deep learning. At present, the object detection technology is widely applied to robots to achieve the function of coordinating and grabbing objects by hands and eyes of the robots, and when the robots coordinate and grab objects by hands and eyes, the robots need to know not only the positions of the objects but also the category information of the objects, so that the robots can grab the specified objects at the corresponding positions. The traditional object detection method needs to perform three-dimensional modeling on the object, is long in time consumption and only supports detection of a small number of classes of objects.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide an object detection method, an apparatus, a terminal device, and a storage medium, so as to solve the problem that the conventional object detection method needs to perform three-dimensional modeling on an object, which takes a long time and only supports detection of a small number of classes of objects.

A first aspect of an embodiment of the present invention provides an object detection method, including:

preprocessing an RGBD image, and determining whether the RGBD image is available;

when the RGBD image is available, performing object detection on the preprocessed RGBD image, and determining whether an object exists in the preprocessed RGBD image;

and outputting an object detection result when an object exists in the preprocessed RGBD image, wherein the object detection result comprises three-dimensional position information and category information of the object.

A second aspect of an embodiment of the present invention provides an object detection apparatus, including:

an image preprocessing unit, configured to preprocess an RGBD image, and determine whether the RGBD image is available;

an object detection unit, configured to perform object detection on the preprocessed RGBD image when the RGBD image is available, and determine whether an object exists in the preprocessed RGBD image;

and a result output unit for outputting an object detection result when an object exists in the preprocessed RGBD image, wherein the object detection result comprises three-dimensional position information and category information of the object.

A third aspect of the embodiments of the present invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the object detection method according to the first aspect of the embodiments of the present invention when the computer program is executed.

A fourth aspect of the embodiments of the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the object detection method according to the first aspect of the embodiments of the present invention.

According to the object detection method provided by the first aspect of the embodiment of the invention, whether the RGBD image is available or not is determined by preprocessing the RGBD image, so that the success rate of object detection can be effectively improved; determining whether an object exists in the preprocessed RGBD image by performing object detection on the preprocessed RGBD image when the RGBD image is available; when an object exists in the preprocessed RGBD image, an object detection result comprising three-dimensional position information and category information of the object is output, the three-dimensional position information and category information of the object in the RGBD image can be obtained without three-dimensional modeling of the object, the time consumption is short, and the method can be widely applied to detecting objects of various categories.

It will be appreciated that the advantages of the second to fourth aspects may be found in the relevant description of the first aspect and are not repeated here.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a first method for detecting an object according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a second flow chart of an object detection method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a third flow chart of an object detection method according to an embodiment of the present invention;

FIG. 4 is a fourth flowchart of an object detection method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an object detection device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The object detection method provided by the embodiment of the invention can be applied to terminal equipment such as robots, tablet computers, notebook computers, netbooks, digital assistants (Digital Assistant, PDAs), personal computers (Personal Computer, PC), industrial computers, servers and the like which are provided with the manipulators and the RGBD cameras or can be in communication connection with the manipulators and the RGBD cameras. The embodiment of the invention does not limit the specific type of the terminal equipment.

In application, the camera may be fixedly disposed with respect to the manipulator, follow the manipulator during the manipulator movement, and may be disposed on the manipulator to form together with the manipulator an eye-on-hand (eye in hand) or an eye-off-hand (eye to hand) manipulator.

As shown in fig. 1, the object detection method provided by the embodiment of the present invention includes the following steps S101 to S103 executed by a processor of a terminal device:

step S101, preprocessing an RGBD image, and determining whether the RGBD image is available.

In application, the RGBD image may be a frame of RGBD image obtained by controlling the RGBD camera when the terminal device controls the manipulator and the RGBD camera to perform the hand-eye coordination operation, or may be a frame of RGBD image obtained and stored in advance by the terminal device, that is, the object detection method provided by the present invention may be executed at any time during, before or after the hand-eye coordination operation performed by the terminal device controls the manipulator and the RGBD camera.

In one embodiment, prior to step S101, comprising:

RGBD images of the scene are acquired by an RGBD camera.

In application, the RGBD camera can be controlled to acquire RGB images of any scene in the view of the RGBD camera, for example, any scene of a pipeline, an express warehouse, any workbench surface, the ground and the like, which can perform hand-eye coordination operation. The scene may or may not include objects, i.e. the RGBD image may include objects and background areas, or may include only objects or background areas.

In the application, when the RGBD image is not available, it indicates that the object cannot be detected or is difficult to detect according to the RGBD image, at this time, the terminal device may automatically control the RGBD camera to re-acquire the RGBD image, or may wait for the user to input an instruction for performing the next operation according to the instruction of the user.

In one embodiment, after step S101, it includes:

adjusting parameters of the RGBD camera when the RGBD image is not available, and re-acquiring the RGBD image of the scene through the RGBD camera;

or when the RGBD image is not available, acquiring the RGBD image of the next scene through the RGBD camera.

In applications, when the RGBD image is not available, it may be because the resolution of the RGBD camera is low or the performance parameters such as the focal length are not suitable, and the performance parameters such as the resolution and the focal length of the RGBD camera need to be adjusted to improve the performance of the RGBD camera. When the RGBD image is not available or the RGBD image of the same scene re-acquired after adjusting the performance parameters of the RGBD camera is not available, the RGBD camera can be controlled to acquire the RGBD image of the next scene.

In one embodiment, after step S101, it includes:

when the RGBD image is not available, outputting prompt information representing that the RGBD image is not available.

In application, the prompt information representing the unavailability of the RGBD image can be output through any man-machine interaction mode supported by the terminal equipment, for example, the prompt information can be sound information output based on a voice device, text, image and character drawing information displayed on a display screen, light indication information displayed on an indicator lamp, somatosensory information output based on a mechanical arm or a vibration motor and the like.

In application, before performing object detection on the RGBD image, the RGBD image may be preprocessed to determine whether the RGBD image is available, so that subsequent object detection is facilitated, and if the RGBD image is not available, prompt information representing that the RGBD image is not available may be output. Whether the RGBD image is available may be determined according to image characteristics of the RGBD image, when the image characteristics of the RGBD image are satisfactory, it may be determined that the RGBD image is available, otherwise, it may be determined that the RGBD image is not available.

As shown in fig. 2, in one embodiment, step S101 includes the following steps S201 to S203:

step S201, converting the RGBD image into an RGB image.

In application, since the image features such as the definition, brightness, chromaticity, resolution and the like of the RGDB image are only related to the image information of the RGB three channels of the RGBD image, the RGBD image can be converted into the RGB image, and the depth information therein can be removed, so as to facilitate detection of the image features related to the image information of the RGB three channels.

Step S202, detecting the image characteristics of the RGB image; wherein the image features include at least one of sharpness, chromaticity, brightness, and resolution;

step S203, when the image feature of the RGB image passes the detection, determining that the RGBD image is available.

In an application, the image features may include, but are not limited to, sharpness, brightness, chromaticity, size, resolution, etc., any feature not meeting the requirements may determine that the RGBD image is not available, and output a hint information that characterizes the RGBD image is not available. When the image features comprise at least two features of definition, chromaticity, brightness and resolution, whether each feature of the RGB image meets the requirements or not can be detected sequentially, when one feature meets the requirements, whether the next feature meets the requirements or not is continuously detected, if a certain feature is detected to be not met, the unavailability of the RGBD image is directly determined, prompt information representing the unavailability of the RGBD image is output, and when all the features meet the requirements (namely detection passes), the unavailability of the RGBD image is determined.

As shown in fig. 3, in one embodiment, prior to step S202, the following steps S301 and S302 are included:

step S301, clipping the RGB image to intercept the RGB image of a preset area in the RGB image;

step S302, scaling the RGB image of the preset area to obtain an RGB image with a preset size.

In application, since the terminal device usually only detects the object located in the preset area in the view of the RGBD camera when detecting the object, and does not detect the object in all the areas in the view, the RGB image can be cut, the irrelevant background area is removed, and only the RGB image of the preset area is reserved. The preset area may be a field of view center area of the RGBD camera. Since the object detection algorithm adopted by the terminal device generally only supports detection of an image with a fixed size, the RGB image of the preset area needs to be scaled to obtain the RGB image with the preset size supported by the object detection algorithm.

As shown in fig. 3, in one embodiment, step S202 includes the following steps S303 to S305:

step S303, detecting the definition of the RGB image;

step S304, detecting chromaticity of the RGB image when the definition of the RGB image is larger than a preset definition threshold;

step S305, detecting the brightness of the RGB image when the chromaticity of the RGB image is larger than a preset chromaticity threshold value;

step S203 includes:

step S306, when the brightness of the RGB image is larger than a preset brightness threshold value, determining that the RGBD image is available.

In the application, the image features may include definition, chromaticity and brightness, the definition, chromaticity and brightness of the RGB image are sequentially detected, when the previous feature meets the corresponding threshold requirement, the next feature is continuously detected, when any feature does not meet the corresponding threshold requirement, the unavailability of the RGBD image can be determined, and the prompt information representing the unavailability of the RGBD image is output.

In the application, when a certain feature of the RGBD image is not satisfied, the RGBD image may be subjected to corresponding optimization processing to improve the feature, then, whether the feature of the RGBD image is satisfied is judged again, if not, the unavailability of the RGBD image is determined, and the unavailable prompt information of the RGBD image is output.

As shown in fig. 3, in one embodiment, step S101 further includes the following steps performed after step S305:

step S307, when the brightness of the RGB image is less than or equal to the preset brightness threshold and is within the preset brightness range, performing high dynamic illumination rendering on the RGB image to obtain a high dynamic range image, and determining that the RGBD image is available.

In application, when the luminance of the RGB image does not meet the corresponding threshold requirement (i.e., the luminance is less than or equal to the preset luminance threshold), it may be further determined whether the luminance is within the threshold range of the improvable process (i.e., the preset luminance range), if the luminance is within the threshold range of the improvable process, the luminance may be subjected to the improving process, so that the RGB image after the luminance improvement is available, and specifically, a high dynamic illumination rendering method may be used to improve the luminance of the RGB image, so as to obtain the high dynamic range (High Dynamic Range, HDR) image. When the brightness of the RGB image is smaller than or equal to the preset brightness threshold value and is not in the preset brightness range, determining that the RGBD image is not available, and outputting prompt information representing that the RGBD image is not available.

And step S102, when the RGBD image is available, performing object detection on the preprocessed RGBD image, and determining whether an object exists in the preprocessed RGBD image.

In application, when the RGBD image is preprocessed, after the RGBD image is determined to be available, further object detection is performed on the preprocessed RGBD image (that is, RGB image with a preset size), whether an object exists in the preprocessed RGBD image is determined, and when no object exists in the preprocessed RGBD image, no subsequent object grabbing operation needs to be further performed, at this time, the terminal device may automatically control the RGBD camera to re-acquire the RGBD image of the next scene, or may wait for a user to input an instruction for performing the next operation according to the instruction of the user.

In one embodiment, after step S102, it includes:

and acquiring an RGBD image of the next scene through the RGBD camera when no object exists in the preprocessed RGBD image.

In one embodiment, after step S102, it includes:

and outputting prompt information representing that no object exists in the preprocessed RGBD image when the object does not exist in the preprocessed RGBD image.

In application, the prompt information representing that no object exists in the preprocessed RGBD image can be output through any man-machine interaction mode supported by the terminal equipment, for example, the prompt information can be sound information output based on a voice device, text, image and character drawing information displayed on a display screen, light indication information displayed on an indicator lamp, somatosensory information output based on a manipulator or a vibration motor and the like.

As shown in fig. 4, in one embodiment, step S102 includes steps S401 and S402 as follows:

step S401, when the RGBD image is available, performing object detection on the preprocessed RGBD image to obtain at least one first detection result, where the first detection result includes an object frame, a category corresponding to the object frame, and a confidence level of the category;

step S402, when at least one second detection result with the confidence coefficient greater than the preset confidence coefficient threshold exists in the at least one first detection result, determining that an object exists in the preprocessed RGBD image.

In the application, the method for determining whether the object exists in the preprocessed RGBD image may be to perform object detection on the preprocessed RGBD image to obtain at least one first detection result, and if only one first detection result is obtained and the result is null, it may be determined that the object does not exist in the preprocessed RGBD image, and output prompt information indicating that the object does not exist in the preprocessed RGBD image. If the obtained first detection result is not null, whether an object exists in the preprocessed RGBD image is further determined according to the confidence level of the category in the first detection result.

In application, an object box (bounding box) is used to identify the position information of the object in the preprocessed RGBD image, and since the preprocessed RGBD image is an RGB image and does not include depth information, only two-dimensional position information, that is, x-axis and y-axis information, of the object in the RGB image can be obtained according to the object box, and the height information h and width information w of the object can also be obtained according to the object box. The category of the object frame indicates the category to which the object identified by the object frame belongs, and may be used to learn the attribute of the category of the object, for example, the category under the large category of people, animals, articles, etc., or the category under the small category of football, cup, pen, etc.

In the application, when at least one second detection result with the confidence coefficient being greater than the preset confidence coefficient threshold value exists in at least one first detection result, it can be determined that objects exist in the preprocessed RGBD image, all the second detection results are reserved, and the first detection result with the confidence coefficient being less than or equal to the preset confidence coefficient threshold value is removed. When at least one second detection result with the confidence coefficient larger than the preset confidence coefficient threshold value does not exist in the first detection result, the fact that no object exists in the preprocessed RGBD image can be determined, and prompt information representing that no object exists in the preprocessed RGBD image is output. The preset confidence threshold may be set according to actual needs, for example, any value in the range of 0.5-0.95, and specifically may be 0.9.

Step S103, when an object exists in the preprocessed RGBD image, outputting an object detection result, where the object detection result includes three-dimensional position information and category information of the object.

In an application, when it is determined that an object exists in the preprocessed RGBD image, an object detection result including three-dimensional position information and category information of the detected object is output. The three-dimensional position information may specifically include x-axis information, y-axis information and z-axis information, where the x-axis information and the y-axis information may be identified by an object frame, and the z-axis information may be obtained by depth information included in the RGBD image, specifically, based on the object frame, two-dimensional position information (i.e., the x-axis information and the y-axis information) of the object in the RGBD image may be obtained, and then corresponding depth information (i.e., the z-axis information) in the RGBD image may be obtained according to the two-dimensional position information; the category information includes a category corresponding to the object frame, and may further include a confidence of the category.

As shown in fig. 4, in one embodiment, step S103 includes the following steps S403 to S408:

step S403, performing non-maximum suppression on all the second detection results with the same category to obtain at least one third detection result;

step S404, detecting the cross-over ratio between the object frames of all the third detection results of different categories;

step S405, obtaining the third detection result with the highest confidence coefficient and the cross-over ratio larger than a preset cross-over ratio threshold value, and obtaining at least one fourth detection result;

step S406, obtaining the third detection result with the cross-over ratio smaller than or equal to a preset cross-over ratio threshold value, and obtaining at least one fourth detection result;

step S407, obtaining z-axis information of the object in each fourth detection result according to the object frame and the RGBD image in each fourth detection result;

step S408, outputting an object detection result, where the object detection result includes all the fourth detection results and all the z-axis information of the objects in the fourth detection results.

In the application, all the obtained second detection results comprise detection results of the same category and detection results of different categories. Performing non-maximum suppression (Non Maximum Suppression, NMS) on the second detection result of the same category, removing redundant object frames, specifically, sorting all object frames of the same category, selecting an object frame with highest confidence in the sorting sequence, calculating an Intersection-Over-Union (IOU) with other object frames in the sorting sequence, if the Intersection ratio of the object frame with highest confidence and any object frame is greater than a preset Intersection ratio threshold, removing any object frame with low confidence from the sorting sequence, and if the Intersection ratio of the object frame with highest confidence and any object frame is less than or equal to the preset Intersection ratio threshold, considering that a plurality of objects belonging to the same category exist and any object frame with low confidence is reserved, repeating the steps until all object frames of the same category are traversed, and obtaining at least one third detection result. And solving the intersection ratio of all the third detection results of different categories, removing the object frames with low confidence in the third detection results of two different categories with the intersection ratio larger than a preset intersection ratio threshold, reserving the object frames with high confidence, reserving the object frames in the third detection results of two different categories with the intersection ratio smaller than or equal to the preset intersection ratio threshold, and obtaining at least one fourth detection result.

In application, since the fourth detection result only includes the object frame, the category corresponding to the object frame, and the confidence of the category, only the two-dimensional position information of the object in the RGB image, that is, the x-axis and y-axis information, can be obtained according to the object frame, and therefore, the z-axis information of the object needs to be further obtained through the depth information included in the RGBD image.

In one embodiment, before step S101, the method further includes:

the object detection model is trained from RGBD images in a variety of scenarios.

In application, the object detection model may be a dark net yolov4 model, and after the object detection model is trained through RGBD images under various scenes, the model can have the function of detecting objects of different categories under various scenes, and then the trained object detection model is used for executing the object detection method. It should be understood that the method for training the object detection model is the same as the above-mentioned object detection method, that is, the object detection model performs the object detection method each time, which is equivalent to performing one training on the object detection model, and each time the object detection model is applied to perform the object detection method, the detection performance of the object detection model can be improved to some extent.

According to the object detection method provided by the embodiment of the invention, whether the RGBD image is available or not is determined by preprocessing the RGBD image, so that the success rate of object detection can be effectively improved; determining whether an object exists in the preprocessed RGBD image by performing object detection on the preprocessed RGBD image when the RGBD image is available; when an object exists in the preprocessed RGBD image, an object detection result comprising three-dimensional position information and category information of the object is output, the three-dimensional position information and category information of the object in the RGBD image can be obtained without three-dimensional modeling of the object, the time consumption is short, and the method can be widely applied to detecting objects of various categories.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

The embodiment of the invention also provides an object detection device which is used for executing the steps in the embodiment of the object detection method. The object detection means may be virtual means (virtual appliance) in the terminal device, which are executed by a processor of the terminal device, or the terminal device itself.

As shown in fig. 5, an object detection apparatus 100 according to an embodiment of the present invention includes:

an image preprocessing unit 101, configured to preprocess an RGBD image, and determine whether the RGBD image is available;

an object detection unit 102, configured to perform object detection on the preprocessed RGBD image when the RGBD image is available, and determine whether an object exists in the preprocessed RGBD image;

and a result output unit 103 for outputting an object detection result including three-dimensional position information and category information of the object when the object exists in the preprocessed RGBD image.

In one embodiment, the object detection apparatus further includes:

and the image acquisition unit is used for acquiring RGBD images of the scene through the RGBD camera.

In one embodiment, the object detection apparatus further includes:

a parameter adjustment unit, configured to adjust parameters of the RGBD camera when the RGBD image is unavailable, and re-acquire the RGBD image of the scene by the RGBD camera;

and the image acquisition unit is used for acquiring RGBD images of the next scene through the RGBD camera when the RGBD images are not available.

In one embodiment, the object detection apparatus further includes:

and the prompting unit is used for outputting prompting information representing that the RGBD image is unavailable when the RGBD image is unavailable.

In one embodiment, the object detection apparatus further includes:

and the image acquisition unit is used for acquiring RGBD images of the next scene through the RGBD camera when no object exists in the preprocessed RGBD images.

In one embodiment, the object detection apparatus further includes:

and the prompting unit is used for outputting prompting information representing that no object exists in the preprocessed RGBD image when the object does not exist in the preprocessed RGBD image.

In one embodiment, the object detection apparatus further includes:

and the training unit is used for training the object detection model through RGBD images in various scenes.

In application, each unit in the object detection device may be a software program unit, may be implemented by different logic circuits integrated in a processor, or may be implemented by a plurality of distributed processors.

As shown in fig. 6, an embodiment of the present invention further provides a terminal device 200, including: at least one processor 201 (only one processor is shown in fig. 6), a memory 202 and a computer program 203 stored in the memory 202 and executable on the at least one processor 201, further comprising a robot 204 and a camera 205 connected to the processor 201, the processor 201 implementing the steps in any of the method embodiments described above when the computer program 203 is executed.

In applications, the terminal device may include, but is not limited to, a processor, a memory, an ultra-wideband transceiver module, a radar sensor, a vision sensor, and an odometer. It will be appreciated by those skilled in the art that fig. 6 is merely an example of a terminal device and is not limiting of the terminal device 6, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

In application, the processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In applications, the memory may in some embodiments be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory may in other embodiments also be an external storage device of the terminal device, for example a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the terminal device. Further, the memory may also include both an internal storage unit of the terminal device and an external storage device. The memory is used to store an operating system, application programs, boot Loader (Boot Loader), data, and other programs, etc., such as program code for a computer program, etc. The memory may also be used to temporarily store data that has been output or is to be output.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units or modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention. The specific working process of the units in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the invention also provides a network device, which comprises: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing the computer program to perform the steps of the method embodiments described above.

The embodiments of the present invention also provide a computer readable storage medium storing a computer program, which when executed by a processor implements steps of the above-described respective method embodiments.

Embodiments of the present invention provide a computer program product enabling a terminal device to carry out the steps of the method embodiments described above when the computer program product is run on the terminal device.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above-described embodiments, and may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/device, a recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/device and method may be implemented in other manners. For example, the apparatus/device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. An object detection method, comprising:

when the RGBD image is available, carrying out object detection on the preprocessed RGBD image to obtain at least one first detection result, wherein the first detection result comprises an object frame, a category corresponding to the object frame and a confidence level of the category;

when at least one second detection result with the confidence coefficient larger than a preset confidence coefficient threshold exists in the at least one first detection result, determining that an object exists in the preprocessed RGBD image;

performing non-maximum suppression on all the second detection results in the same category to obtain at least one third detection result;

detecting the cross-over ratio between the object frames of all the third detection results of different categories;

acquiring the third detection result with the intersection ratio larger than a preset intersection ratio threshold value and highest confidence coefficient, and acquiring at least one fourth detection result;

acquiring the third detection result with the cross-over ratio smaller than or equal to a preset cross-over ratio threshold value, and obtaining at least one fourth detection result;

acquiring the z-axis information of the object in each fourth detection result according to the object frame and the RGBD image in each fourth detection result;

outputting object detection results, wherein the object detection results comprise all fourth detection results and z-axis information of objects in all fourth detection results;

the object detection result includes three-dimensional position information and category information of the object.

2. The object detection method of claim 1, wherein the performing image preprocessing on the RGBD image to determine whether the RGBD image is available comprises:

converting the RGBD image into an RGB image;

detecting image characteristics of the RGB image; wherein the image features include at least one of sharpness, chromaticity, brightness, and resolution;

when the image feature detection of the RGB image passes, it is determined that the RGBD image is available.

3. The object detection method according to claim 2, wherein the detecting of the image characteristics of the RGB image includes:

detecting the definition of the RGB image;

detecting chromaticity of the RGB image when the definition of the RGB image is larger than a preset definition threshold;

detecting the brightness of the RGB image when the chromaticity of the RGB image is larger than a preset chromaticity threshold value;

the determining that the RGBD image is available when the image feature detection of the RGB image passes includes:

and when the brightness of the RGB image is larger than a preset brightness threshold value, determining that the RGBD image is available.

4. The object detection method as claimed in claim 3, wherein the performing image preprocessing on the RGBD image to determine whether the RGBD image is available further comprises:

and when the brightness of the RGB image is smaller than or equal to a preset brightness threshold value and is within a preset brightness range, performing high dynamic illumination rendering on the RGB image to obtain a high dynamic range image, and determining that the RGBD image is available.

5. The object detection method according to any one of claims 2 to 4, characterized by comprising, before the detection of the image features of the RGB image:

clipping the RGB image to intercept the RGB image of a preset area in the RGB image;

and scaling the RGB image of the preset area to obtain the RGB image with the preset size.

6. An object detection device, characterized by comprising:

a result output unit configured to output an object detection result when an object exists in the preprocessed RGBD image, the object detection result including three-dimensional position information and category information of the object;

and when the RGBD image is available, performing object detection on the preprocessed RGBD image to determine whether an object exists in the preprocessed RGBD image, including:

and when an object exists in the preprocessed RGBD image, outputting an object detection result, wherein the object detection result comprises:

and outputting object detection results, wherein the object detection results comprise all the fourth detection results and z-axis information of objects in all the fourth detection results.

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the object detection method according to any one of claims 1 to 5 when the computer program is executed.

8. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the object detection method according to any one of claims 1 to 5.