WO2022062238A1 - 一种足球检测方法、装置、计算机可读存储介质及机器人 - Google Patents

一种足球检测方法、装置、计算机可读存储介质及机器人 Download PDF

Info

Publication number
WO2022062238A1
WO2022062238A1 PCT/CN2020/139859 CN2020139859W WO2022062238A1 WO 2022062238 A1 WO2022062238 A1 WO 2022062238A1 CN 2020139859 W CN2020139859 W CN 2020139859W WO 2022062238 A1 WO2022062238 A1 WO 2022062238A1
Authority
WO
WIPO (PCT)
Prior art keywords
football
point cloud
detection
model
preset
Prior art date
Application number
PCT/CN2020/139859
Other languages
English (en)
French (fr)
Inventor
王东
张惊涛
胡淑萍
白杰
麻星星
程骏
郭渺辰
顾在旺
庞建新
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Publication of WO2022062238A1 publication Critical patent/WO2022062238A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Definitions

  • the present application belongs to the field of robotics technology, and in particular relates to a football detection method, device, computer-readable storage medium and robot.
  • the traditional football detection methods are based on a single two-dimensional image, and find the pose information of the football by training the target detection model.
  • To improve the accuracy of football detection requires a heavyweight model with high parameters and high computing power.
  • the hardware platform of the robot generally does not support GPU acceleration, so on the premise of ensuring a high frame rate, only a lightweight model can be used, resulting in a low accuracy rate of football detection results.
  • the embodiments of the present application provide a football detection method, device, computer-readable storage medium, and robot, so as to solve the problem of low accuracy of the existing football detection methods.
  • a first aspect of the embodiments of the present application provides a football detection method, which may include:
  • the football pose is determined according to the three-dimensional point cloud data in the football detection frame.
  • determining the football pose according to the three-dimensional point cloud data in the football detection frame includes:
  • the football pose is determined according to the candidate sphere model with the smallest error rate.
  • calculating the error rate of the candidate sphere model according to the point cloud set includes:
  • the error rate of the candidate sphere model is calculated according to the number of in-game points.
  • the preset deep learning target detection model before using the preset deep learning target detection model to perform football detection in the two-dimensional image, it also includes:
  • the deep learning target detection model is trained using the VOC data set to obtain a trained deep learning target detection model.
  • the football detection method may also include:
  • the soccer pose is determined according to the soccer outline.
  • the football detection frame before performing binarization processing on the two-dimensional image in the football detection frame, it may further include:
  • the deep learning target detection model is a MobileNet-SSD model.
  • a second aspect of the embodiments of the present application provides a football detection device, which may include:
  • the data acquisition module is used to collect the 2D image and 3D point cloud data of the target area through the depth camera of the robot;
  • a football detection module for detecting football in the two-dimensional image using a preset deep learning target detection model, and outputting a football detection frame and a confidence level
  • the pose determination module is configured to determine the soccer pose according to the three-dimensional point cloud data in the soccer detection frame if the confidence is greater than a preset confidence threshold.
  • the pose determination module may include:
  • a point cloud set construction unit configured to construct the three-dimensional point cloud data in the football detection frame into a point cloud set
  • a point cloud subset selection unit for selecting a point cloud subset from the point cloud set
  • a spherical model fitting unit used for performing spherical model fitting on the point cloud subset to obtain candidate spherical models
  • an error rate calculation unit configured to calculate the error rate of the candidate sphere model according to the point cloud set
  • a pose determination unit configured to determine the soccer pose according to the candidate sphere model with the smallest error rate if the error rate is smaller than the error threshold or the number of iterations is greater than the iteration threshold.
  • the error rate calculation unit may include:
  • a deviation value calculation subunit used to calculate the deviation value between each sample point in the point cloud set and the candidate sphere model
  • an intra-office point determination subunit configured to determine a sample point whose deviation value is less than a preset deviation threshold value as an intra-office point belonging to the candidate sphere model
  • An error rate calculation subunit configured to calculate the error rate of the candidate sphere model according to the number of the in-office points.
  • the football detection device may also include:
  • the data set building module is used to build each football image obtained from the preset data source into the original data set;
  • the data set conversion module is used to clean and preprocess the original data set, and convert it into a VOC data set;
  • a model training module is used to train the deep learning target detection model by using the VOC data set to obtain a trained deep learning target detection model.
  • the football detection device may also include:
  • the binarization processing module is used to perform binarization processing on the two-dimensional image in the football detection frame to obtain a binarized image
  • a contour detection module used for segmenting the football contour in the binarized image according to the maximum contour detection algorithm
  • a soccer pose determination module configured to determine the soccer pose according to the soccer outline.
  • the football detection device may also include:
  • the color space transformation module is used for performing color space transformation on the two-dimensional image in the football detection frame, from RGB color space to HSV color space.
  • a third aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of any of the foregoing football detection methods.
  • a fourth aspect of the embodiments of the present application provides a robot, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the computer program when the processor executes the computer program.
  • the steps of any of the above football detection methods are not limited to:
  • a fifth aspect of the embodiments of the present application provides a computer program product, which, when the computer program product runs on a robot, causes the robot to perform the steps of any one of the above football detection methods.
  • the embodiments of the present application have the following beneficial effects: the embodiments of the present application collect 2D images and 3D point cloud data of the target area through the depth camera of the robot; use a preset deep learning target detection model in the described The football is detected in the two-dimensional image, and the football detection frame and the confidence level are output; if the confidence level is greater than a preset confidence level threshold, the football pose is determined according to the three-dimensional point cloud data in the football detection frame.
  • the embodiments of the present application based on the deep learning target detection technology and the 3D point cloud technology, the advantages of 2D images and 3D point cloud data are fully combined, and extremely high accuracy can be achieved even when a lightweight model is used. Rate.
  • FIG. 1 is a flowchart of an embodiment of a football detection method in the embodiment of the application.
  • FIG. 2 is a schematic structural diagram of a depthwise separable convolution
  • FIG. 3 is a schematic diagram of the network structure of MobileNet-SSD
  • FIG. 4 is a schematic flowchart of model training for a deep learning target detection model
  • FIG. 5 is a schematic diagram of a part of a football image
  • Fig. 6 is the schematic flow chart of determining the football pose according to the three-dimensional point cloud data in the football detection frame
  • Fig. 7 is the visualization schematic diagram of point cloud collection
  • FIG. 8 is a schematic diagram of point cloud data on a football surface
  • Fig. 9 is a schematic diagram of determining soccer pose information such as centroid and radius by soccer model fitting
  • Fig. 10 is the schematic flow chart of football segmentation based on color space threshold method
  • Figure 11 is a schematic diagram of football segmentation based on HSV color space
  • FIG. 12 is a structural diagram of an embodiment of a football detection device in the embodiment of the application.
  • FIG. 13 is a schematic block diagram of a robot in an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting” .
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • an embodiment of a football detection method in the embodiment of the present application may include:
  • Step S101 collecting a two-dimensional image and three-dimensional point cloud data of a target area through a depth camera of the robot.
  • the depth camera may be an internal device built into the robot, or may be an external device external to the robot.
  • the depth camera is preferably built into the crotch position of the robot, and is used to collect the area in the forward direction of the robot, that is, the two-dimensional image and the target area. 3D point cloud data.
  • Step S102 using a preset deep learning target detection model to perform football detection in the two-dimensional image, and output a football detection frame and a confidence level.
  • the deep learning target detection model may be any lightweight model in the prior art.
  • the MobileNet-SSD model is preferably used, and the MobileNet-SSD model is modeled on VGG16-SSD.
  • the multi-scale convolution structure of VGG16 is replaced by the model obtained by MobileNet V1.
  • MobileNet V1 is a lightweight convolutional neural network model suitable for mobile phones and other mobile terminals.
  • the network is mainly used to extract image convolution features in tasks such as classification, detection, embedding and segmentation of lightweight deep learning.
  • the core contribution and innovation of the MobileNet V1 network is that a depthwise separable convolution (Depthwise Separable Convolution) is proposed to decompose the convolution kernel, which reduces the amount of computation and parameters of the network.
  • Figure 2 shows a schematic diagram of the structure of the depthwise separable convolution.
  • the depthwise separable convolution can be further divided into a depthwise convolution (Depthwise Convolution) and a 1*1 pointwise convolution (Pointwise Convolution).
  • MobileNet V1 has a total of 28 layers. Compared with the standard convolution operation of the same number of layers, MobileNet V1 using 3*3 convolution kernels sacrifices a percentage of accuracy. The amount of computation can be reduced by a factor of 8-9.
  • the SSD (Single Shot MultiBox Detector) target detection network uses VGG16 as the basic feature extraction model, and adds the multi-scale feature detection method based on the feature pyramid to generate a fixed-size detection box (Bounding Box) set and the existence of the detection box instance score (score). ), and then complete the one-stage target detection network for final detection through Non-Maximum Suppression (NMS).
  • the MobileNet-SSD network imitates the multi-scale convolution structure of VGG16-SSD and replaces VGG16 with MobileNet V1 (as shown in Figure 3). Here is a brief comparison of the differences between the two.
  • the parameter amount and calculation amount of the VGG16 network are 1.38 *10 ⁇ 8 and 1.55*10 ⁇ 10FLOPs(Floating-point Operations Per Second), the parameter and calculation amount of MobileNet V1 network are 4.2*10 ⁇ 6 and 0.569*10 ⁇ 9FLOPs respectively, which shows the parameters of MobileNet V1 network
  • the amount is about 3.04% of that of VGG-16, and the amount of computation is about 3.67% of that of VGG-16, which greatly reduces the amount of parameters and computation of the network.
  • FIG. 3 shows the schematic diagram of the network structure of MobileNet-SSD.
  • the convolutional layers of MobileNet V1 are recorded as Conv0, Conv1, Conv2, ..., Conv13 in turn.
  • the MobileNet-SSD network further adds 8 after Conv13 of MobileNet V1.
  • Layer convolution layer marked as Conv14_1, Conv14_2, Conv15_1, Conv15_2, Conv16_1, Conv16_2, Conv17_1, Conv17_2, a total of 6 layers of convolution layer feature map (Feature Map) for detection (Detection), these 6 layers of convolution layers Conv11, Conv13, Conv14_2, Conv15_2, Conv16_2, Conv17_2, respectively, the default number of boxes generated by each unit (cell) of the feature map of the 6-layer convolutional layer is 3, 6, 6, 6, 6, 6, 6, namely The output numbers of the 3*3 convolution kernels used for coordinate regression after the 6-layer convolution layer are 12, 24, 24, 24, 24, 24, 24, 24, respectively, followed by a non-maximum suppression layer to achieve target detection. .
  • model training can be performed on it through the process shown in Figure 4:
  • Step S401 construct each football image obtained from a preset data source into an original data set.
  • the specific data source used and the number of football images may be set according to actual conditions, which are not specifically limited in this embodiment of the present application.
  • the original data set includes a total of 16018 football images, which can be divided into 7688 football images obtained from OpenImage, 1300 football images obtained from ImageNet, self-collected and Annotated 3102 soccer images from Robocup matches and 3928 soccer images crawled from the web.
  • FIG. 5 is a schematic diagram of some acquired football images.
  • Step S402 cleaning and preprocessing the original data set, and converting it into a VOC data set.
  • the original data set covers different scenes including outdoor football field, sky, beach, indoor ground, desktop, etc., football images with different light intensities, different colors and different texture features such as day, evening and night, because the data and features determine the model algorithm
  • the original data set can be cleaned and preprocessed. Specifically, the images that do not meet the requirements such as blurred football, too small size, single color, serious occlusion, etc.
  • the remaining images are preprocessed, the size of each image is unified to 300*300, the pixel intensities of the images are normalized and mean subtracted, and data augmentation is performed by mirror flipping and other methods.
  • the cleaned and preprocessed dataset can also be converted into VOC format to obtain a VOC dataset.
  • VOC format it can be divided into three folders: Annotations, ImageSets and JPEGImages.
  • the ImageSets folder stores the original images, and the naming method is 000000.jpg-016017.jpg; Annotations exists in the XML format of each image
  • the target detection box (Bounding Box) and the category (class) label; ImageSets stores the division method of training set, validation set and test set, the ratio of training set, validation set and test set is 70%: 10%: 20 %, and generate the training set and test data set in lmdb format through create_data.sh: so far, the preparation stage of the data set is completed.
  • Step S403 using the VOC data set to train the deep learning target detection model to obtain a trained deep learning target detection model.
  • the specific training parameters can be set according to the actual situation.
  • the initial learning rate is 0.001
  • the learning rate is 0.001.
  • the rate decay factor is 0.4
  • the weight decay factor is 0.00005
  • the batch size is 24, and the network optimizer is the RMSProp algorithm.
  • the trained deep learning target detection model can be used to detect the football in the two-dimensional image, and the model outputs a rectangular detection frame containing the football in the two-dimensional image, that is, the football detection frame four The coordinates of each vertex and the confidence that the detection box contains the football.
  • the greater the confidence level the greater the probability that the detection frame contains a football, and the lower the confidence level, the lower the probability that the detection frame contains a football.
  • a confidence threshold may be preset, and the specific value of the confidence threshold may be set according to the actual situation, which is not specifically limited in the embodiment of the present application.
  • step S103 If the confidence level is less than or equal to the confidence level threshold, it is considered that the reliability of the detection frame is low, and subsequent steps are not performed, and data is collected again and football detection is performed. If the confidence level is greater than the confidence level threshold, step S103 is performed.
  • Step S103 Determine the football pose according to the three-dimensional point cloud data in the football detection frame.
  • step S103 may specifically include the following process:
  • Step S1031 constructing the three-dimensional point cloud data in the football detection frame into a point cloud set.
  • Each 3D point cloud data includes coordinates on the preset X-axis, Y-axis and Z-axis.
  • the football detection frame includes both the 3D point cloud data of the football and the 3D points of the ground and surrounding environment.
  • Cloud data, the set composed of these three-dimensional point cloud data is the point cloud set, and FIG. 7 is a visual schematic diagram of the point cloud set.
  • Step S1032 Select a point cloud subset from the point cloud set.
  • Step S1033 performing sphere model fitting on the point cloud subset to obtain a candidate sphere model.
  • the minimum variance estimation method can be used to perform spherical model fitting on the point cloud subset, and the parameters of the spherical model can be fitted to obtain a spherical model, that is, the candidate spherical model, which is denoted as Mi.
  • Step S1034 Calculate the error rate of the candidate sphere model according to the point cloud set.
  • the deviation value between each sample point in the point cloud set and the candidate sphere model may be calculated separately.
  • the Its shortest distance from the surface of the candidate sphere model is taken as its deviation value.
  • the inliers and the outliers may be determined according to the deviation value, and the sample points whose deviation value is less than the preset deviation threshold value are determined as the inliers belonging to the candidate sphere model, and the deviation value is greater than or equal to
  • the sample points of the deviation threshold are determined to be outliers that do not belong to the candidate sphere model.
  • the specific value of the deviation threshold may be set according to the actual situation, which is not specifically limited in this embodiment of the present application.
  • the number of the in-game points is recorded, denoted as Numi, and the error rate of the candidate sphere model is calculated according to the number of the in-game points.
  • the number of in-office points is positively correlated with the error rate of the candidate sphere model, that is, the more the number of in-office points, the smaller the error rate, and the smaller the number of in-office points, the greater the error rate.
  • the ratio of the number of intra-office points to the total number of sample points in the point cloud set may be used as the error rate.
  • Step S1035 judging whether the loop termination condition is satisfied.
  • the loop termination condition may be that the error rate is less than a preset error threshold or the number of iterations greater than the preset iteration threshold.
  • the specific values of the error threshold and the iteration threshold may be set according to actual conditions, which are not specifically limited in this embodiment of the present application.
  • step S1036 If the loop termination condition is not met, return to step S1032 and subsequent steps, that is, randomly select a point cloud subset and perform spherical model fitting; if the loop termination condition is met, execute step S1036.
  • Step S1036 Determine the soccer pose according to the candidate sphere model with the smallest error rate.
  • the method described in step S103 may not be used, but the football segmentation is performed based on the color space threshold method, and the football pose is determined.
  • grayscale processing and binarization processing may be performed on the two-dimensional image in the football detection frame to obtain a binarized image.
  • the background and the football can be distinguished.
  • FIG 11 shows a schematic diagram of football segmentation based on HSV color space.
  • HSV is closer to people's perception experience of color than RGB. It is very intuitive to express the hue, vividness and lightness and darkness of the color, which is convenient for color contrast.
  • H represents hue
  • S represents saturation
  • V represents lightness. Each color is in a fixed range in HSV space.
  • green is [60, 255, 255], and all greens are in [45, 100, 50] to [ 75,255,255], i.e. [60-15,100,50] to [60+15,255,255], where 15 is an approximation.
  • the two-dimensional image in the football detection frame may be transformed into a color space from the RGB color space to the HSV color space. The steps of dividing the soccer ball are the same as the previous process, and will not be repeated here.
  • the premise of the threshold segmentation method to segment the football is that there is a significant difference in the color or gray value of the football and the background. However, this condition cannot always be satisfied in the actual scene, such as on the ground of white tiles, plus the lights on the ground. The reflective effect makes the white part of the football blend with the ground. At this time, it is detected that the contour is no longer the circular area of the football, and the effect of the algorithm will also become very poor. In this case, it is suitable to use the spherical model fitting method based on the three-dimensional point cloud data described in step S103 to segment the football.
  • the experimental data shows that after the MobileNet-SSD network code is accelerated, the football detection part takes 60-70 milliseconds, the sphere model fitting part takes 0.1-5 milliseconds, and the detection frame rate is 13Hz. Combined with the acceleration of the tracking algorithm, the detection frame rate can be increased to 25Hz, which meets the requirements of real-time detection. In addition, with the help of the screening of 3D point cloud data sphere model fitting, it is possible to eliminate false detections of non-spheres and sphere diameters that do not meet the requirements in the scene, which greatly improves the precision, recall, and accuracy of football detection in actual 3D scenes. Accuracy.
  • the two-dimensional image and three-dimensional point cloud data of the target area are collected by the depth camera of the robot; the football detection is performed in the two-dimensional image by using the preset deep learning target detection model, and the football detection is output. frame and confidence; if the confidence is greater than a preset confidence threshold, the football pose is determined according to the three-dimensional point cloud data in the football detection frame.
  • FIG. 12 shows a structural diagram of an embodiment of a football detection apparatus provided by an embodiment of the present application.
  • a football detection device may include:
  • the data acquisition module 1201 is used to acquire the 2D image and 3D point cloud data of the target area through the depth camera of the robot;
  • a football detection module 1202 configured to use a preset deep learning target detection model to perform football detection in the two-dimensional image, and output a football detection frame and a confidence level;
  • the pose determination module 1203 is configured to determine the soccer pose according to the three-dimensional point cloud data in the soccer detection frame if the confidence is greater than a preset confidence threshold.
  • the pose determination module may include:
  • a point cloud set construction unit configured to construct the three-dimensional point cloud data in the football detection frame into a point cloud set
  • a point cloud subset selection unit for selecting a point cloud subset from the point cloud set
  • a spherical model fitting unit used for performing spherical model fitting on the point cloud subset to obtain candidate spherical models
  • an error rate calculation unit configured to calculate the error rate of the candidate sphere model according to the point cloud set
  • a pose determination unit configured to determine the soccer pose according to the candidate sphere model with the smallest error rate if the error rate is smaller than the error threshold or the number of iterations is greater than the iteration threshold.
  • the error rate calculation unit may include:
  • a deviation value calculation subunit used to calculate the deviation value between each sample point in the point cloud set and the candidate sphere model
  • an intra-office point determination subunit configured to determine a sample point whose deviation value is less than a preset deviation threshold value as an intra-office point belonging to the candidate sphere model
  • An error rate calculation subunit configured to calculate the error rate of the candidate sphere model according to the number of the in-office points.
  • the football detection device may also include:
  • the data set building module is used to build each football image obtained from the preset data source into the original data set;
  • the data set conversion module is used to clean and preprocess the original data set, and convert it into a VOC data set;
  • a model training module is used to train the deep learning target detection model by using the VOC data set to obtain a trained deep learning target detection model.
  • the football detection device may also include:
  • the binarization processing module is used to perform binarization processing on the two-dimensional image in the football detection frame to obtain a binarized image
  • a contour detection module used for segmenting the football contour in the binarized image according to the maximum contour detection algorithm
  • a soccer pose determination module configured to determine the soccer pose according to the soccer outline.
  • the football detection device may also include:
  • the color space transformation module is used for performing color space transformation on the two-dimensional image in the football detection frame, from RGB color space to HSV color space.
  • FIG. 13 shows a schematic block diagram of a robot provided by an embodiment of the present application. For convenience of description, only parts related to the embodiment of the present application are shown.
  • the robot 13 of this embodiment includes a processor 130 , a memory 131 , and a computer program 132 stored in the memory 131 and executable on the processor 130 .
  • the processor 130 executes the computer program 132
  • the steps in each of the above embodiments of the soccer detection method are implemented, for example, steps S101 to S103 shown in FIG. 1 .
  • the processor 130 executes the computer program 132
  • the functions of the modules/units in the foregoing device embodiments are implemented, for example, the functions of the modules 1201 to 1203 shown in FIG. 12 .
  • the computer program 132 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 131 and executed by the processor 130 to complete the this application.
  • the one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 132 in the robot 13 .
  • FIG. 13 is only an example of the robot 13, and does not constitute a limitation to the robot 13. It may include more or less components than the one shown, or combine some components, or different components, such as The robot 13 may also include input and output devices, network access devices, buses, and the like.
  • the processor 130 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 131 may be an internal storage unit of the robot 13 , such as a hard disk or a memory of the robot 13 .
  • the memory 131 can also be an external storage device of the robot 13, such as a plug-in hard disk equipped on the robot 13, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, Flash card (Flash Card) and so on.
  • the memory 131 may also include both an internal storage unit of the robot 13 and an external storage device.
  • the memory 131 is used to store the computer program and other programs and data required by the robot 13 .
  • the memory 131 may also be used to temporarily store data that has been output or will be output.
  • the disclosed apparatus/robot and method may be implemented in other ways.
  • the device/robot embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated modules/units if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing the relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) ), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content contained in the computer-readable storage medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, computer-readable Storage media exclude electrical carrier signals and telecommunications signals.

Abstract

一种足球检测方法、装置、计算机可读存储介质及机器人, 属于机器人技术领域。所述方法通过机器人的深度相机采集目标区域的二维图像和三维点云数据(S101);使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度(S102);若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿(S103)。通过所述方法,基于深度学习目标检测技术和三维点云技术,充分结合了二维图像和三维点云数据的优势,即使在使用轻量级模型的情况下,也能达到极高的准确率。

Description

一种足球检测方法、装置、计算机可读存储介质及机器人
本申请要求于2020年9月27日在中国专利局提交的、申请号为202011030724.2的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于机器人技术领域,尤其涉及一种足球检测方法、装置、计算机可读存储介质及机器人。
背景技术
传统足球检测方法都是基于单一的二维图像,通过训练目标检测模型实现寻找足球的位姿信息,然而,提升足球检测的准确率需要参数量和运算能力均很高的重量级模型,但人形机器人的硬件平台一般不支持GPU加速,因此在保证帧率较高的前提下,只能使用轻量级模型,从而导致足球检测的结果准确率较低。
技术问题
有鉴于此,本申请实施例提供了一种足球检测方法、装置、计算机可读存储介质及机器人,以解决现有的足球检测方法准确率较低的问题。
技术解决方案
本申请实施例的第一方面提供了一种足球检测方法,可以包括:
通过机器人的深度相机采集目标区域的二维图像和三维点云数据;
使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度;
若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿。
进一步地,所述根据所述足球检测框中的三维点云数据确定足球位姿,包括:
将所述足球检测框中的三维点云数据构造为点云集合;
从所述点云集合中选取一个点云子集;
对所述点云子集进行球体模型拟合,得到候选球体模型;
根据所述点云集合计算所述候选球体模型的误差率;
若所述误差率大于或等于预设的误差阈值,则返回执行所述从所述点云集合中选取一个点云子集的步骤及其后续步骤,直至所述误差率小于所述误差阈值或迭代次数大于预设的迭代阈值为止;
若所述误差率小于所述误差阈值或所述迭代次数大于所述迭代阈值,则根据误差率最小的候选球体模型确定所述足球位姿。
进一步地,所述根据所述点云集合计算所述候选球体模型的误差率,包括:
分别计算所述点云集合中的各个样本点与所述候选球体模型之间的偏差值;
将偏差值小于预设的偏差阈值的样本点确定为属于所述候选球体模型的局内点;
根据所述局内点的数目计算所述候选球体模型的误差率。
进一步地,在使用预设的深度学习目标检测模型在所述二维图像中进行足球检测之前,还包括:
将从预设的数据源获取的各张足球图像构建为原始数据集;
对所述原始数据集进行清洗和预处理,并转化为VOC数据集;
使用所述VOC数据集对所述深度学习目标检测模型进行训练,得到训练好的深度学习目标检测模型。
可选地,所述足球检测方法还可以包括:
对所述足球检测框中的二维图像进行二值化处理,得到二值化图像;
根据最大轮廓检测算法分割出所述二值化图像中的足球轮廓;
根据所述足球轮廓确定所述足球位姿。
可选地,在对所述足球检测框中的二维图像进行二值化处理之前,还可以包括:
对所述足球检测框中的二维图像进行颜色空间变换,由RGB颜色空间变换为HSV颜色空间。
优选地,所述深度学习目标检测模型为MobileNet-SSD模型。
本申请实施例的第二方面提供了一种足球检测装置,可以包括:
数据采集模块,用于通过机器人的深度相机采集目标区域的二维图像和三维点云数据;
足球检测模块,用于使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度;
位姿确定模块,用于若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿。
进一步地,所述位姿确定模块可以包括:
点云集合构造单元,用于将所述足球检测框中的三维点云数据构造为点云集合;
点云子集选取单元,用于从所述点云集合中选取一个点云子集;
球体模型拟合单元,用于对所述点云子集进行球体模型拟合,得到候选球体模型;
误差率计算单元,用于根据所述点云集合计算所述候选球体模型的误差率;
位姿确定单元,用于若所述误差率小于所述误差阈值或所述迭代次数大于所述迭代阈值,则根据误差率最小的候选球体模型确定所述足球位姿。
进一步地,所述误差率计算单元可以包括:
偏差值计算子单元,用于分别计算所述点云集合中的各个样本点与所述候选球体模型之间的偏差值;
局内点确定子单元,用于将偏差值小于预设的偏差阈值的样本点确定为属于所述候选球体模型的局内点;
误差率计算子单元,用于根据所述局内点的数目计算所述候选球体模型的误差率。
进一步地,所述足球检测装置还可以包括:
数据集构建模块,用于将从预设的数据源获取的各张足球图像构建为原始数据集;
数据集转化模块,用于对所述原始数据集进行清洗和预处理,并转化为VOC数据集;
模型训练模块,用于使用所述VOC数据集对所述深度学习目标检测模型进行训练,得到训练好的深度学习目标检测模型。
可选地,所述足球检测装置还可以包括:
二值化处理模块,用于对所述足球检测框中的二维图像进行二值化处理,得到二值化图像;
轮廓检测模块,用于根据最大轮廓检测算法分割出所述二值化图像中的足球轮廓;
足球位姿确定模块,用于根据所述足球轮廓确定所述足球位姿。
可选地,所述足球检测装置还可以包括:
颜色空间变换模块,用于对所述足球检测框中的二维图像进行颜色空间变换,由RGB颜色空间变换为HSV颜色空间。
本申请实施例的第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种足球检测方法的步骤。
本申请实施例的第四方面提供了一种机器人,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任一种足球检测方法的步骤。
本申请实施例的第五方面提供了一种计算机程序产品,当计算机程序产品在机器人上运行时,使得机器人执行上述任一种足球检测方法的步骤。
有益效果
本申请实施例与现有技术相比存在的有益效果是:本申请实施例通过机器人的深度相 机采集目标区域的二维图像和三维点云数据;使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度;若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿。通过本申请实施例,基于深度学习目标检测技术和三维点云技术,充分结合了二维图像和三维点云数据的优势,即使在使用轻量级模型的情况下,也能达到极高的准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本申请实施例中一种足球检测方法的一个实施例流程图;
图2为深度可分离卷积的结构示意图;
图3为MobileNet-SSD的网络结构示意图;
图4为对深度学习目标检测模型进行模型训练的示意流程图;
图5为部分足球图像的示意图;
图6为根据足球检测框中的三维点云数据确定足球位姿的示意流程图;
图7为点云集合的可视化示意图;
图8为足球表面的点云数据的示意图;
图9为通过足球模型拟合确定质心和半径等足球位姿信息的示意图;
图10为基于颜色空间阈值法进行足球分割的示意流程图;
图11为基于HSV颜色空间进行足球分割的示意图,
图12为本申请实施例中一种足球检测装置的一个实施例结构图;
图13为本申请实施例中一种机器人的示意框图。
本发明的实施方式
为使得本申请的发明目的、特征、优点能够更加的明显和易懂,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本申请一部分实施例,而非全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
另外,在本申请的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
请参阅图1,本申请实施例中一种足球检测方法的一个实施例可以包括:
步骤S101、通过机器人的深度相机采集目标区域的二维图像和三维点云数据。
所述深度相机可以是内置于所述机器人中的内部器件,也可以是外置于所述机器人的 外接器件。在本申请实施例的一种具体实现中,优选将所述深度相机内置于所述机器人的胯部位置,用于采集所述机器人前进方向的区域,也即所述目标区域的二维图像和三维点云数据。
步骤S102、使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度。
所述深度学习目标检测模型可以是现有技术中的任意一种轻量级模型,在本申请实施例的一种具体实现中,优选采用MobileNet-SSD模型,MobileNet-SSD模型为仿照VGG16-SSD的多尺度卷积结构将VGG16换成MobileNet V1所得到的模型。
MobileNet V1是一种适用于手机等移动端的轻量级卷积神经网络模型,该网络主要用于轻量级深度学习的分类、检测、嵌入和分割等任务中提取图像卷积特征。MobileNet V1网络的核心贡献和创新点在于提出了一种深度可分离卷积(Depthwise Separable Convolution)将卷积核进行分解运算,降低了网络的计算量和参数量。图2所示即为深度可分离卷积的结构示意图,深度可分离卷积进一步可分为深度卷积(Depthwise Convolution)和一个1*1的逐点卷积(Pointwise Convolution)。若将深度卷积和逐点卷积分开来算,MobileNet V1共有28层,相较于相同层数的标准卷积运算,使用3*3卷积核的MobileNet V1牺牲一个百分点准确率的前提下可以将计算量可以减少8-9倍。
SSD(Single Shot MultiBox Detector)目标检测网络采用VGG16作为基础提取特征的模型,加入基于特征金字塔的多尺度特征检测方式生成一个固定大小的检测框(Bounding Box)集合以及检测框实例存在的分数(score),然后通过非极大值抑制(Non-Maximum Suppression,NMS)完成最终检测的one-stage目标检测网络。而MobileNet-SSD网络仿照VGG16-SSD的多尺度卷积结构将VGG16换成MobileNet V1(如图3所示),此处简单对比一下二者的区别,VGG16网络的参数量和计算量分别为1.38*10^8和1.55*10^10FLOPs(Floating-point Operations Per Second),MobileNet V1网络的参数量和计算量分别为4.2*10^6和0.569*10^9FLOPs,由此可见MobileNet V1网络的参数量约为VGG-16的3.04%,计算量约为VGG-16的3.67%,极大地降低了网络的参数量和计算量。
图3所示即为MobileNet-SSD的网络结构示意图,将MobileNet V1的各层卷积层依次记为Conv0、Conv1、Conv2、…、Conv13,MobileNet-SSD网络在MobileNet V1的Conv13后进一步添加了8层卷积层,依次记为Conv14_1、Conv14_2、Conv15_1、Conv15_2、Conv16_1、Conv16_2、Conv17_1、Conv17_2,总共抽取6层卷积层的特征图(Feature Map)进行检测(Detection),这6层卷积层分别为Conv11,Conv13,Conv14_2,Conv15_2,Conv16_2,Conv17_2,这6层卷积层的特征图每个单元(cell)产生的默认框个数分别为3,6,6,6,6,6,即在6层卷积层后边接的用于坐标回归的3*3的卷积核的输出个数分别为12,24,24,24,24,24,后接非极大值抑制层实现目标检测。
在使用所述深度学习目标检测模型之前,可以通过如图4所示的过程对其进行模型训练:
步骤S401、将从预设的数据源获取的各张足球图像构建为原始数据集。
具体使用的数据源以及足球图像的张数可以根据实际情况进行设置,本申请实施例对此不作具体限定。在本申请实施例的一种具体实现中,所述原始数据集共包括16018张足球图像,可以分为7688张从OpenImage中获取的足球图像、1300张从ImageNet中获取的足球图像、自行采集并标注的3102张机器人世界杯(Robocup)比赛的足球图像以及3928张从网络爬取的足球图像。图5所示即为获取的部分足球图像的示意图。
步骤S402、对所述原始数据集进行清洗和预处理,并转化为VOC数据集。
原始数据集覆盖了包含室外足球场、天空、沙滩、室内地面、桌面等不同场景,白天、傍晚和黑夜等不同光线强度,不同颜色以及不同纹理特征的足球图像,由于数据和特征决定了模型算法的上限,为了提高模型的精准度,可以对所述原始数据集进行清洗和预处理,具体地,可以清洗掉足球模糊、尺寸过小、颜色单一、遮挡严重等不符合要求的图像,然 后对剩余图像进行预处理,将每张图像的尺寸统一为300*300,对图像像素强度进行归一化和减均值处理,并通过镜像翻转等方式进行数据扩增(Data Augmentation)。
进一步地,还可以将清洗和预处理之后的数据集转化为VOC格式,从而得到VOC数据集。在VOC格式中,可以分为Annotations、ImageSets和JPEGImages三个文件夹,ImageSets文件夹中存放的是原始图像,命名方式为000000.jpg-016017.jpg;Annotations中存在的是每张图像的XML格式的目标检测框(Bounding Box)和类别(class)标签;ImageSets中则存放了训练集、验证集和测试集的划分方式,训练集,验证集和测试集的比例为70%:10%:20%,并通过create_data.sh:生成lmdb格式的训练集和测试数据集,至此完成数据集的准备阶段。
步骤S403、使用所述VOC数据集对所述深度学习目标检测模型进行训练,得到训练好的深度学习目标检测模型。
具体的训练参数可以根据实际情况进行设置,在本申请实施例的一种具体实现中,在基于Caffe框架的MobileNet-SSD模型的微调(Fine-tuning)训练阶段中,初始学习率为0.001,学习率衰减因子为0.4,权重衰减因子为0.00005,批尺寸(batch size)为24,网络优化器为RMSProp算法,当损失函数(loss)不再降低时下降学习率,训练过程中共下降两次学习率,训练迭代的总次数为57000次(steps)。
在训练完成后,即可使用训练好的深度学习目标检测模型在所述二维图像中进行足球检测,该模型输出所述二维图像中包含足球的矩形检测框,即所述足球检测框四个顶点的坐标以及该检测框包含足球的置信度(confidence)。所述置信度越大,则说明该检测框中包含足球的概率越大,反之,所述置信度越小,则说明该检测框中包含足球的概率越小。在本申请实施例中,可以预先设置一个置信度阈值,所述置信度阈值的具体取值可以根据实际情况设置,本申请实施例对此不作具体限定。若所述置信度小于或等于所述置信度阈值,则认为该检测框的可信度较低,不再执行后续步骤,重新采集数据并进行足球检测。若所述置信度大于所述置信度阈值,则执行步骤S103。
步骤S103、根据所述足球检测框中的三维点云数据确定足球位姿。
如图6所示,步骤S103具体可以包括如下过程:
步骤S1031、将所述足球检测框中的三维点云数据构造为点云集合。
每一个三维点云数据均包括在预设的X轴、Y轴和Z轴上的坐标,一般地,所述足球检测框中既有足球的三维点云数据,也有地面及周边环境的三维点云数据,这些三维点云数据所组成的集合即为所述点云集合,图7所示即为所述点云集合的可视化示意图。
步骤S1032、从所述点云集合中选取一个点云子集。
将所述点云集合记为S,则在每次迭代时,可以根据实际情况从集合S中随机选取一个点云子集,将其记为Si,即
Figure PCTCN2020139859-appb-000001
步骤S1033、对所述点云子集进行球体模型拟合,得到候选球体模型。
具体地,可以使用最小方差估计法对所述点云子集进行球体模型拟合,拟合出球体模型的参数,从而得到一个球体模型,也即所述候选球体模型,将其记为Mi。
步骤S1034、根据所述点云集合计算所述候选球体模型的误差率。
具体地,可以分别计算所述点云集合中的各个样本点与所述候选球体模型之间的偏差值,在本申请实施例的一种具体实现中,对于某一样本点来说,可以将其距离所述候选球体模型的表面的最短距离作为其偏差值。然后,可以根据偏差值确定局内点(Inliers)和局外点(Outliers),将偏差值小于预设的偏差阈值的样本点确定为属于所述候选球体模型的局内点,将偏差值大于或等于所述偏差阈值的样本点确定为不属于所述候选球体模型的局外点。所述偏差阈值的具体取值可以根据实际情况设置,本申请实施例对此不作具体限定。接着,记录所述局内点的数目,将其记为Numi,并根据所述局内点的数目计算所述候选球体模型的误差率。所述局内点的数目与所述候选球体模型的误差率正相关,即局内点的数目越多,则误差率越小,反之,局内点的数目越少,则误差率越大。例如,可以将局内点 的数目与所述点云集合中的样本点总数之比作为所述误差率。
步骤S1035、判断是否满足循环终止条件。
通过步骤S1032至步骤S1034完成了一次迭代过程,每次迭代完成之后,需要判断是否已满足预设的循环终止条件,所述循环终止条件可以为所述误差率小于预设的误差阈值或迭代次数大于预设的迭代阈值。所述误差阈值和所述迭代阈值的具体取值可以根据实际情况设置,本申请实施例对其均不作具体限定。
若不满足所述循环终止条件,则返回执行步骤S1032及其后续步骤,即重新随机选取一个点云子集并进行球体模型拟合,若满足所述循环终止条件,则执行步骤S1036。
步骤S1036、根据误差率最小的候选球体模型确定所述足球位姿。
每次迭代完成之后,都需要记录本次迭代的局内点的集合、并更新误差率最小、也即局内点数目最多时的最佳模型参数,循环终止时得到的最佳模型参数就是最终球体模型的估计值,对应的局内点的集合便是足球表面的点云数据,如图8所示。最后,根据球体模型的参数即可以还原整个足球的位姿信息,如图9所示。
可选地,在本申请实施例的另一种具体实现中,还可以不使用步骤S103中所述的方法,而是基于颜色空间阈值法进行足球分割,并确定所述足球位姿。具体地,如图10所示,可以对所述足球检测框中的二维图像进行灰度处理及二值化处理,得到二值化图像。由于一般情况下,足球和草地、地面等背景的像素灰度值存在显著性差异,在足球的边界处灰度值变化剧烈,通过设置合理的阈值可以将背景和足球区分开,在本申请实施例中,可以优选使用OTSU最大类间方差阈值选择法来进行二值化处理。在得到二值化图像之后,可以根据最大轮廓检测算法分割出所述二值化图像中的足球轮廓,并根据所述足球轮廓确定包括质心坐标和半径在内的足球位姿。
以上方法的缺点在于,如果足球纹理非常不规则且不均匀时,轮廓检测算法会得到非常多的轮廓,因此在本申请实施例中还可以使用基于HSV颜色空间的足球分割方法。图11所示即为基于HSV颜色空间进行足球分割的示意图,HSV比RGB更接近人们对彩色的感知经验。非常直观地表达颜色的色调、鲜艳程度和明暗程度,方便进行颜色的对比。HSV中H表示色调、S表示饱和度、V表示明度,每一种颜色在HSV空间都处于固定的范围内,譬如已知绿色是[60,255,255],所有的绿色都在[45,100,50]到[75,255,255]之间,即[60-15,100,50]到[60+15,255,255],其中15是近似值。在对所述足球检测框中的二维图像进行二值化处理之前,可以对所述足球检测框中的二维图像进行颜色空间变换,由RGB颜色空间变换为HSV颜色空间,空间变换后的分割足球的步骤与前述过程相同,此处不再赘述。
阈值分割法分割足球的前提是足球和背景的颜色或灰度值存在着显著差异,然而实际场景中这一条件并不能总是被满足,譬如在白色瓷砖的地面上,加上灯光在地面的反光效果,使得足球的白色部分和地面融为一体,此时检测到轮廓不再是足球的圆形区域、算法的效果也会变得很差。在这种情况下则适合使用步骤S103中所述的基于三维点云数据的球体模型拟合法分割足球。
实验数据表明,MobileNet-SSD网络代码加速后足球检测部分耗时60-70毫秒,球体模型拟合部分耗时0.1-5毫秒,检测帧率13Hz。结合跟踪算法加速后使用一边检测一边跟踪的方案,检测帧率可以提升至25Hz,满足实时检测的要求。此外,借助于三维点云数据球体模型拟合的筛选,可以排除场景中非球体以及球体直径不满足要求的误检测情况,极大地提升了实际三维场景中足球检测的查准率、召回率和准确率。
综上所述,本申请实施例通过机器人的深度相机采集目标区域的二维图像和三维点云数据;使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度;若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿。通过本申请实施例,基于深度学习目标检测技术和三维点云技术,充分结合了二维图像和三维点云数据的优势,即使在使用轻量级模型的情况下,也能达到极 高的准确率。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于上文实施例所述的一种足球检测方法,图12示出了本申请实施例提供的一种足球检测装置的一个实施例结构图。
本实施例中,一种足球检测装置可以包括:
数据采集模块1201,用于通过机器人的深度相机采集目标区域的二维图像和三维点云数据;
足球检测模块1202,用于使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度;
位姿确定模块1203,用于若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿。
进一步地,所述位姿确定模块可以包括:
点云集合构造单元,用于将所述足球检测框中的三维点云数据构造为点云集合;
点云子集选取单元,用于从所述点云集合中选取一个点云子集;
球体模型拟合单元,用于对所述点云子集进行球体模型拟合,得到候选球体模型;
误差率计算单元,用于根据所述点云集合计算所述候选球体模型的误差率;
位姿确定单元,用于若所述误差率小于所述误差阈值或所述迭代次数大于所述迭代阈值,则根据误差率最小的候选球体模型确定所述足球位姿。
进一步地,所述误差率计算单元可以包括:
偏差值计算子单元,用于分别计算所述点云集合中的各个样本点与所述候选球体模型之间的偏差值;
局内点确定子单元,用于将偏差值小于预设的偏差阈值的样本点确定为属于所述候选球体模型的局内点;
误差率计算子单元,用于根据所述局内点的数目计算所述候选球体模型的误差率。
进一步地,所述足球检测装置还可以包括:
数据集构建模块,用于将从预设的数据源获取的各张足球图像构建为原始数据集;
数据集转化模块,用于对所述原始数据集进行清洗和预处理,并转化为VOC数据集;
模型训练模块,用于使用所述VOC数据集对所述深度学习目标检测模型进行训练,得到训练好的深度学习目标检测模型。
可选地,所述足球检测装置还可以包括:
二值化处理模块,用于对所述足球检测框中的二维图像进行二值化处理,得到二值化图像;
轮廓检测模块,用于根据最大轮廓检测算法分割出所述二值化图像中的足球轮廓;
足球位姿确定模块,用于根据所述足球轮廓确定所述足球位姿。
可选地,所述足球检测装置还可以包括:
颜色空间变换模块,用于对所述足球检测框中的二维图像进行颜色空间变换,由RGB颜色空间变换为HSV颜色空间。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置,模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
图13示出了本申请实施例提供的一种机器人的示意框图,为了便于说明,仅示出了与本申请实施例相关的部分。
如图13所示,该实施例的机器人13包括:处理器130、存储器131以及存储在所述存储器131中并可在所述处理器130上运行的计算机程序132。所述处理器130执行所述计算机程序132时实现上述各个足球检测方法实施例中的步骤,例如图1所示的步骤S101至步骤S103。或者,所述处理器130执行所述计算机程序132时实现上述各装置实施例中各模块/单元的功能,例如图12所示模块1201至模块1203的功能。
示例性的,所述计算机程序132可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器131中,并由所述处理器130执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序132在所述机器人13中的执行过程。
本领域技术人员可以理解,图13仅仅是机器人13的示例,并不构成对机器人13的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述机器人13还可以包括输入输出设备、网络接入设备、总线等。
所述处理器130可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器131可以是所述机器人13的内部存储单元,例如机器人13的硬盘或内存。所述存储器131也可以是所述机器人13的外部存储设备,例如所述机器人13上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器131还可以既包括所述机器人13的内部存储单元也包括外部存储设备。所述存储器131用于存储所述计算机程序以及所述机器人13所需的其它程序和数据。所述存储器131还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/机器人和方法,可以通过其它的方式实现。例如,以上所描述的装置/机器人实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读存储介质不包括电载波信号和电信信号。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种足球检测方法,其特征在于,包括:
    通过机器人的深度相机采集目标区域的二维图像和三维点云数据;
    使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度;
    若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿。
  2. 根据权利要求1所述的足球检测方法,其特征在于,所述根据所述足球检测框中的三维点云数据确定足球位姿,包括:
    将所述足球检测框中的三维点云数据构造为点云集合;
    从所述点云集合中选取一个点云子集;
    对所述点云子集进行球体模型拟合,得到候选球体模型;
    根据所述点云集合计算所述候选球体模型的误差率;
    若所述误差率大于或等于预设的误差阈值,则返回执行所述从所述点云集合中选取一个点云子集的步骤及其后续步骤,直至所述误差率小于所述误差阈值或迭代次数大于预设的迭代阈值为止;
    若所述误差率小于所述误差阈值或所述迭代次数大于所述迭代阈值,则根据误差率最小的候选球体模型确定所述足球位姿。
  3. 根据权利要求2所述的足球检测方法,其特征在于,所述根据所述点云集合计算所述候选球体模型的误差率,包括:
    分别计算所述点云集合中的各个样本点与所述候选球体模型之间的偏差值;
    将偏差值小于预设的偏差阈值的样本点确定为属于所述候选球体模型的局内点;
    根据所述局内点的数目计算所述候选球体模型的误差率。
  4. 根据权利要求1所述的足球检测方法,其特征在于,在使用预设的深度学习目标检测模型在所述二维图像中进行足球检测之前,还包括:
    将从预设的数据源获取的各张足球图像构建为原始数据集;
    对所述原始数据集进行清洗和预处理,并转化为VOC数据集;
    使用所述VOC数据集对所述深度学习目标检测模型进行训练,得到训练好的深度学习目标检测模型。
  5. 根据权利要求1所述的足球检测方法,其特征在于,还包括:
    对所述足球检测框中的二维图像进行二值化处理,得到二值化图像;
    根据最大轮廓检测算法分割出所述二值化图像中的足球轮廓;
    根据所述足球轮廓确定所述足球位姿。
  6. 根据权利要求5所述的足球检测方法,其特征在于,在对所述足球检测框中的二维图像进行二值化处理之前,还包括:
    对所述足球检测框中的二维图像进行颜色空间变换,由RGB颜色空间变换为HSV颜色空间。
  7. 根据权利要求1至6中任一项所述的足球检测方法,其特征在于,所述深度学习目标检测模型为MobileNet-SSD模型。
  8. 一种足球检测装置,其特征在于,包括:
    数据采集模块,用于通过机器人的深度相机采集目标区域的二维图像和三维点云数据;
    足球检测模块,用于使用预设的深度学习目标检测模型在所述二维图像中进行足球检测,输出足球检测框和置信度;
    位姿确定模块,用于若所述置信度大于预设的置信度阈值,则根据所述足球检测框中的三维点云数据确定足球位姿。
  9. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在 于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的足球检测方法的步骤。
  10. 一种机器人,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的足球检测方法的步骤。
PCT/CN2020/139859 2020-09-27 2020-12-28 一种足球检测方法、装置、计算机可读存储介质及机器人 WO2022062238A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011030724.2 2020-09-27
CN202011030724.2A CN112215861A (zh) 2020-09-27 2020-09-27 一种足球检测方法、装置、计算机可读存储介质及机器人

Publications (1)

Publication Number Publication Date
WO2022062238A1 true WO2022062238A1 (zh) 2022-03-31

Family

ID=74050774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139859 WO2022062238A1 (zh) 2020-09-27 2020-12-28 一种足球检测方法、装置、计算机可读存储介质及机器人

Country Status (2)

Country Link
CN (1) CN112215861A (zh)
WO (1) WO2022062238A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160324A (zh) * 2021-03-31 2021-07-23 北京京东乾石科技有限公司 包围框生成方法、装置、电子设备和计算机可读介质
CN113052835B (zh) * 2021-04-20 2024-02-27 江苏迅捷装具科技有限公司 一种基于三维点云与图像数据融合的药盒检测方法及其检测系统
CN113591901A (zh) * 2021-06-10 2021-11-02 中国航天时代电子有限公司 一种基于锚框的目标检测方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229548A (zh) * 2017-12-27 2018-06-29 华为技术有限公司 一种物体检测方法及装置
CN109102547A (zh) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 基于物体识别深度学习模型的机器人抓取位姿估计方法
CN110032962A (zh) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 一种物体检测方法、装置、网络设备和存储介质
US20190304134A1 (en) * 2018-03-27 2019-10-03 J. William Mauchly Multiview Estimation of 6D Pose
CN110544279A (zh) * 2019-08-26 2019-12-06 华南理工大学 一种结合图像识别和遗传算法精配准的位姿估计方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109541632B (zh) * 2018-09-30 2022-06-03 天津大学 一种基于四线激光雷达辅助的目标检测漏检改进方法
CN110634161B (zh) * 2019-08-30 2023-05-05 哈尔滨工业大学(深圳) 一种基于点云数据的工件位姿快速高精度估算方法及装置
CN110942449B (zh) * 2019-10-30 2023-05-23 华南理工大学 一种基于激光与视觉融合的车辆检测方法
CN111178250B (zh) * 2019-12-27 2024-01-12 深圳市越疆科技有限公司 物体识别定位方法、装置及终端设备
CN111191582B (zh) * 2019-12-27 2022-11-01 深圳市越疆科技有限公司 三维目标检测方法、检测装置、终端设备及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229548A (zh) * 2017-12-27 2018-06-29 华为技术有限公司 一种物体检测方法及装置
US20190304134A1 (en) * 2018-03-27 2019-10-03 J. William Mauchly Multiview Estimation of 6D Pose
CN109102547A (zh) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 基于物体识别深度学习模型的机器人抓取位姿估计方法
CN110032962A (zh) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 一种物体检测方法、装置、网络设备和存储介质
CN110544279A (zh) * 2019-08-26 2019-12-06 华南理工大学 一种结合图像识别和遗传算法精配准的位姿估计方法

Also Published As

Publication number Publication date
CN112215861A (zh) 2021-01-12

Similar Documents

Publication Publication Date Title
WO2022062238A1 (zh) 一种足球检测方法、装置、计算机可读存储介质及机器人
US20200349765A1 (en) Object modeling and movement method and apparatus, and device
CN107066935B (zh) 基于深度学习的手部姿态估计方法及装置
CN110196053B (zh) 一种基于fpga的实时田间机器人视觉导航方法与系统
CN109146948B (zh) 基于视觉的作物长势表型参数量化与产量相关性分析方法
US9275277B2 (en) Using a combination of 2D and 3D image data to determine hand features information
CN108388882B (zh) 基于全局-局部rgb-d多模态的手势识别方法
EP3545497B1 (en) System for acquiring a 3d digital representation of a physical object
CN110827398B (zh) 基于深度神经网络的室内三维点云自动语义分割方法
WO2017132636A1 (en) Systems and methods for extracting information about objects from scene information
EP4036790A1 (en) Image display method and device
Pound et al. A patch-based approach to 3D plant shoot phenotyping
CN109448086B (zh) 基于稀疏实采数据的分拣场景平行数据集构建方法
WO2023024441A1 (zh) 模型重建方法及相关装置、电子设备和存储介质
CN107292896A (zh) 基于Snake模型的轮廓提取方法
CN206021358U (zh) 一种物体成像装置及其机器人
WO2024088445A1 (zh) 一种基于视觉语义矢量的车辆导引方法、系统、设备和介质
CN111598149B (zh) 一种基于注意力机制的回环检测方法
WO2024088071A1 (zh) 三维场景重建方法、装置、设备及存储介质
Maltezos et al. Improving the visualisation of 3D textured models via shadow detection and removal
CN113658274B (zh) 用于灵长类动物种群行为分析的个体间距自动计算方法
CN115761191A (zh) 一种基于增强现实的发动机数字化装配系统
Nguyen et al. High resolution 3d content creation using unconstrained and uncalibrated cameras
CN205692214U (zh) 一种单目视觉位姿测量系统
CN113361475A (zh) 一种基于多阶段特征融合信息复用的多光谱行人检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20955080

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20955080

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.09.2023)