CN112215861A

CN112215861A - Football detection method and device, computer readable storage medium and robot

Info

Publication number: CN112215861A
Application number: CN202011030724.2A
Authority: CN
Inventors: 王东; 张惊涛; 胡淑萍; 白杰; 麻星星; 程骏; 郭渺辰; 顾在旺; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2021-01-12
Also published as: WO2022062238A1

Abstract

The application belongs to the technical field of robots, and particularly relates to a football detection method, a football detection device, a computer-readable storage medium and a robot. The method includes the steps that a depth camera of a robot is used for collecting two-dimensional images and three-dimensional point cloud data of a target area; performing football detection in the two-dimensional image by using a preset deep learning target detection model, and outputting a football detection frame and confidence; and if the confidence coefficient is greater than a preset confidence coefficient threshold value, determining the football pose according to the three-dimensional point cloud data in the football detection frame. According to the method and the device, based on the deep learning target detection technology and the three-dimensional point cloud technology, the advantages of the two-dimensional image and the three-dimensional point cloud data are fully combined, and extremely high accuracy can be achieved even under the condition that a lightweight model is used.

Description

Football detection method and device, computer readable storage medium and robot

Technical Field

The application belongs to the technical field of robots, and particularly relates to a football detection method, a football detection device, a computer-readable storage medium and a robot.

Background

The traditional football detection method is based on a single two-dimensional image, the pose information of the football is searched by training a target detection model, however, a heavy model with high parameters and high computing capability is needed for improving the football detection accuracy, but a hardware platform of the humanoid robot does not support GPU acceleration generally, so that only a light model can be used on the premise of ensuring high frame rate, and the football detection result accuracy is low.

Disclosure of Invention

In view of this, embodiments of the present application provide a method and an apparatus for detecting a soccer ball, a computer-readable storage medium, and a robot, so as to solve the problem of low accuracy in the conventional soccer ball detection method.

A first aspect of an embodiment of the present application provides a method for detecting a soccer ball, which may include:

acquiring a two-dimensional image and three-dimensional point cloud data of a target area through a depth camera of a robot;

performing football detection in the two-dimensional image by using a preset deep learning target detection model, and outputting a football detection frame and confidence;

and if the confidence coefficient is greater than a preset confidence coefficient threshold value, determining the football pose according to the three-dimensional point cloud data in the football detection frame.

Further, the determining the football pose according to the three-dimensional point cloud data in the football detection frame includes:

constructing three-dimensional point cloud data in the football detection frame into a point cloud set;

selecting a point cloud subset from the point cloud set;

performing sphere model fitting on the point cloud subset to obtain a candidate sphere model;

calculating an error rate of the candidate sphere model according to the point cloud set;

if the error rate is larger than or equal to a preset error threshold, returning to the step of selecting a point cloud subset from the point cloud set and the subsequent steps until the error rate is smaller than the error threshold or the iteration times is larger than a preset iteration threshold;

and if the error rate is less than the error threshold or the iteration times are greater than the iteration threshold, determining the football pose according to the candidate sphere model with the minimum error rate.

Further, the calculating an error rate of the candidate sphere model from the point cloud set includes:

respectively calculating deviation values between each sample point in the point cloud set and the candidate sphere model;

determining the sample points with deviation values smaller than a preset deviation threshold value as local interior points belonging to the candidate sphere model;

and calculating the error rate of the candidate sphere model according to the number of the local inner points.

Further, before performing soccer detection in the two-dimensional image by using a preset deep learning target detection model, the method further includes:

constructing each football image acquired from a preset data source into an original data set;

cleaning and preprocessing the original data set, and converting the original data set into a VOC data set;

and training the deep learning target detection model by using the VOC data set to obtain the trained deep learning target detection model.

Optionally, the football detection method may further include:

performing binarization processing on the two-dimensional image in the football detection frame to obtain a binarized image;

segmenting the football outline in the binary image according to a maximum outline detection algorithm;

and determining the football pose according to the football outline.

Optionally, before performing binarization processing on the two-dimensional image in the football detection frame, the method may further include:

and carrying out color space transformation on the two-dimensional image in the football detection frame, and transforming the two-dimensional image from the RGB color space to the HSV color space.

Preferably, the deep learning target detection model is a MobileNet-SSD model.

A second aspect of embodiments of the present application provides a football detection apparatus, which may include:

the data acquisition module is used for acquiring a two-dimensional image and three-dimensional point cloud data of a target area through a depth camera of the robot;

the football detection module is used for carrying out football detection in the two-dimensional image by using a preset deep learning target detection model and outputting a football detection frame and confidence;

and the pose determining module is used for determining the pose of the football according to the three-dimensional point cloud data in the football detection frame if the confidence coefficient is greater than a preset confidence coefficient threshold value.

Further, the pose determination module may include:

the point cloud set constructing unit is used for constructing three-dimensional point cloud data in the football detection frame into a point cloud set;

the point cloud subset selecting unit is used for selecting a point cloud subset from the point cloud set;

the sphere model fitting unit is used for performing sphere model fitting on the point cloud subset to obtain a candidate sphere model;

the error rate calculation unit is used for calculating the error rate of the candidate sphere model according to the point cloud set;

and the pose determining unit is used for determining the pose of the football according to the candidate ball model with the minimum error rate if the error rate is less than the error threshold or the iteration times is more than the iteration threshold.

Further, the error rate calculation unit may include:

the deviation value calculation operator unit is used for respectively calculating the deviation value between each sample point in the point cloud set and the candidate sphere model;

the local interior point determining subunit is used for determining the sample points with the deviation values smaller than a preset deviation threshold value as local interior points belonging to the candidate sphere model;

and the error rate calculating subunit is used for calculating the error rate of the candidate sphere model according to the number of the local interior points.

Further, the soccer detection apparatus may further include:

the data set construction module is used for constructing each football image acquired from a preset data source into an original data set;

the data set conversion module is used for cleaning and preprocessing the original data set and converting the original data set into a VOC data set;

and the model training module is used for training the deep learning target detection model by using the VOC data set to obtain a trained deep learning target detection model.

Optionally, the football detection device may further include:

the binarization processing module is used for carrying out binarization processing on the two-dimensional image in the football detection frame to obtain a binarization image;

the contour detection module is used for segmenting the football contour in the binarized image according to a maximum contour detection algorithm;

and the football pose determining module is used for determining the football pose according to the football outline.

Optionally, the football detection device may further include:

and the color space transformation module is used for carrying out color space transformation on the two-dimensional image in the football detection frame, and transforming the two-dimensional image into an HSV color space from an RGB color space.

A third aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of any one of the above-mentioned soccer detection methods.

A fourth aspect of the embodiments of the present application provides a robot, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of any one of the above-mentioned soccer detection methods when executing the computer program.

A fifth aspect of embodiments of the present application provides a computer program product, which, when run on a robot, causes the robot to perform the steps of any of the methods for detecting a soccer ball described above.

Compared with the prior art, the embodiment of the application has the advantages that: according to the embodiment of the application, a two-dimensional image and three-dimensional point cloud data of a target area are collected through a depth camera of a robot; performing football detection in the two-dimensional image by using a preset deep learning target detection model, and outputting a football detection frame and confidence; and if the confidence coefficient is greater than a preset confidence coefficient threshold value, determining the football pose according to the three-dimensional point cloud data in the football detection frame. According to the method and the device, based on the deep learning target detection technology and the three-dimensional point cloud technology, the advantages of the two-dimensional image and the three-dimensional point cloud data are fully combined, and extremely high accuracy can be achieved even under the condition that a lightweight model is used.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flowchart illustrating an embodiment of a method for detecting a soccer ball according to an embodiment of the present application;

FIG. 2 is a schematic diagram of the structure of a depth separable convolution;

FIG. 3 is a schematic diagram of a network structure of a MobileNet-SSD;

FIG. 4 is a schematic flow diagram of model training for a deep learning target detection model;

FIG. 5 is a schematic view of a portion of a soccer ball image;

FIG. 6 is a schematic flow chart of determining a pose of a soccer ball from three-dimensional point cloud data in a soccer ball detection box;

FIG. 7 is a schematic view of a visualization of a point cloud collection;

FIG. 8 is a schematic representation of point cloud data for a soccer ball surface;

FIG. 9 is a schematic diagram of determining football pose information such as centroid and radius by football model fitting;

FIG. 10 is a schematic flow diagram of football segmentation based on color space thresholding;

figure 11 is a schematic diagram of football segmentation based on HSV color space,

FIG. 12 is a block diagram of an embodiment of a soccer ball detection system according to an embodiment of the present application;

fig. 13 is a schematic block diagram of a robot in an embodiment of the present application.

Detailed Description

In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In addition, in the description of the present application, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, an embodiment of a football detection method in an embodiment of the present application may include:

step S101, collecting a two-dimensional image and three-dimensional point cloud data of a target area through a depth camera of the robot.

The depth camera may be an internal device built in the robot or an external device externally mounted to the robot. In a specific implementation of the embodiment of the present application, the depth camera is preferably built in the crotch position of the robot, and is used for acquiring a region of the advancing direction of the robot, namely, a two-dimensional image and three-dimensional point cloud data of the target region.

And S102, performing football detection in the two-dimensional image by using a preset deep learning target detection model, and outputting a football detection frame and confidence.

The deep learning target detection model may be any lightweight model in the prior art, and in a specific implementation of the embodiment of the present application, a MobileNet-SSD model is preferably adopted, and the MobileNet-SSD model is a model obtained by replacing VGG16 with MobileNet V1 by imitating a multi-scale convolution structure of VGG 16-SSD.

The MobileNet V1 is a lightweight convolutional neural network model suitable for mobile terminals such as mobile phones, and the network is mainly used for extracting image convolution characteristics in tasks such as classification, detection, embedding and segmentation of lightweight deep learning. The core contribution and innovation point of the MobileNet V1 network is to provide a deep Separable Convolution (Depthwise Separable Convolition) to carry out decomposition operation on a Convolution kernel, so that the calculated amount and the parameter amount of the network are reduced. Fig. 2 is a schematic diagram of the structure of the depth separable Convolution, and the depth separable Convolution can be further divided into a depth Convolution (Depthwise Convolution) and a 1 × 1 point-by-point Convolution (Pointwise Convolution). If the depth convolution and point-by-point convolution are separated for calculation, the MobileNet V1 has 28 layers in total, and compared with the standard convolution operation with the same number of layers, the calculation amount can be reduced by 8-9 times on the premise that the MobileNet V1 using the 3 × 3 convolution kernel sacrifices one percent of accuracy.

An SSD (Single Shot MultiBox Detector) target detection network adopts a model of VGG16 as a basis for extracting features, adds a multi-scale feature detection mode based on a feature pyramid to generate a detection Box (Bounding Box) set with a fixed size and a score (score) of the detection Box instance, and then completes the final detection of the one-stage target detection network through Non-Maximum Suppression (NMS). And the difference of the simple comparison between the VGG16 and the MobileNet V1 of the MobileNet-SSD network, which is similar to the multi-scale convolution structure of the VGG16-SSD, is that the parameters and calculated quantities of the VGG16 network are 1.38 10^8 and 1.55 10^10FLOPs (flowing-point Operations Per Second), and the parameters and calculated quantities of the MobileNet V1 network are 4.2 ^10 ^6 and 0.569 ^10 ^9FLOPs, respectively, so that the parameters and calculated quantities of the MobileNet V1 network are about 3.04% of the VGG-16 and about 3.67% of the VGG-16, and the parameters and calculated quantities of the network are greatly reduced.

FIG. 3 is a schematic diagram of the network structure of the MobileNet-SSD, in which the convolutional layers of the MobileNet V1 are sequentially labeled as Conv0, Conv1, Conv2, … and Conv13, the MobileNet-SSD network further adds 8 convolutional layers after Conv13 of the MobileNet V1, and is sequentially labeled as Conv14_1, Conv14_2, Conv15_1, Conv15_2, Conv16_1, Conv16_2, Conv17_1 and Conv17_2, and the Feature maps (Feature maps) of 6 convolutional layers are extracted for Detection (Detection), the 6 convolutional layers are Conv11, Conv13, Conv14_2, Conv15_2, Conv16_2 and Conv17_2, the feature map of these 6 convolutional layers generates default number of frames per cell (cell) of 3, 6, 6, 6, 6, 6, namely, the output numbers of 3 × 3 convolution kernels for coordinate regression, which are connected after 6 convolution layers, are respectively 12, 24, 24, 24, 24 and 24, and then a non-maximum value suppression layer is connected to realize target detection.

Before using the deep learning object detection model, it may be model-trained by the process as shown in fig. 4:

step S401, constructing each football image acquired from a preset data source into an original data set.

The specific data source and the number of the soccer images may be set according to actual conditions, and this is not particularly limited in this embodiment of the present application. In a specific implementation of the embodiment of the present application, the original data set includes 16018 football images in total, which can be divided into 7688 football images acquired from OpenImage, 1300 football images acquired from ImageNet, 3102 football images of a robot world cup (Robocup) game collected and labeled by itself, and 3928 football images crawled from the network. Fig. 5 is a schematic diagram of a portion of a soccer ball image being captured.

And S402, cleaning and preprocessing the original data set, and converting the original data set into a VOC data set.

The original Data set covers football images with different light intensities, different colors and different texture characteristics including different scenes such as an outdoor football field, sky, a beach, indoor ground, a desktop and the like, and different light intensities, different colors and different texture characteristics in daytime, evening, night and the like, and as the Data and the characteristics determine the upper limit of a model algorithm, in order to improve the accuracy of the model, the original Data set can be cleaned and preprocessed, specifically, images which are not qualified by football blurring, undersize, single color, serious shielding and the like can be cleaned, then the residual images are preprocessed, the size of each image is unified to 300 x 300, the pixel intensity of the image is normalized and subjected to mean value reduction, and Data amplification (Augmentation) is performed in a mode of mirror image Data overturning and the like.

Further, the data set after washing and preprocessing can also be converted into a VOC format, thereby obtaining a VOC data set. In the VOC format, the image processing method can be divided into three folders of exceptions, ImageSets and JPEGImages, wherein the ImageSets folder stores original images in a naming mode of 000000.jpg-016017. jpg; existing in the indications are an object detection Box (Bounding Box) and a class (class) tag in an XML format of each image; the ImageSets stores the division modes of a training set, a verification set and a test set, wherein the proportion of the training set to the verification set is 70 percent to 10 percent to 20 percent, and the training set and the test data set in the lmdb format are generated through create _ data.sh, so that the preparation stage of the data set is completed.

And S403, training the deep learning target detection model by using the VOC data set to obtain the trained deep learning target detection model.

Specific training parameters can be set according to actual conditions, in a specific implementation of the embodiment of the present application, in a Fine-tuning (Fine-tuning) training phase of a MobileNet-SSD model based on a Caffe framework, an initial learning rate is 0.001, a learning rate attenuation factor is 0.4, a weight attenuation factor is 0.00005, a batch size (batch size) is 24, a network optimizer is an RMSProp algorithm, when a loss function (loss) is not reduced any more, the learning rate is reduced for two times in the training process, and the total number of training iterations is 57000 (steps).

After the training is accomplished, can use the degree of depth learning target detection model that trains to be in carry out the football detection in the two-dimensional image, this model output contain the rectangle detection frame of football in the two-dimensional image, promptly the coordinate of four summits of football detection frame and this detection frame contain the confidence (confidence) of football. The confidence coefficient is larger, the probability that the detection frame contains the football is larger, and conversely, the confidence coefficient is smaller, the probability that the detection frame contains the football is smaller. In this embodiment of the present application, a confidence threshold may be preset, and a specific value of the confidence threshold may be set according to an actual situation, which is not specifically limited in this embodiment of the present application. And if the confidence coefficient is smaller than or equal to the confidence coefficient threshold value, the confidence coefficient of the detection frame is considered to be low, the subsequent steps are not executed, and the data are collected again to perform football detection. If the confidence is greater than the confidence threshold, step S103 is executed.

And S103, determining the football pose according to the three-dimensional point cloud data in the football detection frame.

As shown in fig. 6, step S103 may specifically include the following processes:

and step S1031, constructing the three-dimensional point cloud data in the football detection frame into a point cloud set.

Each three-dimensional point cloud data includes coordinates on preset X, Y and Z axes, and generally, the soccer detection frame includes three-dimensional point cloud data of a soccer ball and three-dimensional point cloud data of a ground and a surrounding environment, a set formed by these three-dimensional point cloud data is the point cloud set, and fig. 7 is a visualization diagram of the point cloud set.

Step S1032, a point cloud subset is selected from the point cloud set.

Recording the point cloud set as S, randomly selecting a point cloud subset from the set S according to actual conditions during each iteration, and recording the point cloud subset as S_iI.e. by

And step S1033, performing sphere model fitting on the point cloud subsets to obtain candidate sphere models.

Specifically, a least square error estimation method may be used to perform sphere model fitting on the point cloud subset to obtain parameters of a sphere model, so as to obtain a sphere model, i.e., the candidate sphere model, which is recorded as M_i。

And S1034, calculating the error rate of the candidate sphere model according to the point cloud set.

Specifically, the deviation values between each sample point in the point cloud set and the candidate sphere model may be calculated, and in a specific implementation of the embodiment of the present application, for a certain sample point, the shortest distance between the certain sample point and the surface of the candidate sphere model may be used as the deviation value. Then, local interior points (Inliers) and local exterior points (Outliers) may be determined from the deviation value, sample points having a deviation value smaller than a preset deviation threshold may be determined as local interior points belonging to the candidate sphere model, and sample points having a deviation value greater than or equal to the deviation threshold may be determined as local exterior points not belonging to the candidate sphere model. The specific value of the deviation threshold may be set according to an actual situation, which is not specifically limited in this embodiment of the present application. Next, the number of the local interior points is recorded and is marked as Num_iAnd calculating the error rate of the candidate sphere model according to the number of the local interior points. The number of the local points is positively correlated with the error rate of the candidate sphere model, that is, the error rate is smaller when the number of the local points is larger, and the error rate is larger when the number of the local points is smaller. For example, the error rate may be taken as the ratio of the number of local points to the total number of sample points in the point cloud collection.

Step S1035 judges whether or not the loop termination condition is satisfied.

An iteration process is completed through steps S1032 to S1034, and after each iteration is completed, it needs to be determined whether a preset loop termination condition is satisfied, where the loop termination condition may be that the error rate is smaller than a preset error threshold or the iteration number is greater than a preset iteration threshold. The specific values of the error threshold and the iteration threshold may be set according to actual conditions, and are not specifically limited in the embodiments of the present application.

And if the loop termination condition is not met, returning to execute the step S1032 and the subsequent steps, namely, randomly selecting a point cloud subset again and performing sphere model fitting, and if the loop termination condition is met, executing the step S1036.

And step S1036, determining the pose of the football according to the candidate ball model with the minimum error rate.

After each iteration is completed, the set of local interior points of the iteration is recorded, and the optimal model parameter with the minimum error rate, that is, the maximum number of local interior points, is updated, the optimal model parameter obtained when the loop is terminated is the estimated value of the final sphere model, and the corresponding set of local interior points is the point cloud data of the football surface, as shown in fig. 8. And finally, the pose information of the whole football can be restored according to the parameters of the sphere model, as shown in fig. 9.

Optionally, in another specific implementation of the embodiment of the present application, instead of using the method described in step S103, a football may be segmented based on a color space threshold method, and the football pose may be determined. Specifically, as shown in fig. 10, a two-dimensional image in the soccer detection frame may be subjected to a grayscale process and a binarization process to obtain a binarization image. Because in general, there is a significant difference in the pixel gray values of the backgrounds such as the football, the grassland, the ground, etc., the gray value at the boundary of the football changes violently, and the background and the football can be distinguished by setting a reasonable threshold. After the binarized image is obtained, the football contour in the binarized image can be segmented according to a maximum contour detection algorithm, and the football pose including the centroid coordinate and the radius is determined according to the football contour.

The above method has the disadvantage that if the texture of the football is very irregular and uneven, the contour detection algorithm will obtain very many contours, so that the HSV color space-based football segmentation method can also be used in the embodiment of the present application. Fig. 11 is a schematic diagram of football segmentation based on HSV color space, where HSV is closer to the perception experience of human color than RGB. The color tone, the brightness degree and the brightness degree of the color are visually expressed, and the color contrast is convenient to carry out. In HSV where H denotes hue, S denotes saturation, and V denotes lightness, each color is in a fixed range in HSV space, for example, green is known as [60,255,255], and all greens are between [45,100,50] and [75,255,255], i.e., [60-15,100,50] to [60+15,255,255], where 15 is an approximation. Before the binarization processing is performed on the two-dimensional image in the football detection frame, the two-dimensional image in the football detection frame can be subjected to color space transformation, the RGB color space is transformed into HSV color space, and the step of segmenting the football after the space transformation is the same as the process, and is not repeated here.

The premise of segmenting the football by the threshold segmentation method is that the colors or gray values of the football and the background are different remarkably, however, the condition cannot be met in an actual scene always, for example, on the ground of a white tile, the light reflection effect of light on the ground is added, so that the white part of the football is integrated with the ground, the outline is detected to be no longer a round area of the football, and the algorithm effect is poor. In this case, it is suitable to segment the soccer ball by using the sphere model fitting method based on the three-dimensional point cloud data described in step S103.

Experimental data show that 60-70 milliseconds are consumed by a football detection part after the MobileNet-SSD network code is accelerated, 0.1-5 milliseconds are consumed by a sphere model fitting part, and the detection frame rate is 13 Hz. And by combining a scheme of detecting and tracking while accelerating by a tracking algorithm, the detection frame rate can be increased to 25Hz, and the requirement of real-time detection is met. In addition, by means of screening of three-dimensional point cloud data sphere model fitting, the false detection condition that the diameters of non-spheres and spheres in a scene do not meet requirements can be eliminated, and the precision ratio, recall rate and accuracy rate of football detection in an actual three-dimensional scene are greatly improved.

In summary, in the embodiment of the application, a two-dimensional image and three-dimensional point cloud data of a target area are acquired through a depth camera of a robot; performing football detection in the two-dimensional image by using a preset deep learning target detection model, and outputting a football detection frame and confidence; and if the confidence coefficient is greater than a preset confidence coefficient threshold value, determining the football pose according to the three-dimensional point cloud data in the football detection frame. According to the method and the device, based on the deep learning target detection technology and the three-dimensional point cloud technology, the advantages of the two-dimensional image and the three-dimensional point cloud data are fully combined, and extremely high accuracy can be achieved even under the condition that a lightweight model is used.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 12 is a structural diagram of an embodiment of a soccer ball detecting device according to an embodiment of the present application, which corresponds to the soccer ball detecting method described in the foregoing embodiments.

In this embodiment, a football detection device may include:

a data acquisition module 1201, configured to acquire a two-dimensional image and three-dimensional point cloud data of a target area through a depth camera of a robot;

a football detection module 1202, configured to perform football detection in the two-dimensional image by using a preset deep learning target detection model, and output a football detection frame and a confidence level;

a pose determining module 1203, configured to determine a football pose according to the three-dimensional point cloud data in the football detection frame if the confidence is greater than a preset confidence threshold.

Further, the pose determination module may include:

Further, the error rate calculation unit may include:

Further, the soccer detection apparatus may further include:

Optionally, the football detection device may further include:

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, modules and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Fig. 13 shows a schematic block diagram of a robot provided in an embodiment of the present application, and only a part related to the embodiment of the present application is shown for convenience of explanation.

As shown in fig. 13, the robot 13 of this embodiment includes: a processor 130, a memory 131 and a computer program 132 stored in the memory 131 and executable on the processor 130. The processor 130 implements the steps in the above-mentioned various embodiments of the football detection method, such as the steps S101 to S103 shown in fig. 1, when executing the computer program 132. Alternatively, the processor 130 implements the functions of the modules/units in the above device embodiments, for example, the functions of the modules 1201 to 1203 shown in fig. 12, when executing the computer program 132.

Illustratively, the computer program 132 may be partitioned into one or more modules/units that are stored in the memory 131 and executed by the processor 130 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 132 in the robot 13.

Those skilled in the art will appreciate that fig. 13 is merely an example of a robot 13, and does not constitute a limitation of the robot 13, and may include more or fewer components than those shown, or some components in combination, or different components, for example, the robot 13 may also include input and output devices, network access devices, buses, etc.

The Processor 130 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 131 may be an internal storage unit of the robot 13, such as a hard disk or a memory of the robot 13. The memory 131 may also be an external storage device of the robot 13, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the robot 13. Further, the memory 131 may also include both an internal storage unit and an external storage device of the robot 13. The memory 131 is used to store the computer program and other programs and data required by the robot 13. The memory 131 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/robot and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/robot are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of detecting a soccer ball, comprising:

2. The football detection method of claim 1, wherein said determining a football pose from three-dimensional point cloud data in the football detection frame comprises:

selecting a point cloud subset from the point cloud set;

3. The method of claim 2, wherein said calculating an error rate for the candidate sphere model from the point cloud collection comprises:

4. The method of claim 1, further comprising, before performing football detection in the two-dimensional image using a preset deep learning target detection model:

5. The method of detecting a soccer ball according to claim 1, further comprising:

and determining the football pose according to the football outline.

6. The soccer ball detection method according to claim 5, further comprising, before binarizing the two-dimensional image in the soccer ball detection frame:

7. The football detection method of any one of claims 1 to 6, characterized in that the deep learning target detection model is a MobileNet-SSD model.

8. A soccer detection device, comprising:

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the football detection method according to one of claims 1 to 7.

10. A robot comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method for detecting a soccer ball according to any one of claims 1 to 7 when executing the computer program.