CN116758461A

CN116758461A - Volleyball match intelligent penalty method, equipment and storage medium based on neural network

Info

Publication number: CN116758461A
Application number: CN202311002609.8A
Authority: CN
Inventors: 常沛炜; 吴星辰; 李友高; 许朝智
Original assignee: Ji Hua Laboratory
Current assignee: Ji Hua Laboratory
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-09-15
Anticipated expiration: 2043-08-10
Also published as: CN116758461B

Abstract

The invention discloses a volleyball match intelligent penalty method, equipment and a storage medium based on a neural network, wherein the method comprises the steps of acquiring a sampling image from a real-time video stream and extracting a target detection area image; calculating a non-zero gradient of the target detection area image based on a Hough gradient method, forming a circular data set and calculating to obtain a target sphere prediction result; normalizing the detection data set to obtain coordinates of a target sphere; and based on the multi-layer convolutional neural network structure, extracting the characteristics of the target detection area image, generating a target human body characteristic diagram, and calculating and outputting a character interaction detection sequence by combining the coordinates of the target sphere. The sampling image is obtained from the real-time video stream, and then the result is obtained in the target detection algorithm, the smaller sphere target detection has better generalization capability, the self-adaptive sampling method improves the real-time performance of detection, and the accuracy of the target human body characteristic image and the target sphere prediction result is improved based on the multi-layer degradation neural network structure.

Description

Volleyball match intelligent penalty method, equipment and storage medium based on neural network

Technical Field

The invention relates to the technical field of artificial intelligence algorithms, in particular to a volleyball match intelligent penalty method, equipment and a storage medium based on a neural network.

Background

In recent years, in significant sporting events, the introduction of intelligent auxiliary penalties to assist judges in making accurate decisions about the event has become more and more important. In the field of football, video assistant referees have been widely applied to various large-level events, and the Katayer world cup held in 2022 incorporates AI into the technology, specifically, semi-automatic penalty is realized in offside penalty and the like, so that fairness of the event is reliably ensured. It is anticipated that intelligent referees will be more widely applied to the field of sporting events in the future.

The current situation and future application scenes of intelligent penalty in the fields of basketball, football and the like are initially discovered, and the feasibility of the intelligent penalty is initially proved. However, at present, artificial intelligence penalty is in a development stage, obvious short plates with delay, lower sampling accuracy and the like still exist on the real-time performance of the penalty, and technical problems and barriers existing in the application of intelligent referees still need to be continuously explored.

It can be seen that there is a need for improvements and improvements in the art.

Disclosure of Invention

In view of the shortcomings of the prior art, the invention aims to provide a neural network-based intelligent penalty method, equipment and storage medium for volleyball match, which are used for solving the problem of low real-time performance and accuracy of intelligent auxiliary penalty in the prior art.

In order to achieve the above purpose, the invention adopts the following technical scheme: an intelligent penalty method for volleyball match based on neural network comprises the following steps: s1, acquiring a sampling image from a real-time video stream, performing edge detection on the sampling image, and extracting a target detection area image; s2, calculating a non-zero gradient of the target detection area image based on a Hough gradient method, and acquiring circle centers and radiuses of all circles in the target detection area to form a circular data set; s3, inputting the circular data set into an ROI (region of interest) of a target detection algorithm, and calculating to obtain a target sphere prediction result; s4, generating a detection data set according to the target sphere prediction result, and normalizing the detection data set to obtain coordinates of a target sphere; s5, based on a multi-layer convolutional neural network structure, extracting features of the target detection area image to generate a target human body feature map; and S6, calculating and outputting a character interaction detection sequence according to the target human body characteristic diagram and the coordinates of the target sphere.

In an embodiment of the present invention, in step S1, a Canny edge detection operator is used to perform edge detection on the sampled image.

In an embodiment of the present invention, in step S3, a previous frame image of the target detection area image is obtained from the real-time video stream as an auxiliary image, and the auxiliary image is combined with the target sphere prediction result to assist in target detection consideration.

In an embodiment of the present invention, in step S3, a sphere feature map is generated according to the target detection algorithm, and the similarity of the target sphere feature map and the confidence of the target sphere are calculated according to the sphere feature map, and the loss function is as follows:

；

wherein A represents the area of each region, w is the width of the region, h is the height of the region, gt is the center point of the region, ρ represents the distance between the center points of the two regions, and d represents the diagonal line in the minimum closure region containing both the target detection region and the real regionThe length, alpha, represents the weight factor of the CIoU position loss function,for measuring the similarity of the aspect ratios of the two regions.

In one embodiment of the present invention, the confidence loss function of the target sphere is:

；

wherein x represents a category corresponding to the target position, and c represents a confidence prediction value of a corresponding certain category; i represents the i-th prior prediction box in the sample, and j represents the j-th group-trunk region.

In an embodiment of the present invention, the method further includes step S51, where the convolution layer performs dimension reduction processing on the target human body feature map; s52, flattening the feature map obtained in the step S51 by using a transducer encoder to form a sequence of length Hf×wf.

In one embodiment of the present invention, in step S52, a position code is input at each attention layer of the transducer, and the position code is expressed as follows:

；

wherein PE represents position encoding, pos is an input position, 2i represents even coordinates, and 2i+1 represents odd coordinates.

In an embodiment of the present invention, in step S6, a KM (Kuhn-Munkres) algorithm is adopted to perform binary matching, which is specifically expressed as follows:

；

where ΘN represents the solution space of the entire bipartite matching problem and L represents the matching loss of the bipartite matching problem.

In order to achieve the above purpose, the present invention also adopts the following technical scheme: an intelligent penalty device for volleyball matches based on a neural network comprises a memory and at least one processor, wherein the memory is stored with computer readable instructions; the at least one processor invokes the computer readable instructions in the memory to perform the steps of the neural network based volleyball match intelligent penalty method described above.

In order to achieve the above purpose, the present invention also adopts the following technical scheme: a computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of a neural network based intelligent penalty method for volleyball events as described above.

As described above, the intelligent penalty method, the intelligent penalty equipment and the storage medium for volleyball match based on the neural network have the following beneficial effects: the method has the advantages that the sampling image is obtained from the real-time video stream, the target detection area image is determined, the target sphere prediction result is obtained in the target detection algorithm, the smaller sphere target detection has better generalization capability on different scales, the real-time performance of detection is improved by the self-adaptive sampling method, the accuracy of the target human body feature map and the target sphere prediction result is improved based on the multi-layer degradation neural network structure, the real-time performance is obviously improved, and the use requirement of the ball event can be completely met.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for intelligent penalty for volleyball match based on neural network;

FIG. 2 is a flowchart of another method of intelligent penalty method for volleyball match based on neural network;

FIG. 3 is a schematic diagram of a network structure of a target detection neural network of the intelligent penalty method for volleyball matches based on the neural network;

FIG. 4 is a schematic diagram of an image processing process of the intelligent penalty method for volleyball match based on the neural network;

fig. 4 (a) is an original input image;

fig. 4 (b) is a preprocessed image;

fig. 4 (c) shows a real-time detection result image;

FIG. 5 is a schematic diagram of a network structure of the intelligent penalty method for volleyball events based on the neural network;

FIG. 6 is another network schematic diagram of the neural network-based intelligent penalty method for volleyball events;

fig. 7 is an image output schematic diagram of character interaction detection of the intelligent penalty method for volleyball match based on the neural network.

Table 1: comparing the target detection algorithm with the result of the volleyball data set in the existing method;

table 2: and comparing the human interaction detection algorithm with the existing method in average accuracy.

Meaning of special name and resolution thereof: AI-Artificial Intelligence-artificial intelligence, FCMNet-Full Communication Memory Net-naming of a neural network structure, optimization of channel information exchange, naming of a PDNet-Polysemy deciphering network-naming of a neural network structure, optimization of parallel processing of point detection and point matching problems in PPDM-Parallel point detection and matching-task interaction detection, optimization of a DETR-Detection Transformer-processing of a target detection problem through a converter, SSD-Single Shot MultiBox Detector-an improved target detection network structure, one-time completion of target classification and positioning, a measurement of overlapping degree of IoU-Intersection over Union-two rectangles, interaction problem between HOI-Human Object Interaction-person and object, and general classification into five-tuple: human detection frame, human confidence, object detection frame, object confidence, interaction confidence, CNN-Convolutional Neural Networks-common metrics including depth feedforward neural network of convolution computation, mAP-mean Average Precision-target detection task.

Detailed Description

The invention provides a volleyball match intelligent penalty method, equipment and a storage medium based on a neural network, and in order to make the purposes, the technical scheme and the effects of the invention clearer and more definite, the invention is further described in detail below by referring to the accompanying drawings and the embodiments.

In the description of the present invention, it should be understood that the azimuth or positional relationship indicated by the terms "up, down, left, right", etc. are based on the azimuth or positional relationship shown in the drawings, and are merely for convenience in describing the present invention and for simplifying the description, and are not to be construed as limiting the present invention; furthermore, the terms "mounted," "connected," and the like, are to be construed broadly and, as appropriate, the specific meaning of the terms in the present invention will be understood by those of ordinary skill in the art.

Referring to fig. 1 to 7, the invention provides an intelligent penalty method for volleyball match based on a neural network, which comprises the following steps: s1, acquiring a sampling image from a real-time video stream, performing edge detection on the sampling image, and extracting a target detection area image; s2, calculating a non-zero gradient of the target detection area image based on a Hough gradient method, and acquiring circle centers and radiuses of all circles in the target detection area to form a circular data set; s3, inputting the circular data set into an ROI (region of interest) of a target detection algorithm, and calculating to obtain a target sphere prediction result; and S4, generating a detection data set according to the target sphere prediction result, and normalizing the detection data set to obtain the coordinates of the target sphere.

In the embodiment, a neural network with a multi-scale architecture is adopted for processing, and the resolution of the picture is improved through the operation of deconvolution, so that the performance of the network on small target detection is improved; compared with the full-connection layer output operation of CNN after all convolution layers, the embodiment directly detects the feature images generated by each convolution layer, convolution can be regarded as the operation of cutting after picture up-sampling, each operation is a one-time scale transformation process, and therefore detection of an original target on multiple scales can be guaranteed. Different from SSD, the residual error module is realized as a convolution layer at the bottom layer of the network, and the residual error module can ensure that the model has better generalization capability on different scales for detecting small targets.

Optionally, in step S1, a Canny edge detection operator is used to perform edge detection on the sampled image.

In this embodiment, in step S3, a previous frame image of the target detection area image is obtained from the real-time video stream as an auxiliary image, and the auxiliary image is combined with the target sphere prediction result to assist in target detection.

Further, in step S3, a sphere feature map is generated according to the target detection algorithm, and the similarity of the target sphere feature map and the confidence of the target sphere are calculated according to the sphere feature map, and the loss function is as follows:

；

wherein A represents the area of each region, and the number of pixel points in the region can be confirmed by the mark of the region; w is the width of the region, h is the height of the region, gt is the center point of the region, ρ represents the distance between the center points of the two regions, d represents the diagonal length in the minimum closure region containing both the target detection region and the real region, a represents the weight factor of the CIoU position loss function,for measuring the similarity of the aspect ratios of the two regions.

In detail, the confidence loss function of the target sphere is:

；

Preferably, for the normalization processing of the confidence of the target sphere, this embodiment corresponds to using softmax as the classifier error, and the following expression exists:

；

where p represents the corresponding category.

The final loss function is obtained by a weighted average of the two:

；

wherein N is the number of target detection frames, and beta is a weight factor between the position error and the confidence error.

The CIoU position loss is used to measure the similarity (including the center point distance, the overlapping area, etc.) between the target detection area and the group-trunk area, and the confidence loss indicates to some extent whether the framed range can contain the confidence of the target object.

Referring to fig. 3, for detecting small objects such as volleyball, the object with larger error is amplified to a certain extent, the selection of the loss function allows the aspect ratio similarity and distance of the two regions to be considered as the consistency of the two regions, and the region is quantized to include the target object.

For a continuous video frame input f= { F1, F2, & gt, fn }, if the predicted position and estimated speed of the previous frame are known, the predicted position and estimated speed can be used as a predicted value of the position of the target sphere of the next frame, and the predicted value is weighted and used as a confidence coefficient item; if only the predicted position is known, the object velocity can be estimated from the position of the target sphere between two frames; if the two are unknown, the estimated speed is determined by combining the position of the next frame after the predicted position of the current frame.

Considering the real-time performance of the algorithm, the embodiment introduces a dynamic sampling method to balance the reduction of the computational complexity and the improvement of the algorithm accuracy as much as possible.

At the start-up phase, a full-scale algorithm is run to obtain the position of the previous frames while predicting the velocity of the target sphere. And setting a sampling interval period gamma to be 1 second at the initial value, performing real-time circle detection only on a non-key frame of the video stream, judging an object meeting the condition according to the position and the speed of the previous frame, and updating correspondingly.

This process requires determining whether the speed has changed dramatically, the following expression exists:

；

where v is the speed of the target sphere and n represents the current frame, i.eRepresenting the speed of the target sphere of the current frame; n-1 represents the last frame, i.e. +.>The rotating speed of the target sphere of the previous frame is represented; when theta exceeds a set threshold value, the target detection needs to be forcedly triggered once, and on the basis, the movement track of the ball can be outlined.

Referring to fig. 4, for the image processing process, the data enhancement methods of rotation, wide translation, high translation, shear transformation, scaling and horizontal flipping are performed sequentially. The input image is preprocessed, scaled to 512 and 512, and then input to the network after being subjected to color normalization according to the average RGB value of the data set; the output of ball target detection is the four coordinates of the ball detection frame, and the corresponding confidence.

Referring to table 1, comparing the improved multi-scale architecture neural network with the results of the R-CNN and SSD networks, respectively, it is found that the overall average accuracy (mAP) index is improved, and the data enhancement is also significantly improved for the network effect, while considering the possible high-frequency motion, deformation, etc. of volleyball in the real-time competition field environment, the data enhancement is necessary for model training; the target detection task only needs to detect volleyball in a limited scene space, and the detection precision is higher than that of full-class detection; in real-time video stream optimization, the sampling interval set in the embodiment is 1s, and trade-off is made between the running performance of practical application and the accuracy of algorithm.

Table 1 target detection algorithm and results of the prior art method on volleyball datasets

In detail, step S5, based on a multi-layer convolutional neural network structure, extracting features of the target detection area image to generate a target human body feature map; and S6, calculating and outputting a character interaction detection sequence according to the target human body characteristic diagram and the coordinates of the target sphere.

When the target human body and the target sphere are predicted, continuous video frames F= { F1, F2, …, ft } are input, the mask layer of each frame of target human body is H= { H1, H2, …, ht }, the mask layer of the key target is M= { M1, M2, …, mt }, the task is to predict the contact state of the target human body and the target sphere, so as to finish the judgment of the human body in the continuous video frames, and if the line is stamped, the net is touched, the ball is touched and other contact states are not touched.

Detecting the touch net, the touch ball and the tread line, and completing the task as human interaction (HOI); the embodiment acquires the context information of the target human body and the target sphere from the video to assist learning; define the target human interaction as (C _human , C _interaction , C _object , P _human , P _object ) Wherein P represents the position corresponding to each corresponding category and C represents the corresponding confidence.

More specifically, the method further comprises S51 after the step S5, and the convolution layer carries out dimension reduction treatment on the target human body characteristic map; in detail, after the input image passes through the CNN, a feature map with high-dimensional hidden semantic information is generated, and the size of the feature map is (Hf, wf, cf). Then reducing the dimension of the input characteristic diagram channel number to df through a 1X 1 convolution layer; through the dimension reduction processing, the parameter quantity can be reduced, and the information interaction between the channels of the image and the nonlinear fitting capability of the network are improved.

Step S52, flattening the feature map obtained in the step S51 by using a transducer encoder, wherein the feature map is (Hf, wf, df) to form a sequence with the length of Hf multiplied by Wf; optionally, in step S52, a position code is input at each attention layer of the transducer, where the position code is expressed as follows:

；

In this embodiment, the decoder side of the transducer includes a multi-headed cross-attention layer. The Key (Key), value (Value), and query (Value) vectors of the transducer are, at the decoder side, a combination of feature serialization vectors containing position codes, the serialization vectors themselves, position codes, and human interaction query vectors, respectively.

Referring to fig. 5 and 6, the output of the transducer decoder is used as the input of the embedded layer of the multi-layer perceptron, which is responsible for decoding the query vector into the character interaction five-tuple; the three-layer sensing machine is used for respectively detecting and outputting confidence degrees of people, interaction categories and object categories, and the three-layer sensing machine is used for identifying a human body target detection frame and an object detection frame. The output layer of the confidence level perceptron adopts softmax classification, the length of the perceptron output vector for outputting the confidence level of the person and the interaction is 2, and the perceptron output vector respectively represents the foreground/background of the person and the occurrence/non-occurrence of the interaction; the length of the output vector of the perceptron for outputting the confidence coefficient of the object category is 4, and the confidence coefficient and the background of the ball, the net and the wire are represented; the length of the output vector of the multi-layer perceptron outputting the person and the target detection frame is 4, and the output vector represents four coordinates of the target detection frame.

In the decoder model training phase, the predicted human interaction HOI tuples need to be matched with the true values after being output.

The invention defines the character interaction tuples output by the decoder model as O=oi, i=1, 2, … and N, and the real character interaction tuples as T=ti, i=1, 2, … and M, wherein the decoder model may misjudge that interaction occurs, and the aggregate lengths of the two can be unequal, so that the real character interaction tuples are filled with N to M elements to make the lengths of the two equal.

On this basis, a mapping function ϵ (O, T) of the sequence index of the decoder model output to the sequence index of the real value can be defined, wherein ϵ (i) the output of the decoder model corresponds to the i-th real value.

The loss function of decoder model matching is as follows:

；

wherein l (ti, o ϵ (i)) represents a specific term matching loss between the two, with

；

Wherein l ^j _class Classification losses representing people, targets and interactions, there areThe method comprises the steps of carrying out a first treatment on the surface of the The general softmax class loss function, l, is used here ^k _box Representing the distance measure between the detection frames, referring to using CIoU position loss functions, a and b are weight super parameters for controlling the weight ratio of classification loss and detection loss.

In detail, the method also comprises a step S7 after the step S6, and a KM (Kuhn-Munkres) algorithm is adopted for binary matching, wherein the specific expression of the KM (Kuhn-Munkres) algorithm is as follows:

；

wherein ϵ represents a binary matching problem to be solved by the KM algorithm, ΘN is a solution space of the whole binary matching problem, and L is a matching loss of the binary matching problem.

Unlike the conventional one-stage method/two-stage method human interaction detection, the present embodiment can directly complete the process from inputting an image to outputting a human interaction detection sequence, and in this embodiment, confidence output and detection frame output are performed simultaneously.

The above method basically performs experiments of human interaction detection problems on the HICO-DET dataset, the V-COCO dataset and the self-built dataset. Average accuracy (mAP) is taken as an evaluation index of the model, which has stronger pertinence to fewer interaction categories in actual volleyball event application. Only if the IoU value between the predicted HOI output human and object detection box and the true value is greater than 0.5 and the model correctly outputs the interaction class (i.e., touch) we consider this as an output sample of True Positives (TP).

Referring to fig. 7, the feature extraction layer of the image uses classical res net processing, followed by channel number dimension reduction. In detail, the method of transfer learning is adopted, and mainly the pre-training parameters of DETR are used to initialize the weight parameters of the encoder and decoder ResNet, transformer.

Referring to table 2, in the target detection link of the interactive detection of the person, in an actual implementation, the output of the multi-scale target detection network is used as a reference of a target detection frame in the multi-layer perceptron, the detection result is compared with the output of the multi-layer perceptron, and the CIoU position loss function is calculated to readjust the confidence level of the ball class detection. Experiments prove that the auxiliary input can improve the accuracy of ball target detection in the multi-layer perceptron.

Table 2 comparison of human interaction detection algorithm with existing methods in average accuracy

In practical application, after the image is subjected to edge detection processing, hough straight line detection is applied to identify boundary lines in the court. The set of lines y= { Y1, Y2,..and yN }, yi= kix +bi, is found in the image. After the target detection frame of the ball object is obtained, the moment when the speed of the ball is suddenly changed in the vertical direction is detected in real time, and meanwhile, the judgment of whether the following 5 frames contain people and the object is waited. If so, skipping the judgment; otherwise, the ball is considered to be in contact with the ground. At this time, the bottom coordinates pb= (xb, yb) of the target detection frame of the ball, that is, the contact point, are obtained, and then, according to the same-side method, it is judged whether the point is contained by the boundary line. If not, the ball is considered to have gone out of bounds.

In summary, the intelligent penalty method, the intelligent penalty device and the intelligent penalty storage medium for volleyball match based on the neural network acquire the sampling image from the real-time video stream, determine the target detection area image, further obtain the target sphere prediction result in the target detection algorithm, enable the smaller sphere target detection to have better generalization capability on different scales, improve the real-time performance of detection by the self-adaptive sampling method, improve the accuracy of the target human body feature map and the target sphere prediction result based on the multi-layer degraded neural network structure, obviously improve the real-time performance level, and completely meet the use requirement of the ball game. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

It will be understood that equivalents and modifications will occur to those skilled in the art based on the present invention and its spirit, and all such modifications and substitutions are intended to be included within the scope of the present invention.

Claims

1. The intelligent penalty method for volleyball match based on the neural network is characterized by comprising the following steps:

s1, acquiring a sampling image from a real-time video stream, performing edge detection on the sampling image, and extracting a target detection area image;

s2, calculating a non-zero gradient of the target detection area image based on a Hough gradient method, and acquiring circle centers and radiuses of all circles in the target detection area to form a circular data set;

s3, inputting the circular data set into an ROI (region of interest) of a target detection algorithm, and calculating to obtain a target sphere prediction result;

s4, generating a detection data set according to the target sphere prediction result, and normalizing the detection data set to obtain coordinates of a target sphere;

s5, based on a multi-layer convolutional neural network structure, extracting features of the target detection area image to generate a target human body feature map;

and S6, calculating and outputting a character interaction detection sequence according to the target human body characteristic diagram and the coordinates of the target sphere.

2. The intelligent penalty method for volleyball matches based on neural network according to claim 1, wherein in step S1, the sampling image is edge-detected by using a Canny edge-detection operator.

3. The intelligent penalty method for volleyball match based on neural network according to claim 1, wherein said step S3 further comprises the steps of: generating a sphere feature map according to the target detection algorithm, and calculating the similarity of the target sphere feature map and the confidence coefficient of the target sphere according to the sphere feature map, wherein the loss function of the CIoU position for calculating the similarity of the feature map is as follows:

；

wherein A represents the area of each region, gt is the center point of the region, ρ represents the distance between the center points of the two regions, d represents the diagonal length in the minimum closure region containing both the target detection region and the real region, α represents the weight factor of the CIoU position loss function,the method is used for measuring the similarity of the aspect ratio of two areas, w is the width of the area, and h is the height of the area.

4. The neural network-based intelligent penalty method for volleyball events of claim 3, wherein the loss function for calculating the confidence level of the target sphere is:

；

5. The intelligent penalty method for volleyball match based on neural network according to claim 1, wherein the step S5 further comprises the steps of: s51, performing dimension reduction processing on the target human body feature map by the convolution layer to obtain a dimension reduction feature map; s52, flattening the dimension reduction feature map by using a transducer encoder to form a sequence with length of Hf×wf.

6. The intelligent penalty method for volleyball games based on neural network according to claim 5, wherein in step S52, a position code is input at each attention layer of the transducer, and the position code is expressed as follows:

；

7. The intelligent penalty method for volleyball match based on neural network according to claim 1, wherein said step S6 further comprises the steps of: s7, performing binary matching by adopting a KM (Kuhn-Munkres) algorithm, wherein the specific expression of the KM (Kuhn-Munkres) algorithm is as follows:

；

8. An intelligent penalty device for volleyball matches based on a neural network is characterized by comprising a memory and at least one processor, wherein the memory stores computer readable instructions; the at least one processor invoking the computer readable instructions in the memory to perform the steps of the neural network based intelligent penalty method for volleyball events of any one of claims 1-7.

9. A computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the neural network based intelligent penalty method of a volleyball game of any one of claims 1-7.