WO2024041319A1 - Basketball shot recognition method and apparatus, device and storage medium - Google Patents

Basketball shot recognition method and apparatus, device and storage medium Download PDF

Info

Publication number
WO2024041319A1
WO2024041319A1 PCT/CN2023/110320 CN2023110320W WO2024041319A1 WO 2024041319 A1 WO2024041319 A1 WO 2024041319A1 CN 2023110320 W CN2023110320 W CN 2023110320W WO 2024041319 A1 WO2024041319 A1 WO 2024041319A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
self
feature
classification
shot
Prior art date
Application number
PCT/CN2023/110320
Other languages
French (fr)
Chinese (zh)
Inventor
王杰
孔繁昊
Original Assignee
京东方科技集团股份有限公司
成都京东方智慧科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司, 成都京东方智慧科技有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2024041319A1 publication Critical patent/WO2024041319A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to the field of computer application technology, and in particular to a shot recognition method, device, equipment and storage medium.
  • sensors can be installed on the backboard, and the sensors can automatically determine whether the shot is scored.
  • the deployment cost of this method is relatively high, and a low-cost shot recognition method is urgently needed.
  • the present invention provides a shot recognition method, device, equipment and storage medium to solve the deficiencies in related technologies.
  • a shot recognition method including:
  • the image sequence of the backboard area to be identified is input into the shot classification network, and the shot recognition result output by the shot classification network is obtained; wherein the shot recognition result is used to represent whether the shot is successful.
  • the shot classification network is used to:
  • a feature map is extracted for each backboard area image in the input backboard area image sequence to obtain a feature map sequence; classification features are extracted based on the self-attention mechanism for the feature map sequence; and the shot recognition result is determined based on the classification features.
  • extracting classification features based on a self-attention mechanism includes:
  • a position code is added; wherein the position code includes: information characterizing the spatial position relationship between different feature points of each feature map in the feature map sequence, and characterizing the different feature points in the feature map sequence. Information about the temporal position relationship between feature maps;
  • features are extracted from the spatial dimension and the temporal dimension to obtain classification features.
  • adding position coding to the feature map sequence includes:
  • position encoding is added.
  • the shot classification network includes N cascaded preset self-attention modules; N ⁇ 2; for the i-th preset self-attention module, 1 ⁇ i ⁇ N-1, its output is cascaded to The input of the i+1 preset self-attention module; the preset self-attention module is used to extract features from the spatial dimension and the temporal dimension based on the self-attention mechanism;
  • the feature map sequence after adding position encoding is based on the self-attention mechanism to extract features from the spatial dimension and the temporal dimension, including:
  • the feature map sequence after position coding is input into the first preset self-attention module, and the classification features are determined based on the output of the Nth preset self-attention module.
  • the preset self-attention module is used to:
  • For the input features extract features at least once in series from the spatial dimension based on the self-attention mechanism, and further extract features at least once from the time dimension based on the self-attention mechanism in series for the extracted features, and output the extracted features;
  • features are serially extracted at least once from the time dimension based on the self-attention mechanism. Further, for the extracted features, features are serially extracted at least once from the spatial dimension based on the self-attention mechanism, and the extracted features are output.
  • extracting classification features based on a self-attention mechanism includes:
  • features are extracted based on the self-attention mechanism
  • the current representation corresponding to the initial classification feature is determined as the classification feature.
  • determining the shot recognition result based on the classification features includes:
  • the features to be input are input into the pre-trained fully connected network, and the shot recognition result output by the fully connected network is obtained.
  • extracting classification features based on a self-attention mechanism includes:
  • the first m feature maps are determined as the feature map subsequence contained in the current sliding window; m ⁇ 1;
  • obtaining the image sequence of the backboard area to be identified includes:
  • the backboard area image sequence to be identified For each determined backboard area, crop the image content containing the backboard area, adjust the cropping result to a preset image size, and add the adjustment result to the backboard area image sequence to be identified; the backboard area image sequence to be identified , sorted by the temporal order between the video frames where the backboard area images are located.
  • a shot recognition device including:
  • a backboard identification unit configured to: obtain a video to be identified; determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified;
  • a classification network unit configured to: input the to-be-identified backboard area image sequence into a shot classification network, and obtain a shot recognition result output by the shot classification network; wherein the shot recognition result is used to characterize whether the shot is successful.
  • Figure 1 is a schematic flow chart of a shot recognition method according to an embodiment of the present invention
  • Figure 2 is a schematic structural diagram of a shot classification network according to an embodiment of the present invention.
  • Figure 3 is a schematic diagram of the principle of a shot classification network according to an embodiment of the present invention.
  • Figure 4 is a schematic structural diagram of a shot recognition device according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of the hardware structure of a computer device configured to configure a method according to an embodiment of the present invention.
  • sensors can be installed on the backboard, and the sensors can automatically determine whether the shot is scored.
  • the deployment cost of this method is relatively high, and a low-cost shot recognition method is urgently needed.
  • the present invention provides a shot recognition method.
  • videos taken during basketball games can be identified to determine whether the shot is a goal.
  • shooting video for shot recognition does not require pre-deployment of hardware, which can reduce deployment costs.
  • the image content including the backboard area is often closely related to whether the basketball is thrown into the basket.
  • Other image content outside the backboard area has less to do with whether the basketball is thrown into the basket.
  • the continuous image content containing the backboard area can be used to determine whether the basket in the backboard area is vibrating, whether the mesh pocket under the basket in the backboard area is shaking, the basketball movement trajectory in the backboard area, etc., which can be used to determine Whether the shot was successful or not.
  • the rebound area in the captured video can be determined and combined into an image sequence for shot recognition.
  • the amount of data that needs to be recognized can be reduced without reducing the accuracy of shot recognition, the efficiency of shot recognition can be improved, and the calculation cost of shot recognition can also be reduced.
  • the specific identification method can be identified using deep learning.
  • the above method can perform shot recognition based on video without pre-deployment of sensors, reducing deployment costs. And the shot recognition can be performed by determining the rebound area in the video, which reduces redundant data and improves the efficiency of shot recognition.
  • a shot recognition method provided by an embodiment of the present invention will be explained in detail below.
  • Figure 1 is a schematic flow chart of a shot recognition method according to an embodiment of the present invention.
  • the embodiments of the present invention do not limit the execution subject of the method flow.
  • the execution subject can be a mobile device or a server.
  • the method may include the following steps.
  • S102 Determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified.
  • S103 Input the image sequence of the backboard area to be identified into the shot classification network, and obtain the shot recognition result output by the shot classification network.
  • the shot classification network can be used to predict the shot recognition results for the input backboard area image sequence.
  • the shot recognition result can specifically be used to characterize whether the shot is successful.
  • the shot classification network can be trained in advance based on the shot samples whose sample characteristics are the backboard area image sequence and the corresponding shot labels; the shot labels are used to characterize whether the shot is successful.
  • the results of shot recognition can be used to calculate the score of basketball games.
  • the above method process can be used to identify shots based on the video to be identified, without pre-deployment of hardware such as sensors, and directly use software algorithms to identify shots. It only requires a device that can shoot videos. Specifically, it can be a camera of a handheld device, or it can be Surveillance cameras on basketball courts, etc., reduce deployment costs.
  • shot recognition can also be performed by determining the rebound area in the video to be recognized, reducing redundant data, thereby improving the efficiency of shot recognition.
  • Shot recognition through deep learning can improve the efficiency and accuracy of shot recognition.
  • S101 Obtain the video to be identified.
  • This method process does not limit the specific method of obtaining the video to be identified.
  • the video to be identified may specifically be a surveillance video of a basketball game, or a video of a basketball game captured by a camera of a handheld device.
  • the video to be identified can be a video clip of a basketball game, so as to identify whether the shot in the clip was successful.
  • the video to be recognized can be a clip of a shot in a basketball game, so that it can be directly and quickly identified whether the shot is successful.
  • the video to be recognized can be a 2-4 second shooting video clip.
  • the video to be identified can be a complete video of a basketball game, so that the number of successful shots in the basketball game can be easily identified, and the video location of successful shots can be conveniently located.
  • the specific method can be found in the explanation below.
  • This method process does not limit the shooting method of the video to be identified.
  • the shooting can be done with a handheld device or with a fixed angle of view.
  • the basketball game can be shot with a surveillance camera, or the basketball match can be shot with a fixed angle of view of the mobile phone.
  • a fixed angle of view can be used to shoot the backboard area.
  • S102 Determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified.
  • This method process does not limit the way to determine the backboard area.
  • target detection can be used to identify the rebound area for each video frame in the video to be identified.
  • a pre-trained backboard area detection model can be used to identify each video frame in the video to be recognized and determine the backboard area.
  • the backboard area detection model can be trained using image samples and corresponding backboard area position labels.
  • the backboard area detection model may specifically use a target detection network trained based on YOLOX or RetinaNet.
  • video frames determined to contain the backboard area can be added to the backboard area image sequence to be identified.
  • redundant data can be further reduced, and video frames containing the backboard area can be cropped to obtain image content containing the backboard area, and added to the image sequence of the backboard area to be identified.
  • This embodiment does not limit the specific cropping method, as long as the cropping result includes the determined backboard area image.
  • the target containing the basket in order to facilitate the use of image data near the backboard area for shot recognition, you can target the target containing the basket.
  • the size of the cropping result is larger than the determined size of the backboard area, so that the cropping result can include image data near the backboard area.
  • the position of the rectangular frame in the video frame can be determined, so that in the continuous video frames, the determined rebound area is included in the position of the rectangular frame. Then you can crop according to the position of the rectangular frame.
  • the position of the rectangular frame can be expanded and then cropped, so that the image content near the basket area can be cropped.
  • the minimum outer bounding box of the backboard area detected at the same position between these video frames can be determined; after expanding the outer bounding box by 1.5 times, the corresponding image is cropped from the corresponding position of the image.
  • the cropping results can be adjusted and different cropping results can be adjusted to the same size. Specifically, it can be to adjust different cropping results to the same resolution.
  • obtaining the image sequence of the backboard area to be identified may include: for each backboard area determined, crop the image content containing the backboard area, adjust the cropping result to the preset image size, and add the adjustment result into the image sequence of the backboard area to be identified.
  • different backboard areas in the backboard area image sequence to be identified may have the same image size.
  • Obtaining the image sequence of the backboard area to be identified may include: for the determined different backboard areas, crop the image content containing the different backboard areas and adjust them to the same image size, and add the adjustment results to the image sequence of the backboard area to be identified. .
  • the backboard area images in the sequence of backboard area images to be identified may have a sequence, so that shot recognition can be facilitated based on the sequence.
  • sequence of backboard area images to be identified may be sorted in temporal order between video frames where the backboard area images are located.
  • the backboard area determined for the video to be identified may be the same backboard area captured in the video to be identified.
  • the backboard area in the video to be identified can be determined through a backboard area detection model, and then the same backboard area captured in the video to be identified can be determined using a tracking algorithm. Specifically, it may be the same backboard area contained between different video frames in the video to be identified.
  • one or more rebound areas may be determined for the video to be identified.
  • the video to be identified can be fixed to capture the same backboard area, and then the shooting situation in the backboard area can be easily determined through the shot recognition method.
  • the video to be identified can capture multiple different backboard areas, and then the shooting conditions of each backboard area can be determined through the shot recognition method for different backboard areas.
  • an image sequence of the backboard area to be identified can be constructed separately.
  • the method may be to crop the image content containing the backboard area, adjust the cropping result to a preset image size, and add the adjustment result to the image sequence of the backboard area to be identified.
  • the determined different backboard areas can correspond to different backboard area image sequences to be identified, and subsequent shot classification networks can be used for shot recognition respectively.
  • each video frame in the video to be identified can be detected whether it contains a backboard area, and then different video frames containing the same backboard area can be determined for the video frame containing the backboard area, and then the video to be identified can be determined.
  • the area of one or more backboards being photographed.
  • different video frames containing the same rebound area can be determined through a tracking algorithm.
  • this method process does not limit the method of obtaining the backboard area image sequence.
  • an image sequence of the backboard area to be identified can be obtained.
  • the image content containing the backboard area can be cropped, and then the image of the backboard area to be identified corresponding to the backboard area can be obtained.
  • sequence Specifically, the cropped image content can be adjusted to a preset size, and then the adjustment result is added to the image sequence of the backboard area to be identified.
  • an image sequence of the backboard area to be identified can be obtained respectively.
  • the image content containing the backboard area can be cropped out from one or more video frames containing the backboard area, and then the image of the backboard area to be identified corresponding to the backboard area can be obtained. sequence.
  • the cropped image content can be adjusted to a preset size, and then the adjustment result is added to the image sequence of the backboard area to be identified.
  • S103 Input the image sequence of the backboard area to be identified into the shot classification network, and obtain the shot recognition result output by the shot classification network.
  • the shot classification network can be used to predict the shot recognition results for the input backboard area image sequence.
  • the shot recognition result can specifically be used to characterize whether the shot is successful.
  • the shot classification network can be trained in advance based on the shot samples whose sample characteristics are the backboard area image sequence and the corresponding shot labels; the shot labels can be used to characterize whether the shot is successful.
  • the results of shot recognition can be used to calculate the score of basketball games.
  • the process of this method does not specifically limit the shot classification network's processing of the image sequence of the backboard area to be identified.
  • the shot classification network can perform image recognition for each backboard area in the image sequence of the backboard area to be identified, and identify whether the shot is successful.
  • the shot classification network can perform image recognition by integrating at least two consecutive backboard areas in the image sequence of the backboard area to be identified, and identify whether the shot is successful.
  • the image recognition may be performed by integrating 8 or more consecutive frames of backboard area images.
  • the shot classification network can use a self-attention mechanism to extract features for shot recognition.
  • the association between different backboard area images in at least two consecutive backboard area images can be learned.
  • As a feature for shot recognition it can better distinguish between successful shots and failed shots, thereby improving Shot recognition accuracy.
  • the shot classification network can be used to: extract classification features based on the self-attention mechanism for the input backboard area image sequence; and determine the shot recognition result based on the classification features.
  • the classification features may be features used to predict shot recognition results.
  • the process of this method does not limit the specific way to extract features based on the self-attention mechanism.
  • the feature map in order to facilitate the extraction of classification features based on the self-attention mechanism, can be first extracted for the backboard area image in the backboard area image sequence, and the features of the feature map can be extracted based on the self-attention mechanism.
  • extracting classification features based on the self-attention mechanism for the input backboard area image sequence may include: extracting feature maps for each backboard area image in the input backboard area image sequence to obtain a feature map sequence; Sequence,classification features are extracted based on self-attention,mechanism.
  • the order of the feature maps in the feature map sequence can be the same as the order of the corresponding backboard area images in the backboard area image sequence.
  • the order of the feature maps in the feature map sequence can be the same as the temporal order between the video frames where the corresponding backboard area images are located.
  • features can also be extracted directly from the backboard area image sequence based on the self-attention mechanism.
  • This embodiment does not limit the method of extracting the feature map of the backboard area image.
  • feature maps can be extracted using pre-trained convolutional networks.
  • the extracted feature map may include at least one of the following information: detail information, edge information, noise information, spatial relationship information, etc. This embodiment is not specifically limited.
  • extracting a feature map for each backboard area image in the input backboard area image sequence may include: extracting a feature map based on a pre-trained convolutional network for each backboard area image in the input backboard area image sequence. Feature map.
  • This embodiment does not limit the structure of the convolutional network. Specifically, it can be a two-dimensional convolutional network, which has the advantages of high performance and fast speed in image feature extraction.
  • the shot classification network can extract features from the time dimension based on the self-attention mechanism.
  • the backboard area image sequence to be identified since the backboard area image is determined from the video frame, and the video frames have a temporal sequence relationship in the video to be identified, therefore, there are different images in the backboard area image sequence to be identified. Between images of the same backboard area, the temporal sequence relationship, that is, the temporal position relationship, can be determined, so that features can be extracted based on the self-attention mechanism in the time dimension and the association between images in different backboard areas can be learned in the time dimension.
  • the position encoding may include information characterizing the temporal positional relationship between different backboard area images in the backboard area image sequence to be identified.
  • the temporal position relationship between different backboard area images can be the same as the temporal position relationship between video frames where the backboard area image is located.
  • the shot classification network can be used to: add the above position coding to the backboard area image sequence to be identified, and determine the temporal position relationship between different backboard area images; for the backboard area image to be identified after adding position coding Sequence, based on the self-attention mechanism, extracts features from the time dimension to obtain classification features.
  • the shot classification network can extract feature maps for the backboard area image in advance to obtain a feature map sequence
  • the shot classification network can be used to: add position coding to the feature map sequence; position coding includes characterizing different features in the feature map sequence Information about the temporal position relationship between maps; among them, the temporal position relationship between different feature maps can be the same as the temporal position relationship between the video frames where the feature map corresponds to the backboard area image, that is, the timing between the feature maps corresponding to the video frames Positional relationship.
  • the shot classification network can be used to extract features from the time dimension based on the self-attention mechanism for the feature map sequence after adding position encoding to obtain classification features.
  • features can be extracted from the time dimension based on the self-attention mechanism to improve the feature extraction effect of the shot classification network and improve the recognition accuracy of the shot classification network.
  • the position coding may include information that characterizes the temporal positional relationship between different backboard area images in the backboard area image sequence to be identified, and may specifically include the timestamp of the video frame where the backboard area image is located in the video to be identified.
  • information characterizing the temporal positional relationship between video frames where the backboard area images are located can be added as position coding. , so that the temporal position relationship between the video frames where the backboard area image is located can be used to improve the feature extraction effect of the shot classification network and improve the recognition accuracy of the shot classification network.
  • the shot classification network can extract features from the spatial dimension based on the self-attention mechanism.
  • a spatial position relationship between different pixels in a single backboard area image which can be characterized by two-dimensional spatial coordinates.
  • features can be extracted based on the self-attention mechanism in the spatial dimension, and the association between different image contents in a single backboard area image can be learned.
  • the position encoding may include information characterizing the spatial positional relationship between image content at different positions in each backboard area image in the backboard area image sequence to be identified.
  • it may include information characterizing the spatial positional relationship between different pixels in each backboard area image in the backboard area image sequence to be identified.
  • the shot classification network can be used to: add the above position coding to the backboard area image sequence to be identified, determine the spatial position relationship between the image content at different positions in each backboard area image; add position coding for After the image sequence of the backboard area to be identified, based on the self-attention mechanism, features are extracted from the spatial dimension to obtain classification features.
  • the shot classification network can extract feature maps for the backboard area image in advance to obtain a feature map sequence
  • the shot classification network can be used to: add position coding to the feature map sequence; position coding includes characterizing each feature map sequence Information about the spatial position relationship between different feature points in the feature map; for the feature map sequence after adding position encoding, based on the self-attention mechanism, features are extracted from the spatial dimension to obtain classification features.
  • features can be extracted from the spatial dimension based on the self-attention mechanism to improve the feature extraction effect of the shot classification network and improve the recognition accuracy of the shot classification network.
  • the shot classification network can extract features from the spatial dimension and the temporal dimension based on the self-attention mechanism.
  • the feature extraction effect of the shot classification network can be improved, and the recognition accuracy of the shot classification network can be improved.
  • the position coding may include information characterizing the spatial positional relationship between image contents at different positions in each backboard area image in the backboard area image sequence to be identified, and information characterizing the spatial positional relationship between different backboard area images in the backboard area image sequence to be identified. Information about temporal and spatial relationships.
  • the temporal position relationship between different backboard area images can be the same as the temporal position relationship between video frames where the backboard area image is located.
  • the shot classification network can be used to: add the above position coding to the image sequence of the backboard area to be identified; for the image sequence of the backboard area to be identified after adding position coding, based on the self-attention mechanism, from the spatial dimension and Features are extracted in the time dimension to obtain classification features.
  • This embodiment does not limit the order and times of extracting features from the spatial dimension and extracting features from the time dimension.
  • the shot classification network can extract feature maps for the backboard area image in advance to obtain a feature map sequence
  • the shot classification network can be used to: add position coding to the feature map sequence;
  • the position coding includes: characterizing the feature map sequence Information on the spatial position relationship between different feature points of each feature map, and information representing the temporal position relationship between different feature maps in the feature map sequence; for the feature map sequence after adding position encoding, based on the self-attention mechanism, from space Dimension and time dimensions extract features to obtain classification features.
  • extracting classification features based on the self-attention mechanism may include: adding position coding to the feature map sequence; position coding includes: characterizing the relationship between different feature points of each feature map in the feature map sequence. Information about the spatial position relationship, and information that characterizes the temporal position relationship between different feature maps in the feature map sequence; for the feature map sequence after adding position coding, based on the self-attention mechanism, features are extracted from the spatial dimension and the time dimension to obtain the classification feature.
  • the feature map can be integrated into an overall feature.
  • This overall feature can have time dimension and space dimension, so that it can be conveniently based on automatic feature transformation through simple feature conversion.
  • the attention mechanism extracts features from the temporal and spatial dimensions.
  • each feature map in the feature map sequence can be converted into a one-dimensional feature; the one-dimensional features can be stacked and converted to obtain a two-dimensional feature.
  • the two-dimensional feature is the integrated overall feature.
  • This embodiment does not limit the method of converting feature maps into one-dimensional features.
  • converting the feature map into a one-dimensional feature can be by adding all feature points in the feature map to the one-dimensional feature.
  • a one-dimensional feature of length c can be obtained through conversion.
  • c a*b.
  • This embodiment does not limit the way of stacking one-dimensional features.
  • one-dimensional features can be stacked according to the temporal relationship between corresponding feature maps to obtain two-dimensional features.
  • a two-dimensional feature of a*b can be obtained by stacking.
  • the feature map sequence as a whole can be regarded as an overall feature and then adjusted.
  • the feature map sequence includes n feature maps of size a*b, and the entire feature map sequence can be regarded as a three-dimensional feature of a*b*n.
  • the feature map of each a*b can be converted into one-dimensional features of 1*c, that is, the three-dimensional features of a*b*n can be converted into the two-dimensional features of c*n.
  • c a*b.
  • add position coding to the feature map sequence which may include: converting each feature map in the feature map sequence into one-dimensional features; performing stack conversion processing on the one-dimensional features to obtain two-dimensional features; in order to obtain Two-dimensional features, add position encoding.
  • it can be to stack the one-dimensional features converted by each feature map in the feature map sequence to obtain two-dimensional features.
  • adding position coding to the two-dimensional features can be to add information representing the spatial position relationship to each feature point for each one-dimensional feature stacked in the two-dimensional feature.
  • the feature points can be included in the feature map.
  • coordinate information; for each one-dimensional feature as a whole, information representing the temporal position relationship can also be added, which may specifically include timestamps.
  • the spatial dimension and time dimension feature extraction can include: extracting features from the spatial dimension and the time dimension based on the self-attention mechanism for the two-dimensional features after adding position encoding.
  • the sequence in the spatial dimension or the sequence in the time dimension can be obtained by transposing, so as to facilitate the self-attention based on the time dimension and the spatial dimension respectively. Force mechanism extraction features.
  • a two-dimensional feature of a*b can be obtained by stacking. Two-dimensional features after adding position encoding. Between a one-dimensional features, there is a temporal position relationship; among the one-dimensional features of length b, there is a spatial position relationship.
  • features can be extracted from the time dimension based on the self-attention mechanism.
  • the two-dimensional features of a*b can be transposed to obtain the two-dimensional features of b*a, so that features can be extracted from the spatial dimension based on the self-attention mechanism.
  • the sizes are the same between the input features and output features due to the self-attention mechanism. Therefore, features can be extracted serially from the spatial dimension based on the self-attention mechanism, and then the extracted features can be transposed, and then features can be serially extracted from the temporal dimension based on the self-attention mechanism.
  • a two-dimensional feature of a*b can be obtained by stacking. Two-dimensional features after adding position encoding. Between a one-dimensional features, there is a temporal position relationship; among the one-dimensional features of length b, there is a spatial position relationship.
  • features can be extracted from the time dimension based on the self-attention mechanism to obtain the first feature of a*b. Then the first feature can be transposed to obtain the second feature of b*a. For the second feature, features can be extracted from the spatial dimension based on the self-attention mechanism.
  • the process of this method does not limit the order between extracting features based on the self-attention mechanism from the spatial dimension and extracting features based on the self-attention mechanism from the temporal dimension.
  • features can be extracted from the spatial dimension based on the self-attention mechanism in parallel, and features can be extracted from the temporal dimension based on the self-attention mechanism, and then the extracted features can be synthesized to obtain classification features.
  • This embodiment does not limit the method of integrating features.
  • it can be a splicing feature, specifically it can be The features extracted from the spatial dimension based on the self-attention mechanism and the features extracted from the temporal dimension based on the self-attention mechanism are spliced into one feature as a classification feature.
  • features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and features can be serially extracted at least once from the temporal dimension based on the self-attention mechanism, and then the extracted features can be synthesized to obtain classification features.
  • serially extract features from the spatial dimension at least once based on the self-attention mechanism which may include: determining the two-dimensional feature with added position encoding as the current feature; performing the following steps in a loop until the preset number of cycles: for the current feature from The spatial dimension extracts features based on the self-attention mechanism and determines the extracted features as the current features.
  • the preset number of times may be at least once.
  • serially extracting features at least once from the time dimension based on the self-attention mechanism may include: determining the two-dimensional feature with added position encoding as the current feature; looping through the following steps until the preset number of cycles: for the current feature from The time dimension extracts features based on the self-attention mechanism and determines the extracted features as the current features.
  • the preset number of times may be at least once.
  • features can be extracted serially from the spatial dimension and the temporal dimension based on the self-attention mechanism.
  • This embodiment does not limit the order and times between extracting features from the spatial dimension based on the self-attention mechanism and extracting features from the temporal dimension based on the self-attention mechanism.
  • multiple features can be extracted serially from the spatial dimension based on the self-attention mechanism, multiple features can be extracted serially based on the self-attention mechanism from the time dimension, or multiple features can be extracted from the spatial dimension and the time dimension crosswise, based on the self-attention mechanism.
  • the attention mechanism extracts multiple features serially.
  • features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, features can be serially extracted at least once from the time dimension based on the self-attention mechanism.
  • features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, features can be serially extracted at least once from the time dimension based on the self-attention mechanism. ; Then for the extracted features, extract features at least once in series from the spatial dimension based on the self-attention mechanism, and further extract features at least once in series from the time dimension based on the self-attention mechanism based on the extracted features.
  • a preset number of feature extractions can be performed serially for the feature map sequence with added position encoding. step.
  • the feature extraction step may include: for the input features, serially extracting features at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, serially extracting the features from the time dimension based on the self-attention mechanism. Feature at least once.
  • the feature extraction step may include: for the input features, serially extract features at least once from the time dimension based on the self-attention mechanism, and further for the extracted features, serially extract at least once from the spatial dimension based on the self-attention mechanism. feature.
  • any feature extraction step can be input into the subsequent feature extraction step.
  • different feature extraction steps executed serially may be different from each other.
  • the weight of the self-attention mechanism may be different, or the number or order of extracted features may be different.
  • the feature map sequence with added position coding can be determined as the current feature; the following steps are executed in a loop until the preset loop stop condition is met: for the current feature, serially extract features at least once from the spatial dimension based on the self-attention mechanism, Further, for the extracted features, features are extracted serially at least once from the time dimension based on the self-attention mechanism; the extracted features are determined as current features.
  • This embodiment does not limit specific preset loop stop conditions.
  • the preset loop stop condition may include at least one of the following: the loop reaches a preset number of times, the total number of feature extraction times reaches a preset number of times, the time taken to extract features reaches a preset length of time, etc.
  • the methods of serially extracting features from the spatial dimension based on the self-attention mechanism can be different. Specifically, it can be that the weight of the self-attention mechanism is different, or the number of serial extractions is different.
  • the loop is stopped.
  • features are serially extracted at least once from the spatial dimension based on the self-attention mechanism and then the loop is stopped.
  • the number of times of feature extraction can be different in different loop processes.
  • the number of times to extract features is not specifically limited.
  • features can be serially extracted three times from the spatial dimension based on the self-attention mechanism for the current feature, and further features can be serially extracted twice from the time dimension based on the self-attention mechanism for the extracted features.
  • features can be serially extracted three times from the spatial dimension based on the self-attention mechanism for the current feature, and further features can be serially extracted twice from the time dimension based on the self-attention mechanism for the extracted features.
  • the second cycle for the current feature, one feature can be extracted serially from the spatial dimension based on the self-attention mechanism, and further for the extracted features, five features can be serially extracted from the time dimension based on the self-attention mechanism.
  • the feature map sequence with added position coding can be determined as the current feature; the following steps are executed in a loop until the preset loop stop condition is met: for the current feature, extract features at least once in series from the time dimension based on the self-attention mechanism, Further, for the extracted features, features are extracted serially at least once from the spatial dimension based on the self-attention mechanism; the extracted features are determined as current features.
  • features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and the extracted features can be further transposed.
  • features are extracted serially at least once from the time dimension based on the self-attention mechanism.
  • the extracted features are further transposed, and for the transposed features, features are serially extracted at least once from the spatial dimension based on the self-attention mechanism.
  • a preset self-attention module in the shot classification network, can be used to implement the step of extracting features based on the self-attention mechanism.
  • the preset self-attention module can be used to extract features from the spatial dimension and/or the temporal dimension and output them based on the self-attention mechanism for the input features.
  • one or more preset self-attention modules may be included in the shot classification network.
  • the shot classification network needs to extract features from the spatial dimension based on the self-attention mechanism
  • One or more preset self-attention modules are preset.
  • extracting features from the spatial dimension based on the self-attention mechanism may include: inputting the feature map sequence after adding position coding into one or more preset self-attention modules. , get the output features.
  • N cascaded preset self-attention modules N ⁇ 2
  • i-th preset self-attention module 1 ⁇ i ⁇ N-1
  • its output can be cascaded to the i+th 1 preset input to the self-attention module.
  • the shot classification network needs to extract features from the time dimension based on the self-attention mechanism
  • One or more preset self-attention modules are preset.
  • the shot classification network needs to extract features from the spatial and temporal dimensions based on the self-attention mechanism
  • preset self-attention modules can include one or more preset self-attention modules that are used to serially extract features from the spatial dimension at least once and output them based on the self-attention mechanism based on the input features.
  • the shot classification network may include N cascaded preset self-attention modules; N ⁇ 2; for the i-th preset self-attention module, 1 ⁇ i ⁇ N-1, Its output is cascaded to the input of the i+1th preset self-attention module.
  • the preset self-attention module can be used to extract features from the spatial and temporal dimensions based on the self-attention mechanism.
  • extracting features from the spatial dimension and the time dimension based on the self-attention mechanism may include: inputting the feature map sequence after adding position coding into the first preset
  • the self-attention module determines the classification features based on the output of the Nth preset self-attention module.
  • This embodiment is not limited to the order and number of times in the preset self-attention module between extracting features from the spatial dimension based on the self-attention mechanism and extracting features from the temporal dimension based on the self-attention mechanism.
  • the preset self-attention module can be used to serially execute a preset number of feature extraction steps for the input features.
  • the preset self-attention module can be used to serially execute a preset number of feature extraction steps for the input features.
  • the preset self-attention module can be used to: for input features, serially extract features at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, serially extract features from the temporal dimension based on the self-attention mechanism. Extract features at least once and output the extracted features.
  • the preset self-attention module can be used to: for the input features, serially extract features at least once from the time dimension based on the self-attention mechanism, and further for the extracted features, from the spatial dimension based on the self-attention
  • the mechanism extracts features at least once in series and outputs the extracted features.
  • the preset self-attention module can be used to: determine the input feature as the current feature; perform the following steps in a loop until the preset loop stop condition is met: for the current feature, serially based on the self-attention mechanism from the spatial dimension Extract features at least once, and further extract features at least once in series based on the self-attention mechanism from the time dimension based on the extracted features; determine the extracted features as the current features.
  • the preset self-attention module can be used to: determine the input feature as the current feature; perform the following steps in a loop until the preset loop stop condition is met: for the current feature, serially based on the self-attention mechanism from the spatial dimension Extract features at least once, and further extract features at least once in series based on the self-attention mechanism from the time dimension based on the extracted features; determine the extracted features as the current features.
  • the preset self-attention module can be used to: determine the input feature as the current feature; perform the following steps in a loop until the preset loop stop condition is met: for the current feature, serially based on the self-attention mechanism from the time dimension Extract features at least once, and further extract features at least once in series from the spatial dimension based on the self-attention mechanism for the extracted features; determine the extracted features as the current features.
  • the preset self-attention module can be used to: determine the input feature as the current feature; perform the following steps in a loop until the preset loop stop condition is met: for the current feature, serially based on the self-attention mechanism from the time dimension Extract features at least once, and further extract features at least once in series from the spatial dimension based on the self-attention mechanism for the extracted features; determine the extracted features as the current features.
  • the weight of the self-attention mechanism can be different between different preset self-attention modules. Specifically, it can be parameters such as the weight of each preset self-attention module determined through model training.
  • the order and number of features extracted from the spatial dimension based on the self-attention mechanism and from the temporal dimension based on the self-attention mechanism can be different between different preset self-attention modules.
  • a preset self-attention module can be used to: first extract features from the spatial dimension at least once based on the self-attention mechanism for the input features, and then serially extract at least once the features from the time dimension based on the self-attention mechanism for the extracted features. Feature once and output.
  • Another preset self-attention module can be used to: first extract features from the time dimension at least once based on the self-attention mechanism based on the input features, and then serially extract features at least once from the spatial dimension based on the self-attention mechanism based on the extracted features. features and output.
  • the preset self-attention module may be a self-attention layer, which is used to extract features based on the self-attention mechanism for input features.
  • the shot classification network can include a self-attention layer, or multiple self-attention layers in series, so that the shot classification network can be trained through the model and the parameters of the self-attention layer can be determined.
  • the parameters of the self-attention layer include weights. After model training, the parameters of different self-attention layers are usually different.
  • the self-attention layer can be used to extract features based on the self-attention mechanism from the spatial and temporal dimensions of the input features.
  • the features of the input are first serially extracted at least once from the spatial dimension based on the self-attention mechanism, and then the features are serially extracted at least once from the time dimension based on the self-attention mechanism and output. It is also possible to first serially extract features from the time dimension at least once based on the self-attention mechanism for the input features, and then serially extract features from the spatial dimension at least once based on the self-attention mechanism for the extracted features and output them.
  • This method flow does not limit the specific way to obtain classification features.
  • the classification features may be extracted based on the self-attention mechanism for the backboard area image sequence.
  • the above embodiment explains the features extracted based on the self-attention mechanism for the backboard area image sequence.
  • This embodiment is not limited to the method of obtaining classification features based on features extracted by the self-attention mechanism.
  • features extracted based on the self-attention mechanism can be synthesized to obtain classification features.
  • features extracted based on the self-attention mechanism can be directly determined as classification features.
  • the features extracted based on the self-attention mechanism can also be pooled, specifically average pooling or maximum pooling, to obtain classification features.
  • classification initial features can be added to the features input from the attention mechanism.
  • the initial features of classification are not the features in the image sequence of the backboard area to be identified.
  • the initial features of classification can be used to comprehensively represent the feature information in the image sequence of the backboard area to be identified learned by the self-attention mechanism through multiple feature extractions. Affect the original features in the image sequence of the backboard area to be identified.
  • the backboard area image learned by the self-attention mechanism has a greater correlation with the shot recognition result in the spatial dimension; or the image content learned by the self-attention mechanism has a greater correlation with the shot recognition result in the time dimension.
  • the current representation corresponding to the initial feature of classification can be determined among the features extracted based on the self-attention mechanism.
  • the current representation corresponding to the initial classification feature can be determined as the classification feature and used for subsequent prediction of the shot recognition result.
  • extracting classification features based on the self-attention mechanism for the feature map sequence may include: adding classification initial features to the feature map sequence; adding classification features to the feature map sequence.
  • the feature map sequence after the initial features extracts features based on the self-attention mechanism; from the extracted features, the current representation corresponding to the classification initial features is determined as the classification feature.
  • the initial classification features are not features in the image sequence of the backboard area to be identified. Therefore, position coding can be set for the initial classification features to facilitate subsequent feature extraction based on the self-attention mechanism.
  • position coding can be added to the feature map sequence after adding the initial features for classification.
  • the classification initial features can be added first, and then the position coding can be added.
  • the position encoding set for the initial features for classification is usually outside the entire feature map sequence.
  • the classification initial features can be added before the first feature map in the feature map sequence, or after the last feature map.
  • the classification initial features can be added before the first feature point in each feature map in the feature map sequence, or after the last feature point.
  • this method process does not limit the specific form.
  • the classification initial features may belong to parameters in the shot classification network, and the specific values of the classification initial features may be determined through model training.
  • the values of the initial features for classification can be continuously adjusted. Finally, after the training is completed, the final adjusted classification initial feature values are determined and used for shot recognition.
  • the smaller the size of the initial features for classification the smaller the computing resources required for adjustment, which can improve the stability of model training.
  • the size of the initial classification features can be 1*1.
  • the initial classification features can be copied multiple times to meet the needs of the self-attention mechanism. Specifically, it can be a broadcast operation.
  • the feature size that needs to be expanded is 1*N
  • the 1*1 classification initial feature can be copied N times, combined into a 1*N feature, and then added to the feature. Used in the subsequent steps of feature extraction by the self-attention mechanism.
  • the classification initial features can be easily adjusted during the model training process to improve the stability of the model training.
  • classification initial features can also be of other sizes.
  • the size of the initial features for classification can include one or more feature maps, which can be added to the feature map sequence as new features in the time dimension.
  • the size of the initial features for classification may include one or more newly added feature points at the same position in each feature map as new features in the spatial dimension of the feature map sequence.
  • the size of the initial features for classification can include one or more feature maps, and one or more new feature points at the same position in each feature map, which can be used as the feature map sequence in the time and space dimensions. New features.
  • the features required to be extended by the self-attention mechanism can be one or more feature maps. This allows multiple copies of the initial classification features to be combined into a feature map for feature expansion.
  • the features required to be expanded by the self-attention mechanism can be one or more feature points added at the same position in each feature map. Therefore, multiple copies of the initial classification features can be copied and combined into feature points at the same position in each feature map for feature expansion.
  • one or more one-dimensional features can be added based on the one-dimensional features that have temporal relationships among the two-dimensional features, and the newly added one-dimensional features are the initial classification features.
  • one or more feature points can be added based on each one-dimensional feature that has a temporal relationship in the two-dimensional feature, and the added feature points are the initial features for classification.
  • one or more one-dimensional features can be added based on the one-dimensional features that have temporal relationships in the two-dimensional features, and further one or more one-dimensional features can be added based on each one-dimensional feature in the current two-dimensional features. or multiple feature points.
  • the newly added feature part is the classification initial feature.
  • a two-dimensional feature of a*b can be obtained by stacking. Two-dimensional features after adding position encoding. Between a one-dimensional features, there is a temporal position relationship; among the one-dimensional features of length b, there is a spatial position relationship.
  • the initial feature for classification can be the feature of 1*b, which is added to the two-dimensional feature of a*b to obtain the two-dimensional feature of (a+1)*b.
  • the initial feature for classification can be the feature of 1*a, which is added to the two-dimensional feature of a*b to obtain the two-dimensional feature of a*(b+1).
  • the initial classification feature can also be a 1*1 feature, for the two-dimensional feature of a*b, if necessary, it can be expanded to (a+1) *b two-dimensional features, you can copy b copies of the initial classification features and combine them into 1*b features and add them to the two-dimensional features of a*b to obtain the two-dimensional features of (a+1)*b. Specifically, it can be to copy the initial features of classification through broadcast operation.
  • the two-dimensional features after adding the classification initial features can be the two-dimensional features of (a+1)*(b+1), and the new part compared to the two-dimensional features of a*b can be the classification initial features.
  • This method process does not limit the specific method of determining the shot classification result based on the classification characteristics.
  • the classification features when classification features are obtained, can be input into a pre-trained fully connected network to obtain the shot recognition result output by the fully connected network.
  • a fully connected network can be used to predict the shot recognition results based on the input features. Specifically, it can be based on the input classification features to predict the shot recognition results.
  • model structures can also be used to predict shot recognition results for classification features. This embodiment is not specifically limited.
  • classification features can be processed, and then a fully connected network can be used to predict the shot recognition result.
  • the shot classification network includes a fully connected network.
  • the fully connected network is also trained, so that the fully connected network can be used to predict the shot recognition results and improve the prediction accuracy and prediction efficiency.
  • the fully connected network trained in the shot classification network can be used to predict the shot recognition results based on the classification features.
  • the classification features can be pooled.
  • average pooling or maximum pooling can be performed to obtain the classification features of the preset feature size, and then the classification features of the preset feature size can be input to the pre-trained full set of features. Connect to the network and obtain the shot recognition results output by the fully connected network.
  • the preset feature size can be smaller than the original size of the classification feature, thereby reducing the amount of data and calculation and improving prediction efficiency.
  • determining the shot recognition result based on the classification features may include: Perform pooling processing to obtain the features to be input with preset feature sizes; input the features to be input into the pre-trained fully connected network to obtain the shot recognition results output by the fully connected network.
  • This embodiment does not limit the specific form of the shot recognition result, as long as it can indicate whether the shot is successful.
  • the process of this method can be based on the shot classification network to perform shot recognition on the video to be recognized.
  • a video clip of the shooting process can be obtained as the video to be identified, and the shot classification network is used to perform shot recognition.
  • the video to be identified may also include longer basketball game clips, which may include one or more shooting processes.
  • the video to be identified can be divided into multiple segments, so that shot recognition can be performed on multiple segments respectively.
  • the video to be recognized may include multiple shooting processes, dividing the segments for separate shooting recognition can easily improve the accuracy of shooting recognition, and can also locate the video position of the shooting process and the video position of the successful shot.
  • This embodiment does not limit the way in which the video to be recognized is divided into segments.
  • the video to be recognized can be directly divided into multiple segments of preset segment duration.
  • a sliding window mechanism can be used to divide the segments.
  • this embodiment does not limit the order in which shot recognition is performed.
  • shot recognition can be performed in parallel for different divided video segments, or shot recognition can be performed serially.
  • the image sequence of the backboard area to be identified can be directly divided and shot recognition can be performed separately.
  • input the backboard area image sequence to be identified into the shot classification network and obtain the shot recognition result output by the shot classification network, which may include: dividing the backboard area image sequence to be identified into multiple backboard area image sub-sequences, and dividing the obtained Each backboard area image subsequence is input into the shot classification network, and the shot recognition result output by the shot classification network is obtained.
  • This embodiment does not limit the way of dividing the backboard area image sub-sequence.
  • a sliding window mechanism can be used. Since there may be overlapping parts between different segments divided by the sliding window mechanism, when the image sequence of the backboard area to be identified is divided into subsequences based on the sliding window mechanism, there is no need to repeatedly determine the video frames in the overlapping parts. Backboard area, improve efficiency and save computing resources.
  • extracting classification features based on the self-attention mechanism may include: for the feature map sequence, determine the first m feature maps as the feature map subsequence contained in the current sliding window; m ⁇ 1; loop execution The following steps are performed until the current sliding window cannot move backward: extract classification features based on the self-attention mechanism for the feature map subsequence contained in the current sliding window; move the sliding window backward by a preset sliding step.
  • This embodiment does not limit the order in which subsequent shot recognition steps are performed for different feature map subsequences divided by the sliding window mechanism.
  • subsequent steps of shot recognition can be performed in parallel for different feature map subsequences divided by the sliding window mechanism. Can also be executed serially. Among them, executing subsequent steps of shot recognition in parallel can improve the efficiency of shot recognition.
  • classification features are extracted based on the self-attention mechanism for the feature map subsequence contained in the current sliding window.
  • the self-attention mechanism for the feature map subsequence contained in the current sliding window.
  • extracting classification features based on the self-attention mechanism for the feature map subsequence contained in the current sliding window may include: adding position coding to the feature map subsequence contained in the current sliding window; position coding may include: characterizing the feature Information about the spatial position relationship between different feature points of each feature map in the graph subsequence, and the table Characterize the information of the temporal position relationship between different feature maps in the feature map subsequence; for the feature map subsequence after adding position coding, based on the self-attention mechanism, extract features from the spatial dimension and time dimension to obtain classification features.
  • adding position coding to the feature map subsequence may include: converting each feature map in the feature map subsequence into a one-dimensional feature; performing stack conversion processing on the one-dimensional features to obtain two-dimensional features; for The obtained two-dimensional features are added with position coding.
  • stacking conversion processing may be performed on all or part of the converted one-dimensional features to obtain two-dimensional features, and then position coding may be added to the obtained two-dimensional features.
  • extracting classification features based on the self-attention mechanism for the feature map subsequence may include: adding classification initial features to the feature map subsequence; extracting features based on the self-attention mechanism for the feature map sequence after adding the classification initial features. ; From the extracted features, determine the current representation corresponding to the initial classification feature as the classification feature.
  • the video clips in the video to be identified represented by the corresponding sub-sequences can be determined based on the shot recognition results, so that the video location of the successful shot can be located.
  • the number of successful shots in the video to be recognized can also be determined based on the number of shot recognition results that represent successful shots, so as to facilitate subsequent calculation of scores in basketball games.
  • the structure of the shot classification network can be clarified.
  • the process of this method does not limit the specific structure of the shot classification network.
  • the following explanations are provided for illustrative purposes.
  • the shot classification network can be used to: extract classification features based on the self-attention mechanism for the input backboard area image sequence; and determine the shot recognition result based on the classification features.
  • the shot classification network may include a module for extracting classification features based on a self-attention mechanism, and a module for determining a shot recognition result based on the classification features.
  • graph sequence and the extracted feature graph sequence needs to be preprocessed, for example, preprocessing operations such as adding position coding, dividing subsequences, adding initial features for classification, etc.
  • the shot classification network is also used to further extract classification features based on the self-attention mechanism based on the preprocessed feature map sequence, and finally predict the shot recognition results based on the classification features.
  • the structure of the shot classification network may include a feature map extraction module, a feature preprocessing module, a self-attention feature extraction module and a prediction module.
  • the structure of the shot classification network is not specifically limited, and this embodiment is only used for illustrative explanation.
  • the feature map extraction module may be used to extract a feature map sequence of the input backboard area image sequence, and output the extracted feature map sequence.
  • the feature preprocessing module can be used to preprocess the feature map sequence output by the feature map extraction module.
  • the feature preprocessing module can be used to: add position coding to the feature map sequence output by the feature map extraction module; and output the feature map sequence after adding position coding.
  • the position coding may include: information characterizing the spatial position relationship between different feature points of each feature map in the feature map sequence, and information characterizing the temporal position relationship between different feature maps in the feature map sequence.
  • the feature preprocessing module can also be used to: add classification initial features to the feature map sequence output by the feature map extraction module; and output the feature map sequence after adding the classification initial features.
  • the feature preprocessing module can also be used to: add classification initial features and position coding to the feature map sequence output by the feature map extraction module; and output the feature map sequence after adding classification initial features and position coding.
  • the order in which the classification initial features and position codes are added is not limited. Specifically, the initial classification features can be added first, and then the position coding can be added.
  • the preprocessing in the feature preprocessing module may include at least adding position coding, thereby facilitating subsequent feature extraction based on the self-attention mechanism.
  • the feature preprocessing module can also be used to divide the feature map sequence output by the feature map extraction module into feature map subsequences, and the divided feature map subsequences can be directly output.
  • the feature preprocessing module can also be used to: divide the feature map subsequences for the feature map sequence output by the feature map extraction module; for each divided feature map subsequence, add classification initial features and/or position codes and output them.
  • the sliding window mechanism can be used to divide the feature map subsequences.
  • the feature preprocessing module can be used to: for the feature map sequence output by the feature map extraction module, determine the first m feature maps as the feature map subsequence contained in the current sliding window; m ⁇ 1; perform the following steps in a loop until the current sliding window Unable to move backward: For the feature map subsequence contained in the current sliding window, add classification initial features and/or position coding and output; move the sliding window backward by the preset sliding step.
  • the self-attention feature extraction module can be used to: target the feature map with added position coding output by the feature preprocessing module. Sequence or feature map subsequence, feature extraction based on self-attention mechanism, and output classification features.
  • Whether feature extraction is performed from the spatial dimension or the time dimension can be determined based on the added position coding.
  • the self-attention feature extraction module can be used to: target the feature map sequence or features output by the feature preprocessing module Position coding is added to the image subsequence, feature extraction is performed based on the self-attention mechanism, and classification features are output.
  • the self-attention feature extraction module includes a preset self-attention module, or a plurality of cascaded preset self-attention modules.
  • the prediction module can be used to predict the shot recognition result based on the classification features output by the self-attention feature extraction module.
  • a fully connected network can be used for prediction.
  • Figure 2 is a schematic structural diagram of a shot classification network according to an embodiment of the present invention.
  • the shot classification network can include: feature map extraction module, feature preprocessing module, self-attention feature extraction module and prediction module.
  • the output of the feature map extraction module can be cascaded to the input of the feature preprocessing module.
  • the output of the feature preprocessing module can be cascaded to the input of the self-attention feature extraction module.
  • the self-attention feature extraction module can include multiple cascaded The self-attention module is preset, and the output of the self-attention feature extraction module can be cascaded to the input of the prediction module.
  • the backboard area image sequence can be input into the shot classification network, that is, into the feature map extraction module.
  • the output of the shot classification network that is, the output of the prediction module, is the shot recognition result. Can be used to determine whether a shot was successful.
  • Figure 3 is a schematic principle diagram of a shot classification network according to an embodiment of the present invention.
  • the shot classification network can include: feature map extraction module, feature preprocessing module, self-attention feature extraction module and prediction module.
  • the feature map extraction module can be used to: for each backboard area image in the input backboard area image sequence, use the pre-trained two-dimensional CNN network to extract a two-dimensional feature map, thereby obtaining the feature map sequence. and output.
  • the output of the feature map extraction module can be cascaded to the input of the feature preprocessing module.
  • the feature preprocessing module can be used to: divide the input feature map sequence into multiple feature map subsequences based on the sliding window mechanism. For each feature map in each feature map subsequence, it is converted into one-dimensional features, and then all the converted one-dimensional features are stacked to obtain two-dimensional features. Then, classification initial features and position coding are added to the two-dimensional features.
  • the feature preprocessing module can be used to: output two-dimensional features that add classification initial features and position encoding.
  • the output of the feature preprocessing module can be cascaded to the input of the self-attention feature extraction module.
  • the sliding window length is 3 and the sliding step size is 1, and 8 feature map subsequences can be divided.
  • the classification initial features can be added from the time dimension to obtain 4*n two-dimensional features, and then position coding can be added. You can also add classification initial features from the spatial dimension to obtain 3*(n+1) two-dimensional features, and then add position coding.
  • the position coding can include: information characterizing the spatial position relationship between different feature points of each feature map in the feature map sequence, and characterizing different feature maps in the feature map sequence. information about the temporal relationship between them.
  • the self-attention feature extraction module can include 3 cascaded preset self-attention modules.
  • features can be extracted once from the spatial dimension based on the self-attention mechanism based on the input features.
  • extract features once from the time dimension based on the self-attention mechanism.
  • the output of the self-attention feature extraction module can be classification features. Specifically, from the features output by the eighth preset self-attention module, the current representation corresponding to the classification initial feature is determined as the classification feature.
  • the current representation corresponding to the classification initial feature can also be pooled to obtain a fixed-size feature, which is determined as a classification feature.
  • the output of the self-attention feature extraction module can be cascaded to the input of the prediction module.
  • the prediction module it can be used to predict the shot recognition results based on the classification features output by the self-attention feature extraction module, using the pre-trained fully connected network.
  • the embodiment of the present invention also provides a specific method embodiment.
  • the embodiment of this method proposes a vision-based shot recognition algorithm, which can be run on a mobile phone or an ordinary personal computer. Shot recognition can be realized using the camera of the mobile phone or the surveillance camera of the stadium, which has the advantages of simple operation and low cost. .
  • shot recognition is used to determine whether a player's shot is a goal or a score.
  • shot recognition algorithms allows automatic scoring, allowing players to focus on the game.
  • the video data of the basketball game can be divided into multiple video segments based on the sliding window mechanism.
  • the length of the sliding window may be 2 seconds, and the sliding step may be 1 second.
  • a target detection network trained based on open source architecture (such as YOLOX, RetinaNet) to detect backboards.
  • open source architecture such as YOLOX, RetinaNet
  • the shot classification network is a custom deep neural network, which includes a two-dimensional convolutional network and a self-attention mechanism.
  • Two-dimensional convolutional network has proven its advantages of high performance and fast speed in image feature extraction after many years of practice.
  • image data sets such as ImageNet
  • the self-attention module has shown great potential in processing sequence data in natural language.
  • the shot classification network can be divided into two sections during network design.
  • the first section uses a two-dimensional convolutional network to extract underlying image-level features.
  • the second section uses self-attention to fuse multi-frame image features in the time domain.
  • the spatial features can be fine-tuned through transposition + self-attention mechanism.
  • the extracted features can be reused in subsequent processing, reducing unnecessary repeated calculations.
  • the self-attention in the second section can effectively perform multi-frame image fusion processing in the time domain to obtain optimal results.
  • Each frame of image can be processed using a three-layer two-dimensional convolutional network with shared weights.
  • the kernel size of each layer is 3, the step size of the first layer is 2, and the step size of the last two layers is 1.
  • Each layer uses Batch norm is processed, the activation function is relu, and the number of output channels are 32, 64, and 128 respectively.
  • T x S x C the time dimension data, which is processed simultaneously.
  • the number of image frames, the typical value is 32, where S is the spatial dimension, the typical value is 1024, C represents the channel data of the feature, which is the number of output channels of the above convolution network, the typical value is 128.
  • a self-attention module to process the spliced features after adding position encoding and classification initial features. Specifically, it can be based on the self-attention mechanism to extract features from the spatial dimension; after the processing is completed, the extracted features are transposed to S x T After x C, another self-attention module is used to process the features. Specifically, it can be based on the self-attention mechanism to extract features from the time dimension. After the processing is completed, the feature sequence is restored to T x S x C.
  • the output result of the classification network is whether the corresponding shot is scored. If a goal is scored, the corresponding score will be scored.
  • the input used is multiple continuous frame images around the backboard.
  • the backboard has a large target and cannot move.
  • the detection difficulty is extremely low. There is no need to explicitly detect the basketball and the basket.
  • This method embodiment provides a vision-based shot recognition algorithm, which can determine whether a goal is scored based on the video input of the camera, thereby achieving the purpose of automatic scoring of the game. It can be directly run on a mobile phone or an ordinary personal computer, and has simple deployment , the advantage of being removable.
  • the embodiment of the present invention also provides an apparatus embodiment.
  • Figure 4 is a schematic structural diagram of a shot recognition device according to an embodiment of the present invention.
  • the device may include the following units.
  • the backboard identification unit 401 is used to: obtain the video to be identified; determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified.
  • the classification network unit 402 is used to: input the image sequence of the backboard area to be identified into the shot classification network, and obtain the shot recognition result output by the shot classification network; the shot recognition result is used to characterize whether the shot is successful.
  • Embodiments of the present invention also provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, any one of the above method embodiments is implemented. .
  • An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the one processor, The instructions are executed by the at least one processor, so that the at least one processor can execute any of the above method embodiments.
  • Figure 5 is a schematic hardware structure diagram of a computer device configured to configure a method according to an embodiment of the present invention.
  • the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040 and a bus 1050.
  • the processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.
  • the processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the technical solutions provided by the embodiments of the present invention.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits
  • ASIC Application Specific Integrated Circuit
  • the memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 1020 can store operating systems and other application programs. When the technical solution provided by the embodiment of the present invention is implemented through software or firmware, the relevant program code is stored in the memory 1020 and called and executed by the processor 1010.
  • the input/output interface 1030 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
  • the communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
  • the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components.
  • the above-mentioned device may also include only the components necessary to implement the embodiments of the present invention, and does not necessarily include all the components shown in the figures.
  • Embodiments of the present invention also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, any one of the above method embodiments can be implemented.
  • Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which implements any of the above method embodiments when executed by a processor.
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cartridges disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transient computer-readable media (transitory media), such as modulated data signals and carrier waves.
  • the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence or the contribution part of the technical solutions of the embodiments of the present invention can be embodied in the form of software products.
  • the computer software products can be stored in storage media, such as ROM/RAM, magnetic disks, and optical disks. etc., including a number of instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments of the present invention.
  • a typical implementation device is a computer, which may be in the form of a personal computer, a laptop, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, or a game controller. desktop, tablet, wearable device, or a combination of any of these devices.
  • each embodiment in this specification is described in a progressive manner, and the same and similar features among the various embodiments Parts may refer to each other, and each embodiment focuses on its differences from other embodiments.
  • the description is relatively simple.
  • the device embodiments described above are only illustrative.
  • the modules described as separate components may or may not be physically separated.
  • the functions of each module may be integrated into the same device. or implemented in multiple software and/or hardware. Some or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance.
  • plurality refers to two or more than two, unless expressly limited otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present invention are a basketball shot recognition method and apparatus, a device and a storage medium. The method comprises: acquiring a video to be recognized; determining a backboard region in the video to be recognized, so as to obtain a backboard region image sequence to be recognized; and inputting into a basketball shot classification network the backboard region image sequence to be recognized, and acquiring a basketball shot recognition result output by the basketball shot classification network, the basketball shot recognition result being used for representing whether a basketball shot is successful.

Description

一种投篮识别方法、装置、设备和存储介质A shot recognition method, device, equipment and storage medium 技术领域Technical field
本发明涉及计算机应用技术领域,尤其涉及一种投篮识别方法、装置、设备和存储介质。The present invention relates to the field of computer application technology, and in particular to a shot recognition method, device, equipment and storage medium.
背景技术Background technique
在篮球比赛中,通常需要裁判人工判断投篮是否进球。但在多种情况下,往往难以找到裁判进行判断。例如,在一般业余比赛或者篮球训练过程中,考虑到裁判的人力成本,往往难以找到裁判进行判断,需要运动员自己分神判断投篮是否进球。In basketball games, referees usually need to manually judge whether a shot is a goal or not. However, in many cases, it is often difficult to find a referee to make a judgment. For example, in general amateur games or basketball training, considering the labor cost of referees, it is often difficult to find referees to make judgments, and athletes need to be distracted to judge whether the shots are scored.
在相关技术中,可以在篮板上安装传感器,通过传感器自动判断投篮是否进球。但这种方式的部署成本较高,目前亟需一种低成本的投篮识别方法。In related technology, sensors can be installed on the backboard, and the sensors can automatically determine whether the shot is scored. However, the deployment cost of this method is relatively high, and a low-cost shot recognition method is urgently needed.
发明内容Contents of the invention
本发明提供一种投篮识别方法、装置、设备和存储介质,以解决相关技术中的不足。The present invention provides a shot recognition method, device, equipment and storage medium to solve the deficiencies in related technologies.
根据本发明实施例的第一方面,提供一种投篮识别方法,包括:According to a first aspect of an embodiment of the present invention, a shot recognition method is provided, including:
获取待识别视频;Get the video to be identified;
确定所述待识别视频中的篮板区域,得到待识别篮板区域图像序列;Determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified;
将所述待识别篮板区域图像序列输入投篮分类网络,获取所述投篮分类网络输出的投篮识别结果;其中,所述投篮识别结果用于表征投篮是否成功。The image sequence of the backboard area to be identified is input into the shot classification network, and the shot recognition result output by the shot classification network is obtained; wherein the shot recognition result is used to represent whether the shot is successful.
可选地,所述投篮分类网络用于:Optionally, the shot classification network is used to:
针对输入的篮板区域图像序列中的每个篮板区域图像提取特征图,得到特征图序列;针对所述特征图序列,基于自注意力机制提取分类特征;根据所述分类特征确定投篮识别结果。A feature map is extracted for each backboard area image in the input backboard area image sequence to obtain a feature map sequence; classification features are extracted based on the self-attention mechanism for the feature map sequence; and the shot recognition result is determined based on the classification features.
可选地,所述针对所述特征图序列,基于自注意力机制提取分类特征,包括:Optionally, for the feature map sequence, extracting classification features based on a self-attention mechanism includes:
针对所述特征图序列,添加位置编码;其中,所述位置编码包括:表征所述特征图序列中每个特征图不同特征点之间空间位置关系的信息,和表征所述特征图序列中不同特征图之间时序位置关系的信息; For the feature map sequence, a position code is added; wherein the position code includes: information characterizing the spatial position relationship between different feature points of each feature map in the feature map sequence, and characterizing the different feature points in the feature map sequence. Information about the temporal position relationship between feature maps;
针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度和时间维度提取特征,得到分类特征。For the feature map sequence after adding position encoding, based on the self-attention mechanism, features are extracted from the spatial dimension and the temporal dimension to obtain classification features.
可选地,所述针对所述特征图序列,添加位置编码,包括:Optionally, adding position coding to the feature map sequence includes:
将所述特征图序列中的每个特征图,转换为一维特征;Convert each feature map in the feature map sequence into a one-dimensional feature;
将一维特征进行堆叠转换处理,得到二维特征;Perform stacking conversion processing on one-dimensional features to obtain two-dimensional features;
针对所述二维特征,添加位置编码。For the two-dimensional features, position encoding is added.
可选地,所述投篮分类网络包括N个级联的预设自注意力模块;N≥2;对于第i个预设自注意力模块,1≤i≤N-1,其输出级联至第i+1个预设自注意力模块的输入;所述预设自注意力模块用于基于自注意力机制,从空间维度和时间维度提取特征;Optionally, the shot classification network includes N cascaded preset self-attention modules; N≥2; for the i-th preset self-attention module, 1≤i≤N-1, its output is cascaded to The input of the i+1 preset self-attention module; the preset self-attention module is used to extract features from the spatial dimension and the temporal dimension based on the self-attention mechanism;
所述针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度和时间维度提取特征,包括:The feature map sequence after adding position encoding is based on the self-attention mechanism to extract features from the spatial dimension and the temporal dimension, including:
将添加位置编码后的特征图序列输入第1个预设自注意力模块,基于第N个预设自注意力模块的输出,确定分类特征。The feature map sequence after position coding is input into the first preset self-attention module, and the classification features are determined based on the output of the Nth preset self-attention module.
可选地,所述预设自注意力模块用于:Optionally, the preset self-attention module is used to:
针对输入的特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征,输出所提取的特征;For the input features, extract features at least once in series from the spatial dimension based on the self-attention mechanism, and further extract features at least once from the time dimension based on the self-attention mechanism in series for the extracted features, and output the extracted features;
或者,or,
针对输入的特征,从时间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从空间维度基于自注意力机制串行提取至少一次特征,输出所提取的特征。For the input features, features are serially extracted at least once from the time dimension based on the self-attention mechanism. Further, for the extracted features, features are serially extracted at least once from the spatial dimension based on the self-attention mechanism, and the extracted features are output.
可选地,所述针对所述特征图序列,基于自注意力机制提取分类特征,包括:Optionally, for the feature map sequence, extracting classification features based on a self-attention mechanism includes:
针对所述特征图序列添加分类初始特征;Add classification initial features to the feature map sequence;
针对添加分类初始特征后的特征图序列,基于自注意力机制提取特征;For the feature map sequence after adding the initial features for classification, features are extracted based on the self-attention mechanism;
从所提取的特征中,将所述分类初始特征对应的当前表征,确定为分类特征。From the extracted features, the current representation corresponding to the initial classification feature is determined as the classification feature.
可选地,所述根据所述分类特征确定投篮识别结果,包括:Optionally, determining the shot recognition result based on the classification features includes:
针对所述分类特征进行池化处理,得到预设特征尺寸的待输入特征; Perform pooling processing on the classification features to obtain features to be input with preset feature sizes;
将所述待输入特征输入到预先训练的全连接网络中,获取所述全连接网络输出的投篮识别结果。The features to be input are input into the pre-trained fully connected network, and the shot recognition result output by the fully connected network is obtained.
可选地,所述针对所述特征图序列,基于自注意力机制提取分类特征,包括:Optionally, for the feature map sequence, extracting classification features based on a self-attention mechanism includes:
针对所述特征图序列,将前m个特征图确定为当前滑动窗口包含的特征图子序列;m≥1;For the feature map sequence, the first m feature maps are determined as the feature map subsequence contained in the current sliding window; m≥1;
循环执行以下步骤,直到当前滑动窗口无法向后移动:针对当前滑动窗口包含的特征图子序列,基于自注意力机制提取分类特征;将滑动窗口向后移动预设滑动步长。Perform the following steps in a loop until the current sliding window cannot move backward: extract classification features based on the self-attention mechanism for the feature map subsequence contained in the current sliding window; move the sliding window backward by the preset sliding step.
可选地,所述得到待识别篮板区域图像序列,包括:Optionally, obtaining the image sequence of the backboard area to be identified includes:
针对所确定的每个篮板区域,裁剪包含该篮板区域的图像内容,并将裁剪结果调整为预设图像尺寸,将调整结果添加到待识别篮板区域图像序列中;所述待识别篮板区域图像序列中,以篮板区域图像所在视频帧之间的时序顺序排序。For each determined backboard area, crop the image content containing the backboard area, adjust the cropping result to a preset image size, and add the adjustment result to the backboard area image sequence to be identified; the backboard area image sequence to be identified , sorted by the temporal order between the video frames where the backboard area images are located.
根据本发明实施例的第二方面,提供一种投篮识别装置,包括:According to a second aspect of the embodiment of the present invention, a shot recognition device is provided, including:
篮板识别单元,用于:获取待识别视频;确定所述待识别视频中的篮板区域,得到待识别篮板区域图像序列;A backboard identification unit, configured to: obtain a video to be identified; determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified;
分类网络单元,用于:将所述待识别篮板区域图像序列输入投篮分类网络,获取所述投篮分类网络输出的投篮识别结果;其中,所述投篮识别结果用于表征投篮是否成功。A classification network unit configured to: input the to-be-identified backboard area image sequence into a shot classification network, and obtain a shot recognition result output by the shot classification network; wherein the shot recognition result is used to characterize whether the shot is successful.
根据上述实施例可知,通过针对待识别视频进行投篮识别,无需预先部署传感器等硬件,降低了部署成本。According to the above embodiments, it can be seen that by performing shot recognition on the video to be recognized, there is no need to pre-deploy hardware such as sensors, which reduces deployment costs.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit the present invention.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.
图1是根据本发明实施例示出的一种投篮识别方法的流程示意图;Figure 1 is a schematic flow chart of a shot recognition method according to an embodiment of the present invention;
图2是根据本发明实施例示出的一种投篮分类网络的结构示意图;Figure 2 is a schematic structural diagram of a shot classification network according to an embodiment of the present invention;
图3是根据本发明实施例示出的一种投篮分类网络的原理示意图; Figure 3 is a schematic diagram of the principle of a shot classification network according to an embodiment of the present invention;
图4是根据本发明实施例示出的一种投篮识别装置的结构示意图;Figure 4 is a schematic structural diagram of a shot recognition device according to an embodiment of the present invention;
图5是根据本发明实施例示出的一种配置本发明实施例方法的计算机设备硬件结构示意图。FIG. 5 is a schematic diagram of the hardware structure of a computer device configured to configure a method according to an embodiment of the present invention.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the appended claims.
在篮球比赛中,通常需要裁判人工判断投篮是否进球。但在多种情况下,往往难以找到裁判进行判断。例如,在一般业余比赛或者篮球训练过程中,考虑到裁判的人力成本,往往难以找到裁判进行判断,需要运动员自己分神判断投篮是否进球。In basketball games, referees usually need to manually judge whether a shot is a goal or not. However, in many cases, it is often difficult to find a referee to make a judgment. For example, in general amateur games or basketball training, considering the labor cost of referees, it is often difficult to find referees to make judgments, and athletes need to be distracted to judge whether the shots are scored.
在相关技术中,可以在篮板上安装传感器,通过传感器自动判断投篮是否进球。但这种方式的部署成本较高,目前亟需一种低成本的投篮识别方法。In related technology, sensors can be installed on the backboard, and the sensors can automatically determine whether the shot is scored. However, the deployment cost of this method is relatively high, and a low-cost shot recognition method is urgently needed.
为了降低投篮识别的成本,本发明提供了一种投篮识别方法。In order to reduce the cost of shot recognition, the present invention provides a shot recognition method.
在该方法中,可以针对在篮球比赛中拍摄的视频进行识别,确定投篮是否进球。In this method, videos taken during basketball games can be identified to determine whether the shot is a goal.
相比于部署传感器等方式,拍摄视频进行投篮识别无需预先部署硬件,从而可以降低部署成本。Compared with methods such as deploying sensors, shooting video for shot recognition does not require pre-deployment of hardware, which can reduce deployment costs.
进一步地,由于判断投篮是否进球,通常是判断篮球是否投进篮板区域内的篮框,因此,包含篮板区域的图像内容,往往与篮球是否投进篮框关联较大。而篮板区域以外的其他图像内容,与篮球是否投进篮框关联较小。Furthermore, since determining whether a shot is a goal usually involves determining whether the basketball is thrown into the basket in the backboard area, therefore, the image content including the backboard area is often closely related to whether the basketball is thrown into the basket. Other image content outside the backboard area has less to do with whether the basketball is thrown into the basket.
例如,包含篮板区域的连续图像内容,可以用于判断篮板区域内的篮框是否震动,篮板区域内篮框下面的网状兜是否晃动,篮板区域内的篮球运动轨迹等,从而可以用于判断投篮是否成功进球。For example, the continuous image content containing the backboard area can be used to determine whether the basket in the backboard area is vibrating, whether the mesh pocket under the basket in the backboard area is shaking, the basketball movement trajectory in the backboard area, etc., which can be used to determine Whether the shot was successful or not.
因此,在本方法中,可以确定出拍摄视频中的篮板区域,组合成图像序列进行投篮识别。Therefore, in this method, the rebound area in the captured video can be determined and combined into an image sequence for shot recognition.
通过确定篮板区域,可以在不降低投篮识别准确率的情况下,减少所需要识别的数据量,提高投篮识别的效率,也可以降低投篮识别的计算成本。 By determining the rebound area, the amount of data that needs to be recognized can be reduced without reducing the accuracy of shot recognition, the efficiency of shot recognition can be improved, and the calculation cost of shot recognition can also be reduced.
具体识别的方式,可以采用深度学习的方式进行识别,The specific identification method can be identified using deep learning.
上述方法,可以通过针对视频进行投篮识别,无需预先部署传感器,降低了部署成本。并且可以通过确定出视频中的篮板区域进行投篮识别,减少了冗余的数据,从而可以提高投篮识别的效率。The above method can perform shot recognition based on video without pre-deployment of sensors, reducing deployment costs. And the shot recognition can be performed by determining the rebound area in the video, which reduces redundant data and improves the efficiency of shot recognition.
下面针对本发明实施例提供的一种投篮识别方法进行详细解释。A shot recognition method provided by an embodiment of the present invention will be explained in detail below.
如图1所示,图1是根据本发明实施例示出的一种投篮识别方法的流程示意图。As shown in Figure 1, Figure 1 is a schematic flow chart of a shot recognition method according to an embodiment of the present invention.
本发明实施例并不限定本方法流程的执行主体。可选地,执行主体可以是移动设备,也可以是服务端。The embodiments of the present invention do not limit the execution subject of the method flow. Optionally, the execution subject can be a mobile device or a server.
该方法可以包括以下步骤。The method may include the following steps.
S101:获取待识别视频。S101: Obtain the video to be identified.
S102:确定待识别视频中的篮板区域,得到待识别篮板区域图像序列。S102: Determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified.
S103:将待识别篮板区域图像序列输入投篮分类网络,获取投篮分类网络输出的投篮识别结果。S103: Input the image sequence of the backboard area to be identified into the shot classification network, and obtain the shot recognition result output by the shot classification network.
可选地,投篮分类网络可以用于针对输入的篮板区域图像序列,预测投篮识别结果。投篮识别结果具体可以用于表征投篮是否成功。Alternatively, the shot classification network can be used to predict the shot recognition results for the input backboard area image sequence. The shot recognition result can specifically be used to characterize whether the shot is successful.
可选地,可以预先根据样本特征为篮板区域图像序列的投篮样本,和对应的投篮标签训练投篮分类网络;投篮标签用于表征投篮是否成功。Optionally, the shot classification network can be trained in advance based on the shot samples whose sample characteristics are the backboard area image sequence and the corresponding shot labels; the shot labels are used to characterize whether the shot is successful.
其中,投篮识别的结果可以用于计算篮球比赛的得分。Among them, the results of shot recognition can be used to calculate the score of basketball games.
上述方法流程,可以通过针对待识别视频进行投篮识别,无需预先部署传感器等硬件,直接利用软件算法进行投篮识别,只需要能够拍摄视频的装置即可,具体可以是手持设备的摄像头,也可以是篮球场的监控摄像头等,降低了部署成本。The above method process can be used to identify shots based on the video to be identified, without pre-deployment of hardware such as sensors, and directly use software algorithms to identify shots. It only requires a device that can shoot videos. Specifically, it can be a camera of a handheld device, or it can be Surveillance cameras on basketball courts, etc., reduce deployment costs.
此外,还可以通过确定出待识别视频中的篮板区域进行投篮识别,减少冗余数据,从而可以提高投篮识别的效率。In addition, shot recognition can also be performed by determining the rebound area in the video to be recognized, reducing redundant data, thereby improving the efficiency of shot recognition.
通过深度学习进行投篮识别,可以提高投篮识别的效率和准确率。Shot recognition through deep learning can improve the efficiency and accuracy of shot recognition.
下面针对各个步骤进行详细的解释。Each step is explained in detail below.
一、S101:获取待识别视频。1. S101: Obtain the video to be identified.
本方法流程并不限定具体获取待识别视频的方式。 This method process does not limit the specific method of obtaining the video to be identified.
可选地,待识别视频具体可以是,篮球比赛的监控视频,或者手持设备的摄像头拍摄的篮球比赛视频。Optionally, the video to be identified may specifically be a surveillance video of a basketball game, or a video of a basketball game captured by a camera of a handheld device.
可选地,待识别视频可以是篮球比赛的视频片段,以便于识别在该片段中投篮是否成功。具体地,待识别视频可以是篮球比赛中的投篮片段,以便于直接快速地识别投篮是否成功。Optionally, the video to be identified can be a video clip of a basketball game, so as to identify whether the shot in the clip was successful. Specifically, the video to be recognized can be a clip of a shot in a basketball game, so that it can be directly and quickly identified whether the shot is successful.
例如,待识别视频可以是2-4秒的投篮视频片段。For example, the video to be recognized can be a 2-4 second shooting video clip.
可选地,待识别视频可以是篮球比赛的完整视频,以便于识别在篮球比赛中投篮成功的数量,并且可以方便定位投篮成功的视频位置。具体方式可以参见后文解释。Optionally, the video to be identified can be a complete video of a basketball game, so that the number of successful shots in the basketball game can be easily identified, and the video location of successful shots can be conveniently located. The specific method can be found in the explanation below.
本方法流程并不限定待识别视频的拍摄方式。This method process does not limit the shooting method of the video to be identified.
可选地,可以是手持设备进行拍摄,也可以是固定视角进行拍摄,例如,可以是监控摄像头对篮球比赛进行拍摄,也可以是固定手机的视角后对篮球比赛进行拍摄。Optionally, the shooting can be done with a handheld device or with a fixed angle of view. For example, the basketball game can be shot with a surveillance camera, or the basketball match can be shot with a fixed angle of view of the mobile phone.
可选地,为了方便后续确定篮板区域,可以利用固定视角针对篮板区域进行拍摄。Optionally, in order to facilitate subsequent determination of the backboard area, a fixed angle of view can be used to shoot the backboard area.
二、S102:确定待识别视频中的篮板区域,得到待识别篮板区域图像序列。2. S102: Determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified.
本方法流程并不限定确定篮板区域的方式。This method process does not limit the way to determine the backboard area.
可选地,可以采用目标检测的方式,针对待识别视频中的每个视频帧识别篮板区域。Optionally, target detection can be used to identify the rebound area for each video frame in the video to be identified.
可选地,可以使用预先训练的篮板区域检测模型,针对待识别视频中的每个视频帧进行识别,确定篮板区域。Optionally, a pre-trained backboard area detection model can be used to identify each video frame in the video to be recognized and determine the backboard area.
其中,篮板区域检测模型可以使用图像样本以及对应的篮板区域位置标签进行训练。Among them, the backboard area detection model can be trained using image samples and corresponding backboard area position labels.
可选地,篮板区域检测模型具体可以是使用基于YOLOX或RetinaNet训练的目标检测网络。Optionally, the backboard area detection model may specifically use a target detection network trained based on YOLOX or RetinaNet.
针对待识别篮板区域图像序列,可选地,可以将确定包含篮板区域的视频帧添加到待识别篮板区域图像序列中。For the backboard area image sequence to be identified, optionally, video frames determined to contain the backboard area can be added to the backboard area image sequence to be identified.
可选地,可以进一步减少冗余数据,针对包含篮板区域的视频帧进行裁剪,得到包含篮板区域的图像内容,添加到待识别篮板区域图像序列中。Optionally, redundant data can be further reduced, and video frames containing the backboard area can be cropped to obtain image content containing the backboard area, and added to the image sequence of the backboard area to be identified.
本实施例并不限定具体的裁剪方式,只要裁剪结果中包含所确定的篮板区域图像即可。This embodiment does not limit the specific cropping method, as long as the cropping result includes the determined backboard area image.
可选地,为了方便利用篮板区域附近的图像数据进行投篮识别,可以在针对包含篮 板区域的视频帧进行裁剪时,使得裁剪结果的尺寸大于所确定的篮板区域尺寸,从而可以使得裁剪结果包含篮板区域附近的图像数据。Optionally, in order to facilitate the use of image data near the backboard area for shot recognition, you can target the target containing the basket. When the video frame in the board area is cropped, the size of the cropping result is larger than the determined size of the backboard area, so that the cropping result can include image data near the backboard area.
可选地,针对包含篮板区域的连续视频帧,可以确定视频帧中的矩形框位置,使得连续视频帧中,所确定的篮板区域都包含在矩形框位置中。进而可以根据矩形框位置进行裁剪。Optionally, for the continuous video frames that contain the rebound area, the position of the rectangular frame in the video frame can be determined, so that in the continuous video frames, the determined rebound area is included in the position of the rectangular frame. Then you can crop according to the position of the rectangular frame.
具体可以是将矩形框位置扩大后进行裁剪,从而可以裁剪得到篮框区域附近的图像内容。Specifically, the position of the rectangular frame can be expanded and then cropped, so that the image content near the basket area can be cropped.
具体地,针对连续的视频帧,可以确定这些视频帧之间同一位置检测到的篮板区域的最小外包围框;将外包围框扩大1.5倍后,从图像相应位置裁剪出相应的图像。Specifically, for consecutive video frames, the minimum outer bounding box of the backboard area detected at the same position between these video frames can be determined; after expanding the outer bounding box by 1.5 times, the corresponding image is cropped from the corresponding position of the image.
可选地,由于裁剪得到的图像内容尺寸并不统一,为了方便后续计算和输入模型,可以针对裁剪结果进行调整,将不同裁剪结果调整为相同尺寸。具体可以是将不同裁剪结果调整为相同分辨率。Optionally, since the size of the cropped image content is not uniform, in order to facilitate subsequent calculation and input into the model, the cropping results can be adjusted and different cropping results can be adjusted to the same size. Specifically, it can be to adjust different cropping results to the same resolution.
因此,可选地,得到待识别篮板区域图像序列,可以包括:针对所确定的每个篮板区域,裁剪包含该篮板区域的图像内容,并将裁剪结果调整为预设图像尺寸,将调整结果添加到待识别篮板区域图像序列中。Therefore, optionally, obtaining the image sequence of the backboard area to be identified may include: for each backboard area determined, crop the image content containing the backboard area, adjust the cropping result to the preset image size, and add the adjustment result into the image sequence of the backboard area to be identified.
可选地,待识别篮板区域图像序列中的不同篮板区域可以具有相同的图像尺寸。得到待识别篮板区域图像序列,可以包括:针对所确定的不同篮板区域,分别将包含不同篮板区域的图像内容裁剪出来并调整为相同的图像尺寸,将调整结果添加到待识别篮板区域图像序列中。Optionally, different backboard areas in the backboard area image sequence to be identified may have the same image size. Obtaining the image sequence of the backboard area to be identified may include: for the determined different backboard areas, crop the image content containing the different backboard areas and adjust them to the same image size, and add the adjustment results to the image sequence of the backboard area to be identified. .
在一种可选的实施例中,待识别篮板区域图像序列中的篮板区域图像之间,可以具有先后顺序,从而可以方便根据先后顺序进行投篮识别。In an optional embodiment, the backboard area images in the sequence of backboard area images to be identified may have a sequence, so that shot recognition can be facilitated based on the sequence.
可选地,待识别篮板区域图像序列中,可以是以篮板区域图像所在视频帧之间的时序顺序排序的。Optionally, the sequence of backboard area images to be identified may be sorted in temporal order between video frames where the backboard area images are located.
在一种可选的实施例中,针对待识别视频所确定的篮板区域,可以是待识别视频所拍摄的同一篮板区域。In an optional embodiment, the backboard area determined for the video to be identified may be the same backboard area captured in the video to be identified.
可选地,可以通过篮板区域检测模型确定出待识别视频中的篮板区域,再利用跟踪算法确定出待识别视频中所拍摄的同一篮板区域。具体可以是待识别视频中不同视频帧之间包含的同一篮板区域。 Optionally, the backboard area in the video to be identified can be determined through a backboard area detection model, and then the same backboard area captured in the video to be identified can be determined using a tracking algorithm. Specifically, it may be the same backboard area contained between different video frames in the video to be identified.
可选地,针对待识别视频可以确定出一个或多个篮板区域。Optionally, one or more rebound areas may be determined for the video to be identified.
例如,待识别视频可以固定拍摄同一个篮板区域,之后可以方便通过投篮识别的方法,确定出该篮板区域的投篮情况。For example, the video to be identified can be fixed to capture the same backboard area, and then the shooting situation in the backboard area can be easily determined through the shot recognition method.
例如,待识别视频可以拍摄到多个不同的篮板区域,之后可以分别针对不同篮板区域,分别通过投篮识别的方法,确定各个篮板区域的投篮情况。For example, the video to be identified can capture multiple different backboard areas, and then the shooting conditions of each backboard area can be determined through the shot recognition method for different backboard areas.
可选地,针对待识别视频中所确定的每个篮板区域,可以分别构建待识别篮板区域图像序列。具体可以是裁剪包含该篮板区域的图像内容,并将裁剪结果调整为预设图像尺寸,将调整结果添加到待识别篮板区域图像序列中。Optionally, for each backboard area determined in the video to be identified, an image sequence of the backboard area to be identified can be constructed separately. Specifically, the method may be to crop the image content containing the backboard area, adjust the cropping result to a preset image size, and add the adjustment result to the image sequence of the backboard area to be identified.
所确定的不同篮板区域可以对应于不同的待识别篮板区域图像序列,可以分别利用后续的投篮分类网络进行投篮识别。The determined different backboard areas can correspond to different backboard area image sequences to be identified, and subsequent shot classification networks can be used for shot recognition respectively.
可选地,可以分别针对待识别视频中的每个视频帧,检测是否包含篮板区域,之后可以针对包含篮板区域的视频帧,确定包含同一篮板区域的不同视频帧,进而能够确定出待识别视频所拍摄的一个或多个篮板区域。Optionally, each video frame in the video to be identified can be detected whether it contains a backboard area, and then different video frames containing the same backboard area can be determined for the video frame containing the backboard area, and then the video to be identified can be determined. The area of one or more backboards being photographed.
具体可以通过跟踪算法确定包含同一篮板区域的不同视频帧。Specifically, different video frames containing the same rebound area can be determined through a tracking algorithm.
针对待识别视频中所确定的一个或多个篮板区域,本方法流程并不限定得到篮板区域图像序列的方法。For one or more backboard areas determined in the video to be identified, this method process does not limit the method of obtaining the backboard area image sequence.
可选地,可以针对待识别视频中所确定的任一篮板区域,得到待识别篮板区域图像序列。具体可以针对待识别视频中所确定的任一篮板区域,针对包含该篮板区域的一个或多个视频帧,裁剪出包含该篮板区域的图像内容,进而得到该篮板区域对应的待识别篮板区域图像序列。具体可以针对裁剪出的图像内容调整为预设尺寸,进而将调整结果添加到待识别篮板区域图像序列中。Optionally, for any backboard area determined in the video to be identified, an image sequence of the backboard area to be identified can be obtained. Specifically, for any backboard area determined in the video to be identified, for one or more video frames containing the backboard area, the image content containing the backboard area can be cropped, and then the image of the backboard area to be identified corresponding to the backboard area can be obtained. sequence. Specifically, the cropped image content can be adjusted to a preset size, and then the adjustment result is added to the image sequence of the backboard area to be identified.
可选地,可以针对待识别视频中所确定的每个篮板区域,分别得到待识别篮板区域图像序列。具体可以针对待识别视频中所确定的每个篮板区域,针对包含该篮板区域的一个或多个视频帧,裁剪出包含该篮板区域的图像内容,进而得到该篮板区域对应的待识别篮板区域图像序列。具体可以针对裁剪出的图像内容调整为预设尺寸,进而将调整结果添加到待识别篮板区域图像序列中。Optionally, for each backboard area determined in the video to be identified, an image sequence of the backboard area to be identified can be obtained respectively. Specifically, for each backboard area determined in the video to be identified, the image content containing the backboard area can be cropped out from one or more video frames containing the backboard area, and then the image of the backboard area to be identified corresponding to the backboard area can be obtained. sequence. Specifically, the cropped image content can be adjusted to a preset size, and then the adjustment result is added to the image sequence of the backboard area to be identified.
三、S103:将待识别篮板区域图像序列输入投篮分类网络,获取投篮分类网络输出的投篮识别结果。 3. S103: Input the image sequence of the backboard area to be identified into the shot classification network, and obtain the shot recognition result output by the shot classification network.
可选地,投篮分类网络可以用于针对输入的篮板区域图像序列,预测投篮识别结果。投篮识别结果具体可以用于表征投篮是否成功。Alternatively, the shot classification network can be used to predict the shot recognition results for the input backboard area image sequence. The shot recognition result can specifically be used to characterize whether the shot is successful.
可选地,可以预先根据样本特征为篮板区域图像序列的投篮样本,和对应的投篮标签训练投篮分类网络;投篮标签可以用于表征投篮是否成功。Optionally, the shot classification network can be trained in advance based on the shot samples whose sample characteristics are the backboard area image sequence and the corresponding shot labels; the shot labels can be used to characterize whether the shot is successful.
其中,投篮识别的结果可以用于计算篮球比赛的得分。Among them, the results of shot recognition can be used to calculate the score of basketball games.
下面针对投篮分类网络进行详细解释。The following is a detailed explanation of the shot classification network.
1)自注意力机制。1) Self-attention mechanism.
本方法流程并不具体限定投篮分类网络针对待识别篮板区域图像序列的处理。The process of this method does not specifically limit the shot classification network's processing of the image sequence of the backboard area to be identified.
可选地,投篮分类网络可以针对待识别篮板区域图像序列中每个篮板区域分别进行图像识别,识别投篮是否成功。Optionally, the shot classification network can perform image recognition for each backboard area in the image sequence of the backboard area to be identified, and identify whether the shot is successful.
可选地,投篮分类网络可以综合待识别篮板区域图像序列中连续的至少两个篮板区域进行图像识别,识别投篮是否成功。具体可以是综合连续8帧或8帧以上的篮板区域图像进行图像识别。Optionally, the shot classification network can perform image recognition by integrating at least two consecutive backboard areas in the image sequence of the backboard area to be identified, and identify whether the shot is successful. Specifically, the image recognition may be performed by integrating 8 or more consecutive frames of backboard area images.
由于确定投篮成功,需要确定篮球穿过篮板区域的篮框。而篮球穿过篮框通常是一个运动过程,综合连续的多帧图像进行投篮识别,可以提高投篮识别的准确率。To determine a successful shot, it is necessary to determine the hoop where the basketball passed through the backboard area. The basketball passing through the hoop is usually a movement process. Combining continuous multi-frame images for shot recognition can improve the accuracy of shot recognition.
在一种可选的实施例中,针对待识别篮板区域图像序列中连续的至少两个篮板区域图像,投篮分类网络可以采用自注意力机制提取特征,用于投篮识别。In an optional embodiment, for at least two consecutive backboard area images in the backboard area image sequence to be identified, the shot classification network can use a self-attention mechanism to extract features for shot recognition.
基于自注意力机制可以学习到连续的至少两个篮板区域图像中不同篮板区域图像之间的关联,作为用于投篮识别的特征,可以更好地区分投篮成功和投篮失败的情况,从而可以提高投篮识别的准确率。Based on the self-attention mechanism, the association between different backboard area images in at least two consecutive backboard area images can be learned. As a feature for shot recognition, it can better distinguish between successful shots and failed shots, thereby improving Shot recognition accuracy.
因此,可选地,投篮分类网络可以用于:针对输入的篮板区域图像序列,基于自注意力机制提取分类特征;根据分类特征确定投篮识别结果。Therefore, optionally, the shot classification network can be used to: extract classification features based on the self-attention mechanism for the input backboard area image sequence; and determine the shot recognition result based on the classification features.
其中,分类特征可以是用于预测投篮识别结果的特征。Among them, the classification features may be features used to predict shot recognition results.
本方法流程并不限定具体基于自注意力机制提取特征的方式。The process of this method does not limit the specific way to extract features based on the self-attention mechanism.
在一种可选的实施例中,为了方便基于自注意力机制提取分类特征,可以先针对篮板区域图像序列中的篮板区域图像提取特征图,针对特征图基于自注意力机制提取特征。 In an optional embodiment, in order to facilitate the extraction of classification features based on the self-attention mechanism, the feature map can be first extracted for the backboard area image in the backboard area image sequence, and the features of the feature map can be extracted based on the self-attention mechanism.
可选地,针对输入的篮板区域图像序列,基于自注意力机制提取分类特征,可以包括:针对输入的篮板区域图像序列中的每个篮板区域图像提取特征图,得到特征图序列;针对特征图序列,基于自注意力机制提取分类特征。Optionally, extracting classification features based on the self-attention mechanism for the input backboard area image sequence may include: extracting feature maps for each backboard area image in the input backboard area image sequence to obtain a feature map sequence; Sequence,classification features are extracted based on self-attention,mechanism.
其中,特征图序列中特征图的顺序与对应的篮板区域图像在篮板区域图像序列中的顺序可以相同。Wherein, the order of the feature maps in the feature map sequence can be the same as the order of the corresponding backboard area images in the backboard area image sequence.
可选地,特征图序列中特征图的顺序,可以与对应的篮板区域图像所在视频帧之间的时序顺序相同。Optionally, the order of the feature maps in the feature map sequence can be the same as the temporal order between the video frames where the corresponding backboard area images are located.
在本实施例中,通过提取篮板区域图像的特征图,可以学习挖掘出篮板区域图像中的较多信息,从而可以基于自注意力机制更好地提取分类特征,提高投篮识别的准确率。In this embodiment, by extracting the feature map of the backboard area image, more information in the backboard area image can be learned to be mined, so that classification features can be better extracted based on the self-attention mechanism and the accuracy of shot recognition can be improved.
当然,可选地,也可以直接针对篮板区域图像序列基于自注意力机制提取特征。Of course, optionally, features can also be extracted directly from the backboard area image sequence based on the self-attention mechanism.
本实施例并不限定提取篮板区域图像的特征图的方式。This embodiment does not limit the method of extracting the feature map of the backboard area image.
可选地,可以使用预先训练的卷积网络提取特征图。所提取的特征图可以包括以下至少一种信息:细节信息、边缘信息、噪声信息和空间关系信息等。本实施例并不具体限定。Optionally, feature maps can be extracted using pre-trained convolutional networks. The extracted feature map may include at least one of the following information: detail information, edge information, noise information, spatial relationship information, etc. This embodiment is not specifically limited.
因此,可选地,针对输入的篮板区域图像序列中的每个篮板区域图像提取特征图,可以包括:针对输入的篮板区域图像序列中的每个篮板区域图像,基于预先训练的卷积网络提取特征图。Therefore, optionally, extracting a feature map for each backboard area image in the input backboard area image sequence may include: extracting a feature map based on a pre-trained convolutional network for each backboard area image in the input backboard area image sequence. Feature map.
本实施例并不限定卷积网络的结构。具体可以是二维卷积网络,在图像特征提取方面具有性能高和速度快的优点。This embodiment does not limit the structure of the convolutional network. Specifically, it can be a two-dimensional convolutional network, which has the advantages of high performance and fast speed in image feature extraction.
2)时间维度和空间维度。2) Time dimension and space dimension.
在基于自注意力机制提取特征的情况下,通常需要为输入的序列添加位置编码,确定序列中不同元素之间的位置关系,从而可以学习到序列中不同元素之间的关联。When extracting features based on the self-attention mechanism, it is usually necessary to add positional coding to the input sequence to determine the positional relationship between different elements in the sequence, so that the association between different elements in the sequence can be learned.
a)时间维度。a) Time dimension.
在一种可选的实施例中,投篮分类网络可以基于自注意力机制,从时间维度提取特征。In an optional embodiment, the shot classification network can extract features from the time dimension based on the self-attention mechanism.
针对待识别篮板区域图像序列,由于其中的篮板区域图像是从视频帧中确定的,并且视频帧在待识别视频中存在时序先后关系,因此,待识别篮板区域图像序列中的不 同篮板区域图像之间,可以确定出时序先后关系,也就是时序位置关系,从而可以在时间维度上,基于自注意力机制提取特征,学习不同篮板区域图像之间在时间维度上的关联。For the backboard area image sequence to be identified, since the backboard area image is determined from the video frame, and the video frames have a temporal sequence relationship in the video to be identified, therefore, there are different images in the backboard area image sequence to be identified. Between images of the same backboard area, the temporal sequence relationship, that is, the temporal position relationship, can be determined, so that features can be extracted based on the self-attention mechanism in the time dimension and the association between images in different backboard areas can be learned in the time dimension.
通过自注意力机制,可以学习到时间维度上与投篮识别结果关联较大的若干篮板区域图像。Through the self-attention mechanism, several backboard area images that are closely related to the shot recognition results in the time dimension can be learned.
可选地,位置编码可以包括表征待识别篮板区域图像序列中不同篮板区域图像之间时序位置关系的信息。其中,不同篮板区域图像之间的时序位置关系,可以与篮板区域图像所在视频帧之间的时序位置关系相同。Optionally, the position encoding may include information characterizing the temporal positional relationship between different backboard area images in the backboard area image sequence to be identified. The temporal position relationship between different backboard area images can be the same as the temporal position relationship between video frames where the backboard area image is located.
相对应地,可选地,投篮分类网络可以用于:为待识别篮板区域图像序列添加上述位置编码,确定不同篮板区域图像之间的时序位置关系;针对添加位置编码后的待识别篮板区域图像序列,基于自注意力机制,从时间维度提取特征,得到分类特征。Correspondingly, optionally, the shot classification network can be used to: add the above position coding to the backboard area image sequence to be identified, and determine the temporal position relationship between different backboard area images; for the backboard area image to be identified after adding position coding Sequence, based on the self-attention mechanism, extracts features from the time dimension to obtain classification features.
可选地,由于投篮分类网络可以预先针对篮板区域图像提取特征图,得到特征图序列,因此,投篮分类网络可以用于:为特征图序列添加位置编码;位置编码包括表征特征图序列中不同特征图之间时序位置关系的信息;其中,不同特征图之间时序位置关系,可以与特征图对应篮板区域图像所在视频帧之间的时序位置关系相同,也就是特征图对应视频帧之间的时序位置关系。Optionally, since the shot classification network can extract feature maps for the backboard area image in advance to obtain a feature map sequence, the shot classification network can be used to: add position coding to the feature map sequence; position coding includes characterizing different features in the feature map sequence Information about the temporal position relationship between maps; among them, the temporal position relationship between different feature maps can be the same as the temporal position relationship between the video frames where the feature map corresponds to the backboard area image, that is, the timing between the feature maps corresponding to the video frames Positional relationship.
进一步地,投篮分类网络可以用于,针对添加位置编码后的特征图序列,基于自注意力机制,从时间维度提取特征,得到分类特征。Furthermore, the shot classification network can be used to extract features from the time dimension based on the self-attention mechanism for the feature map sequence after adding position encoding to obtain classification features.
在本实施例中,可以基于自注意力机制,从时间维度提取特征,提高投篮分类网络的特征提取效果,提高投篮分类网络的识别准确率。In this embodiment, features can be extracted from the time dimension based on the self-attention mechanism to improve the feature extraction effect of the shot classification network and improve the recognition accuracy of the shot classification network.
可选地,位置编码可以包括表征待识别篮板区域图像序列中不同篮板区域图像之间时序位置关系的信息,具体可以包括篮板区域图像所在视频帧在待识别视频中的时间戳。Optionally, the position coding may include information that characterizes the temporal positional relationship between different backboard area images in the backboard area image sequence to be identified, and may specifically include the timestamp of the video frame where the backboard area image is located in the video to be identified.
在本实施例中,可以基于时间戳,确定相邻的篮板区域图像是否是在连续或相近的视频帧中,也就可以添加表征篮板区域图像所在视频帧之间时序位置关系的信息作为位置编码,从而可以利用篮板区域图像所在视频帧之间时序位置关系,提高投篮分类网络的特征提取效果,提高投篮分类网络的识别准确率。In this embodiment, it can be determined based on the timestamp whether adjacent backboard area images are in consecutive or similar video frames. In other words, information characterizing the temporal positional relationship between video frames where the backboard area images are located can be added as position coding. , so that the temporal position relationship between the video frames where the backboard area image is located can be used to improve the feature extraction effect of the shot classification network and improve the recognition accuracy of the shot classification network.
b)空间维度。 b) Spatial dimension.
在另一种可选的实施例中,投篮分类网络可以基于自注意力机制,从空间维度提取特征。In another optional embodiment, the shot classification network can extract features from the spatial dimension based on the self-attention mechanism.
针对待识别篮板区域图像序列中的单个篮板区域图像,其中不同位置的图像内容之间存在空间位置关系。For a single backboard area image in the backboard area image sequence to be identified, there is a spatial positional relationship between image contents at different positions.
可选地,单个篮板区域图像中的不同像素点之间,存在空间位置关系,具体可以用二维空间坐标进行表征。Optionally, there is a spatial position relationship between different pixels in a single backboard area image, which can be characterized by two-dimensional spatial coordinates.
因此,针对篮板区域图像,可以在空间维度上,基于自注意力机制提取特征,学习单个篮板区域图像中不同图像内容之间的关联。Therefore, for the backboard area image, features can be extracted based on the self-attention mechanism in the spatial dimension, and the association between different image contents in a single backboard area image can be learned.
通过自注意力机制,可以学习到篮板区域图像在空间维度上,与投篮识别结果关联较大的图像内容。Through the self-attention mechanism, it is possible to learn the image content of the backboard area image that is closely related to the shot recognition result in the spatial dimension.
针对待识别篮板区域图像序列中的每个篮板区域图像,在空间维度上基于自注意力机制提取特征,可以学习到各个篮板区域图像在空间维度上,与投篮识别结果关联较大的图像内容。For each backboard area image in the backboard area image sequence to be identified, features are extracted based on the self-attention mechanism in the spatial dimension, and the image content of each backboard area image in the spatial dimension that is more closely related to the shot recognition result can be learned.
可选地,位置编码可以包括表征待识别篮板区域图像序列中每个篮板区域图像中不同位置的图像内容之间空间位置关系的信息。Optionally, the position encoding may include information characterizing the spatial positional relationship between image content at different positions in each backboard area image in the backboard area image sequence to be identified.
具体可以包括,表征待识别篮板区域图像序列中每个篮板区域图像中不同像素点之间空间位置关系的信息。Specifically, it may include information characterizing the spatial positional relationship between different pixels in each backboard area image in the backboard area image sequence to be identified.
相对应地,可选地,投篮分类网络可以用于:为待识别篮板区域图像序列添加上述位置编码,确定每个篮板区域图像中不同位置的图像内容之间的空间位置关系;针对添加位置编码后的待识别篮板区域图像序列,基于自注意力机制,从空间维度提取特征,得到分类特征。Correspondingly, optionally, the shot classification network can be used to: add the above position coding to the backboard area image sequence to be identified, determine the spatial position relationship between the image content at different positions in each backboard area image; add position coding for After the image sequence of the backboard area to be identified, based on the self-attention mechanism, features are extracted from the spatial dimension to obtain classification features.
可选地,由于投篮分类网络可以预先针对篮板区域图像提取特征图,得到特征图序列,因此,投篮分类网络可以用于:为特征图序列添加位置编码;位置编码包括表征特征图序列中每个特征图不同特征点之间空间位置关系的信息;针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度提取特征,得到分类特征。Optionally, since the shot classification network can extract feature maps for the backboard area image in advance to obtain a feature map sequence, the shot classification network can be used to: add position coding to the feature map sequence; position coding includes characterizing each feature map sequence Information about the spatial position relationship between different feature points in the feature map; for the feature map sequence after adding position encoding, based on the self-attention mechanism, features are extracted from the spatial dimension to obtain classification features.
在本实施例中,可以基于自注意力机制,从空间维度提取特征,提高投篮分类网络的特征提取效果,提高投篮分类网络的识别准确率。In this embodiment, features can be extracted from the spatial dimension based on the self-attention mechanism to improve the feature extraction effect of the shot classification network and improve the recognition accuracy of the shot classification network.
c)综合时间维度和空间维度。 c) Integrate time and space dimensions.
在另一种可选的实施例中,投篮分类网络可以基于自注意力机制,从空间维度和时间维度提取特征。In another optional embodiment, the shot classification network can extract features from the spatial dimension and the temporal dimension based on the self-attention mechanism.
通过自注意力机制,一方面可以学习到篮板区域图像在空间维度上,与投篮识别结果关联较大的图像内容,另一方面可以学习到时间维度上与投篮识别结果关联较大的若干篮板区域图像。Through the self-attention mechanism, on the one hand, we can learn the image content of the backboard area image that is more closely related to the shot recognition result in the spatial dimension. On the other hand, we can learn several backboard areas that are more closely related to the shot recognition result in the time dimension. image.
由于可以综合时间维度和空间维度提取特征,因此,在本实施例中,可以提高投篮分类网络的特征提取效果,提高投篮分类网络的识别准确率。Since features can be extracted in both the time dimension and the spatial dimension, in this embodiment, the feature extraction effect of the shot classification network can be improved, and the recognition accuracy of the shot classification network can be improved.
可选地,位置编码可以包括表征待识别篮板区域图像序列中每个篮板区域图像中不同位置的图像内容之间空间位置关系的信息,以及表征待识别篮板区域图像序列中不同篮板区域图像之间时序位置关系的信息。Optionally, the position coding may include information characterizing the spatial positional relationship between image contents at different positions in each backboard area image in the backboard area image sequence to be identified, and information characterizing the spatial positional relationship between different backboard area images in the backboard area image sequence to be identified. Information about temporal and spatial relationships.
其中,不同篮板区域图像之间的时序位置关系,可以与篮板区域图像所在视频帧之间的时序位置关系相同。The temporal position relationship between different backboard area images can be the same as the temporal position relationship between video frames where the backboard area image is located.
相对应地,可选地,投篮分类网络可以用于:为待识别篮板区域图像序列添加上述位置编码;针对添加位置编码后的待识别篮板区域图像序列,基于自注意力机制,从空间维度和时间维度提取特征,得到分类特征。Correspondingly, optionally, the shot classification network can be used to: add the above position coding to the image sequence of the backboard area to be identified; for the image sequence of the backboard area to be identified after adding position coding, based on the self-attention mechanism, from the spatial dimension and Features are extracted in the time dimension to obtain classification features.
本实施例并不限定从空间维度提取特征和从时间维度提取特征的次序和次数。This embodiment does not limit the order and times of extracting features from the spatial dimension and extracting features from the time dimension.
可选地,由于投篮分类网络可以预先针对篮板区域图像提取特征图,得到特征图序列,因此,投篮分类网络可以用于:针对特征图序列,添加位置编码;位置编码包括:表征特征图序列中每个特征图不同特征点之间空间位置关系的信息,和表征特征图序列中不同特征图之间时序位置关系的信息;针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度和时间维度提取特征,得到分类特征。Optionally, since the shot classification network can extract feature maps for the backboard area image in advance to obtain a feature map sequence, the shot classification network can be used to: add position coding to the feature map sequence; the position coding includes: characterizing the feature map sequence Information on the spatial position relationship between different feature points of each feature map, and information representing the temporal position relationship between different feature maps in the feature map sequence; for the feature map sequence after adding position encoding, based on the self-attention mechanism, from space Dimension and time dimensions extract features to obtain classification features.
因此,可选地,针对特征图序列,基于自注意力机制提取分类特征,可以包括:针对特征图序列,添加位置编码;位置编码包括:表征特征图序列中每个特征图不同特征点之间空间位置关系的信息,和表征特征图序列中不同特征图之间时序位置关系的信息;针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度和时间维度提取特征,得到分类特征。Therefore, optionally, for the feature map sequence, extracting classification features based on the self-attention mechanism may include: adding position coding to the feature map sequence; position coding includes: characterizing the relationship between different feature points of each feature map in the feature map sequence. Information about the spatial position relationship, and information that characterizes the temporal position relationship between different feature maps in the feature map sequence; for the feature map sequence after adding position coding, based on the self-attention mechanism, features are extracted from the spatial dimension and the time dimension to obtain the classification feature.
针对特征图序列,在一种可选的实施例中,由于需要从空间维度和时间维度提取特征,为了方便基于自注意力机制提取特征,可以将特征图整合为一个整体特征,这一整体特征可以具有时间维度和空间维度,从而可以通过简单的特征转换,方便基于自 注意力机制从时间维度和空间维度提取特征。For the feature map sequence, in an optional embodiment, since features need to be extracted from the spatial dimension and the time dimension, in order to facilitate feature extraction based on the self-attention mechanism, the feature map can be integrated into an overall feature. This overall feature It can have time dimension and space dimension, so that it can be conveniently based on automatic feature transformation through simple feature conversion. The attention mechanism extracts features from the temporal and spatial dimensions.
可选地,可以将特征图序列中的每个特征图,转换为一维特征;将一维特征进行堆叠转换处理,得到二维特征。二维特征也就是所综合的整体特征。Optionally, each feature map in the feature map sequence can be converted into a one-dimensional feature; the one-dimensional features can be stacked and converted to obtain a two-dimensional feature. The two-dimensional feature is the integrated overall feature.
本实施例并不限定将特征图转换为一维特征的方式。This embodiment does not limit the method of converting feature maps into one-dimensional features.
可选地,将特征图转换为一维特征,可以是将特征图中的所有特征点添加到一维特征中。Optionally, converting the feature map into a one-dimensional feature can be by adding all feature points in the feature map to the one-dimensional feature.
例如,针对大小为a*b的特征图,可以通过转换得到长度为c的一维特征。其中c=a*b。For example, for a feature map of size a*b, a one-dimensional feature of length c can be obtained through conversion. where c=a*b.
本实施例并不限定堆叠一维特征的方式。This embodiment does not limit the way of stacking one-dimensional features.
可选地,可以将一维特征按照对应特征图之间的时序关系进行堆叠,得到二维特征。Optionally, one-dimensional features can be stacked according to the temporal relationship between corresponding feature maps to obtain two-dimensional features.
例如,针对a个长度为b的一维特征,可以通过堆叠得到a*b的二维特征。For example, for a one-dimensional feature of length b, a two-dimensional feature of a*b can be obtained by stacking.
对于特征图序列的转换,可选地,可以将特征图序列整体看作是一个整体特征,再进行调整。For the conversion of the feature map sequence, optionally, the feature map sequence as a whole can be regarded as an overall feature and then adjusted.
例如,特征图序列中包括大小为a*b的n个特征图,特征图序列整体可以看作是一个a*b*n的三维特征。For example, the feature map sequence includes n feature maps of size a*b, and the entire feature map sequence can be regarded as a three-dimensional feature of a*b*n.
针对三维特征,可以将每个a*b的特征图转换为1*c的一维特征,也就是将a*b*n的三维特征转换为c*n的二维特征。其中,c=a*b。For three-dimensional features, the feature map of each a*b can be converted into one-dimensional features of 1*c, that is, the three-dimensional features of a*b*n can be converted into the two-dimensional features of c*n. Among them, c=a*b.
可选地,针对特征图序列,添加位置编码,具体可以包括:将特征图序列中的每个特征图,转换为一维特征;将一维特征进行堆叠转换处理,得到二维特征;针对得到的二维特征,添加位置编码。Optionally, add position coding to the feature map sequence, which may include: converting each feature map in the feature map sequence into one-dimensional features; performing stack conversion processing on the one-dimensional features to obtain two-dimensional features; in order to obtain Two-dimensional features, add position encoding.
具体可以是堆叠特征图序列中每个特征图转换得到的一维特征,得到二维特征。Specifically, it can be to stack the one-dimensional features converted by each feature map in the feature map sequence to obtain two-dimensional features.
可选地,针对二维特征添加位置编码,可以是针对二维特征中堆叠的每个一维特征,为其中每个特征点添加表征空间位置关系的信息,具体可以包括特征点在特征图中的坐标信息;还可以针对每个一维特征整体,添加表征时序位置关系的信息,具体可以包括时间戳。Optionally, adding position coding to the two-dimensional features can be to add information representing the spatial position relationship to each feature point for each one-dimensional feature stacked in the two-dimensional feature. Specifically, the feature points can be included in the feature map. coordinate information; for each one-dimensional feature as a whole, information representing the temporal position relationship can also be added, which may specifically include timestamps.
相应地,针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度 和时间维度提取特征,可以包括:针对添加位置编码后的二维特征,基于自注意力机制,从空间维度和时间维度提取特征。Correspondingly, for the feature map sequence after adding position encoding, based on the self-attention mechanism, from the spatial dimension and time dimension feature extraction, which can include: extracting features from the spatial dimension and the time dimension based on the self-attention mechanism for the two-dimensional features after adding position encoding.
在一种可选的实施例中,针对添加位置编码的二维特征,可以通过转置,得到空间维度上的序列或者时间维度上的序列,从而方便分别从时间维度和空间维度,基于自注意力机制提取特征。In an optional embodiment, for the two-dimensional features with added position coding, the sequence in the spatial dimension or the sequence in the time dimension can be obtained by transposing, so as to facilitate the self-attention based on the time dimension and the spatial dimension respectively. Force mechanism extraction features.
例如,针对a个长度为b的一维特征,可以通过堆叠得到a*b的二维特征。添加位置编码后的二维特征。在a个一维特征之间,存在时序位置关系;在长度为b的一维特征中,存在空间位置关系。For example, for a one-dimensional feature of length b, a two-dimensional feature of a*b can be obtained by stacking. Two-dimensional features after adding position encoding. Between a one-dimensional features, there is a temporal position relationship; among the one-dimensional features of length b, there is a spatial position relationship.
因此,可以针对a*b的二维特征,从时间维度基于自注意力机制提取特征。此外,可以针对a*b的二维特征进行转置,得到b*a的二维特征,从而可以从空间维度基于自注意力机制提取特征。Therefore, for the two-dimensional features of a*b, features can be extracted from the time dimension based on the self-attention mechanism. In addition, the two-dimensional features of a*b can be transposed to obtain the two-dimensional features of b*a, so that features can be extracted from the spatial dimension based on the self-attention mechanism.
可选地,由于自注意力机制的输入特征和输出特征之间,大小相同。因此,可以串行地从空间维度基于自注意力机制提取特征,再针对提取的特征进行转置,进而串行地从时间维度基于自注意力机制提取特征。Optionally, the sizes are the same between the input features and output features due to the self-attention mechanism. Therefore, features can be extracted serially from the spatial dimension based on the self-attention mechanism, and then the extracted features can be transposed, and then features can be serially extracted from the temporal dimension based on the self-attention mechanism.
当然,也可以串行地从时间维度基于自注意力机制提取特征,再针对提取的特征进行转置,进而串行地从空间维度基于自注意力机制提取特征。Of course, you can also extract features serially from the time dimension based on the self-attention mechanism, and then transpose the extracted features, and then serially extract features from the spatial dimension based on the self-attention mechanism.
例如,针对a个长度为b的一维特征,可以通过堆叠得到a*b的二维特征。添加位置编码后的二维特征。在a个一维特征之间,存在时序位置关系;在长度为b的一维特征中,存在空间位置关系。For example, for a one-dimensional feature of length b, a two-dimensional feature of a*b can be obtained by stacking. Two-dimensional features after adding position encoding. Between a one-dimensional features, there is a temporal position relationship; among the one-dimensional features of length b, there is a spatial position relationship.
因此,可以针对a*b的二维特征,从时间维度基于自注意力机制提取特征,得到a*b的第一特征。进而可以将第一特征进行转置,得到b*a的第二特征,针对第二特征可以从空间维度基于自注意力机制提取特征。Therefore, for the two-dimensional features of a*b, features can be extracted from the time dimension based on the self-attention mechanism to obtain the first feature of a*b. Then the first feature can be transposed to obtain the second feature of b*a. For the second feature, features can be extracted from the spatial dimension based on the self-attention mechanism.
本方法流程并不限定从空间维度基于自注意力机制提取特征,和从时间维度基于自注意力机制提取特征之间的次序。The process of this method does not limit the order between extracting features based on the self-attention mechanism from the spatial dimension and extracting features based on the self-attention mechanism from the temporal dimension.
在一种可选的实施例中,可以是并行地分别从空间维度基于自注意力机制提取特征,并且从时间维度基于自注意力机制提取特征,之后可以综合所提取的特征得到分类特征。In an optional embodiment, features can be extracted from the spatial dimension based on the self-attention mechanism in parallel, and features can be extracted from the temporal dimension based on the self-attention mechanism, and then the extracted features can be synthesized to obtain classification features.
本实施例并不限定综合特征的方式。可选地,可以是拼接特征,具体可以是将 从空间维度基于自注意力机制提取的特征,与从时间维度基于自注意力机制提取的特征拼接为一个特征,作为分类特征。This embodiment does not limit the method of integrating features. Optionally, it can be a splicing feature, specifically it can be The features extracted from the spatial dimension based on the self-attention mechanism and the features extracted from the temporal dimension based on the self-attention mechanism are spliced into one feature as a classification feature.
本实施例不限定特征提取的次数。可选地,可以从空间维度基于自注意力机制串行提取至少一次特征,并且从时间维度基于自注意力机制串行提取至少一次特征,之后可以综合所提取的特征得到分类特征。This embodiment does not limit the number of feature extractions. Optionally, features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and features can be serially extracted at least once from the temporal dimension based on the self-attention mechanism, and then the extracted features can be synthesized to obtain classification features.
可选地,从空间维度基于自注意力机制串行提取至少一次特征,可以包括:将添加位置编码的二维特征确定为当前特征;循环执行以下步骤,直到循环预设次数:针对当前特征从空间维度基于自注意力机制提取特征,将所提取的特征确定为当前特征。其中预设次数可以是至少一次。Optionally, serially extract features from the spatial dimension at least once based on the self-attention mechanism, which may include: determining the two-dimensional feature with added position encoding as the current feature; performing the following steps in a loop until the preset number of cycles: for the current feature from The spatial dimension extracts features based on the self-attention mechanism and determines the extracted features as the current features. The preset number of times may be at least once.
可选地,从时间维度基于自注意力机制串行提取至少一次特征,可以包括:将添加位置编码的二维特征确定为当前特征;循环执行以下步骤,直到循环预设次数:针对当前特征从时间维度基于自注意力机制提取特征,将所提取的特征确定为当前特征。其中预设次数可以是至少一次。Optionally, serially extracting features at least once from the time dimension based on the self-attention mechanism may include: determining the two-dimensional feature with added position encoding as the current feature; looping through the following steps until the preset number of cycles: for the current feature from The time dimension extracts features based on the self-attention mechanism and determines the extracted features as the current features. The preset number of times may be at least once.
在另一种可选的实施例中,可以串行地从空间维度和时间维度基于自注意力机制提取特征。In another optional embodiment, features can be extracted serially from the spatial dimension and the temporal dimension based on the self-attention mechanism.
本实施例并不限定从空间维度基于自注意力机制提取特征,和从时间维度基于自注意力机制提取特征之间的次序和次数。This embodiment does not limit the order and times between extracting features from the spatial dimension based on the self-attention mechanism and extracting features from the temporal dimension based on the self-attention mechanism.
可选地,可以从空间维度基于自注意力机制串行提取多次特征,也可以从时间维度基于自注意力机制串行提取多次特征,也可以交叉地从空间维度和时间维度,基于自注意力机制串行提取多次特征。Alternatively, multiple features can be extracted serially from the spatial dimension based on the self-attention mechanism, multiple features can be extracted serially based on the self-attention mechanism from the time dimension, or multiple features can be extracted from the spatial dimension and the time dimension crosswise, based on the self-attention mechanism. The attention mechanism extracts multiple features serially.
可选地,可以针对添加位置编码的特征图序列,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征。Optionally, for the feature map sequence with added position encoding, features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, features can be serially extracted at least once from the time dimension based on the self-attention mechanism. .
可选地,可以针对添加位置编码的特征图序列,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征;再针对所提取的特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征。Optionally, for the feature map sequence with added position encoding, features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, features can be serially extracted at least once from the time dimension based on the self-attention mechanism. ; Then for the extracted features, extract features at least once in series from the spatial dimension based on the self-attention mechanism, and further extract features at least once in series from the time dimension based on the self-attention mechanism based on the extracted features.
可选地,可以针对添加位置编码的特征图序列,串行执行预设次数的特征提取 步骤。Optionally, a preset number of feature extractions can be performed serially for the feature map sequence with added position encoding. step.
其中,可选地,特征提取步骤可以包括:针对输入的特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征。Wherein, optionally, the feature extraction step may include: for the input features, serially extracting features at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, serially extracting the features from the time dimension based on the self-attention mechanism. Feature at least once.
可选地,特征提取步骤可以包括:针对输入的特征,从时间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从空间维度基于自注意力机制串行提取至少一次特征。Optionally, the feature extraction step may include: for the input features, serially extract features at least once from the time dimension based on the self-attention mechanism, and further for the extracted features, serially extract at least once from the spatial dimension based on the self-attention mechanism. feature.
可选地,串行执行的不同特征提取步骤之间,可以存在级联关系。具体地,任一特征提取步骤所提取的特征,可以输入到后一个特征提取步骤中。Optionally, there may be a cascade relationship between different feature extraction steps executed serially. Specifically, the features extracted in any feature extraction step can be input into the subsequent feature extraction step.
可选地,串行执行的不同特征提取步骤之间,可以互不相同。具体地,可以是自注意力机制的权重不同,也可以是提取特征的次数或者次序不同。Optionally, different feature extraction steps executed serially may be different from each other. Specifically, the weight of the self-attention mechanism may be different, or the number or order of extracted features may be different.
可选地,可以将添加位置编码的特征图序列确定为当前特征;循环执行以下步骤,直到满足预设循环停止条件:针对当前特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征;将所提取的特征确定为当前特征。Optionally, the feature map sequence with added position coding can be determined as the current feature; the following steps are executed in a loop until the preset loop stop condition is met: for the current feature, serially extract features at least once from the spatial dimension based on the self-attention mechanism, Further, for the extracted features, features are extracted serially at least once from the time dimension based on the self-attention mechanism; the extracted features are determined as current features.
本实施例并不限定具体的预设循环停止条件。This embodiment does not limit specific preset loop stop conditions.
可选地,预设循环停止条件可以包括以下至少一项:循环达到预设次数、特征提取总次数达到预设次数、提取特征所耗费的时间达到预设时长等。Optionally, the preset loop stop condition may include at least one of the following: the loop reaches a preset number of times, the total number of feature extraction times reaches a preset number of times, the time taken to extract features reaches a preset length of time, etc.
可选地,在不同次数的循环中,从空间维度基于自注意力机制串行提取特征的方法可以不同。具体可以是自注意力机制的权重不同,或者串行提取的次数不同。Optionally, in different times of loops, the methods of serially extracting features from the spatial dimension based on the self-attention mechanism can be different. Specifically, it can be that the weight of the self-attention mechanism is different, or the number of serial extractions is different.
可选地,允许在循环过程中停止循环。具体可以是在循环过程中,从空间维度基于自注意力机制串行提取至少一次特征后停止循环。Optionally, it is allowed to stop the loop during the loop. Specifically, during the loop process, features are serially extracted at least once from the spatial dimension based on the self-attention mechanism and then the loop is stopped.
其中,不同循环过程中,提取特征的次数可以不同。提取特征的次数并不具体限定。Among them, the number of times of feature extraction can be different in different loop processes. The number of times to extract features is not specifically limited.
例如,在第一次循环过程中,可以针对当前特征,从空间维度基于自注意力机制串行提取三次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取两次特征。而在第二次循环中,可以针对当前特征,从空间维度基于自注意力机制串行提取一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取五 次特征。For example, in the first loop process, features can be serially extracted three times from the spatial dimension based on the self-attention mechanism for the current feature, and further features can be serially extracted twice from the time dimension based on the self-attention mechanism for the extracted features. . In the second cycle, for the current feature, one feature can be extracted serially from the spatial dimension based on the self-attention mechanism, and further for the extracted features, five features can be serially extracted from the time dimension based on the self-attention mechanism. secondary features.
可选地,可以将添加位置编码的特征图序列确定为当前特征;循环执行以下步骤,直到满足预设循环停止条件:针对当前特征,从时间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从空间维度基于自注意力机制串行提取至少一次特征;将所提取的特征确定为当前特征。Optionally, the feature map sequence with added position coding can be determined as the current feature; the following steps are executed in a loop until the preset loop stop condition is met: for the current feature, extract features at least once in series from the time dimension based on the self-attention mechanism, Further, for the extracted features, features are extracted serially at least once from the spatial dimension based on the self-attention mechanism; the extracted features are determined as current features.
具体解释可以参见上述实施例。For detailed explanation, please refer to the above embodiments.
以堆叠一维特征得到的二维特征为例,可选地,可以针对添加位置编码的二维特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步转置所提取的特征,针对转置后的特征,从时间维度基于自注意力机制串行提取至少一次特征。进一步转置所提取的特征,针对转置后的特征,从空间维度基于自注意力机制串行提取至少一次特征。串行提取至少一次特征的解释可以参见上述实施例。Taking the two-dimensional features obtained by stacking one-dimensional features as an example, optionally, for the two-dimensional features with added position encoding, features can be serially extracted at least once from the spatial dimension based on the self-attention mechanism, and the extracted features can be further transposed. For the transposed features, features are extracted serially at least once from the time dimension based on the self-attention mechanism. The extracted features are further transposed, and for the transposed features, features are serially extracted at least once from the spatial dimension based on the self-attention mechanism. For an explanation of serially extracting features at least once, please refer to the above embodiment.
d)预设自注意力模块。d) Default self-attention module.
在一种可选的实施例中,在投篮分类网络中,可以使用预设自注意力模块,实现基于自注意力机制提取特征的步骤。In an optional embodiment, in the shot classification network, a preset self-attention module can be used to implement the step of extracting features based on the self-attention mechanism.
可选地,预设自注意力模块可以用于,针对输入的特征,基于自注意力机制,从空间维度和/或时间维度提取特征并输出。Optionally, the preset self-attention module can be used to extract features from the spatial dimension and/or the temporal dimension and output them based on the self-attention mechanism for the input features.
具体地从空间维度和/或时间维度基于自注意力机制提取特征的解释,可以参见上述实施例。For a specific explanation of extracting features from the spatial dimension and/or the temporal dimension based on the self-attention mechanism, please refer to the above embodiments.
可选地,投篮分类网络中可以包括一个或多个预设自注意力模块。Optionally, one or more preset self-attention modules may be included in the shot classification network.
可选地,在投篮分类网络需要基于自注意力机制,从空间维度提取特征的情况下,可以选择用于针对输入的特征,基于自注意力机制,从空间维度串行提取至少一次特征并输出的一个或多个预设自注意力模块。Optionally, when the shot classification network needs to extract features from the spatial dimension based on the self-attention mechanism, you can select the features for the input and serially extract features from the spatial dimension at least once based on the self-attention mechanism and output them. One or more preset self-attention modules.
相应地,针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度提取特征,可以包括:将添加位置编码后的特征图序列输入到一个或多个预设自注意力模块中,得到输出的特征。Correspondingly, for the feature map sequence after adding position coding, extracting features from the spatial dimension based on the self-attention mechanism may include: inputting the feature map sequence after adding position coding into one or more preset self-attention modules. , get the output features.
其中,多个预设的自注意力模块之间可以是级联关系。Among them, there can be a cascade relationship between multiple preset self-attention modules.
可选地,在N个级联的预设自注意力模块中,N≥2,对于第i个预设自注意力模块,1≤i≤N-1,其输出可以级联至第i+1个预设自注意力模块的输入。 Optionally, among N cascaded preset self-attention modules, N≥2, for the i-th preset self-attention module, 1≤i≤N-1, its output can be cascaded to the i+th 1 preset input to the self-attention module.
可选地,在投篮分类网络需要基于自注意力机制,从时间维度提取特征的情况下,可以选择用于针对输入的特征,基于自注意力机制,从时间维度串行提取至少一次特征并输出的一个或多个预设自注意力模块。Optionally, when the shot classification network needs to extract features from the time dimension based on the self-attention mechanism, you can select features for the input, based on the self-attention mechanism, serially extract features from the time dimension at least once and output them. One or more preset self-attention modules.
可选地,在投篮分类网络需要基于自注意力机制,从空间维度和时间维度提取特征的情况下,可以选择用于针对输入的特征,基于自注意力机制,从空间维度和时间维度提取特征并输出的一个或多个预设自注意力模块。Optionally, when the shot classification network needs to extract features from the spatial and temporal dimensions based on the self-attention mechanism, you can choose to use the input features to extract features from the spatial and temporal dimensions based on the self-attention mechanism. and output one or more preset self-attention modules.
当然,也可以选择多个预设自注意力模块,其中可以包括用于针对输入的特征,基于自注意力机制,从空间维度串行提取至少一次特征并输出的一个或多个预设自注意力模块,以及用于针对输入的特征,基于自注意力机制,从时间维度串行提取至少一次特征并输出的一个或多个预设自注意力模块。Of course, you can also choose multiple preset self-attention modules, which can include one or more preset self-attention modules that are used to serially extract features from the spatial dimension at least once and output them based on the self-attention mechanism based on the input features. A force module, and one or more preset self-attention modules for serially extracting features at least once from the time dimension and outputting them based on the self-attention mechanism for the input features.
具体解释可以参见上述实施例。For detailed explanation, please refer to the above embodiments.
在一种可选的实施例中,投篮分类网络可以包括N个级联的预设自注意力模块;N≥2;对于第i个预设自注意力模块,1≤i≤N-1,其输出级联至第i+1个预设自注意力模块的输入。In an optional embodiment, the shot classification network may include N cascaded preset self-attention modules; N≥2; for the i-th preset self-attention module, 1≤i≤N-1, Its output is cascaded to the input of the i+1th preset self-attention module.
其中,预设自注意力模块可以用于基于自注意力机制,从空间维度和时间维度提取特征。Among them, the preset self-attention module can be used to extract features from the spatial and temporal dimensions based on the self-attention mechanism.
相应地,可选地,针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度和时间维度提取特征,可以包括:将添加位置编码后的特征图序列输入第1个预设自注意力模块,基于第N个预设自注意力模块的输出,确定分类特征。Accordingly, optionally, for the feature map sequence after adding position coding, extracting features from the spatial dimension and the time dimension based on the self-attention mechanism may include: inputting the feature map sequence after adding position coding into the first preset The self-attention module determines the classification features based on the output of the Nth preset self-attention module.
本实施例并不限定预设自注意力模块中,从空间维度基于自注意力机制提取特征,和从时间维度基于自注意力机制提取特征之间的次序和次数。This embodiment is not limited to the order and number of times in the preset self-attention module between extracting features from the spatial dimension based on the self-attention mechanism and extracting features from the temporal dimension based on the self-attention mechanism.
可选地,预设自注意力模块可以用于:针对输入的特征,串行执行预设次数的特征提取步骤。具体解释可以参见上述实施例。Optionally, the preset self-attention module can be used to serially execute a preset number of feature extraction steps for the input features. For detailed explanation, please refer to the above embodiments.
可选地,预设自注意力模块可以用于:针对输入的特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征,输出所提取的特征。Optionally, the preset self-attention module can be used to: for input features, serially extract features at least once from the spatial dimension based on the self-attention mechanism, and further for the extracted features, serially extract features from the temporal dimension based on the self-attention mechanism. Extract features at least once and output the extracted features.
可选地,预设自注意力模块可以用于:针对输入的特征,从时间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从空间维度基于自注意力 机制串行提取至少一次特征,输出所提取的特征。Optionally, the preset self-attention module can be used to: for the input features, serially extract features at least once from the time dimension based on the self-attention mechanism, and further for the extracted features, from the spatial dimension based on the self-attention The mechanism extracts features at least once in series and outputs the extracted features.
可选地,预设自注意力模块可以用于:将输入的特征确定为当前特征;循环执行以下步骤,直到满足预设循环停止条件:针对当前特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征;将所提取的特征确定为当前特征。具体解释可以参见上述实施例。Optionally, the preset self-attention module can be used to: determine the input feature as the current feature; perform the following steps in a loop until the preset loop stop condition is met: for the current feature, serially based on the self-attention mechanism from the spatial dimension Extract features at least once, and further extract features at least once in series based on the self-attention mechanism from the time dimension based on the extracted features; determine the extracted features as the current features. For detailed explanation, please refer to the above embodiments.
可选地,预设自注意力模块可以用于:将输入的特征确定为当前特征;循环执行以下步骤,直到满足预设循环停止条件:针对当前特征,从时间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从空间维度基于自注意力机制串行提取至少一次特征;将所提取的特征确定为当前特征。具体解释可以参见上述实施例。Optionally, the preset self-attention module can be used to: determine the input feature as the current feature; perform the following steps in a loop until the preset loop stop condition is met: for the current feature, serially based on the self-attention mechanism from the time dimension Extract features at least once, and further extract features at least once in series from the spatial dimension based on the self-attention mechanism for the extracted features; determine the extracted features as the current features. For detailed explanation, please refer to the above embodiments.
可选地,不同的预设自注意力模块之间,自注意力机制的权重可以不同。具体可以是通过模型训练确定的各个预设自注意力模块的权重等参数。Optionally, the weight of the self-attention mechanism can be different between different preset self-attention modules. Specifically, it can be parameters such as the weight of each preset self-attention module determined through model training.
可选地,不同的预设自注意力模块之间,从空间维度基于自注意力机制提取特征,和从时间维度基于自注意力机制提取特征的次序和次数可以不同。Optionally, the order and number of features extracted from the spatial dimension based on the self-attention mechanism and from the temporal dimension based on the self-attention mechanism can be different between different preset self-attention modules.
例如,一个预设自注意力模块可以用于:先针对输入的特征基于自注意力机制从空间维度串行提取至少一次特征,再针对提取的特征基于自注意力机制从时间维度串行提取至少一次特征并输出。另一个预设自注意力模块可以用于:先针对输入的特征基于自注意力机制从时间维度串行提取至少一次特征,再针对提取的特征基于自注意力机制从空间维度串行提取至少一次特征并输出。For example, a preset self-attention module can be used to: first extract features from the spatial dimension at least once based on the self-attention mechanism for the input features, and then serially extract at least once the features from the time dimension based on the self-attention mechanism for the extracted features. Feature once and output. Another preset self-attention module can be used to: first extract features from the time dimension at least once based on the self-attention mechanism based on the input features, and then serially extract features at least once from the spatial dimension based on the self-attention mechanism based on the extracted features. features and output.
在一种具体的实施例中,预设自注意力模块具体可以是self-attention层,用于针对输入特征,基于自注意力机制提取特征。In a specific embodiment, the preset self-attention module may be a self-attention layer, which is used to extract features based on the self-attention mechanism for input features.
投篮分类网络中可以包括一个self-attention层,或者多个串联的self-attention层,从而可以通过模型训练投篮分类网络,确定self-attention层的参数,self-attention层的参数中包括权重。经过模型训练后,不同self-attention层的参数通常不同。The shot classification network can include a self-attention layer, or multiple self-attention layers in series, so that the shot classification network can be trained through the model and the parameters of the self-attention layer can be determined. The parameters of the self-attention layer include weights. After model training, the parameters of different self-attention layers are usually different.
可选地,self-attention层可以用于针对输入特征,从空间维度和时间维度,基于自注意力机制提取特征。具体可以是先针对输入的特征基于自注意力机制从空间维度串行提取至少一次特征,再针对提取的特征基于自注意力机制从时间维度串行提取至少一次特征并输出。也可以是先针对输入的特征基于自注意力机制从时间维度串行提取至少一次特征,再针对提取的特征基于自注意力机制从空间维度串行提取至少一次特征并输出。 Optionally, the self-attention layer can be used to extract features based on the self-attention mechanism from the spatial and temporal dimensions of the input features. Specifically, it may be that the features of the input are first serially extracted at least once from the spatial dimension based on the self-attention mechanism, and then the features are serially extracted at least once from the time dimension based on the self-attention mechanism and output. It is also possible to first serially extract features from the time dimension at least once based on the self-attention mechanism for the input features, and then serially extract features from the spatial dimension at least once based on the self-attention mechanism for the extracted features and output them.
3)分类特征。3) Classification features.
本方法流程并不限定具体得到分类特征的方式。This method flow does not limit the specific way to obtain classification features.
在一种可选的实施例中,分类特征可以是针对篮板区域图像序列,基于自注意力机制提取的。In an optional embodiment, the classification features may be extracted based on the self-attention mechanism for the backboard area image sequence.
上述实施例中解释了针对篮板区域图像序列,基于自注意力机制所提取的特征。本实施例并不限定根据自注意力机制提取的特征,得到分类特征的方式。The above embodiment explains the features extracted based on the self-attention mechanism for the backboard area image sequence. This embodiment is not limited to the method of obtaining classification features based on features extracted by the self-attention mechanism.
可选地,可以综合基于自注意力机制提取的特征,得到分类特征。Optionally, features extracted based on the self-attention mechanism can be synthesized to obtain classification features.
例如,可以直接将基于自注意力机制提取的特征,确定为分类特征。For example, features extracted based on the self-attention mechanism can be directly determined as classification features.
也可以将基于自注意力机制提取的特征进行池化处理,具体可以是平均池化或者最大池化,得到分类特征。The features extracted based on the self-attention mechanism can also be pooled, specifically average pooling or maximum pooling, to obtain classification features.
可选地,可以针对输入自注意力机制的特征,添加分类初始特征。Optionally, classification initial features can be added to the features input from the attention mechanism.
分类初始特征并不是待识别篮板区域图像序列中的特征,可以利用分类初始特征,经过多次提取特征,综合表征自注意力机制所学习到的待识别篮板区域图像序列中的特征信息,而不影响待识别篮板区域图像序列中原有的特征。The initial features of classification are not the features in the image sequence of the backboard area to be identified. The initial features of classification can be used to comprehensively represent the feature information in the image sequence of the backboard area to be identified learned by the self-attention mechanism through multiple feature extractions. Affect the original features in the image sequence of the backboard area to be identified.
例如,自注意力机制所学习到的篮板区域图像在空间维度上,与投篮识别结果关联较大的图像内容;或者自注意力机制所学习到的,时间维度上与投篮识别结果关联较大的若干篮板区域图像。For example, the backboard area image learned by the self-attention mechanism has a greater correlation with the shot recognition result in the spatial dimension; or the image content learned by the self-attention mechanism has a greater correlation with the shot recognition result in the time dimension. Several images of the backboard area.
由于自注意力机制的输入特征和输出特征大小相同,因此,可以在基于自注意力机制提取的特征中,确定分类初始特征对应的当前表征。Since the input features and output features of the self-attention mechanism are of the same size, the current representation corresponding to the initial feature of classification can be determined among the features extracted based on the self-attention mechanism.
具体可以将分类初始特征对应的当前表征,确定为分类特征,用于后续预测投篮识别结果。Specifically, the current representation corresponding to the initial classification feature can be determined as the classification feature and used for subsequent prediction of the shot recognition result.
在训练投篮分类网络的过程中,可以借助分类初始特征,利用自注意力机制学习到待识别篮板区域图像序列中的特征信息。In the process of training the shot classification network, you can use the initial features of the classification and use the self-attention mechanism to learn the feature information in the image sequence of the backboard area to be identified.
可选地,在针对待识别篮板区域图像序列提取出特征图序列的情况下,针对特征图序列,基于自注意力机制提取分类特征,可以包括:针对特征图序列添加分类初始特征;针对添加分类初始特征后的特征图序列,基于自注意力机制提取特征;从所提取的特征中,将分类初始特征对应的当前表征,确定为分类特征。 Optionally, when the feature map sequence is extracted from the image sequence of the backboard area to be identified, extracting classification features based on the self-attention mechanism for the feature map sequence may include: adding classification initial features to the feature map sequence; adding classification features to the feature map sequence. The feature map sequence after the initial features extracts features based on the self-attention mechanism; from the extracted features, the current representation corresponding to the classification initial features is determined as the classification feature.
可选地,结合上述关于位置编码的实施例,分类初始特征并不是待识别篮板区域图像序列中的特征,因此,可以为分类初始特征设置位置编码,方便后续基于自注意力机制提取特征。Optionally, combined with the above embodiments about position coding, the initial classification features are not features in the image sequence of the backboard area to be identified. Therefore, position coding can be set for the initial classification features to facilitate subsequent feature extraction based on the self-attention mechanism.
可选地,可以针对添加分类初始特征后的特征图序列,添加位置编码。Optionally, position coding can be added to the feature map sequence after adding the initial features for classification.
具体地,以堆叠多个一维特征得到的二维特征为例,可以先添加分类初始特征,再添加位置编码。Specifically, taking the two-dimensional features obtained by stacking multiple one-dimensional features as an example, the classification initial features can be added first, and then the position coding can be added.
而为分类初始特征设置的位置编码,通常在特征图序列整体之外。例如,时间维度上,分类初始特征可以添加在特征图序列中第一个特征图之前,或者最后一个特征图之后。空间维度上,分类初始特征可以添加到特征图序列中每个特征图中第一个特征点之前,或者最后一个特征点之后。The position encoding set for the initial features for classification is usually outside the entire feature map sequence. For example, in the time dimension, the classification initial features can be added before the first feature map in the feature map sequence, or after the last feature map. In the spatial dimension, the classification initial features can be added before the first feature point in each feature map in the feature map sequence, or after the last feature point.
关于分类初始特征,本方法流程并不限定具体的形式。Regarding the initial characteristics of classification, this method process does not limit the specific form.
可选地,分类初始特征可以属于投篮分类网络中的参数,分类初始特征的具体取值可以通过模型训练进行确定。Optionally, the classification initial features may belong to parameters in the shot classification network, and the specific values of the classification initial features may be determined through model training.
具体地,在训练投篮分类网络的过程中,可以不断调整分类初始特征的取值。最终在训练结束后,确定最终调整得到的分类初始特征取值,用于投篮识别。Specifically, in the process of training the shot classification network, the values of the initial features for classification can be continuously adjusted. Finally, after the training is completed, the final adjusted classification initial feature values are determined and used for shot recognition.
可选地,由于需要调整分类初始特征的取值,因此,分类初始特征的尺寸越小,调整所需的计算资源越小,能够提高模型训练的稳定性。Optionally, since the values of the initial features for classification need to be adjusted, the smaller the size of the initial features for classification, the smaller the computing resources required for adjustment, which can improve the stability of model training.
本方法流程并不限定分类初始特征的尺寸。可选地,分类初始特征的尺寸可以是1*1,在具体使用分类初始特征时,可以将分类初始特征复制多份,从而满足自注意力机制的需求。具体可以是广播操作。This method flow does not limit the size of the initial features for classification. Optionally, the size of the initial classification features can be 1*1. When the initial classification features are specifically used, the initial classification features can be copied multiple times to meet the needs of the self-attention mechanism. Specifically, it can be a broadcast operation.
例如,如果在自注意力机制中,所需要扩展的特征大小为1*N,那么,可以将1*1的分类初始特征复制N份,组合为1*N的特征,进而添加到特征中,用于后续的自注意力机制提取特征的步骤中。For example, if in the self-attention mechanism, the feature size that needs to be expanded is 1*N, then the 1*1 classification initial feature can be copied N times, combined into a 1*N feature, and then added to the feature. Used in the subsequent steps of feature extraction by the self-attention mechanism.
在本实施例中,针对1*1的分类初始特征,可以在模型训练的过程中,方便调整分类初始特征,提高模型训练的稳定性。In this embodiment, for the 1*1 classification initial features, the classification initial features can be easily adjusted during the model training process to improve the stability of the model training.
当然,分类初始特征也可以是其他尺寸。Of course, the classification initial features can also be of other sizes.
可选地,分类初始特征的尺寸可以包括一个或多个特征图,可以添加到特征图序列中,作为时间维度上新增的特征。 Optionally, the size of the initial features for classification can include one or more feature maps, which can be added to the feature map sequence as new features in the time dimension.
可选地,分类初始特征的尺寸可以包括每个特征图中同一位置新增的一个或多个特征点,作为特征图序列中空间维度上新增的特征。Optionally, the size of the initial features for classification may include one or more newly added feature points at the same position in each feature map as new features in the spatial dimension of the feature map sequence.
可选地,分类初始特征的尺寸可以包括一个或多个特征图,以及每个特征图中同一位置新增的一个或多个特征点,从而可以作为特征图序列中在时间维度和空间维度上新增的特征。Optionally, the size of the initial features for classification can include one or more feature maps, and one or more new feature points at the same position in each feature map, which can be used as the feature map sequence in the time and space dimensions. New features.
当然,可选地,自注意力机制所需扩展的特征可以是一个或多个特征图。从而可以将分类初始特征复制多份后,组合成特征图用于扩展特征。Of course, optionally, the features required to be extended by the self-attention mechanism can be one or more feature maps. This allows multiple copies of the initial classification features to be combined into a feature map for feature expansion.
可选地,自注意力机制所需扩展的特征可以是每个特征图中同一位置新增的一个或多个特征点。从而可以将分类初始特征复制多份后,组合成每个特征图同一位置的特征点,用于扩展特征。Optionally, the features required to be expanded by the self-attention mechanism can be one or more feature points added at the same position in each feature map. Therefore, multiple copies of the initial classification features can be copied and combined into feature points at the same position in each feature map for feature expansion.
在一种可选的实施例中,以堆叠多个一维特征得到的二维特征为例,由于二维特征中堆叠的一维特征对应于特征图,一维特征之间存在时序关系。因此,可以针对二维特征在两个维度上都新增特征点。In an optional embodiment, taking a two-dimensional feature obtained by stacking multiple one-dimensional features as an example, since the stacked one-dimensional features in the two-dimensional feature correspond to the feature map, there is a temporal relationship between the one-dimensional features. Therefore, new feature points can be added in both dimensions for two-dimensional features.
可选地,可以在二维特征中存在时序关系的一维特征基础上,新增一个或多个一维特征,所新增的一维特征就是分类初始特征。Optionally, one or more one-dimensional features can be added based on the one-dimensional features that have temporal relationships among the two-dimensional features, and the newly added one-dimensional features are the initial classification features.
可选地,可以在二维特征中存在时序关系的每个一维特征基础上,新增一个或多个特征点,所新增的特征点就是分类初始特征。Optionally, one or more feature points can be added based on each one-dimensional feature that has a temporal relationship in the two-dimensional feature, and the added feature points are the initial features for classification.
可选地,可以在二维特征中存在时序关系的一维特征基础上,新增一个或多个一维特征,并进一步在当前二维特征中的每个一维特征基础上,新增一个或多个特征点。所新增的特征部分就是分类初始特征。Optionally, one or more one-dimensional features can be added based on the one-dimensional features that have temporal relationships in the two-dimensional features, and further one or more one-dimensional features can be added based on each one-dimensional feature in the current two-dimensional features. or multiple feature points. The newly added feature part is the classification initial feature.
例如,针对a个长度为b的一维特征,可以通过堆叠得到a*b的二维特征。添加位置编码后的二维特征。在a个一维特征之间,存在时序位置关系;在长度为b的一维特征中,存在空间位置关系。For example, for a one-dimensional feature of length b, a two-dimensional feature of a*b can be obtained by stacking. Two-dimensional features after adding position encoding. Between a one-dimensional features, there is a temporal position relationship; among the one-dimensional features of length b, there is a spatial position relationship.
分类初始特征可以是1*b的特征,添加到a*b的二维特征中,得到(a+1)*b的二维特征。The initial feature for classification can be the feature of 1*b, which is added to the two-dimensional feature of a*b to obtain the two-dimensional feature of (a+1)*b.
分类初始特征可以是1*a的特征,添加到a*b的二维特征中,得到a*(b+1)的二维特征。The initial feature for classification can be the feature of 1*a, which is added to the two-dimensional feature of a*b to obtain the two-dimensional feature of a*(b+1).
分类初始特征也可以是1*1的特征,针对a*b的二维特征,如果需要扩展为(a+1) *b的二维特征,则可以将分类初始特征复制b份,组合成1*b的特征添加到a*b的二维特征中,得到(a+1)*b的二维特征。具体可以是通过广播操作复制分类初始特征。The initial classification feature can also be a 1*1 feature, for the two-dimensional feature of a*b, if necessary, it can be expanded to (a+1) *b two-dimensional features, you can copy b copies of the initial classification features and combine them into 1*b features and add them to the two-dimensional features of a*b to obtain the two-dimensional features of (a+1)*b. Specifically, it can be to copy the initial features of classification through broadcast operation.
当然,如果针对a*b的二维特征,如果需要扩展为a*(b+1)的二维特征,则可以将分类初始特征复制a份,组合成1*a的特征添加到a*b的二维特征中,得到a*(b+1)的二维特征。具体可以是通过广播操作复制分类初始特征。Of course, if you want to expand the two-dimensional feature of a*b to the two-dimensional feature of a*(b+1), you can copy a copy of the initial classification feature and combine it into a feature of 1*a and add it to a*b From the two-dimensional features of , the two-dimensional features of a*(b+1) are obtained. Specifically, it can be to copy the initial features of classification through broadcast operation.
添加分类初始特征后的二维特征可以是(a+1)*(b+1)的二维特征,相比于a*b的二维特征新增的部分可以是分类初始特征。The two-dimensional features after adding the classification initial features can be the two-dimensional features of (a+1)*(b+1), and the new part compared to the two-dimensional features of a*b can be the classification initial features.
关于根据分类特征确定投篮分类结果的方法。Regarding the method of determining shot classification results based on classification features.
本方法流程并不限定具体根据分类特征确定投篮分类结果的方法。This method process does not limit the specific method of determining the shot classification result based on the classification characteristics.
在一种可选的实施例中,在得到分类特征的情况下,可以将分类特征输入预先训练的全连接网络,获取全连接网络输出的投篮识别结果。In an optional embodiment, when classification features are obtained, the classification features can be input into a pre-trained fully connected network to obtain the shot recognition result output by the fully connected network.
可选地,全连接网络可以用于针对输入的特征,预测投篮识别结果。具体可以是针对输入的分类特征,预测投篮识别结果。Alternatively, a fully connected network can be used to predict the shot recognition results based on the input features. Specifically, it can be based on the input classification features to predict the shot recognition results.
当然,其他模型结构也可以用于针对分类特征预测投篮识别结果。本实施例并不具体限定。Of course, other model structures can also be used to predict shot recognition results for classification features. This embodiment is not specifically limited.
可选地,为了减少计算量,提高预测效率,可以针对分类特征进行处理,再利用全连接网络预测投篮识别结果。Optionally, in order to reduce the amount of calculation and improve prediction efficiency, classification features can be processed, and then a fully connected network can be used to predict the shot recognition result.
其中,投篮分类网络包括全连接网络,在训练投篮分类网络的过程中,针对全连接网络也进行训练,从而可以利用全连接网络预测投篮识别结果,提高预测准确率和预测效率。Among them, the shot classification network includes a fully connected network. In the process of training the shot classification network, the fully connected network is also trained, so that the fully connected network can be used to predict the shot recognition results and improve the prediction accuracy and prediction efficiency.
而在使用投篮分类网络进行投篮识别的过程中,可以使用投篮分类网络中训练完成的全连接网络,基于分类特征预测投篮识别结果。In the process of using the shot classification network for shot recognition, the fully connected network trained in the shot classification network can be used to predict the shot recognition results based on the classification features.
具体可以是针对分类特征进行池化处理,可选地,可以进行平均池化或者最大池化,得到预设特征尺寸的分类特征,再将预设特征尺寸的分类特征,输入到预先训练的全连接网络,获取全连接网络输出的投篮识别结果。Specifically, the classification features can be pooled. Alternatively, average pooling or maximum pooling can be performed to obtain the classification features of the preset feature size, and then the classification features of the preset feature size can be input to the pre-trained full set of features. Connect to the network and obtain the shot recognition results output by the fully connected network.
其中,预设特征尺寸可以小于分类特征的原始尺寸,从而可以减少数据量和计算量,提高预测效率。Among them, the preset feature size can be smaller than the original size of the classification feature, thereby reducing the amount of data and calculation and improving prediction efficiency.
因此,可选地,根据分类特征确定投篮识别结果,可以包括:针对分类特征进 行池化处理,得到预设特征尺寸的待输入特征;将待输入特征输入到预先训练的全连接网络中,获取全连接网络输出的投篮识别结果。Therefore, optionally, determining the shot recognition result based on the classification features may include: Perform pooling processing to obtain the features to be input with preset feature sizes; input the features to be input into the pre-trained fully connected network to obtain the shot recognition results output by the fully connected network.
本实施例并不限定投篮识别结果的具体形式,只要能够表征投篮是否成功即可。This embodiment does not limit the specific form of the shot recognition result, as long as it can indicate whether the shot is successful.
4)滑动窗口。4) Sliding window.
通过上述实施例的解释,本方法流程可以基于投篮分类网络,针对待识别视频进行投篮识别。Through the explanation of the above embodiments, the process of this method can be based on the shot classification network to perform shot recognition on the video to be recognized.
在一种可选的实施例中,由于篮球比赛中,投篮的过程通常较短,因此,可以获取投篮过程的片段视频,作为待识别视频,利用投篮分类网络进行投篮识别。In an optional embodiment, since the process of shooting is usually short in a basketball game, a video clip of the shooting process can be obtained as the video to be identified, and the shot classification network is used to perform shot recognition.
可选地,待识别视频也可能包括较长的篮球比赛片段,其中可能包括一次或多次投篮过程。Optionally, the video to be identified may also include longer basketball game clips, which may include one or more shooting processes.
为了提高投篮识别的准确率,在一种可选的实施例中,可以针对待识别视频,将待识别视频划分为多个片段,从而可以针对多个片段分别进行投篮识别。In order to improve the accuracy of shot recognition, in an optional embodiment, the video to be identified can be divided into multiple segments, so that shot recognition can be performed on multiple segments respectively.
由于待识别视频中可能包括多个投篮过程,因此,划分片段分别进行投篮识别,可以方便提高投篮识别的准确率,也可以定位投篮过程的视频位置以及投篮成功的视频位置。Since the video to be recognized may include multiple shooting processes, dividing the segments for separate shooting recognition can easily improve the accuracy of shooting recognition, and can also locate the video position of the shooting process and the video position of the successful shot.
本实施例并不限定将待识别视频划分片段的方式。This embodiment does not limit the way in which the video to be recognized is divided into segments.
可选地,可以直接将待识别视频划分为预设片段时长的多个片段。Alternatively, the video to be recognized can be directly divided into multiple segments of preset segment duration.
可选地,为了降低将投篮过程划分到不同片段的可能性,可以采用滑动窗口机制划分片段。Optionally, in order to reduce the possibility of dividing the shooting process into different segments, a sliding window mechanism can be used to divide the segments.
具体可以从待识别视频的首个视频帧开始,确定滑动窗口长度和滑动步长,进而通过滑动窗口机制,获取每次滑动的窗口中包含的视频片段,作为划分得到的视频片段。Specifically, you can start from the first video frame of the video to be recognized, determine the sliding window length and sliding step size, and then use the sliding window mechanism to obtain the video clips contained in each sliding window as the divided video clips.
针对所划分的不同视频片段,本实施例并不限定执行投篮识别的顺序。For the different divided video segments, this embodiment does not limit the order in which shot recognition is performed.
可选地,可以针对所划分的不同视频片段,并行执行投篮识别,也可以串行执行投篮识别。Optionally, shot recognition can be performed in parallel for different divided video segments, or shot recognition can be performed serially.
其中,并行执行投篮识别可以提高投篮识别的效率。Among them, executing shot recognition in parallel can improve the efficiency of shot recognition.
在一种可选的实施例中,由于需要确定篮板区域用于后续的投篮识别,因此, 可以直接划分待识别篮板区域图像序列,分别进行投篮识别。In an optional embodiment, since it is necessary to determine the rebound area for subsequent shot identification, therefore, The image sequence of the backboard area to be identified can be directly divided and shot recognition can be performed separately.
可选地,将待识别篮板区域图像序列输入投篮分类网络,获取投篮分类网络输出的投篮识别结果,可以包括:将待识别篮板区域图像序列划分为多个篮板区域图像子序列,将划分得到的每个篮板区域图像子序列分别输入投篮分类网络,获取投篮分类网络输出的投篮识别结果。Optionally, input the backboard area image sequence to be identified into the shot classification network, and obtain the shot recognition result output by the shot classification network, which may include: dividing the backboard area image sequence to be identified into multiple backboard area image sub-sequences, and dividing the obtained Each backboard area image subsequence is input into the shot classification network, and the shot recognition result output by the shot classification network is obtained.
本实施例并不限定划分篮板区域图像子序列的方式。具体可以采用滑动窗口机制。由于滑动窗口机制所划分的不同片段之间,可能存在重合的部分,因此,在针对待识别篮板区域图像序列基于滑动窗口机制划分子序列的情况下,可以无需重复针对重合部分中的视频帧确定篮板区域,提高效率,节约计算资源。This embodiment does not limit the way of dividing the backboard area image sub-sequence. Specifically, a sliding window mechanism can be used. Since there may be overlapping parts between different segments divided by the sliding window mechanism, when the image sequence of the backboard area to be identified is divided into subsequences based on the sliding window mechanism, there is no need to repeatedly determine the video frames in the overlapping parts. Backboard area, improve efficiency and save computing resources.
在一种可选的实施例中,由于可能需要提取篮板区域图像的特征图,而滑动窗口机制中,所划分的不同片段之间可能存在重合的部分。In an optional embodiment, since it may be necessary to extract a feature map of the backboard area image, there may be overlapping parts between different divided segments in the sliding window mechanism.
因此,为了减少特征提取的成本,提高特征提取和投篮识别的效率,可以直接针对特征图序列,划分出多个子序列进行后续的投篮识别,进而在基于滑动窗口机制划分子序列的基础上,可以无需重复地针对重合片段中的篮板区域图像提取特征图,提高效率,节约计算资源。Therefore, in order to reduce the cost of feature extraction and improve the efficiency of feature extraction and shot recognition, we can directly divide the feature map sequence into multiple subsequences for subsequent shot recognition, and then on the basis of dividing the subsequences based on the sliding window mechanism, we can There is no need to repeatedly extract feature maps from the backboard area images in the overlapping fragments, which improves efficiency and saves computing resources.
可选地,针对特征图序列,基于自注意力机制提取分类特征,可以包括:针对特征图序列,将前m个特征图确定为当前滑动窗口包含的特征图子序列;m≥1;循环执行以下步骤,直到当前滑动窗口无法向后移动:针对当前滑动窗口包含的特征图子序列,基于自注意力机制提取分类特征;将滑动窗口向后移动预设滑动步长。Optionally, for the feature map sequence, extracting classification features based on the self-attention mechanism may include: for the feature map sequence, determine the first m feature maps as the feature map subsequence contained in the current sliding window; m≥1; loop execution The following steps are performed until the current sliding window cannot move backward: extract classification features based on the self-attention mechanism for the feature map subsequence contained in the current sliding window; move the sliding window backward by a preset sliding step.
本实施例并不限定针对滑动窗口机制划分的不同特征图子序列,执行后续投篮识别步骤的次序。This embodiment does not limit the order in which subsequent shot recognition steps are performed for different feature map subsequences divided by the sliding window mechanism.
可选地,可以针对滑动窗口机制所划分的不同特征图子序列,并行执行投篮识别的后续步骤。也可以串行执行。其中,并行执行投篮识别后续步骤可以提高投篮识别的效率。Optionally, subsequent steps of shot recognition can be performed in parallel for different feature map subsequences divided by the sliding window mechanism. Can also be executed serially. Among them, executing subsequent steps of shot recognition in parallel can improve the efficiency of shot recognition.
需要说明的是,在一种可选的实施例中,针对当前滑动窗口包含的特征图子序列,基于自注意力机制提取分类特征,具体提取分类特征的方式,可以参见上述实施例。It should be noted that, in an optional embodiment, classification features are extracted based on the self-attention mechanism for the feature map subsequence contained in the current sliding window. For specific methods of extracting classification features, please refer to the above embodiments.
可选地,针对当前滑动窗口包含的特征图子序列,基于自注意力机制提取分类特征,可以包括:针对当前滑动窗口包含的特征图子序列,添加位置编码;位置编码可以包括:表征该特征图子序列中每个特征图不同特征点之间空间位置关系的信息,和表 征该特征图子序列中不同特征图之间时序位置关系的信息;针对添加位置编码后的特征图子序列,基于自注意力机制,从空间维度和时间维度提取特征,得到分类特征。Optionally, extracting classification features based on the self-attention mechanism for the feature map subsequence contained in the current sliding window may include: adding position coding to the feature map subsequence contained in the current sliding window; position coding may include: characterizing the feature Information about the spatial position relationship between different feature points of each feature map in the graph subsequence, and the table Characterize the information of the temporal position relationship between different feature maps in the feature map subsequence; for the feature map subsequence after adding position coding, based on the self-attention mechanism, extract features from the spatial dimension and time dimension to obtain classification features.
可选地,针对特征图子序列,添加位置编码,可以包括:将特征图子序列中的每个特征图,转换为一维特征;将一维特征进行堆叠转换处理,得到二维特征;针对得到的二维特征,添加位置编码。Optionally, adding position coding to the feature map subsequence may include: converting each feature map in the feature map subsequence into a one-dimensional feature; performing stack conversion processing on the one-dimensional features to obtain two-dimensional features; for The obtained two-dimensional features are added with position coding.
可选地,具体可以是针对转换得到的全部或部分一维特征,进行堆叠转换处理,得到二维特征,之后可以针对所得到的二维特征,添加位置编码。Optionally, specifically, stacking conversion processing may be performed on all or part of the converted one-dimensional features to obtain two-dimensional features, and then position coding may be added to the obtained two-dimensional features.
可选地,针对特征图子序列,基于自注意力机制提取分类特征,可以包括:针对特征图子序列添加分类初始特征;针对添加分类初始特征后的特征图序列,基于自注意力机制提取特征;从所提取的特征中,将分类初始特征对应的当前表征,确定为分类特征。Optionally, extracting classification features based on the self-attention mechanism for the feature map subsequence may include: adding classification initial features to the feature map subsequence; extracting features based on the self-attention mechanism for the feature map sequence after adding the classification initial features. ; From the extracted features, determine the current representation corresponding to the initial classification feature as the classification feature.
具体解释可以参见上述实施例。For detailed explanation, please refer to the above embodiments.
在本实施例中,通过针对特征图序列,基于滑动窗口机制划分出多个子序列进行后续的投篮识别,可以节约计算资源,提高投篮识别效率,也可以提高投篮识别的准确率。In this embodiment, by dividing the feature map sequence into multiple subsequences based on the sliding window mechanism for subsequent shot recognition, computing resources can be saved, shot recognition efficiency can be improved, and the accuracy of shot recognition can also be improved.
由于针对划分出的多个子序列,可以得到多个投篮识别结果,因此,可以基于投篮识别结果,确定对应子序列所表征的待识别视频中的视频片段,从而可以定位投篮成功的视频位置。Since multiple shot recognition results can be obtained for the multiple divided sub-sequences, the video clips in the video to be identified represented by the corresponding sub-sequences can be determined based on the shot recognition results, so that the video location of the successful shot can be located.
还可以根据表征投篮成功的投篮识别结果数量,确定待识别视频中投篮成功的数量,方便后续计算篮球比赛的分数。The number of successful shots in the video to be recognized can also be determined based on the number of shot recognition results that represent successful shots, so as to facilitate subsequent calculation of scores in basketball games.
5)投篮分类网络的结构。5) Structure of the shot classification network.
在上述实施例解释投篮分类网络的作用基础上,可以明确投篮分类网络的结构。Based on the above embodiment explaining the function of the shot classification network, the structure of the shot classification network can be clarified.
本方法流程并不限定投篮分类网络具体的结构。以下解释用于示例性说明。The process of this method does not limit the specific structure of the shot classification network. The following explanations are provided for illustrative purposes.
在一种可选的实施例中,投篮分类网络可以用于:针对输入的篮板区域图像序列,基于自注意力机制提取分类特征;根据分类特征确定投篮识别结果。In an optional embodiment, the shot classification network can be used to: extract classification features based on the self-attention mechanism for the input backboard area image sequence; and determine the shot recognition result based on the classification features.
相应地,可选地,投篮分类网络中可以包括,用于基于自注意力机制提取分类特征的模块,以及用于根据分类特征确定投篮识别结果的模块。Accordingly, optionally, the shot classification network may include a module for extracting classification features based on a self-attention mechanism, and a module for determining a shot recognition result based on the classification features.
可选地,投篮分类网络在处理过程中,通常需要提取篮板区域图像序列的特征 图序列,并且需要针对所提取的特征图序列进行预处理,例如,添加位置编码、划分子序列、添加分类初始特征等预处理的操作。Optionally, during the processing of the shot classification network, it is usually necessary to extract features of the backboard area image sequence. graph sequence, and the extracted feature graph sequence needs to be preprocessed, for example, preprocessing operations such as adding position coding, dividing subsequences, adding initial features for classification, etc.
投篮分类网络还用于针对预处理后的特征图序列,基于自注意力机制进一步提取分类特征,最后基于分类特征预测投篮识别结果。The shot classification network is also used to further extract classification features based on the self-attention mechanism based on the preprocessed feature map sequence, and finally predict the shot recognition results based on the classification features.
因此,可选地,投篮分类网络的结构中,可以包括特征图提取模块、特征预处理模块、自注意力特征提取模块和预测模块。Therefore, optionally, the structure of the shot classification network may include a feature map extraction module, a feature preprocessing module, a self-attention feature extraction module and a prediction module.
当然,投篮分别网络的结构并不具体限定,本实施例仅仅用于示例性说明。Of course, the structure of the shot classification network is not specifically limited, and this embodiment is only used for illustrative explanation.
其中,特征图提取模块可以用于提取输入的篮板区域图像序列的特征图序列,并且输出所提取的特征图序列。The feature map extraction module may be used to extract a feature map sequence of the input backboard area image sequence, and output the extracted feature map sequence.
特征预处理模块可以用于针对特征图提取模块输出的特征图序列进行预处理。The feature preprocessing module can be used to preprocess the feature map sequence output by the feature map extraction module.
具体地,特征预处理模块可以用于:针对特征图提取模块输出的特征图序列,添加位置编码;输出添加位置编码后的特征图序列。Specifically, the feature preprocessing module can be used to: add position coding to the feature map sequence output by the feature map extraction module; and output the feature map sequence after adding position coding.
可选地,位置编码可以包括:表征特征图序列中每个特征图不同特征点之间空间位置关系的信息,和表征特征图序列中不同特征图之间时序位置关系的信息。Optionally, the position coding may include: information characterizing the spatial position relationship between different feature points of each feature map in the feature map sequence, and information characterizing the temporal position relationship between different feature maps in the feature map sequence.
特征预处理模块还可以用于:针对特征图提取模块输出的特征图序列添加分类初始特征;输出添加分类初始特征后的特征图序列。The feature preprocessing module can also be used to: add classification initial features to the feature map sequence output by the feature map extraction module; and output the feature map sequence after adding the classification initial features.
特征预处理模块还可以用于:针对特征图提取模块输出的特征图序列添加分类初始特征和位置编码;输出添加分类初始特征和位置编码后的特征图序列。The feature preprocessing module can also be used to: add classification initial features and position coding to the feature map sequence output by the feature map extraction module; and output the feature map sequence after adding classification initial features and position coding.
其中,添加分类初始特征和位置编码的顺序可以并不限定。具体地,可以是先添加分类初始特征,再添加位置编码。The order in which the classification initial features and position codes are added is not limited. Specifically, the initial classification features can be added first, and then the position coding can be added.
可选地,特征预处理模块中的预处理可以至少包括添加位置编码,从而可以方便后续基于自注意力机制的特征提取。Optionally, the preprocessing in the feature preprocessing module may include at least adding position coding, thereby facilitating subsequent feature extraction based on the self-attention mechanism.
特征预处理模块还可以用于:针对特征图提取模块输出的特征图序列,划分特征图子序列,所划分的特征图子序列可以直接输出。The feature preprocessing module can also be used to divide the feature map sequence output by the feature map extraction module into feature map subsequences, and the divided feature map subsequences can be directly output.
特征预处理模块还可以用于:针对特征图提取模块输出的特征图序列,划分特征图子序列;针对所划分的每个特征图子序列,添加分类初始特征和/或位置编码并输出。The feature preprocessing module can also be used to: divide the feature map subsequences for the feature map sequence output by the feature map extraction module; for each divided feature map subsequence, add classification initial features and/or position codes and output them.
具体地,可以利用滑动窗口机制划分特征图子序列。 Specifically, the sliding window mechanism can be used to divide the feature map subsequences.
特征预处理模块可以用于:针对特征图提取模块输出的特征图序列,将前m个特征图确定为当前滑动窗口包含的特征图子序列;m≥1;循环执行以下步骤,直到当前滑动窗口无法向后移动:针对当前滑动窗口包含的特征图子序列,添加分类初始特征和/或位置编码并输出;将滑动窗口向后移动预设滑动步长。The feature preprocessing module can be used to: for the feature map sequence output by the feature map extraction module, determine the first m feature maps as the feature map subsequence contained in the current sliding window; m ≥ 1; perform the following steps in a loop until the current sliding window Unable to move backward: For the feature map subsequence contained in the current sliding window, add classification initial features and/or position coding and output; move the sliding window backward by the preset sliding step.
可选地,在特征预处理模块中,为特征图序列或者特征图子序列添加位置编码的情况下,自注意力特征提取模块可以用于:针对特征预处理模块输出的添加位置编码的特征图序列或者特征图子序列,基于自注意力机制进行特征提取,输出分类特征。Optionally, in the feature preprocessing module, when position coding is added to the feature map sequence or feature map subsequence, the self-attention feature extraction module can be used to: target the feature map with added position coding output by the feature preprocessing module. Sequence or feature map subsequence, feature extraction based on self-attention mechanism, and output classification features.
具体是从空间维度还是时间维度进行特征提取,可以根据所添加的位置编码确定。Whether feature extraction is performed from the spatial dimension or the time dimension can be determined based on the added position coding.
可选地,在特征预处理模块中,没有为特征图序列或者特征图子序列添加位置编码的情况下,自注意力特征提取模块可以用于:针对特征预处理模块输出的特征图序列或者特征图子序列添加位置编码,基于自注意力机制进行特征提取,输出分类特征。Optionally, in the feature preprocessing module, without adding position coding to the feature map sequence or feature map subsequence, the self-attention feature extraction module can be used to: target the feature map sequence or features output by the feature preprocessing module Position coding is added to the image subsequence, feature extraction is performed based on the self-attention mechanism, and classification features are output.
针对自注意力特征提取模块,在一种可选的实施例中,自注意力特征提取模块包括一个预设自注意力模块,或者级联的多个预设自注意力模块。Regarding the self-attention feature extraction module, in an optional embodiment, the self-attention feature extraction module includes a preset self-attention module, or a plurality of cascaded preset self-attention modules.
关于预设自注意力模块的解释可以参见上述实施例。For an explanation of the preset self-attention module, please refer to the above embodiment.
可选地,预测模块可以用于针对自注意力特征提取模块输出的分类特征,进行投篮识别结果的预测。具体可以采用全连接网络进行预测,Optionally, the prediction module can be used to predict the shot recognition result based on the classification features output by the self-attention feature extraction module. Specifically, a fully connected network can be used for prediction.
为了便于理解,如图2所示,图2是根据本发明实施例示出的一种投篮分类网络的结构示意图。For ease of understanding, as shown in Figure 2, Figure 2 is a schematic structural diagram of a shot classification network according to an embodiment of the present invention.
其中,投篮分类网络中可以包括:特征图提取模块、特征预处理模块、自注意力特征提取模块和预测模块。Among them, the shot classification network can include: feature map extraction module, feature preprocessing module, self-attention feature extraction module and prediction module.
特征图提取模块的输出可以级联到特征预处理模块的输入,特征预处理模块的输出可以级联到自注意力特征提取模块的输入,自注意力特征提取模块中可以包括多个级联的预设自注意力模块,自注意力特征提取模块的输出可以级联到预测模块的输入。The output of the feature map extraction module can be cascaded to the input of the feature preprocessing module. The output of the feature preprocessing module can be cascaded to the input of the self-attention feature extraction module. The self-attention feature extraction module can include multiple cascaded The self-attention module is preset, and the output of the self-attention feature extraction module can be cascaded to the input of the prediction module.
此外,可以将篮板区域图像序列输入到投篮分类网络中,也就是输入到特征图提取模块中。In addition, the backboard area image sequence can be input into the shot classification network, that is, into the feature map extraction module.
投篮分类网络的输出,也就是预测模块的输出,是投篮识别结果。可以用于确定投篮是否成功。 The output of the shot classification network, that is, the output of the prediction module, is the shot recognition result. Can be used to determine whether a shot was successful.
为了便于理解,如图3所示,图3是根据本发明实施例示出的一种投篮分类网络的原理示意图。For ease of understanding, as shown in Figure 3, Figure 3 is a schematic principle diagram of a shot classification network according to an embodiment of the present invention.
其中,投篮分类网络中可以包括:特征图提取模块、特征预处理模块、自注意力特征提取模块和预测模块。Among them, the shot classification network can include: feature map extraction module, feature preprocessing module, self-attention feature extraction module and prediction module.
针对特征图提取模块,特征图提取模块可以用于:针对输入的篮板区域图像序列中的每个篮板区域图像,分别利用预先训练的二维CNN网络提取出二维特征图,从而得到特征图序列并输出。For the feature map extraction module, the feature map extraction module can be used to: for each backboard area image in the input backboard area image sequence, use the pre-trained two-dimensional CNN network to extract a two-dimensional feature map, thereby obtaining the feature map sequence. and output.
特征图提取模块的输出可以级联到特征预处理模块的输入。The output of the feature map extraction module can be cascaded to the input of the feature preprocessing module.
针对特征预处理模块,特征预处理模块可以用于:针对输入的特征图序列,基于滑动窗口机制划分得到多个特征图子序列。针对每个特征图子序列中的每个特征图,分别转换为一维特征,再堆叠转换得到的全部一维特征,得到二维特征。进而针对二维特征添加分类初始特征和位置编码。For the feature preprocessing module, the feature preprocessing module can be used to: divide the input feature map sequence into multiple feature map subsequences based on the sliding window mechanism. For each feature map in each feature map subsequence, it is converted into one-dimensional features, and then all the converted one-dimensional features are stacked to obtain two-dimensional features. Then, classification initial features and position coding are added to the two-dimensional features.
特征预处理模块可以用于:输出添加分类初始特征和位置编码的二维特征。特征预处理模块的输出可以级联到自注意力特征提取模块的输入。The feature preprocessing module can be used to: output two-dimensional features that add classification initial features and position encoding. The output of the feature preprocessing module can be cascaded to the input of the self-attention feature extraction module.
具体地,针对输入的包含10个特征图的特征图序列,可以基于滑动窗口机制,滑动窗口长度为3,滑动步长为1,可以划分出8个特征图子序列。Specifically, for the input feature map sequence containing 10 feature maps, based on the sliding window mechanism, the sliding window length is 3 and the sliding step size is 1, and 8 feature map subsequences can be divided.
针对尺寸为a*b的特征图,可以将每个特征图子序列中的3个特征图分别都转换为1*n的一维特征,n=a*b,再将3个得到的一维特征进行堆叠,得到3*n的二维特征。For the feature map with size a*b, the three feature maps in each feature map subsequence can be converted into one-dimensional features of 1*n, n=a*b, and then the three obtained one-dimensional features The features are stacked to obtain 3*n two-dimensional features.
进而可以从时间维度添加分类初始特征,得到4*n的二维特征,再添加位置编码。也可以从空间维度添加分类初始特征,得到3*(n+1)的二维特征,再添加位置编码。Then, the classification initial features can be added from the time dimension to obtain 4*n two-dimensional features, and then position coding can be added. You can also add classification initial features from the spatial dimension to obtain 3*(n+1) two-dimensional features, and then add position coding.
再针对添加了分类初始特征之后的二维特征添加位置编码,位置编码可以包括:表征特征图序列中每个特征图不同特征点之间空间位置关系的信息,和表征特征图序列中不同特征图之间时序位置关系的信息。Then add position coding to the two-dimensional features after adding the initial classification features. The position coding can include: information characterizing the spatial position relationship between different feature points of each feature map in the feature map sequence, and characterizing different feature maps in the feature map sequence. information about the temporal relationship between them.
针对自注意力特征提取模块,其中可以包括3个级联的预设自注意力模块,每个预设自注意力模块中,可以针对输入的特征,从空间维度基于自注意力机制提取一次特征,进一步针对所提取的特征,从时间维度基于自注意力机制提取一次特征。 For the self-attention feature extraction module, it can include 3 cascaded preset self-attention modules. In each preset self-attention module, features can be extracted once from the spatial dimension based on the self-attention mechanism based on the input features. , further based on the extracted features, extract features once from the time dimension based on the self-attention mechanism.
自注意力特征提取模块的输出,可以是分类特征。具体可以是从第8个预设自注意力模块输出的特征中,将分类初始特征对应的当前表征,确定为分类特征。The output of the self-attention feature extraction module can be classification features. Specifically, from the features output by the eighth preset self-attention module, the current representation corresponding to the classification initial feature is determined as the classification feature.
具体也可以针对分类初始特征对应的当前表征进行池化处理后,得到固定尺寸的特征,确定为分类特征。Specifically, the current representation corresponding to the classification initial feature can also be pooled to obtain a fixed-size feature, which is determined as a classification feature.
自注意力特征提取模块的输出可以级联到预测模块的输入。The output of the self-attention feature extraction module can be cascaded to the input of the prediction module.
针对预测模块,可以用于根据自注意力特征提取模块输出的分类特征,利用预先训练的全连接网络,预测投篮识别结果。For the prediction module, it can be used to predict the shot recognition results based on the classification features output by the self-attention feature extraction module, using the pre-trained fully connected network.
为了便于理解,本发明实施例还提供了一种具体的方法实施例。To facilitate understanding, the embodiment of the present invention also provides a specific method embodiment.
在专业的篮球比赛中有相应的裁判计分,但在一般业余比赛中或训练过程中通常不会有裁判,需要运动员自己计分。In professional basketball games, there are corresponding referees to score, but in general amateur games or training processes, there are usually no referees, and the athletes need to score by themselves.
目前的大多数方案需要额外的传感器,部分方案甚至需要在运动员的手腕上配戴相应的穿戴设备,使用不方便且成本较高。Most current solutions require additional sensors, and some even require corresponding wearable devices to be worn on the athletes' wrists, which is inconvenient and costly.
本方法实施例提出了一个基于视觉的投篮识别算法,可以运行于手机或普通的个人电脑上,利用手机自带的摄像头或球场的监控摄像头即可实现投篮识别,具有操作简单和成本低的优势。The embodiment of this method proposes a vision-based shot recognition algorithm, which can be run on a mobile phone or an ordinary personal computer. Shot recognition can be realized using the camera of the mobile phone or the surveillance camera of the stadium, which has the advantages of simple operation and low cost. .
在篮球比赛中,投篮识别用于判断运动员的投篮是否进球,或者是否得分。使用投篮识别算法可以实现自动计分,使运动员可以专注于比赛。In basketball games, shot recognition is used to determine whether a player's shot is a goal or a score. Using shot recognition algorithms allows automatic scoring, allowing players to focus on the game.
在本方法实施例中,可以针对篮球比赛的视频数据,基于滑动窗口机制划分为多个视频片段。In this method embodiment, the video data of the basketball game can be divided into multiple video segments based on the sliding window mechanism.
具体地,滑动窗口的长度可以是2秒,滑动步长可以是1秒。Specifically, the length of the sliding window may be 2 seconds, and the sliding step may be 1 second.
针对每个视频片段,可以执行以下步骤。For each video clip, you can perform the following steps.
1、对于输入视频中的每一帧图像,使用基于开源架构(如YOLOX,RetinaNet)训练的目标检测网络进行篮板的检测。1. For each frame of image in the input video, use a target detection network trained based on open source architecture (such as YOLOX, RetinaNet) to detect backboards.
2、针对输入视频,取所有帧中同一位置检测到的篮板的最小外包围框。2. For the input video, take the smallest outer bounding box of the backboard detected at the same position in all frames.
3、将外包围框扩大1.5倍后,从每个视频帧图像相应位置裁剪出相应的篮板区域的图像内容。3. After expanding the outer bounding box by 1.5 times, crop the image content of the corresponding backboard area from the corresponding position of each video frame image.
4、将裁剪出的图像内容调整为固定尺寸的图像(如64x64)。使用投篮分类网络 进行投篮识别,也就是是否有进球的分类。4. Adjust the cropped image content to a fixed size image (such as 64x64). Using a shot classification network Perform shot recognition, that is, classify whether there is a goal or not.
投篮分类网络为一个自定义的一个深度神经网络,其中包括了二维卷积网络和自注意力机制。The shot classification network is a custom deep neural network, which includes a two-dimensional convolutional network and a self-attention mechanism.
a)二维卷积网络经过多年的实践证明了其在图像特征提取方面具有性能高、速度快的优点。目前大量的图像数据集,如ImageNet等,可以为二维卷积网络提供预训练数据,减少对目标数据集数量的需求同时可以增加网络泛化性。a) Two-dimensional convolutional network has proven its advantages of high performance and fast speed in image feature extraction after many years of practice. Currently, a large number of image data sets, such as ImageNet, can provide pre-training data for two-dimensional convolutional networks, reducing the need for the number of target data sets and increasing network generalization.
b)自注意力机制self-attention模块在自然语言中已经表现出了处理序列数据的巨大潜力。b) Self-attention mechanism The self-attention module has shown great potential in processing sequence data in natural language.
c)基于以上事实,在网络设计时可以将投篮分类网络分为两段。c) Based on the above facts, the shot classification network can be divided into two sections during network design.
第一段使用二维卷积网络,进行底层的图像级别的特征提取。The first section uses a two-dimensional convolutional network to extract underlying image-level features.
第二段使用self-attention进行时间域多帧图像特征的融合处理。The second section uses self-attention to fuse multi-frame image features in the time domain.
为了进一步提升空间特征的提取性能,在第二阶段可以通过转置+自注意力机制self-attention的方式对空间特征进行微调。In order to further improve the extraction performance of spatial features, in the second stage, the spatial features can be fine-tuned through transposition + self-attention mechanism.
由于第一段仅进行了图像级别的特征处理,其提取的特征在后续的处理过程中可以重复使用,减少了不必要的重复计算。Since the first stage only performs image-level feature processing, the extracted features can be reused in subsequent processing, reducing unnecessary repeated calculations.
第二段的self-attention可以有效的进行时间域的多帧图像融合处理,得到最优结果。The self-attention in the second section can effectively perform multi-frame image fusion processing in the time domain to obtain optimal results.
网络的具体定义如下所示。The specific definition of the network is as follows.
a)每一帧图像可以使用共享权重的三层二维卷积网络进行处理,每一层核大小为3,第一层步长为2,后两层步长为1,每一层都使用batch norm进行处理,激活函数为relu,输出通道数分别为32,64,128。a) Each frame of image can be processed using a three-layer two-dimensional convolutional network with shared weights. The kernel size of each layer is 3, the step size of the first layer is 2, and the step size of the last two layers is 1. Each layer uses Batch norm is processed, the activation function is relu, and the number of output channels are 32, 64, and 128 respectively.
b)将前一步生成的二维特征reshape为一维特征,并根据时间维度拼接在一起,生成的拼接特征大小为T x S x C,其中T表示为时间维度的数据,也就是同时处理的图像帧数量,典型值为32,其中S为空间维度,典型值为1024,C表示特征的通道数据,也就是上述卷积网络的输出通道数,典型值为128。b) Reshape the two-dimensional features generated in the previous step into one-dimensional features and splice them together according to the time dimension. The size of the generated spliced features is T x S x C, where T represents the time dimension data, which is processed simultaneously. The number of image frames, the typical value is 32, where S is the spatial dimension, the typical value is 1024, C represents the channel data of the feature, which is the number of output channels of the above convolution network, the typical value is 128.
c)使用8层混合self-attention层进行处理。c) Use an 8-layer hybrid self-attention layer for processing.
混合self-attention处理前先针对输入特征添加位置编码和分类初始特征。 Before mixing self-attention processing, position encoding and classification initial features are added to the input features.
使用一个self-attention模块对添加位置编码和分类初始特征后的拼接特征进行处理,具体可以是基于自注意力机制,从空间维度提取特征;处理完成后将所提取的特征转置为S x T x C后再使用另一个self-attention模块对特征进行处理,具体可以是基于自注意力机制,从时间维度提取特征,处理完成后将特征顺序还原为T x S x C。Use a self-attention module to process the spliced features after adding position encoding and classification initial features. Specifically, it can be based on the self-attention mechanism to extract features from the spatial dimension; after the processing is completed, the extracted features are transposed to S x T After x C, another self-attention module is used to process the features. Specifically, it can be based on the self-attention mechanism to extract features from the time dimension. After the processing is completed, the feature sequence is restored to T x S x C.
循环8次上述处理过程,最终得到T x S x C的特征。Loop the above process 8 times, and finally obtain the characteristics of T x S x C.
d)将所得到的特征中,分类初始特征对应的当前表征提取出来,确定为分类特征。将分类特征通过池化处理调整大小为1 x C,输入到一个全连接网络中,获取全连接网络输出的分类结果。d) Extract the current representation corresponding to the initial classification feature from the obtained features and determine it as the classification feature. Adjust the size of the classification features to 1 x C through pooling processing, input it into a fully connected network, and obtain the classification results output by the fully connected network.
5、分类网络输出的结果即为相应的投篮是否进球,如果有进球则进行相应的记分。5. The output result of the classification network is whether the corresponding shot is scored. If a goal is scored, the corresponding score will be scored.
通过本方法实施例,使用的输入为篮板周围的多个连续帧图像,在实际应用时仅需要检测篮板,篮板目标大,且无法移动检测难度极低,不需要显式的检测篮球及篮框,Through this method embodiment, the input used is multiple continuous frame images around the backboard. In practical applications, only the backboard needs to be detected. The backboard has a large target and cannot move. The detection difficulty is extremely low. There is no need to explicitly detect the basketball and the basket. ,
同时由于使用了连续帧作为输入,可以自动根据进球后的一些辅助信息判断是否进球,例如篮网的摆动辅助判断是否进球。At the same time, because continuous frames are used as input, it can automatically determine whether a goal is scored based on some auxiliary information after the goal is scored, such as the swing of the basket to assist in determining whether a goal is scored.
本方法实施例提供了一种基于视觉的投篮识别算法,可根据摄像头的视频输入判断是否有进球,从而达到比赛自动计分的目的,可直接运行于手机或普通个人电脑上,具有部署简单、可移动的优点。This method embodiment provides a vision-based shot recognition algorithm, which can determine whether a goal is scored based on the video input of the camera, thereby achieving the purpose of automatic scoring of the game. It can be directly run on a mobile phone or an ordinary personal computer, and has simple deployment , the advantage of being removable.
并且,不需要对篮框或篮板进行任何改造,不会干扰运动员投篮,可运行于手机或普通个人电脑,不需要专业人员安装,便于使用。Moreover, it does not require any modification to the basket frame or backboard, and will not interfere with athletes' shooting. It can run on mobile phones or ordinary personal computers, does not require professional installation, and is easy to use.
对应于上述方法实施例,本发明实施例还提供了一种装置实施例。Corresponding to the above method embodiment, the embodiment of the present invention also provides an apparatus embodiment.
如图4所示,图4是根据本发明实施例示出的一种投篮识别装置的结构示意图。As shown in Figure 4, Figure 4 is a schematic structural diagram of a shot recognition device according to an embodiment of the present invention.
该装置可以包括以下单元。The device may include the following units.
篮板识别单元401,用于:获取待识别视频;确定待识别视频中的篮板区域,得到待识别篮板区域图像序列。The backboard identification unit 401 is used to: obtain the video to be identified; determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified.
分类网络单元402,用于:将待识别篮板区域图像序列输入投篮分类网络,获取投篮分类网络输出的投篮识别结果;投篮识别结果用于表征投篮是否成功。The classification network unit 402 is used to: input the image sequence of the backboard area to be identified into the shot classification network, and obtain the shot recognition result output by the shot classification network; the shot recognition result is used to characterize whether the shot is successful.
具体的解释可以参见上述方法实施例。 For specific explanations, please refer to the above method embodiments.
本发明实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现上述任一方法实施例。Embodiments of the present invention also provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, any one of the above method embodiments is implemented. .
本发明实施例还提供一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述任一方法实施例。An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the one processor, The instructions are executed by the at least one processor, so that the at least one processor can execute any of the above method embodiments.
图5是根据本发明实施例示出的一种配置本发明实施例方法的计算机设备硬件结构示意图,该设备可以包括:处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。Figure 5 is a schematic hardware structure diagram of a computer device configured to configure a method according to an embodiment of the present invention. The device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040 and a bus 1050. The processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.
处理器1010可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本发明实施例所提供的技术方案。The processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the technical solutions provided by the embodiments of the present invention.
存储器1020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本发明实施例所提供的技术方案时,相关的程序代码保存在存储器1020中,并由处理器1010来调用执行。The memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1020 can store operating systems and other application programs. When the technical solution provided by the embodiment of the present invention is implemented through software or firmware, the relevant program code is stored in the memory 1020 and called and executed by the processor 1010.
输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1030 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。 Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本发明实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components. In addition, those skilled in the art can understand that the above-mentioned device may also include only the components necessary to implement the embodiments of the present invention, and does not necessarily include all the components shown in the figures.
本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一方法实施例。Embodiments of the present invention also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, any one of the above method embodiments can be implemented.
本发明实施例还提供一种存储有计算机程序的计算机可读存储介质,所述计算机程序在由处理器执行时实现上述任一方法实施例。Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which implements any of the above method embodiments when executed by a processor.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transient computer-readable media (transitory media), such as modulated data signals and carrier waves.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本发明实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本发明实施例的技术方案本质上或者说做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明实施例各个实施例或者实施例的某些部分所述的方法。From the above description of the embodiments, those skilled in the art can clearly understand that the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence or the contribution part of the technical solutions of the embodiments of the present invention can be embodied in the form of software products. The computer software products can be stored in storage media, such as ROM/RAM, magnetic disks, and optical disks. etc., including a number of instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments of the present invention.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules or units described in the above embodiments may be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, a laptop, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, or a game controller. desktop, tablet, wearable device, or a combination of any of these devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的 部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本发明实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner, and the same and similar features among the various embodiments Parts may refer to each other, and each embodiment focuses on its differences from other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment. The device embodiments described above are only illustrative. The modules described as separate components may or may not be physically separated. When implementing the embodiments of the present invention, the functions of each module may be integrated into the same device. or implemented in multiple software and/or hardware. Some or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
以上所述仅是本发明实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明实施例的保护。The above are only specific implementations of the embodiments of the present invention. It should be pointed out that those of ordinary skill in the art can make several improvements and modifications without departing from the principles of the embodiments of the present invention. Improvements and modifications should also be considered as protection of the embodiments of the invention.
在本发明中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性。术语“多个”指两个或两个以上,除非另有明确的限定。In the present invention, the terms "first" and "second" are used for descriptive purposes only and cannot be understood as indicating or implying relative importance. The term "plurality" refers to two or more than two, unless expressly limited otherwise.
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本发明未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本发明的真正范围和精神由下面的权利要求指出。Other embodiments of the invention will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The present invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary technical means in the technical field that are not disclosed in the invention. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。 It is to be understood that the present invention is not limited to the precise construction described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (13)

  1. 一种投篮识别方法,包括:A shot recognition method including:
    获取待识别视频;Get the video to be identified;
    确定所述待识别视频中的篮板区域,得到待识别篮板区域图像序列;Determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified;
    将所述待识别篮板区域图像序列输入投篮分类网络,获取所述投篮分类网络输出的投篮识别结果;其中,所述投篮识别结果用于表征投篮是否成功。The image sequence of the backboard area to be identified is input into the shot classification network, and the shot recognition result output by the shot classification network is obtained; wherein the shot recognition result is used to represent whether the shot is successful.
  2. 根据权利要求1所述的方法,所述投篮分类网络用于:According to the method of claim 1, the shot classification network is used to:
    针对输入的篮板区域图像序列中的每个篮板区域图像提取特征图,得到特征图序列;Extract feature maps for each backboard area image in the input backboard area image sequence to obtain a feature map sequence;
    针对所述特征图序列,基于自注意力机制提取分类特征;For the feature map sequence, extract classification features based on the self-attention mechanism;
    根据所述分类特征确定投篮识别结果。A shot recognition result is determined based on the classification features.
  3. 根据权利要求2所述的方法,所述针对所述特征图序列,基于自注意力机制提取分类特征,包括:The method according to claim 2, wherein extracting classification features based on a self-attention mechanism for the feature map sequence includes:
    针对所述特征图序列,添加位置编码;其中,所述位置编码包括:表征所述特征图序列中每个特征图不同特征点之间空间位置关系的信息,和表征所述特征图序列中不同特征图之间时序位置关系的信息;For the feature map sequence, a position code is added; wherein the position code includes: information characterizing the spatial position relationship between different feature points of each feature map in the feature map sequence, and characterizing the different feature points in the feature map sequence. Information about the temporal position relationship between feature maps;
    针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度和时间维度提取特征,得到分类特征。For the feature map sequence after adding position encoding, based on the self-attention mechanism, features are extracted from the spatial dimension and the temporal dimension to obtain classification features.
  4. 根据权利要求3所述的方法,所述针对所述特征图序列,添加位置编码,包括:The method according to claim 3, adding position coding to the feature map sequence includes:
    将所述特征图序列中的每个特征图,转换为一维特征;Convert each feature map in the feature map sequence into a one-dimensional feature;
    将一维特征进行堆叠转换处理,得到二维特征;Perform stacking conversion processing on one-dimensional features to obtain two-dimensional features;
    针对所述二维特征,添加位置编码。For the two-dimensional features, position coding is added.
  5. 根据权利要求3所述的方法,所述投篮分类网络包括N个级联的预设自注意力模块;N≥2;对于第i个预设自注意力模块,1≤i≤N-1,其输出级联至第i+1个预设自注意力模块的输入;所述预设自注意力模块用于基于自注意力机制,从空间维度和时间维度提取特征;According to the method of claim 3, the shot classification network includes N cascaded preset self-attention modules; N≥2; for the i-th preset self-attention module, 1≤i≤N-1, Its output is cascaded to the input of the i+1 preset self-attention module; the preset self-attention module is used to extract features from the spatial dimension and the temporal dimension based on the self-attention mechanism;
    所述针对添加位置编码后的特征图序列,基于自注意力机制,从空间维度和时间维度提取特征,包括:The feature map sequence after adding position encoding is based on the self-attention mechanism to extract features from the spatial dimension and the temporal dimension, including:
    将添加位置编码后的特征图序列输入第1个预设自注意力模块,基于第N个预设自注意力模块的输出,确定分类特征。The feature map sequence after position coding is input into the first preset self-attention module, and the classification features are determined based on the output of the Nth preset self-attention module.
  6. 根据权利要求5所述的方法,所述预设自注意力模块用于:According to the method of claim 5, the preset self-attention module is used for:
    针对输入的特征,从空间维度基于自注意力机制串行提取至少一次特征,进一步针 对所提取的特征,从时间维度基于自注意力机制串行提取至少一次特征,输出所提取的特征;For the input features, features are extracted serially at least once from the spatial dimension based on the self-attention mechanism, and further targeted For the extracted features, serially extract features at least once from the time dimension based on the self-attention mechanism, and output the extracted features;
    或者,or,
    针对输入的特征,从时间维度基于自注意力机制串行提取至少一次特征,进一步针对所提取的特征,从空间维度基于自注意力机制串行提取至少一次特征,输出所提取的特征。For the input features, features are serially extracted at least once from the time dimension based on the self-attention mechanism. Further, for the extracted features, features are serially extracted at least once from the spatial dimension based on the self-attention mechanism, and the extracted features are output.
  7. 根据权利要求2所述的方法,所述针对所述特征图序列,基于自注意力机制提取分类特征,包括:The method according to claim 2, wherein extracting classification features based on a self-attention mechanism for the feature map sequence includes:
    针对所述特征图序列添加分类初始特征;Add classification initial features to the feature map sequence;
    针对添加分类初始特征后的特征图序列,基于自注意力机制提取特征;For the feature map sequence after adding the initial features for classification, features are extracted based on the self-attention mechanism;
    从所提取的特征中,将所述分类初始特征对应的当前表征,确定为分类特征。From the extracted features, the current representation corresponding to the initial classification feature is determined as the classification feature.
  8. 根据权利要求2或7所述的方法,所述根据所述分类特征确定投篮识别结果,包括:The method according to claim 2 or 7, wherein determining the shot recognition result according to the classification characteristics includes:
    针对所述分类特征进行池化处理,得到预设特征尺寸的待输入特征;Perform pooling processing on the classification features to obtain features to be input with preset feature sizes;
    将所述待输入特征输入到预先训练的全连接网络中,获取所述全连接网络输出的投篮识别结果。The features to be input are input into the pre-trained fully connected network, and the shot recognition result output by the fully connected network is obtained.
  9. 根据权利要求2所述的方法,所述针对所述特征图序列,基于自注意力机制提取分类特征,包括:The method according to claim 2, wherein extracting classification features based on a self-attention mechanism for the feature map sequence includes:
    针对所述特征图序列,将前m个特征图确定为当前滑动窗口包含的特征图子序列;m≥1;For the feature map sequence, the first m feature maps are determined as the feature map subsequence contained in the current sliding window; m≥1;
    循环执行以下步骤,直到当前滑动窗口无法向后移动:针对当前滑动窗口包含的特征图子序列,基于自注意力机制提取分类特征;将滑动窗口向后移动预设滑动步长。Perform the following steps in a loop until the current sliding window cannot move backward: extract classification features based on the self-attention mechanism for the feature map subsequence contained in the current sliding window; move the sliding window backward by the preset sliding step.
  10. 根据权利要求1所述的方法,所述得到待识别篮板区域图像序列,包括:The method according to claim 1, obtaining the image sequence of the backboard area to be identified includes:
    针对所确定的每个篮板区域,裁剪包含该篮板区域的图像内容,并将裁剪结果调整为预设图像尺寸,将调整结果添加到待识别篮板区域图像序列中;所述待识别篮板区域图像序列中,以篮板区域图像所在视频帧之间的时序顺序排序。For each determined backboard area, crop the image content containing the backboard area, adjust the cropping result to a preset image size, and add the adjustment result to the backboard area image sequence to be identified; the backboard area image sequence to be identified , sorted by the temporal order between the video frames where the backboard area images are located.
  11. 一种投篮识别装置,包括:A shot recognition device including:
    篮板识别单元,用于:获取待识别视频;确定所述待识别视频中的篮板区域,得到待识别篮板区域图像序列;A backboard recognition unit, configured to: obtain a video to be identified; determine the backboard area in the video to be identified, and obtain an image sequence of the backboard area to be identified;
    分类网络单元,用于:将所述待识别篮板区域图像序列输入投篮分类网络,获取所述投篮分类网络输出的投篮识别结果;其中,所述投篮识别结果用于表征投篮是否成功。 A classification network unit configured to: input the to-be-identified backboard area image sequence into a shot classification network, and obtain a shot recognition result output by the shot classification network; wherein the shot recognition result is used to characterize whether the shot is successful.
  12. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至10中任一项所述方法。The memory stores instructions executable by the one processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the instructions of any one of claims 1 to 10 method.
  13. 一种存储有计算机程序的计算机可读存储介质,所述计算机程序在由处理器执行时实现权利要求1至10中任一项所述方法。 A computer-readable storage medium storing a computer program that implements the method of any one of claims 1 to 10 when executed by a processor.
PCT/CN2023/110320 2022-08-23 2023-07-31 Basketball shot recognition method and apparatus, device and storage medium WO2024041319A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211014399.XA CN115376047A (en) 2022-08-23 2022-08-23 Shooting identification method, device, equipment and storage medium
CN202211014399.X 2022-08-23

Publications (1)

Publication Number Publication Date
WO2024041319A1 true WO2024041319A1 (en) 2024-02-29

Family

ID=84067387

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110320 WO2024041319A1 (en) 2022-08-23 2023-07-31 Basketball shot recognition method and apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN115376047A (en)
WO (1) WO2024041319A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376047A (en) * 2022-08-23 2022-11-22 京东方科技集团股份有限公司 Shooting identification method, device, equipment and storage medium
CN116109981B (en) * 2023-01-31 2024-04-12 北京智芯微电子科技有限公司 Shooting recognition method, basketball recognition device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701460A (en) * 2016-01-07 2016-06-22 王跃明 Video-based basketball goal detection method and device
US20170262995A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
CN110942022A (en) * 2019-11-25 2020-03-31 维沃移动通信有限公司 Shooting data output method and electronic equipment
CN115376047A (en) * 2022-08-23 2022-11-22 京东方科技集团股份有限公司 Shooting identification method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701460A (en) * 2016-01-07 2016-06-22 王跃明 Video-based basketball goal detection method and device
US20170262995A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
CN110942022A (en) * 2019-11-25 2020-03-31 维沃移动通信有限公司 Shooting data output method and electronic equipment
CN115376047A (en) * 2022-08-23 2022-11-22 京东方科技集团股份有限公司 Shooting identification method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115376047A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
WO2024041319A1 (en) Basketball shot recognition method and apparatus, device and storage medium
EP3445044B1 (en) Video recording method, server, system, and storage medium
JP2020518078A (en) METHOD AND APPARATUS FOR OBTAINING VEHICLE LOSS EVALUATION IMAGE, SERVER, AND TERMINAL DEVICE
US10540988B2 (en) Method and apparatus for sound event detection robust to frequency change
CN111046235B (en) Method, system, equipment and medium for searching acoustic image archive based on face recognition
US10565713B2 (en) Image processing apparatus and method
Fu et al. Camera-based basketball scoring detection using convolutional neural network
KR20160020498A (en) Tracker assisted image capture
US20220230331A1 (en) Method and electronic device for determining motion saliency and video playback style in video
EP2966591A1 (en) Method and apparatus for identifying salient events by analyzing salient video segments identified by sensor information
US11394870B2 (en) Main subject determining apparatus, image capturing apparatus, main subject determining method, and storage medium
US11307668B2 (en) Gesture recognition method and apparatus, electronic device, and storage medium
CN110046568B (en) Video action recognition method based on time perception structure
US20170099427A1 (en) Methods and apparatuses for providing improved autofocus using curve-fitting
CN111104830A (en) Deep learning model for image recognition, training device and method of deep learning model
US20220237938A1 (en) Methods of performing real-time object detection using object real-time detection model, performance optimization methods of object real-time detection model, electronic devices and computer readable storage media
JP2022539423A (en) Image feature extraction and network training method, device and equipment
US10339414B2 (en) Method and device for detecting face, and non-transitory computer-readable recording medium for executing the method
US11488374B1 (en) Motion trajectory tracking for action detection
US11244154B2 (en) Target hand tracking method and apparatus, electronic device, and storage medium
Choudhary et al. Real time video summarization on mobile platform
JP2009302723A (en) Image processing device, method and program
JP5147737B2 (en) Imaging device
US11601591B2 (en) Image processing apparatus for estimating action of subject and adding information indicating the action of the subject to an image, method for controlling the same, and storage medium
KR102637343B1 (en) Method and apparatus for tracking object

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856409

Country of ref document: EP

Kind code of ref document: A1