TW202309772A

TW202309772A - Behavior recognition method and apparatus, electronic device, and storage medium

Info

Publication number: TW202309772A
Application number: TW111108739A
Authority: TW
Inventors: 李帥成; 楊昆霖; 侯軍; 伊帥
Original assignee: 大陸商上海商湯科技開發有限公司
Priority date: 2021-08-24
Filing date: 2022-03-10
Publication date: 2023-03-01
Also published as: CN113688729B; CN113688729A; WO2023024438A1

Abstract

A behavior recognition method and apparatus, an electronic device, and a storage medium. The method comprises: receiving an input video frame and extracting character features in the video frame; clustering the multiple character features in the video frame to obtain a clustering result; determining attention allocation weights of the character features in the video frame on the basis of the clustering result; updating the character features on the basis of the attention allocation weights; extracting character spatio-temporal features on the basis of the updated character features; and performing behavior recognition on the video frame on the basis of the character spatio-temporal features to obtain a recognition result.

Description

Behavior recognition method, electronic device and storage medium

本發明是有關於一種電腦技術領域，特別是指一種行爲識別方法及電子設備和儲存媒體。The present invention relates to the field of computer technology, in particular to a behavior recognition method, electronic equipment and storage media.

在人群行爲識別（Group Activity Recognition）技術中，會通過電腦視覺演算法來識別影片畫面中不同人的動作類別，以及該畫面描述的人群行爲類別，常用於體育賽事等場景的行爲識別。例如，對於排球比賽影片，該任務需要識別每位排球運動員的動作類別以及該段影片所描述的人群活動類別（左邊傳球、右邊傳球、左邊扣球等）。對於該任務，我們通常可以先檢測影片中的人體，再通過個體動作識別（Individual Action Recognition），根據個體的動作進一步的推斷該影片的人群行爲類別。In group activity recognition (Group Activity Recognition) technology, computer vision algorithms are used to identify the action categories of different people in the movie screen, and the crowd behavior categories described in the screen, which are often used for behavior recognition in scenes such as sports events. For example, for a video of a volleyball match, the task requires identifying the action category of each volleyball player and the category of crowd activity described in the video (left pass, right pass, left spike, etc.). For this task, we can usually detect the human body in the film first, and then use Individual Action Recognition (Individual Action Recognition) to further infer the crowd behavior category of the film based on the individual's actions.

近年來隨著深度學習在電腦視覺上的發展，過去許多工作通常利用卷積神經網路來檢測影片中每個人的動作，並利用全域池化來得到人群總體特徵來識別人群行爲類別。人群行爲識別除依賴於個體動作和影片背景訊息之外，也依賴於個體動作之間的關係訊息。除了使用卷積神經網路以外，一些方法還利用圖卷積網路、循環神經網路、Transformer等模型，來捕獲並分析個體動作之間的關係訊息。In recent years, with the development of deep learning in computer vision, many works in the past usually use convolutional neural networks to detect the actions of each person in the film, and use global pooling to obtain the overall characteristics of the crowd to identify crowd behavior categories. Crowd behavior recognition not only depends on individual actions and video background information, but also depends on the relationship information between individual actions. In addition to using convolutional neural networks, some methods also use models such as graph convolutional networks, recurrent neural networks, and Transformers to capture and analyze the relationship information between individual actions.

因此，本發明提出了一種行爲識別技術方案。Therefore, the present invention proposes a behavior recognition technical solution.

於是，根據本發明的第一目的，提供了一種行爲識別方法，包括：接收輸入的影片幀，並提取所述影片幀中的人物特徵；對所述影片幀中的多個所述人物特徵進行聚類，得到聚類結果；基於所述聚類結果確定所述影片幀中人物特徵的注意力分配權重；基於所述注意力分配權重更新所述人物特徵；基於所述更新後的人物特徵，提取人物時空特徵；基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果。Therefore, according to the first object of the present invention, a behavior recognition method is provided, comprising: receiving an input movie frame, and extracting character features in the movie frame; Clustering to obtain a clustering result; determining the attention distribution weight of the character feature in the film frame based on the clustering result; updating the character feature based on the attention distribution weight; based on the updated character feature, Extracting the spatio-temporal features of the characters; performing behavior recognition on the movie frames based on the spatio-temporal features of the characters to obtain a recognition result.

在一種可能的實現方式中，所述基於所述聚類結果確定所述影片幀中人物特徵的注意力分配權重，包括：基於所述聚類結果中所述人物特徵之間的關聯關係，確定所述人物特徵之間的注意力分配權重。In a possible implementation manner, the determining the attention distribution weights of the character features in the film frame based on the clustering result includes: determining the Attention distribution weights among the character features.

在一種可能的實現方式中，所述基於所述聚類結果中所述人物特徵之間的關聯關係，確定所述人物特徵之間的注意力分配權重，包括：確定聚類得到的同一組內的人物特徵之間的第一相似度；基於所述第一相似度，確定組內人物特徵之間的第一注意力分配權重。In a possible implementation manner, the determining the attention distribution weights between the character features based on the association relationship between the character features in the clustering result includes: determining the same group obtained by clustering The first similarity between the character features of the group; based on the first similarity, determine the first attention distribution weight among the character features in the group.

在一種可能的實現方式中，確定聚類得到的同一組內的人物特徵之間的第一相似度，包括：將所述人物特徵的特徵矩陣劃分爲N份；對不同的人物特徵的N份特徵分別對應計算相似度，得到N個第一相似度；所述基於所述第一相似度，確定組內人物特徵之間的第一注意力分配權重，包括：基於所述N個第一相似度，確定組內人物特徵之間的N個第一注意力分配權重。In a possible implementation manner, determining the first similarity between the character features in the same group obtained by clustering includes: dividing the feature matrix of the character features into N parts; The features are respectively corresponding to the calculation of similarities, and N first similarities are obtained; said determining the first attention distribution weight between the character features in the group based on said first similarities includes: based on said N first similarities degree, determine the N first attention distribution weights among the characters in the group.

在一種可能的實現方式中，所述基於所述聚類結果中所述人物特徵之間的關聯關係，確定所述人物特徵之間的注意力分配權重，包括：確定聚類得到的各組的總體特徵；確定聚類得到的各組的總體特徵之間的第二相似度；基於所述第二相似度，確定所述人物特徵之間的第二注意力分配權重。In a possible implementation manner, the determining the attention distribution weights between the character features based on the association relationship between the character features in the clustering result includes: determining the weights of each group obtained by clustering Overall features; determining a second similarity between the overall features of each group obtained by clustering; based on the second similarity, determining a second attention distribution weight between the character features.

在一種可能的實現方式中，所述基於所述注意力分配權重更新所述人物特徵，包括：針對單個組的人物特徵中的目標人物特徵，利用目標人物特徵與組內的各人物特徵的第一注意力分配權重，對組內的各人物特徵進行加權求和，得到組內各人物對應的組內更新特徵，作爲更新後的人物特徵。In a possible implementation manner, the updating of the character features based on the attention distribution weight includes: for a target character feature in the character features of a single group, using the target character feature and the first character feature of each character feature in the group - Attention distribution weights, performing weighted summation on the features of each character in the group, and obtaining the update features in the group corresponding to each character in the group, as the updated character features.

在一種可能的實現方式中，所述基於所述注意力分配權重更新所述人物特徵，包括：針對各組中的目標分組的目標總體特徵，利用目標總體特徵與各組的總體特徵的第二注意力分配權重，得到各組對應的組間更新特徵；將所述組間更新特徵分別加到目標分組中的各人物對應的組內更新特徵中，得到更新後的人物特徵。In a possible implementation manner, the updating of the character features based on the attention distribution weights includes: for the target general features of the target groups in each group, using the second Attention is assigned to weights to obtain the inter-group update features corresponding to each group; the inter-group update features are respectively added to the intra-group update features corresponding to the characters in the target group to obtain updated character features.

在一種可能的實現方式中，所述基於所述更新後的人物特徵，提取人物時空特徵，包括：將更新後的人物特徵進行空間解碼，得到人物時空特徵。In a possible implementation manner, the extracting the spatio-temporal features of the character based on the updated character features includes: performing spatial decoding on the updated character features to obtain the spatio-temporal features of the character.

在一種可能的實現方式中，將更新後的人物特徵進行空間解碼，得到人物時空特徵，包括：將更新後的人物特徵進行空間解碼，得到人物空間特徵；將多個影片幀的所述人物特徵進行時域編碼與解碼，得到人物時域特徵；將所述人物空間特徵與所述人物時域特徵進行融合，得到人物時空特徵。In a possible implementation manner, performing spatial decoding on the updated character features to obtain the character spatio-temporal features includes: performing spatial decoding on the updated character features to obtain the character spatial features; Time-domain encoding and decoding are performed to obtain the time-domain characteristics of the person; the space-time characteristics of the person are fused with the time-domain characteristics of the person to obtain the time-space characteristics of the person.

在一種可能的實現方式中，所述將多個影片幀的所述人物特徵進行時域編碼與解碼，得到人物時域特徵，包括：基於自注意力機制對多個影片幀的所述人物特徵進行編碼，得到時域編碼特徵；基於自注意力機制對所述時域編碼特徵進行解碼，和/或基於空間編碼特徵對所述時域編碼特徵進行解碼，得到人物時域特徵；其中，所述空間編碼特徵爲所述更新後的人物特徵。In a possible implementation manner, the time-domain encoding and decoding of the character features of multiple film frames to obtain the character time-domain features includes: based on the self-attention mechanism, the character features of multiple film frames Encoding is performed to obtain time-domain coding features; the time-domain coding features are decoded based on a self-attention mechanism, and/or the time-domain coding features are decoded based on spatial coding features to obtain character time-domain features; wherein, The spatially encoded features are the updated character features.

在一種可能的實現方式中，所述將更新後的人物特徵進行空間解碼，得到人物空間特徵，包括：基於自注意力機制對所述空間編碼特徵進行解碼，和/或基於所述時域編碼特徵對所述空間編碼特徵進行解碼，得到人物空間特徵。In a possible implementation manner, the spatial decoding of the updated character features to obtain the character spatial features includes: decoding the spatial encoding features based on a self-attention mechanism, and/or decoding the spatial encoding features based on the time domain encoding The feature decodes the spatially encoded features to obtain the character's spatial features.

在一種可能的實現方式中，所述方法還包括：提取所述影片幀的全域特徵；利用所述人物時空特徵，確定所述全域特徵中的第三注意力分配權重；利用所述第三注意力分配權重更新所述全域特徵；所述基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果，包括：基於更新後的全域特徵，對所述影片幀進行人群行爲識別，得到人群行爲識別結果。In a possible implementation, the method further includes: extracting global features of the movie frame; using the character spatio-temporal features to determine a third attention distribution weight in the global features; using the third attention The force distribution weight updates the global features; the performing behavior recognition on the film frame based on the character spatio-temporal features, and obtaining the recognition result includes: performing crowd behavior recognition on the film frames based on the updated global features, Obtain the result of crowd behavior recognition.

在一種可能的實現方式中，在利用所述第三注意力分配權重更新所述全域特徵後，所述方法還包括：將更新後的全域特徵作爲新的全域特徵，將所述人物時空特徵作爲新的人物特徵，疊代地對全域特徵和人物時空特徵進行更新，直至滿足疊代停止條件，得到疊代更新後的全域特徵和人物時空特徵；所述基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果，包括：基於疊代更新後的全域特徵，對所述影片幀進行人群行爲識別，得到人群行爲識別結果。In a possible implementation manner, after using the third attention distribution weight to update the global feature, the method further includes: using the updated global feature as a new global feature, and using the character spatiotemporal feature as For new character features, iteratively update the global features and character spatio-temporal features until the iterative stop condition is met, and obtain the iteratively updated global features and character spatio-temporal features; based on the character spatio-temporal features, the Performing behavior recognition on the film frame to obtain a recognition result includes: performing crowd behavior recognition on the film frame based on the iteratively updated global features to obtain a crowd behavior recognition result.

在一種可能的實現方式中，在得到疊代更新後的全域特徵和人物時空特徵後，所述方法還包括：基於疊代更新後的所述人物時空特徵進行人物行爲識別，得到人物行爲識別結果。In a possible implementation manner, after obtaining the iteratively updated global features and character spatio-temporal features, the method further includes: performing character behavior recognition based on the iteratively updated character spatio-temporal features to obtain a character behavior recognition result .

在一種可能的實現方式中，所述提取所述影片幀中的人物特徵，包括：對影片幀中進行人體識別，得到各人物的目標矩形框；提取影片幀中的特徵，並利用影片幀中的所述目標矩形框，從提取的影片幀的特徵中匹配得到對應的人物特徵。In a possible implementation manner, the extracting the character features in the film frame includes: performing human body recognition in the film frame to obtain the target rectangular frame of each character; extracting the feature in the film frame, and using the The target rectangular frame is matched with the features of the extracted movie frame to obtain the corresponding character features.

根據本發明的第二目的，提供了一種行爲識別裝置，包括：一人物特徵提取單元，用於接收輸入的影片幀，並提取所述影片幀中的人物特徵；一聚類單元，用於對所述影片幀中的多個所述人物特徵進行聚類，得到聚類結果；一注意力分配單元，用於基於所述聚類結果確定所述影片幀中人物特徵的注意力分配權重；一人物特徵更新單元，用於基於所述注意力分配權重更新所述人物特徵；一人物時空特徵提取單元，用於基於所述更新後的人物特徵，提取人物時空特徵；及一行爲識別單元，用於基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果。According to the second object of the present invention, a behavior recognition device is provided, including: a character feature extraction unit for receiving input film frames and extracting character features in the film frames; a clustering unit for A plurality of the character features in the film frame are clustered to obtain a clustering result; an attention distribution unit is used to determine the attention distribution weight of the character feature in the film frame based on the clustering result; A character feature update unit, used to update the character feature based on the attention distribution weight; a character spatio-temporal feature extraction unit, used to extract the character spatio-temporal feature based on the updated character feature; and a behavior recognition unit, with Based on the spatio-temporal characteristics of the person, behavior recognition is performed on the video frame to obtain a recognition result.

在一種可能的實現方式中，所述注意力分配單元，用於基於所述聚類結果中所述人物特徵之間的關聯關係，確定所述人物特徵之間的注意力分配權重。In a possible implementation manner, the attention allocation unit is configured to determine an attention allocation weight between the character features based on the association relationship between the character features in the clustering result.

在一種可能的實現方式中，所述注意力分配單元，包括：一第一相似度確定單元，用於確定聚類得到的同一組內的人物特徵之間的第一相似度；及一第一注意力確定單元，用於基於所述第一相似度，確定組內人物特徵之間的第一注意力分配權重。In a possible implementation manner, the attention allocating unit includes: a first similarity determining unit, configured to determine a first similarity between character features in the same group obtained by clustering; and a first An attention determination unit, configured to determine a first attention distribution weight among characters in the group based on the first similarity.

在一種可能的實現方式中，所述第一相似度確定單元，用於將所述人物特徵的特徵矩陣劃分爲N份；對不同的人物特徵的N份特徵分別對應計算相似度，得到N個第一相似度；及所述第一注意力確定單元，用於基於所述N個第一相似度，確定組內人物特徵之間的N個第一注意力分配權重。In a possible implementation manner, the first similarity determination unit is configured to divide the feature matrix of the character features into N parts; and calculate the similarity for the N features of different character features to obtain N A first similarity; and the first attention determination unit, configured to determine N first attention distribution weights among the characters in the group based on the N first similarities.

在一種可能的實現方式中，所述注意力分配單元，包括：一總體特徵確定單元，用於確定聚類得到的各組的總體特徵；一第二相似度確定單元，用於確定聚類得到的各組的總體特徵之間的第二相似度；及一第二注意力確定單元，用於基於所述第二相似度，確定所述人物特徵之間的第二注意力分配權重。In a possible implementation manner, the attention allocation unit includes: an overall feature determination unit, configured to determine the overall features of each group obtained by clustering; a second similarity determination unit, configured to determine the clustering obtained A second similarity between the overall features of each group; and a second attention determination unit, configured to determine a second attention distribution weight between the character features based on the second similarity.

在一種可能的實現方式中，所述人物特徵更新單元，用於針對單個組的人物特徵中的目標人物特徵，利用目標人物特徵與組內的各人物特徵的第一注意力分配權重，對組內的各人物特徵進行加權求和，得到組內各人物對應的組內更新特徵，作爲更新後的人物特徵。In a possible implementation manner, the character feature updating unit is configured to, for a target character feature in the character features of a single group, use the target character feature and the first attention distribution weight of each character feature in the group to assign weights to the group Each character feature in the group is weighted and summed to obtain the update feature in the group corresponding to each character in the group, which is used as the updated character feature.

在一種可能的實現方式中，所述人物特徵更新單元，包括：一組間更新特徵確定單元，用於針對各組中的目標分組的目標總體特徵，利用目標總體特徵與各組的總體特徵的第二注意力分配權重，得到各組對應的組間更新特徵；及一人物特徵更新子單元，用於將所述組間更新特徵分別加到目標分組中的各人物對應的組內更新特徵中，得到更新後的人物特徵。In a possible implementation manner, the character feature update unit includes: an inter-group update feature determination unit, configured to use the difference between the target overall feature and the overall feature of each group for the target overall feature of the target grouping in each group. The second attention distribution weight obtains the update features between groups corresponding to each group; and a character feature update subunit, which is used to add the update features between the groups to the update features in the group corresponding to each character in the target grouping , to get the updated character traits.

在一種可能的實現方式中，所述人物時空特徵提取單元，用於將更新後的人物特徵進行空間解碼，得到人物時空特徵。In a possible implementation manner, the character spatio-temporal feature extraction unit is configured to perform spatial decoding on the updated character features to obtain the character spatio-temporal features.

在一種可能的實現方式中，所述人物時空特徵提取單元，包括：一空間解碼單元，用於將更新後的人物特徵進行空間解碼，得到人物空間特徵；一時域編解碼單元，用於將多個影片幀的所述人物特徵進行時域編碼與解碼，得到人物時域特徵；及一融合單元，用於將所述人物空間特徵與所述人物時域特徵進行融合，得到人物時空特徵。In a possible implementation manner, the character spatio-temporal feature extraction unit includes: a spatial decoding unit, configured to spatially decode the updated character features to obtain character spatial features; a time-domain encoding and decoding unit, configured to convert multiple Time-domain encoding and decoding are performed on the character features of each film frame to obtain character time-domain features; and a fusion unit is used to fuse the character space features with the character time-domain features to obtain character space-time features.

在一種可能的實現方式中，所述時域編解碼單元，包括：一時域編碼單元，用於基於自注意力機制對多個影片幀的所述人物特徵進行編碼，得到時域編碼特徵；及一時域解碼單元，用於基於自注意力機制對所述時域編碼特徵進行解碼，和/或基於空間編碼特徵對所述時域編碼特徵進行解碼，得到人物時域特徵；其中，所述空間編碼特徵爲所述更新後的人物特徵。In a possible implementation manner, the time-domain encoding and decoding unit includes: a time-domain encoding unit, configured to encode the character features of multiple film frames based on a self-attention mechanism to obtain time-domain encoding features; and A time-domain decoding unit, configured to decode the time-domain coding features based on a self-attention mechanism, and/or decode the time-domain coding features based on spatial coding features to obtain character time-domain features; wherein, the spatial The encoded feature is the updated character feature.

在一種可能的實現方式中，所述空間解碼單元，用於基於自注意力機制對所述空間編碼特徵進行解碼，和/或基於所述時域編碼特徵對所述空間編碼特徵進行解碼，得到人物空間特徵。In a possible implementation manner, the spatial decoding unit is configured to decode the spatially encoded features based on a self-attention mechanism, and/or decode the spatially encoded features based on the time domain encoded features, to obtain character space features.

在一種可能的實現方式中，所述裝置還包括：一全域特徵提取單元，用於提取所述影片幀的全域特徵；一第三注意力確定單元，用於利用所述人物時空特徵，確定所述全域特徵中的第三注意力分配權重；及一全域特徵更新單元，用於利用所述第三注意力分配權重更新所述全域特徵；所述行爲識別單元，包括：一人群行爲識別單元，用於基於更新後的全域特徵，對所述影片幀進行人群行爲識別，得到人群行爲識別結果。In a possible implementation manner, the device further includes: a global feature extraction unit, configured to extract global features of the film frame; a third attention determination unit, configured to determine the The third attention distribution weight in the global feature; and a global feature update unit, used to update the global feature with the third attention distribution weight; the behavior recognition unit includes: a crowd behavior recognition unit, The method is used for performing crowd behavior recognition on the film frame based on the updated global feature to obtain a crowd behavior recognition result.

在一種可能的實現方式中，所述裝置還包括：一疊代更新單元，用於將更新後的全域特徵作爲新的全域特徵，將所述人物時空特徵作爲新的人物特徵，疊代地對全域特徵和人物時空特徵進行更新，直至滿足疊代停止條件，得到疊代更新後的全域特徵和人物時空特徵；所述行爲識別單元，用於基於疊代更新後的全域特徵，對所述影片幀進行人群行爲識別，得到人群行爲識別結果。In a possible implementation manner, the device further includes: an iterative updating unit, configured to use the updated global features as new global features, and use the character spatio-temporal features as new character features, iteratively The global feature and the character spatio-temporal feature are updated until the iterative stop condition is satisfied, and the iteratively updated global feature and the character spatio-temporal feature are obtained; The frame is used for crowd behavior recognition, and the result of crowd behavior recognition is obtained.

在一種可能的實現方式中，所述裝置還包括：一人物行爲識別單元，用於基於疊代更新後的所述人物時空特徵進行人物行爲識別，得到人物行爲識別結果。In a possible implementation manner, the device further includes: a character behavior recognition unit, configured to perform character behavior recognition based on the iteratively updated spatio-temporal characteristics of the character, and obtain a character behavior recognition result.

在一種可能的實現方式中，所述人物特徵提取單元，包括：一目標矩形框確定單元，用於對影片幀中進行人體識別，得到各人物的目標矩形框；及一人物特徵提取子單元，用於提取影片幀中的特徵，並利用影片幀中的所述目標矩形框，從提取的影片幀的特徵中匹配得到對應的人物特徵。In a possible implementation manner, the character feature extraction unit includes: a target rectangular frame determination unit, configured to perform human body recognition in movie frames to obtain the target rectangular frame of each character; and a character feature extraction subunit, It is used to extract the features in the film frame, and use the target rectangular frame in the film frame to match the features of the extracted film frame to obtain the corresponding character features.

根據本發明的第三目的，提供了一種電子設備，包括：一處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置爲調用所述記憶體儲存的指令，以執行上述方法。According to the third object of the present invention, an electronic device is provided, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory, to perform the above method.

根據本發明的第四目的，提供了一種電腦可讀儲存媒體，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。According to the fourth object of the present invention, there is provided a computer-readable storage medium on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor.

根據本發明的第五目的，提供了一種電腦程式産品，包括電腦可讀代碼，或者承載有電腦可讀代碼的非揮發性電腦可讀儲存媒體，當所述電腦可讀代碼在電子設備的處理器中運行時，所述電子設備中的處理器執行上述方法。According to the fifth object of the present invention, a computer program product is provided, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are processed in an electronic device When running in the processor, the processor in the electronic device executes the above method.

本發明的功效在於：在本發明實施例中，在接收輸入的影片幀後，提取影片幀中的人物特徵，通過對影片幀中的多個人物特徵進行聚類得到聚類結果，然後基於聚類結果確定影片幀中人物特徵的注意力分配權重，基於注意力分配權重更新人物特徵，然後基於更新後的人物特徵，提取人物時空特徵，並基於人物時空特徵，對影片幀進行行爲識別，得到識別結果。由此，基於聚類來得到每個個體特徵（動作）之間的關係，基於聚類結果得到人物特徵的注意力分配權重，以凸顯不同人物特徵之間的重要性，使得人物特徵中重要的動作訊息能夠凸顯出來，那麽，對基於更新後的人物特徵提取的人物時空特徵進行行爲識別，減少了分析不重要的個體動作之間的關係造成的計算冗餘和訊息干擾，提高了行爲識別的準確度。The effect of the present invention is that: in the embodiment of the present invention, after receiving the input film frame, the character features in the film frame are extracted, and the clustering result is obtained by clustering a plurality of character features in the film frame, and then based on the clustering Determine the attention distribution weight of the character features in the movie frame based on the class results, update the character features based on the attention distribution weight, and then extract the character spatio-temporal features based on the updated character features, and conduct behavior recognition on the movie frame based on the character spatio-temporal features, and get recognition result. Therefore, the relationship between each individual feature (action) is obtained based on clustering, and the attention distribution weight of character features is obtained based on the clustering results to highlight the importance of different character features, so that the most important character features If the action information can be highlighted, then, the behavior recognition based on the updated character feature extraction can reduce the calculation redundancy and information interference caused by the relationship between unimportant individual actions, and improve the accuracy of behavior recognition. Accuracy.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本發明。根據下面參考圖式對示例性實施例的詳細說明，本發明的其它特徵及方面將變得清楚。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention. Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

在本發明被詳細描述之前，應當注意在以下的說明內容中，類似的元件是以相同的編號來表示。Before the present invention is described in detail, it should be noted that in the following description, similar elements are denoted by the same numerals.

以下將參考圖式詳細說明本發明的各種示例性實施例、特徵和方面。圖式中相同的圖式標記表示功能相同或相似的元件。儘管在圖式中示出了實施例的各種方面，但是除非特別指出，不必按比例繪製圖式。Various exemplary embodiments, features, and aspects of the invention will be described in detail below with reference to the drawings. Identical drawing symbols in the drawings indicate functionally identical or similar elements. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

在這裡專用的詞“示例性”意爲“用作例子、實施例或說明性”。這裡作爲“示例性”所說明的任何實施例不必解釋爲優於或好於其它實施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

本文中術語“和/或”，僅僅是一種描述關聯對象的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情况。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein means any one or any combination of at least two of the plurality, for example, including at least one of A, B, and C, may mean including the composition consisting of A, B, and C Any one or more elements selected in the collection.

另外，爲了更好地說明本發明，在下文的具體實施方式中給出了眾多的具體細節。本領域技術人員應當理解，沒有某些具體細節，本發明同樣可以實施。在一些實例中，對於本領域技術人員熟知的方法、手段、元件和電路未作詳細描述，以便於凸顯本發明的主旨。In addition, in order to better illustrate the present invention, numerous specific details are given in the specific embodiments below. It will be understood by those skilled in the art that the present invention may be practiced without certain of the specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present invention.

相關技術中，基於深度學習的人群行爲識別方法，往往是嘗試建立更大規模的時空關係模型和更多樣的輸入特徵（影片光流、人體關鍵點訊息），來提升人群行爲識別的準確性。In related technologies, crowd behavior recognition methods based on deep learning often try to establish a larger-scale spatio-temporal relationship model and more diverse input features (video optical flow, human body key point information) to improve the accuracy of crowd behavior recognition .

本發明實施例提供了一種行爲識別方法，該識別方法中進一步考慮到了在人群行爲識別中，並不是每個人的動作對人群行爲識別都是重要的。例如，在排球比賽中往往只有靠近或觸碰排球的運動員動作，對於人群行爲類別的識別才是具有決定意義的。因此，本發明通過對影片幀中的多個人物特徵進行聚類得到聚類結果，然後基於聚類結果確定影片幀中人物特徵的注意力分配權重，基於所述注意力分配權重更新所述人物特徵；基於所述更新後的人物特徵，提取人物時空特徵；基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果。由此，基於聚類來得到每個個體特徵（動作）之間的關係，基於聚類結果得到人物特徵的注意力分配權重，以凸顯不同人物特徵之間的重要性，使得重要的動作訊息能夠凸顯出來，減少了分析不重要的個體動作之間的關係造成的計算冗餘和訊息干擾，提高了行爲識別的準確度。An embodiment of the present invention provides a behavior recognition method, which further takes into account that not everyone's actions are important to the crowd behavior recognition in the crowd behavior recognition. For example, in a volleyball game, only the actions of players approaching or touching the volleyball are decisive for the identification of crowd behavior categories. Therefore, the present invention obtains the clustering result by clustering multiple character features in the film frame, then determines the attention distribution weight of the character feature in the film frame based on the clustering result, and updates the character based on the attention distribution weight Features; based on the updated character features, extract the character spatio-temporal features; based on the character spatio-temporal features, conduct behavior recognition on the film frame to obtain the recognition result. Thus, the relationship between each individual feature (action) is obtained based on clustering, and the attention distribution weight of character features is obtained based on the clustering results to highlight the importance of different character features, so that important action information can be It is highlighted, which reduces the calculation redundancy and information interference caused by analyzing the relationship between unimportant individual actions, and improves the accuracy of behavior recognition.

在一種可能的實現方式中，所述行爲識別方法可以由終端設備或伺服器等電子設備執行，終端設備可以爲用戶設備（User Equipment，UE）、移動設備、用戶終端、終端、蜂窩電話、無線電話、個人數位助理（Personal Digital Assistant，PDA）、手持設備、計算設備、車載設備、可穿戴設備等，所述方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。In a possible implementation, the behavior recognition method may be executed by electronic equipment such as a terminal device or a server, and the terminal device may be user equipment (User Equipment, UE), mobile device, user terminal, terminal, cellular phone, wireless Phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the method can be implemented by calling computer-readable instructions stored in the memory by the processor.

爲便於描述，本說明書一個或多個實施例中，行爲識別方法的執行主體可以是伺服器，後文以執行主體爲伺服器爲例，對該方法的實施方式進行介紹。可以理解，該方法的執行主體爲伺服器只是一種示例性的說明，並不應理解爲對該方法的限定。For ease of description, in one or more embodiments of this specification, the execution subject of the behavior recognition method may be a server, and the implementation of the method will be introduced later by taking the execution subject as a server as an example. It can be understood that the implementation subject of the method is the server is only an exemplary description, and should not be understood as a limitation of the method.

圖1示出根據本發明實施例的行爲識別方法的流程圖，如圖1所示，所述行爲識別方法包括：在步驟S11中，接收輸入的影片幀，並提取所述影片幀中的人物特徵。Fig. 1 shows the flowchart of the behavior recognition method according to the embodiment of the present invention, as shown in Fig. 1, described behavior recognition method comprises: in step S11, receives the film frame of input, and extracts the characters in the film frame feature.

這裡的影片幀可以是影片幀序列中的任一影片幀，或者也可以是影片幀序列中的多個影片幀。影片幀可以是以影片幀序列的形式輸入的，單個影片幀序列的長度可以是預定的，例如可以是20幀。The movie frame here may be any movie frame in the movie frame sequence, or may be multiple movie frames in the movie frame sequence. The film frames may be input in the form of film frame sequences, and the length of a single film frame sequence may be predetermined, for example, 20 frames.

該影片幀可以是儲存於本地儲存空間中的影片幀，那麽可以從終端本地儲存空間中讀取影片幀，以實現對影片幀的輸入，例如，可以是本地儲存的體育賽事的錄影影片中的影片幀，又如，可以是本地儲存的商場管理影片中的影片幀。The video frame can be a video frame stored in the local storage space, then the video frame can be read from the terminal local storage space to realize the input of the video frame, for example, it can be a video frame of a locally stored sports event video The video frame, for another example, may be a video frame in a locally stored store management video.

或者，該影片幀也可以是獲取的圖像採集設備即時採集的影片幀，例如，可以是體育賽事的直播影片中的影片幀，又如，可以是位於商場入口的圖像採集設備即時採集的影片幀。Alternatively, the movie frame may also be a movie frame captured by the acquired image acquisition device in real time, for example, it may be a movie frame in a live video of a sports event, and for another example, it may be captured by an image acquisition device located at the entrance of a shopping mall in real time movie frame.

在一種可能的實現方式中，提取所述影片幀中的人物特徵，包括：對影片幀中進行人體識別，得到各人物的目標矩形框；提取影片幀中的特徵，並利用影片幀中的所述目標矩形框，從提取的影片幀的特徵中匹配得到對應的人物特徵。In a possible implementation manner, extracting the character features in the film frame includes: performing human body recognition in the film frame to obtain the target rectangular frame of each character; extracting the feature in the film frame, and using all the features in the film frame The target rectangular frame is matched to obtain the corresponding character features from the features of the extracted movie frame.

具體來說，針對影片幀中的人物，可以通過人體識別技術來識別影片幀中的人物所在的區域，該區域往往通過矩形框來表示，矩形框所框選的區域即爲識別出的人物所在的區域。由於在對同一個人物進行識別時，可能會得到多個矩形框，那麽，可以通過非極大值抑制演算法（Non-Maximum Suppression，NMS）對多個矩形框進行去重，針對單個影片幀中的一個人物保留一個矩形框，作爲識別出的人物所在的區域。Specifically, for the characters in the movie frame, the area where the characters in the movie frame are located can be identified through human body recognition technology. This area is often represented by a rectangular frame, and the area selected by the rectangular frame is where the recognized character is located. Area. Since multiple rectangular frames may be obtained when identifying the same person, then the non-maximum suppression algorithm (Non-Maximum Suppression, NMS) can be used to deduplicate multiple rectangular frames, and for a single movie frame One of the characters retains a rectangular frame as the area where the recognized character is located.

針對影片幀序列中的多幀影片幀，均可以通過上述方式得到多幀影片幀中的人物所在的區域。For the multi-frame movie frames in the movie frame sequence, the regions where the characters in the multi-frame movie frames are located can be obtained through the above method.

在本發明實施例中，可以對影片幀進行全域特徵提取，得到整個影片幀的全域特徵，例如，可以採用膨脹卷積網路（Inflated 3D ConvNet，I3D）提取影片幀的全域特徵，將I3D網路的最後一層的輸出作爲全域特徵。然後對I3D網路的中間層輸出的中間特徵進行特徵提取得到各人物特徵，具體來說，人物在影片幀中的位置爲前文NMS去重後得到的多個矩形框的位置，將影片幀中矩形框的位置，對應至I3D網路提取的中間特徵，利用RoIAlign技術將中間特徵中與矩形框的位置對應的特徵提取出來，即可得到影片幀中的人物特徵。此外，人物特徵的獲取方式也可以是其它方式，本發明對人物特徵的具體獲取方式不作限定。In the embodiment of the present invention, the global feature extraction can be performed on the movie frame to obtain the global feature of the whole movie frame. The output of the last layer of the road is used as the global feature. Then, feature extraction is performed on the intermediate features output by the intermediate layer of the I3D network to obtain the characteristics of each character. Specifically, the position of the character in the film frame is the position of the multiple rectangular frames obtained after deduplication by the NMS above. The position of the rectangular frame corresponds to the intermediate feature extracted by the I3D network, and the RoIAlign technology is used to extract the features corresponding to the position of the rectangular frame in the intermediate feature to obtain the character features in the movie frame. In addition, other ways may also be used to acquire the character features, and the present invention does not limit the specific way of acquiring the character features.

在步驟S12中，對所述影片幀中的多個所述人物特徵進行聚類，得到聚類結果。In step S12, clustering is performed on a plurality of the character features in the film frame to obtain a clustering result.

聚類（Clustering）的過程中，會按照某個特定標準（如相似度）把多個人物特徵分成不同的組，使得同一個組內的人物特徵的相似性盡可能大，同時不在同一個組中的人物特徵之間的差異性也盡可能地大。也即聚類後同一組的人物特徵盡可能聚集到一起，不同組人物特徵盡可能分離。In the process of clustering, multiple character features are divided into different groups according to a certain standard (such as similarity), so that the similarity of the character features in the same group is as large as possible, and at the same time they are not in the same group. The differences between the characteristics of the characters in the book are also as large as possible. That is to say, after clustering, the character features of the same group are gathered together as much as possible, and the character features of different groups are separated as much as possible.

在本發明實施例中，可以基於K-means演算法對人物特徵進行聚類，具體來說，可以將待分類的人物特徵構成一個特徵集以及指定所要分成的類別數目K（例如可以是3），從特徵集中隨機選擇K個人物特徵作爲K個類別的初始聚類中心，對於特徵集中除K個初始聚類中心之外的每個人物特徵，分別計算該人物特徵與K個初始聚類中心中的每個初始聚類中心的特徵之間的距離（例如可以是歐氏距離，該距離用來表徵特徵之間的相似度），並將該人物特徵歸到與該人物特徵距離最近的初始聚類中心對應的類別中，然後根據K個類別中包括的人物特徵，重新計算K個類別的新的聚類中心，然後將特徵集中的人物特徵重新進行分類，直到K個類別中的每個類別的相鄰兩次聚類的聚類中心之間的距離在預設距離內。In the embodiment of the present invention, the character features can be clustered based on the K-means algorithm. Specifically, the character features to be classified can be formed into a feature set and the number K of categories to be divided can be specified (for example, it can be 3) , randomly select K characters from the feature set as the initial cluster centers of K categories, and for each character feature in the feature set except the K initial cluster centers, calculate the character features and the K initial cluster centers respectively. The distance between the features of each initial cluster center (for example, it can be the Euclidean distance, which is used to characterize the similarity between the features), and the character features are assigned to the initial cluster with the closest distance to the character features. In the category corresponding to the cluster center, then according to the characteristics of the characters included in the K categories, recalculate the new cluster centers of the K categories, and then reclassify the characteristics of the characters in the feature set until each of the K categories The distance between the cluster centers of two adjacent clusters of a category is within a preset distance.

對人物特徵進行聚類後，得到的聚類結果爲將多個人物特徵分爲了多個組，最終，每個組的聚類中心也在經過多次更新後確定。例如，A組中包含人物特徵{a, b, c}；B組中包含人物特徵{d, e, f, g}；C組中包含人物特徵{h, i}；而最終確定的A組的聚類中心的特徵爲α，B組的聚類中心的特徵爲β，C組的聚類中心的特徵爲γ。After clustering the character features, the obtained clustering result is that multiple character features are divided into multiple groups, and finally, the cluster center of each group is also determined after multiple updates. For example, group A contains character features {a, b, c}; group B contains character features {d, e, f, g}; group C contains character features {h, i}; and the final determined group A The characteristic of the cluster center of group B is α, the characteristic of the cluster center of group B is β, and the characteristic of the cluster center of group C is γ.

在步驟S13中，基於所述聚類結果確定所述影片幀中人物特徵的注意力分配權重。In step S13, based on the clustering result, the attention distribution weights of the character features in the movie frame are determined.

聚類結果能夠表徵多個人物特徵之間的潛在聯繫，有助於捕捉到更關鍵的潛在特徵，將關鍵的特徵聚類在一個組，並增强其表徵能力，能夠提高對影片幀進行行爲識別的準確度，因此，可以基於聚類結果對影片幀中人物特徵的注意力進行分配，以增强關鍵特徵的表達能力。The clustering result can represent the potential connection between multiple character features, help to capture more key potential features, cluster key features in a group, and enhance its representation ability, which can improve the behavior recognition of film frames Therefore, attention can be assigned to character features in movie frames based on the clustering results to enhance the expressiveness of key features.

具體確定注意力分配權重的方式可以有多種，例如，可以依據組內人物特徵之間的相似度來確定，又如，還可以依據組間特徵的相似度來確定，具體可參見本發明提供的可能的實現方式，此處不做贅述。There are many ways to specifically determine the weight of attention distribution. For example, it can be determined according to the similarity between the characteristics of the characters in the group. Possible implementation methods are not described here.

在步驟S14中，基於所述注意力分配權重更新所述人物特徵。In step S14, the character features are updated based on the attention distribution weights.

在確定注意力分配權重後，可以對人物特徵進行更新，以在人物特徵中添加注意力訊息。具體來說，可以利用該注意力分配權重給人物特徵進行加權以更新人物特徵，例如，在電腦語言中，會用特徵矩陣來表徵人物特徵，因此可以用注意力權重乘以特徵矩陣，實現對人物特徵賦予注意力分配權重，實現對人物特徵的更新。After the attention distribution weights are determined, the person features can be updated to add attention information in the person features. Specifically, the attention distribution weight can be used to weight the character features to update the character features. For example, in computer language, the feature matrix is used to represent the character features, so the attention weight can be multiplied by the feature matrix to realize the The character characteristics endow the weight of attention distribution to realize the updating of the character characteristics.

在本發明實施例中，能夠基於聚類得到的組內人物特徵的第一相似度得到第一注意力分配權重，以及基於組間總體特徵的相似度得到第二注意力分配權重。具體利用第一注意力分配權重和第二注意力權重進行人物特徵更新的過程，可參見本發明提供的可能的實現方式，此處不做贅述。In the embodiment of the present invention, the first attention distribution weight can be obtained based on the first similarity of character characteristics in the group obtained by clustering, and the second attention distribution weight can be obtained based on the similarity of overall characteristics between groups. For the process of updating the character features by using the first attention distribution weight and the second attention weight, please refer to the possible implementation methods provided by the present invention, which will not be repeated here.

在步驟S15中，基於所述更新後的人物特徵，提取人物時空特徵。In step S15, based on the updated character features, the spatio-temporal features of the character are extracted.

人物時空特徵，能夠表徵人物在時間維度和空間維度的特徵。針對同一影片幀中的人物特徵，能夠表徵人物的特徵的空間分布訊息，基於更新後的人物特徵，能夠得到人物空間特徵。人物空間特徵可以與人物時域特徵進行融合，得到人物時空特徵，具體融合過程可以參見本發明提供的可能的實現方式，此處不做贅述。The spatio-temporal characteristics of a character can represent the characteristics of a character in the time dimension and space dimension. For the character features in the same video frame, the spatial distribution information of the characters can be represented, and based on the updated character features, the character spatial features can be obtained. The spatial characteristics of the person can be fused with the temporal characteristics of the person to obtain the temporal and spatial characteristics of the person. For the specific fusion process, please refer to the possible implementation methods provided by the present invention, which will not be repeated here.

在步驟S16中，基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果。In step S16, based on the spatiotemporal characteristics of the person, perform behavior recognition on the film frame to obtain a recognition result.

基於注意力分配權重更新後的人物特徵，增强了關鍵特徵的表達能力，那麽，基於人物特徵提取的人物時空特徵中，也增强了關鍵特徵的表達能力。因此，基於提取的人物時空特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。The character features updated based on the attention distribution weights enhance the expression ability of key features, then, the character spatio-temporal features extracted based on character features also enhance the expression ability of key features. Therefore, based on the extracted spatiotemporal features of characters, the behavior recognition of movie frames can improve the accuracy of recognition results.

本發明實施例中的行爲識別可以包括對個體的人物行爲識別和/或對人群的人群行爲識別，基於人物時空特徵對影片幀進行行爲識別的方式可以有多種，例如，可以直接對提取的人物時空特徵進行個體人物的行爲識別；又如，可以利用人物時空特徵對影片幀的全域特徵進行注意力加權，利用注意力加權後的全域特徵進行人群行爲識別。具體可參見本發明提供的可能的實現方式，此處不做贅述。Behavior recognition in the embodiment of the present invention may include individual character behavior recognition and/or group crowd behavior recognition. The spatio-temporal features are used to identify the behavior of individual characters; another example is to use the spatio-temporal features of the characters to perform attention weighting on the global features of the film frame, and use the global features after the attention weighting to identify crowd behaviors. For details, reference may be made to the possible implementation manners provided by the present invention, and details are not described here.

在本發明實施例中，在接收輸入的影片幀後，提取影片幀中的人物特徵，通過對影片幀中的多個人物特徵進行聚類得到聚類結果，然後基於聚類結果確定影片幀中人物特徵的注意力分配權重，基於注意力分配權重更新人物特徵，然後基於更新後的人物特徵，提取人物時空特徵，並基於人物時空特徵，對影片幀進行行爲識別，得到識別結果。由此，基於聚類來得到每個個體特徵（動作）之間的關係，基於聚類結果得到人物特徵的注意力分配權重，以凸顯不同人物特徵之間的重要性，使得人物特徵中重要的動作訊息能夠凸顯出來，那麽，對基於更新後的人物特徵提取的人物時空特徵進行行爲識別，減少了分析不重要的個體動作之間的關係造成的計算冗餘和訊息干擾，提高了行爲識別的準確度。In the embodiment of the present invention, after receiving the input film frame, the character features in the film frame are extracted, the clustering result is obtained by clustering a plurality of character features in the film frame, and then based on the clustering result, the The attention distribution weight of the character features, the character features are updated based on the attention distribution weights, and then based on the updated character features, the spatiotemporal features of the characters are extracted, and based on the spatiotemporal features of the characters, the behavior recognition is performed on the movie frame, and the recognition result is obtained. Therefore, the relationship between each individual feature (action) is obtained based on clustering, and the attention distribution weight of character features is obtained based on the clustering results to highlight the importance of different character features, so that the most important character features If the action information can be highlighted, then, the behavior recognition based on the updated character feature extraction can reduce the calculation redundancy and information interference caused by the relationship between unimportant individual actions, and improve the accuracy of behavior recognition. Accuracy.

這裡的關聯關係用於表徵人物特徵之間的相關性，例如，可以是人物特徵之間的相似度，那麽，可以基於人物特徵之間的相似度，來確定人物特徵之間的注意力分配權重。The association relationship here is used to represent the correlation between character features, for example, it can be the similarity between character features, then, the attention distribution weight between character features can be determined based on the similarity between character features .

在本發明實施例中，提供了兩種基於聚類結果中的人物特徵之間的關聯關係確定注意力分配權重的實現方式，分別爲基於組內人物特徵的相似度來確定第一注意力分配權重，以及基於組間各組的總體特徵之間的相似度來確定第二注意力分配權重，下面對這兩種實現方式進行詳細描述。In the embodiment of the present invention, two implementations are provided to determine the weight of attention allocation based on the association relationship between the character features in the clustering results, respectively determining the first attention allocation based on the similarity of the character features in the group weight, and determining the second attention distribution weight based on the similarity between the overall features of the groups, the two implementations will be described in detail below.

在一種可能的實現方式中，所述基於所述聚類結果中所述人物特徵之間的關聯關係，確定所述人物特徵之間的注意力分配權重，包括：確定聚類得到的同一組內的人物特徵之間的第一相似度；及基於所述第一相似度，確定組內人物特徵之間的第一注意力分配權重。In a possible implementation manner, the determining the attention distribution weights between the character features based on the association relationship between the character features in the clustering result includes: determining the same group obtained by clustering the first similarity between the personal characteristics; and based on the first similarity, determine the first attention distribution weight among the personal characteristics in the group.

同一組內的人物特徵之間的第一相似度，可以是同一組內的多個人物特徵兩兩之間的相似度，具體計算特徵相似度的方式可以有多種，例如可以是基於歐幾里得距離的相似度計算方式，又如，還可以是基於餘弦相似度的相似度計算方式，等等。本發明對相似度的計算方式不做具體限定。The first similarity between character features in the same group can be the similarity between multiple character features in the same group. There are many ways to calculate the feature similarity, for example, it can be based on Euclidean The similarity calculation method based on the distance can also be used as another example, the similarity calculation method based on the cosine similarity, and so on. The present invention does not specifically limit the calculation method of the similarity.

在確定人物特徵之間的相似度後，可以對相似度進行正規化（normalize）處理，具體可以通過一個歸一化函數（例如，softmax函數），經過正規化處理後，即可得到人物特徵的第一注意力分配權重。After determining the similarity between character features, the similarity can be normalized (normalize), specifically through a normalization function (for example, softmax function), after normalization, the character feature can be obtained The first attention assigns weights.

第一注意力分配權重，能夠作用到人物特徵中，對人物特徵進行更新，以在人物特徵中添加注意力訊息，利用第一注意力分配權重對多個人物特徵進行更新後，即可得到更新後的人物特徵，基於注意力分配權重更新後的人物特徵，增强了關鍵特徵的表達能力，因此，基於更新後的人物特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。The first attention distribution weight can be applied to the character features, and the character features can be updated to add attention information to the character features. After using the first attention distribution weight to update multiple character features, they can be updated The updated character features, based on the updated character features of the attention distribution weight, enhance the expression ability of key features. Therefore, based on the updated character features, the behavior recognition of the movie frame can improve the accuracy of the recognition results.

在本發明實施例中，通過確定聚類得到的同一組內的人物特徵之間的第一相似度；及基於所述第一相似度，確定組內人物特徵之間的第一注意力分配權重。由此，基於同一組人物特徵之間的相似度，能夠確定組內的各人物特徵之間的關聯關係，以依據相似度確定的第一注意力分配權重，能夠增强人物特徵中關鍵特徵的表達能力，因此，基於更新後的人物特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。In the embodiment of the present invention, by determining the first similarity between the character features in the same group obtained by clustering; and based on the first similarity, determining the first attention distribution weight between the character features in the group . Thus, based on the similarity between the same group of character features, the correlation between the character features in the group can be determined, and the first attention distribution weight determined according to the similarity can enhance the expression of key features in the character features Therefore, based on the updated character characteristics, the behavior recognition of the movie frame can improve the accuracy of the recognition result.

在一種可能的實現方式中，確定聚類得到的同一組內的人物特徵之間的第一相似度，包括：將所述人物特徵的特徵矩陣劃分爲N份；及對不同的人物特徵的N份特徵分別對應計算相似度，得到N個第一相似度；所述基於所述第一相似度，確定組內人物特徵之間的第一注意力分配權重，包括：基於所述N個第一相似度，確定組內人物特徵之間的N個第一注意力分配權重。In a possible implementation manner, determining the first similarity between the character features in the same group obtained by clustering includes: dividing the feature matrix of the character features into N parts; Computing the similarities corresponding to the features respectively, and obtaining N first similarities; said determining the first attention distribution weight between the character features in the group based on said first similarities, including: based on said N first The similarity determines the N first attention distribution weights among the characteristics of the characters in the group.

在電腦技術中，人物特徵的具體表現形式爲特徵矩陣，例如，人物特徵的特徵矩陣的尺寸爲T×1024。那麽，可以將特徵矩陣劃分爲N份，N爲大於1的整數，例如，在N取8時，即將T×1024的矩陣劃分爲8份，劃分後的矩陣可以表示爲8×T×128；又如，在N取4時，即將T×1024的矩陣劃分爲4份，劃分後的矩陣可以表示爲4×T×256。其中，T爲特徵矩陣的一個維度。In computer technology, the specific expression form of character features is a feature matrix, for example, the size of the feature matrix of character features is T×1024. Then, the feature matrix can be divided into N parts, and N is an integer greater than 1. For example, when N is 8, the matrix of T×1024 is divided into 8 parts, and the divided matrix can be expressed as 8×T×128; For another example, when N is 4, the T×1024 matrix is divided into 4 parts, and the divided matrix can be expressed as 4×T×256. Among them, T is a dimension of the feature matrix.

那麽，在對不同的人物特徵計算相似度時，可以對N份特徵分別計算對應的相似度，這樣將會得到N個第一相似度，例如，針對尺寸爲T×1024的特徵矩陣，在N取8時，會計算8份尺寸爲T×128的子特徵矩陣與另一特徵矩陣中的T×128的子特徵矩陣的相似度，這樣，會得到8個相似度，可以用一個尺寸爲8的矩陣來表示；而對T×1024的兩個特徵矩陣進行計算，只能得到1個相似度。由此，8個相似度相對於1個相似度而言，能夠增强人物特徵之間關係的多樣性，能夠更加準確地描述人物特徵之間的關係。Then, when calculating the similarity for different character features, the corresponding similarity can be calculated for N features, so that N first similarities will be obtained. For example, for a feature matrix with a size of T×1024, in N When 8 is selected, the similarity between 8 sub-feature matrices with a size of T×128 and the sub-feature matrix of T×128 in another feature matrix will be calculated. In this way, 8 similarities will be obtained, and a size of 8 can be used The matrix is represented; while calculating the two feature matrices of T×1024, only one similarity can be obtained. Therefore, compared with 1 similarity, 8 similarities can enhance the diversity of the relationship between character features, and can more accurately describe the relationship between character features.

在得到N個第一相似度後，可以再基於N個第一相似度，確定組內人物特徵之間的N個第一注意力分配權重，具體確定注意力分配權重的方式可參見前文相關描述，此處不做贅述。After obtaining the N first similarities, N first attention distribution weights among the characters in the group can be determined based on the N first similarities. For the specific method of determining the attention distribution weights, please refer to the relevant description above. , which will not be described here.

在本發明實施例中，通過將所述人物特徵的特徵矩陣劃分爲N份，然後對不同的人物特徵的N份特徵分別對應計算相似度，得到N個第一相似度，並基於所述N個第一相似度，確定組內人物特徵之間的N個第一注意力分配權重。由此，能夠增强人物特徵之間關係的多樣性，以更加準確地描述人物特徵之間的關係，進而能夠提高基於人物特徵進行行爲識別的準確性。In the embodiment of the present invention, by dividing the feature matrix of the character features into N parts, and then correspondingly calculating similarities for the N features of different character features, N first similarities are obtained, and based on the N The first similarities determine the N first attention distribution weights between the character features in the group. In this way, the diversity of the relationship between the character features can be enhanced to more accurately describe the relationship between the character features, and thus the accuracy of behavior recognition based on the character features can be improved.

在一種可能的實現方式中，所述基於所述聚類結果中所述人物特徵之間的關聯關係，確定所述人物特徵之間的注意力分配權重，包括：確定聚類得到的各組的總體特徵；確定聚類得到的各組的總體特徵之間的第二相似度；及基於所述第二相似度，確定所述人物特徵之間的第二注意力分配權重。In a possible implementation manner, the determining the attention distribution weights between the character features based on the association relationship between the character features in the clustering result includes: determining the weights of each group obtained by clustering Overall features; determining a second similarity between the overall features of each group obtained by clustering; and determining a second attention distribution weight between the character features based on the second similarity.

聚類結果的某一組的總體特徵，能夠從整體上表徵該組的人物特徵，總體特徵是基於該組內的人物特徵計算得到的，例如，可以是對該組內的人物特徵進行平均池化操作得到的，或者，也可以是對該組內的人物特徵進行隨機池化操作得到的，對於具體確定總體特徵的方式，本發明實施例不做具體限定。The overall characteristics of a certain group of clustering results can represent the characteristics of the group as a whole. The overall characteristics are calculated based on the characteristics of the persons in the group. For example, it can be the average pooling of the characteristics of the persons in the group. Alternatively, it may also be obtained by performing a random pooling operation on the characteristics of the characters in the group. The embodiment of the present invention does not specifically limit the method of determining the overall characteristics.

此外，各組的總體特徵還可以是聚類結果中各組的聚類中心的特徵，那麽，在一種可能的實現方式中，所述確定聚類得到的各組的總體特徵，包括：將所述聚類結果中各組的聚類中心的特徵作爲各組的總體特徵。由於在聚類的過程中，其它人體特徵通過與聚類中心計算距離（相似度）來決定是否屬該類，因此，聚類中心的特徵與該組中所有人體特徵之間的相似程度較高，那麽聚類中心的特徵能夠用來準確地表徵該組的總體特徵。In addition, the overall characteristics of each group can also be the characteristics of the cluster centers of each group in the clustering result, then, in a possible implementation, the determination of the overall characteristics of each group obtained by clustering includes: The characteristics of the cluster centers of each group in the above clustering results are used as the overall characteristics of each group. Because in the process of clustering, other human characteristics determine whether they belong to the class by calculating the distance (similarity) with the cluster center, therefore, the similarity between the characteristics of the cluster center and all human characteristics in the group is relatively high , then the features of the cluster center can be used to accurately characterize the overall characteristics of the group.

不同組之間的總體特徵的第二相似度，可以是各組的總體特徵兩兩之間的相似度，具體計算總體特徵的第二相似度的方式可以有多種，例如可以是基於歐幾里得距離的相似度計算方式，又如，還可以是基於餘弦相似度的相似度計算方式，等等。本發明對總體特徵的第二相似度的計算方式不做具體限定。The second similarity of the overall features between different groups can be the similarity between the overall features of each group. There are many ways to specifically calculate the second similarity of the overall features, for example, it can be based on Euclidean The similarity calculation method based on the distance can also be used as another example, the similarity calculation method based on the cosine similarity, and so on. The present invention does not specifically limit the calculation method of the second similarity degree of the overall feature.

在確定各組總體特徵之間的第二相似度後，可以對第二相似度進行正規化（normalize）處理，具體可以通過一個歸一化函數（例如，softmax函數），經過正規化處理後，即可得到人物特徵的第二注意力分配權重。After determining the second similarity between the overall features of each group, the second similarity can be normalized (normalize), specifically through a normalization function (for example, softmax function), after normalization, The second attention distribution weight of the character feature can be obtained.

第二注意力分配權重，能夠作用到人物特徵中，對人物特徵進行更新，以在人物特徵中添加注意力訊息，利用第二注意力分配權重對多個人物特徵進行更新後，即可得到更新後的人物特徵，基於注意力分配權重更新後的人物特徵，增强了關鍵特徵的表達能力，因此，基於更新後的人物特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。The second attention distribution weight can be applied to the character features, and the character features can be updated to add attention information to the character features. After using the second attention distribution weight to update multiple character features, they can be updated The updated character features, based on the updated character features of the attention distribution weight, enhance the expression ability of key features. Therefore, based on the updated character features, the behavior recognition of the movie frame can improve the accuracy of the recognition results.

在本發明實施例中，通過根據各組內的人物特徵得到的各組的總體特徵；確定聚類得到的各組的總體特徵之間的第二相似度；及基於第二相似度，確定人物特徵之間的第二注意力分配權重。由此，基於不同組總體特徵之間的相似度，能夠確定組間的各人物特徵之間的關聯關係，以依據總體特徵的第二相似度確定的第二注意力分配權重，能夠增强人物特徵中關鍵特徵的表達能力，因此，基於更新後的人物特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。In the embodiment of the present invention, the overall characteristics of each group are obtained according to the characteristics of the characters in each group; the second similarity between the overall characteristics of each group obtained by clustering is determined; and based on the second similarity, the characters are determined Second attention distributes weights between features. Thus, based on the similarity between the overall characteristics of different groups, the association relationship between the character characteristics between the groups can be determined, and the second attention distribution weight determined according to the second similarity of the overall characteristics can enhance the character characteristics. Therefore, based on the updated character features, the behavior recognition of the movie frame can improve the accuracy of the recognition results.

可以理解的是，本發明實施例中的“第一”和“第二”用於區分所描述的對象，而不應當理解爲對描述對象的次序、指示或暗示相對重要性等其它限定。It can be understood that "first" and "second" in the embodiments of the present invention are used to distinguish the described objects, and should not be interpreted as other restrictions on the order of the described objects, indicating or implying relative importance, etc.

在確定注意力分配權重後，可以對人物特徵進行更新，以在人物特徵中添加注意力訊息，如前文所述，本發明中能夠得到第一注意力分配權重以及第二注意力分配權重。下面分別就利用第一注意力分配權重和第二注意力權重進行人物特徵更新的過程進行描述。After the attention distribution weights are determined, the character features can be updated to add attention information to the character features. As mentioned above, the present invention can obtain the first attention distribution weight and the second attention distribution weight. The following describes the process of updating character features by using the first attention distribution weight and the second attention weight respectively.

針對某一組的人物特徵中的目標人物特徵而言，利用目標人物特徵與該組中的各人物特徵的第一注意力分配權重，對該組中的各人物特徵（包含目標人物特徵本身）進行加權求和，得到組內各人物對應的組內更新特徵，作爲更新後的人物特徵。For a target character feature in a certain group of character features, use the target character feature and the first attention distribution weight of each character feature in the group, and each character feature in the group (including the target character feature itself) Carry out weighted summation to obtain the update features in the group corresponding to each person in the group, as the updated character features.

舉例來說，目標人物特徵V ₁與n個人物特徵（V ₁、V ₂、V ₃……V _n）的第一相似度依次爲P ₁、P ₂、P ₃……P _n，對第一相似度歸一化後得到的第一注意力分配權重爲W ₁、W ₂、W ₃……W _n，那麽更新後的目標人物特徵V ₁’可以通過如下公式（1）確定： V ₁’= V ₁•W ₁+V ₂•W ₂+V ₃•W ₃+……+V _n•W _n（1） For example, the first degree of similarity between target character V ₁ and n character traits (V ₁ , V ₂ , V ₃ . . . V _n ) is P ₁ , P ₂ , P ₃ _. . . The weights _of the first attention _distribution obtained after similarity normalization are W ₁ , W ₂ , W ₃ _. '= V ₁ •W ₁ +V ₂ •W ₂ +V ₃ •W ₃ +...+V _n •W _n (1)

由此，更新後的目標人物特徵V ₁’綜合考慮了與組內所有人物特徵的相似程度，在與組內所有特徵相似程度較高的情况下，能夠增强關鍵特徵的表達能力，因此，基於更新後的人物特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。 Therefore, the updated target character feature V ₁ ' comprehensively considers the degree of similarity with all character features in the group, and can enhance the expression ability of key features when the similarity with all character features in the group is high. Therefore, based on The updated character features can be used for behavior recognition on video frames, which can improve the accuracy of recognition results.

目標分組的組間更新特徵，可以是利用第二注意力分配權重對各組的總體特徵進行加權求和得到的，然後將加權求和結果分別加到目標分組中的各人物特徵中。The inter-group update feature of the target group can be obtained by weighting and summing the overall features of each group by using the second attention distribution weight, and then adding the weighted sum result to the characteristics of each person in the target group.

舉例來說，目標分組的目標總體特徵爲C ₁，與各分組的m個總體特徵（C ₁、C ₂、C ₃……C _m）的第二相似度依次爲Q ₁、Q ₂、Q ₃……Q _m，對第二相似度歸一化後得到的第二注意力分配權重爲U ₁、U ₂、U ₃……U _m，那麽元件更新特徵C ₁’可以通過公式（2）確定：C ₁’= C ₁•U ₁+C ₂•U ₂+C ₃•U ₃+……+C _m•U _m（2） For example, the target overall feature of the target group is C ₁ , and the second similarity with the m overall features (C ₁ , C ₂ , C ₃ ... C _m ) of each group is Q ₁ , Q ₂ , Q ₃ ... Q _m , the second attention distribution weight obtained after normalizing the second similarity is U ₁ , U ₂ , U ₃ ... U _m , then the component update feature C ₁ ' can be obtained by formula (2) Determine: C ₁ '= C ₁ •U ₁ +C ₂ •U ₂ +C ₃ •U ₃ +...+C _m •U _m (2)

進一步地，將所述組間更新特徵分別加到目標分組中的各人物對應的組內更新特徵中，即目標分組中的n個人物特徵的值即爲V ₁’+C ₁’、V ₂’+C ₁’、V ₃’+C ₁’……V _n’+C ₁’。 Further, the inter-group update features are respectively added to the intra-group update features corresponding to the characters in the target group, that is, the values of n character features in the target group are V ₁ '+C ₁ ', V ₂ '+C ₁ ', V ₃ '+C ₁ '... V _n '+C ₁ '.

利用第二注意力分配權重對多個人物特徵進行更新後，即可得到更新後的人物特徵，基於第二注意力分配權重更新後的人物特徵，增强了關鍵特徵的表達能力，因此，基於更新後的人物特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。After updating multiple character features with the second attention distribution weight, the updated character features can be obtained. The updated character features based on the second attention distribution weight enhance the expression ability of key features. Therefore, based on the updated Based on the characteristics of the final character, the behavior recognition of the film frame can improve the accuracy of the recognition result.

人物時空特徵中包含人物空間特徵，在同一影片幀中的人物特徵能夠表徵該影片幀中的空間分布，因此，對基於注意力分配權重更新的人物特徵進行空間解碼，能夠得到人物空間特徵，人物空間特徵能夠表徵影片幀中人物特徵的空間分布訊息。具體對更新後的人物特徵進行空間解碼的方式可參見本發明提供的可能的實現方式，此處不做贅述。The spatio-temporal features of the characters include the spatial features of the characters, and the character features in the same movie frame can represent the spatial distribution in the movie frame. Therefore, spatial decoding of the character features updated based on the attention distribution weight can obtain the character spatial features, and the character Spatial features can represent the spatial distribution information of character features in video frames. For a specific manner of spatially decoding the updated character features, reference may be made to the possible implementation manners provided by the present invention, which will not be repeated here.

此外，人物時空特徵中還包括人物空間特徵，在一種可能的實現方式中，將更新後的人物特徵進行空間解碼，得到人物時空特徵，包括：將更新後的人物特徵進行空間解碼，得到人物空間特徵；將多個影片幀的所述人物特徵進行時域編碼與解碼，得到人物時域特徵；及將所述人物空間特徵與所述人物時域特徵進行融合，得到人物時空特徵。In addition, the character spatio-temporal features also include character space features. In a possible implementation, the updated character features are spatially decoded to obtain the character spatio-temporal features, including: spatially decoding the updated character features to obtain the character space Features: performing time-domain encoding and decoding on the character features of multiple film frames to obtain character time-domain features; and merging the character space features with the character time-domain features to obtain character space-time features.

在本發明實施例中，還可以確定人物時域特徵，在一個影片幀序列中，同一個人的人物特徵在不同影片幀中會發生變化，這裡的人物時域特徵可以表徵同一人物的人物特徵在不同影片幀中隨時間變化的訊息。因此，可以對同一人物在不同影片幀中的人物特徵進行時域編碼與解碼，得到人物時域特徵。In the embodiment of the present invention, it is also possible to determine the time-domain characteristics of the person. In a film frame sequence, the character characteristics of the same person will change in different film frames. The time-domain characteristics of the person here can represent the character characteristics of the same person. Information that changes over time in different video frames. Therefore, the time-domain encoding and decoding can be performed on the character features of the same character in different movie frames to obtain the character time-domain features.

人物時域特徵可以與人物空間特徵進行融合，得到人物時空特徵，具體融合的方式例如可以是對人物時間特徵和人物空間特徵進行相加處理。然後基於人物時空特徵，對影片幀進行行爲識別，由此，在進行行爲識別時，不僅考慮了人物的空間特徵，還考慮了人物的時域特徵，能夠提高識別結果的準確性。The time-domain features of the person can be fused with the space-time characteristics of the person to obtain the time-space characteristics of the person. The specific fusion method can be, for example, adding processing of the time-domain characteristics of the person and the space-time characteristics of the person. Then, based on the spatio-temporal characteristics of the characters, behavior recognition is performed on the film frames. Therefore, not only the spatial characteristics of the characters, but also the temporal characteristics of the characters are considered during the behavior recognition, which can improve the accuracy of the recognition results.

在一種可能的實現方式中，所述將多個影片幀的所述人物特徵進行時域編碼與解碼，得到人物時域特徵，包括：基於自注意力機制對多個影片幀的所述人物特徵進行編碼，得到時域編碼特徵；及基於自注意力機制對所述時域編碼特徵進行解碼，和/或基於空間編碼特徵對所述時域編碼特徵進行解碼，得到人物時域特徵；其中，所述空間編碼特徵爲所述更新後的人物特徵。In a possible implementation manner, the time-domain encoding and decoding of the character features of multiple film frames to obtain the character time-domain features includes: based on the self-attention mechanism, the character features of multiple film frames Encoding to obtain time-domain coding features; and decoding the time-domain coding features based on a self-attention mechanism, and/or decoding the time-domain coding features based on spatial coding features to obtain character time-domain features; wherein, The spatially encoded feature is the updated character feature.

在提取出多個影片幀的人物特徵後，可以將多個影片幀的人物特徵輸入時域編碼器中，基於自注意力機制進行編碼，得到時域編碼特徵。在該編碼過程中，可以利用自注意力機制來分別計算各影片幀中的人物特徵與其它時刻的人體特徵的對齊機率，然後將該機率與對應時刻的人體特徵進行加權求和，作爲時域編碼特徵。After the character features of multiple movie frames are extracted, the character features of multiple movie frames can be input into the time-domain encoder, and encoded based on the self-attention mechanism to obtain time-domain coding features. In this encoding process, the self-attention mechanism can be used to calculate the alignment probability of the character features in each film frame and the human body features at other moments, and then the probability is weighted and summed with the human body features at the corresponding time, as the time domain Encoding features.

在對時域編碼特徵進行解碼的過程中，可以基於自注意力機制對所述時域編碼特徵進行解碼，得到第一時域特徵，還可以基於空間編碼特徵對所述時域編碼特徵進行解碼，得到第二時域特徵，然後將第一時域特徵與第二時域特徵進行融合，即可得到人物時域特徵。In the process of decoding the time-domain coding features, the time-domain coding features can be decoded based on the self-attention mechanism to obtain the first time-domain features, and the time-domain coding features can also be decoded based on the spatial coding features , to obtain the second time-domain feature, and then fuse the first time-domain feature with the second time-domain feature to obtain the person's time-domain feature.

在基於空間編碼特徵對時域編碼特徵進行解碼的過程中，可以確定空間編碼特徵與時序編碼特徵之間的相似度，利用該相似度對時序編碼特徵進行加權，得到第二時域特徵。基於空間編碼特徵對時域編碼特徵進行解碼，使得得到的第二時域特徵中融合了空間上下文訊息，增强了第二時域特徵中的特徵表示，能夠提高最終行爲識別結果的準確度。In the process of decoding the temporal coding feature based on the spatial coding feature, the similarity between the spatial coding feature and the temporal coding feature can be determined, and the temporal coding feature can be weighted by using the similarity to obtain the second temporal feature. Decoding the time-domain coding features based on the spatial coding features makes the obtained second time-domain features incorporate spatial context information, enhances the feature representation in the second time-domain features, and can improve the accuracy of the final behavior recognition result.

此外，具體對空間編碼特徵進行解碼的方式可以有多種，在一種可能的實現方式中，所述將更新後的人物特徵進行空間解碼，得到人物空間特徵，包括：基於自注意力機制對所述空間編碼特徵進行解碼，和/或基於所述時域編碼特徵對所述空間編碼特徵進行解碼，得到人物空間特徵。In addition, there are many ways to decode the spatially encoded features. In a possible implementation, the spatial decoding of the updated character features to obtain the character spatial features includes: based on the self-attention mechanism. The spatial coding features are decoded, and/or the spatial coding features are decoded based on the time domain coding features to obtain the character spatial features.

在對空間編碼進行解碼的過程中，可以基於自注意力機制對所述空間編碼特徵進行解碼，得到第一空間特徵，還可以基於時域編碼特徵對空間編碼特徵進行解碼，得到第二空間特徵，然後將第一空間特徵與第二空間特徵進行融合，即可得到人物空間特徵。In the process of decoding the spatial encoding, the spatial encoding feature can be decoded based on the self-attention mechanism to obtain the first spatial feature, and the spatial encoding feature can also be decoded based on the time-domain encoding feature to obtain the second spatial feature , and then fuse the first spatial feature with the second spatial feature to obtain the character spatial feature.

在基於自注意力機制對空間編碼特徵進行解碼的過程中，可以確定不同人物的空間編碼特徵之間的關聯程度（例如可以是相似度），然後利用該關聯程度對各空間編碼特徵進行加權，得到第一空間特徵。In the process of decoding the spatial encoding features based on the self-attention mechanism, the degree of correlation (for example, similarity) between the spatial encoding features of different characters can be determined, and then the degree of association can be used to weight each spatial encoding feature, Get the first space features.

在基於時域編碼特徵對空間編碼特徵進行解碼的過程中，可以確定時序編碼特徵與空間編碼特徵的相似度，然後利用該相似度對空間編碼特徵進行加權，得到第二空間特徵。基於時域編碼特徵對空間編碼特徵進行解碼，使得得到的第二空間特徵中融合了時域上下文訊息，增强了第二空間特徵中的特徵表示，能夠提高最終行爲識別結果的準確度。In the process of decoding the spatial encoding feature based on the temporal encoding feature, the similarity between the temporal encoding feature and the spatial encoding feature can be determined, and then the similarity is used to weight the spatial encoding feature to obtain the second spatial feature. Decoding the spatially encoded features based on the time-domain encoded features makes the obtained second spatial features incorporate temporal context information, enhances the feature representation in the second spatial features, and can improve the accuracy of the final behavior recognition result.

在本發明實施例中，基於空間編碼特徵對所述時域編碼特徵進行解碼得到人物時域特徵，並且基於所述時域編碼特徵對所述空間編碼特徵進行解碼得到人物空間特徵，將人物空間特徵與人物時域特徵進行融合，得到的人物時空特徵中，利用基於空間上下文和時間上下文的語義關聯來增强人物的特徵表示，那麽基於人物時空特徵進行行爲識別，能夠提高行爲識別結果的準確度。In the embodiment of the present invention, the time-domain coding feature is decoded based on the spatial coding feature to obtain the character's time-domain feature, and based on the time-domain coding feature, the spatial coding feature is decoded to obtain the character's spatial feature, and the character's space The feature is fused with the character's time-domain feature. In the obtained character's time-space feature, the semantic association based on the spatial context and time context is used to enhance the character's feature representation. Then, the behavior recognition based on the character's time-space feature can improve the accuracy of the behavior recognition result. .

在一種可能的實現方式中，所述方法還包括：提取所述影片幀的全域特徵；利用所述人物時空特徵，確定所述全域特徵中的第三注意力分配權重；利用所述第三注意力分配權重更新所述全域特徵；及所述基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果，包括：基於更新後的全域特徵，對所述影片幀進行人群行爲識別，得到人群行爲識別結果。In a possible implementation, the method further includes: extracting global features of the movie frame; using the character spatio-temporal features to determine a third attention distribution weight in the global features; using the third attention The force distribution weight updates the global feature; and the behavior recognition of the video frame based on the character spatio-temporal feature to obtain the recognition result includes: performing crowd behavior recognition on the video frame based on the updated global feature , to get the result of crowd behavior recognition.

影片幀的全域特徵可以是對影片幀的整個畫面進行特徵提取得到的，也可稱爲影片幀的場景特徵，具體提取方式例如可以是通過I3D網路進行提取，將I3D網路最後一層的輸出，輸入到組表示產生器(Group Representation Generator, GRG)中，GRG是一個預處理元件，用於初始化I3D輸出的特徵，得到初始化後的全域特徵，具體提取方式此處不做贅述。The global feature of the movie frame can be obtained by extracting the feature of the whole picture of the movie frame, and it can also be called the scene feature of the movie frame. The specific extraction method can be extracted through the I3D network, and the output of the last layer of the I3D network , input to the group representation generator (Group Representation Generator, GRG), GRG is a preprocessing element, used to initialize the features of the I3D output, and obtain the global features after initialization, the specific extraction method will not be described here.

針對提取出的全域特徵，可以對全域特徵進行注意力分配，具體可以是利用人物時空特徵來對全域特徵進行注意力分配，因此，可以利用人物時空特徵，確定全域特徵中的第三注意力分配權重，然後利用第三注意力分配權重更新全域特徵。For the extracted global features, attention can be allocated to the global features. Specifically, attention can be allocated to the global features by using the spatio-temporal features of the characters. Therefore, the third attention allocation in the global features can be determined by using the spatio-temporal features of the characters. weights, and then utilize the third attention distribution weights to update the global features.

具體可以通過transformer模型來利用人物時空特徵，對全域特徵進行注意力分配。具體地，可以將全域特徵作爲transformer模型中的query參數，將人物時空特徵作爲transformer模型中的key參數，來得到第三注意力分配權重，實現對全域特徵中的注意力分配，這樣，能夠根據影片幀中每個人的特徵，去優化全域特徵。Specifically, the transformer model can be used to use the spatiotemporal characteristics of the characters to allocate attention to the global features. Specifically, the global feature can be used as the query parameter in the transformer model, and the character spatiotemporal feature can be used as the key parameter in the transformer model to obtain the third attention allocation weight, and realize the attention allocation in the global feature. In this way, it can be based on The characteristics of each person in the movie frame to optimize the global characteristics.

由此，優化後的全域特徵即爲經人物時空特徵進行注意力分配後的特徵，能夠突出全域特徵中的關鍵特徵，減弱全域特徵中的不相關特徵。然後，基於更新後的全域特徵，對影片幀進行行爲識別，能夠提高識別結果的準確性。Therefore, the optimized global feature is the feature after the attention is allocated by the character's spatiotemporal features, which can highlight the key features in the global feature and weaken the irrelevant features in the global feature. Then, based on the updated global features, behavior recognition is performed on movie frames, which can improve the accuracy of recognition results.

在對所述影片幀進行人群行爲識別的過程中，可以將更新後的全域特徵輸入神經網路的全連接層中，利用全連接層來做分類，全連接層中會預先設置有多個人群行爲類別，根據全域特徵，全連接層能夠輸出全域特徵爲各個人群行爲類別的可靠度，可靠度最高的人群行爲類別即可作爲人群行爲識別結果。In the process of crowd behavior recognition for the movie frame, the updated global features can be input into the fully connected layer of the neural network, and the fully connected layer can be used for classification, and multiple groups of people will be preset in the fully connected layer Behavior category, according to the global characteristics, the fully connected layer can output the global characteristics as the reliability of each crowd behavior category, and the crowd behavior category with the highest reliability can be used as the crowd behavior recognition result.

例如，在針對排球比賽中的影片幀，將提取到的全域特徵輸入到全連接層，得到人物動作類別爲“左邊發球”的可靠度爲0.9、“左邊傳球”的可靠度爲0.3、“左邊扣球”的可靠度爲0.4、“右邊發球”的可靠度爲0.1、“右邊攔網”的可靠度爲0.1，那麽，即可將可靠度最高的“左邊發球”作爲識別結果進行輸出。For example, in the video frame of a volleyball game, the extracted global features are input to the fully connected layer, and the reliability of the character action category "serve left" is 0.9, and the reliability of "pass left" is 0.3, " The reliability of "Smash left" is 0.4, the reliability of "Serve right" is 0.1, and the reliability of "Block right" is 0.1. Then, the most reliable "Serve left" can be output as the recognition result.

將所述人物時空特徵作爲新的人物特徵，疊代地對新的人物特徵進行更新，具體來說，可以是疊代執行步驟S12-S15，得到疊代更新後的人物時空特徵，當然，該疊代過程也可以是執行步驟S12-S15在本發明實施例中的一種或多種可能的實現方式，例如，基於組內人物特徵之間的第一注意力分配權重對人物特徵進行更新的實現方式；基於組間特徵之間的第二注意力分配權重對人物特徵進行更新的實現方式；對更新後的人物特徵進行空間解碼，得到人物時空特徵的實現方式，等等，此處不逐一列舉，具體可參見本發明提供的步驟S12-S15的可能的實現方式。Using the character spatio-temporal features as new character features, iteratively update the new character features, specifically, iteratively perform steps S12-S15 to obtain the iteratively updated character spatio-temporal features, of course, the The iterative process can also be one or more possible implementations of steps S12-S15 in the embodiment of the present invention, for example, an implementation of updating the character features based on the first attention distribution weight among the character features in the group ; Based on the second attention distribution weight between the features between the groups, the implementation of updating the character features; the spatial decoding of the updated character features, to obtain the realization of the character space-time features, etc., not listed here one by one, For details, refer to possible implementation manners of steps S12-S15 provided in the present invention.

同樣的，可以將更新後的全域特徵作爲新的全域特徵，對全域特徵進行疊代更新，當滿足疊代停止條件時，即可停止疊代，得到疊代更新後的全域特徵和人物時空特徵，該停止疊代的條件，例如可以是疊代預設的次數。Similarly, the updated global feature can be used as a new global feature, and the global feature can be iteratively updated. When the iteration stop condition is met, the iteration can be stopped, and the iteratively updated global feature and character spatio-temporal features can be obtained. , the condition for stopping iteration may be, for example, a preset number of iterations.

在進行疊代更新後，可以基於疊代更新後的全域特徵，對所述影片幀進行人群行爲識別，得到人群行爲識別結果，由於疊代更新後的全域特徵中突出了關鍵特徵，減弱全域特徵中的不相關特徵，提高了人群行爲識別的準確度。After the iterative update, crowd behavior recognition can be performed on the film frame based on the iteratively updated global feature, and the crowd behavior recognition result can be obtained. Since key features are highlighted in the iteratively updated global feature, the global feature is weakened. The irrelevant features in the model improve the accuracy of crowd behavior recognition.

在本發明實施例中，還可以對人物個體的行爲進行識別，例如，在排球比賽中，人物會有對應的比賽動作，例如：發球、墊球、傳球、扣球、攔網等等。因此，還可以基於人物時空特徵對單個人物的行爲進行識別。In the embodiment of the present invention, the behavior of individual characters can also be identified. For example, in a volleyball game, the characters will have corresponding game actions, such as serving, kicking, passing, smashing, blocking and so on. Therefore, the behavior of a single character can also be identified based on the spatiotemporal characteristics of the character.

具體可以是將人物時空特徵輸入神經網路的全連接層中，利用全連接層來做分類，全連接層中會預先設置有多個比賽動作，根據人物時空特徵，全連接層能夠輸出人物時空特徵爲各個動作的可靠度，可靠度最高的比賽動作即可作爲人物行爲識別結果。Specifically, the spatiotemporal characteristics of the characters can be input into the fully connected layer of the neural network, and the fully connected layer can be used for classification. The fully connected layer will have multiple game actions in advance. According to the spatiotemporal characteristics of the characters, the fully connected layer can output the character spatiotemporal The feature is the reliability of each action, and the game action with the highest reliability can be used as the result of character action recognition.

例如，在針對排球比賽中的影片幀，將提取到的人物時空特徵輸入到全連接層，得到人物動作爲“發球”的可靠度爲0.9、“墊球”的可靠度爲0.3、“傳球”的可靠度爲0.4、“扣球”的可靠度爲0.1、“攔網”的可靠度爲0.1，那麽，即可將可靠度最高的“發球”動作作爲識別結果進行輸出。For example, in the video frame of a volleyball game, the extracted spatiotemporal features of the characters are input to the fully connected layer, and the reliability of the character action "serve" is 0.9, the reliability of "throwing the ball" is 0.3, and the reliability of "passing the ball" is 0.3. The reliability of "is 0.4, the reliability of "smash" is 0.1, and the reliability of "block" is 0.1, then the "serve" action with the highest reliability can be output as the recognition result.

在本發明實施例中，由於人物時空特徵爲經注意力分配後的特徵，因此能夠突出全域特徵中的關鍵特徵，減弱全域特徵中的不相關特徵，因此基於人物時空特徵進行人物動作識別，得到人物動作識別結果的準確度較高。In the embodiment of the present invention, since the spatio-temporal features of the characters are the features after attention allocation, the key features in the global features can be highlighted, and the irrelevant features in the global features can be weakened, so the character action recognition is performed based on the spatio-temporal features of the characters, and the The accuracy of the result of character action recognition is high.

下面對本發明實施例的一個應用場景進行說明。在該應用場景中，可以基於端到端的網路來實現本發明的行爲識別方法，該網路可以輸入影片幀序列，在該應用場景中，輸入的影片幀爲排球比賽中的影片幀，影片幀序列中會包含多個影片幀。然後提取各影片幀的全域特徵和各人物特徵。An application scenario of the embodiment of the present invention is described below. In this application scenario, the behavior recognition method of the present invention can be implemented based on an end-to-end network. The network can input a video frame sequence. In this application scenario, the input video frame is a video frame in a volleyball match, and the video A frame sequence contains multiple movie frames. Then extract the global feature of each movie frame and each character feature.

針對提取的人物特徵，對同一影片幀中的各人物特徵進行聚類，得到聚類結果，基於所述聚類結果，確定各人物特徵的注意力分配權重；基於注意力分配權重更新人物特徵，得到空間編碼特徵，然後對空間編碼特徵進行解碼得到人物空間特徵。此外，還可以基於不同影片幀中的人物特徵，得到人物時域特徵。針對得到的所述人物空間特徵和所述人物時域特徵，可以將二者進行融合，得到融合後的人物時空特徵。該人物時空特徵可以作爲新的人物特徵進行疊代更新，直至得到的人物時空特徵滿足預設的收斂條件。For the extracted character features, each character feature in the same film frame is clustered to obtain a clustering result, based on the clustering result, determine the attention distribution weight of each character feature; update the character feature based on the attention distribution weight, The spatial encoding features are obtained, and then the spatial encoding features are decoded to obtain the character spatial features. In addition, the time-domain features of the characters can also be obtained based on the features of the characters in different movie frames. With regard to the obtained character spatial features and the character temporal features, the two may be fused to obtain the fused character spatio-temporal features. The spatio-temporal features of the person can be iteratively updated as a new character feature until the obtained spatio-temporal features of the character meet the preset convergence condition.

針對提取的全域特徵，可以基於人物時空特徵對全域特徵進行注意力分配，對人物時空特徵進行更新，實現對人物時空特徵的優化。優化後的人物時空特徵可以再次作爲輸入，進行全域特徵的疊代優化，該過程可以循環多次執行，直至得到的全域特徵滿足預設的收斂條件。For the extracted global features, attention can be allocated to the global features based on the spatio-temporal features of the characters, and the spatio-temporal features of the characters can be updated to realize the optimization of the spatio-temporal features of the characters. The optimized spatio-temporal features of the characters can be used as input again to perform iterative optimization of the global features. This process can be repeated multiple times until the obtained global features meet the preset convergence conditions.

可以將更新後的人物時空特徵輸入全連接層中，利用全連接層來做分類，對單個人物的動作進行識別，得到單個人物的動作識別結果。還可以將更新後的全域特徵輸入到全連接層中，利用全連接層來做分類，對人群行爲進行識別，得到整個影片幀序列的人群行爲識別結果。The updated spatio-temporal features of the person can be input into the fully connected layer, and the fully connected layer can be used to classify, identify the action of a single person, and obtain the action recognition result of a single person. It is also possible to input the updated global features into the fully connected layer, use the fully connected layer to classify, identify crowd behavior, and obtain the crowd behavior recognition result of the entire movie frame sequence.

例如，將疊代更新後的全域特徵輸入到全連接層，得到人物動作類別爲“左邊發球”的可靠度爲0.9、“左邊傳球”的可靠度爲0.3、“左邊扣球”的可靠度爲0.4、“右邊發球”的可靠度爲0.1、“右邊攔網”的可靠度爲0.1，那麽，即可將可靠度最高的“左邊發球”作爲排球比賽中的影片幀的人群行爲識別結果進行輸出。將提取到的人物時空特徵輸入到全連接層，得到人物動作爲“發球”的可靠度爲0.9、“墊球”的可靠度爲0.3、“傳球”的可靠度爲0.4、“扣球”的可靠度爲0.1、“攔網”的可靠度爲0.1，那麽，即可將可靠度最高的“發球”動作作爲人物行爲識別結果進行輸出。For example, the iteratively updated global features are input to the fully connected layer, and the reliability of the action category of "left serve" is 0.9, the reliability of "left pass" is 0.3, and the reliability of "left spike" is obtained. is 0.4, the reliability of "right serve" is 0.1, and the reliability of "right block" is 0.1, then the most reliable "left serve" can be output as the crowd behavior recognition result of the video frame in the volleyball game . Input the extracted spatio-temporal features of the characters into the fully connected layer, and obtain the reliability of the characters' actions as "serve the ball" with a reliability of 0.9, "pad the ball" with a reliability of 0.3, "pass the ball" with a reliability of 0.4, and "smash the ball" with a reliability of 0.9. The reliability of is 0.1, and the reliability of "block" is 0.1. Then, the action of "serve" with the highest reliability can be output as the result of character behavior recognition.

可以理解，本發明提及的上述各個方法實施例，在不違背原理邏輯的情况下，均可以彼此相互結合形成結合後的實施例，限於篇幅，本發明不再贅述。本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。It can be understood that the above-mentioned method embodiments mentioned in the present invention can all be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, the present invention will not repeat them. Those skilled in the art can understand that, in the above method in the specific implementation manner, the specific execution order of each step should be determined according to its function and possible internal logic.

此外，本發明還提供了行爲識別裝置、電子設備、電腦可讀儲存媒體、程式，上述均可用來實現本發明提供的任一種行爲識別方法，相應技術方案和描述和參見方法部分的相應記載，不再贅述。In addition, the present invention also provides behavior recognition devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any behavior recognition method provided by the present invention, corresponding technical solutions and descriptions, and corresponding records in the method section, No longer.

圖2示出根據本發明實施例的行爲識別裝置的方塊圖，如圖2所示，所述裝置20包括：一人物特徵提取單元21，用於接收輸入的影片幀，並提取所述影片幀中的人物特徵；一聚類單元22，用於對所述影片幀中的多個所述人物特徵進行聚類，得到聚類結果；一注意力分配單元23，用於基於所述聚類結果確定所述影片幀中人物特徵的注意力分配權重；一人物特徵更新單元24，用於基於所述注意力分配權重更新所述人物特徵；一人物時空特徵提取單元25，用於基於所述更新後的人物特徵，提取人物時空特徵；及一行爲識別單元26，用於基於所述人物時空特徵，對所述影片幀進行行爲識別，得到識別結果。Fig. 2 shows the block diagram of the behavior recognition device according to the embodiment of the present invention, as shown in Fig. 2, described device 20 comprises: a character feature extraction unit 21, is used for receiving the film frame of input, and extracts described film frame character features in; a clustering unit 22, for clustering a plurality of the character features in the film frame, to obtain a clustering result; an attention distribution unit 23, for based on the clustering result Determine the attention distribution weights of character features in the film frame; a character feature update unit 24, for updating the character features based on the attention distribution weight; a character spatio-temporal feature extraction unit 25, for based on the update The final character feature, extracting the character spatio-temporal feature; and a behavior recognition unit 26, used to perform behavior recognition on the video frame based on the character spatio-temporal feature, and obtain the recognition result.

在一種可能的實現方式中，所述注意力分配單元23，用於基於所述聚類結果中所述人物特徵之間的關聯關係，確定所述人物特徵之間的注意力分配權重。In a possible implementation manner, the attention allocation unit 23 is configured to determine an attention allocation weight between the character features based on the association relationship between the character features in the clustering result.

在一種可能的實現方式中，所述注意力分配單元23，包括：一第一相似度確定單元，用於確定聚類得到的同一組內的人物特徵之間的第一相似度；及一第一注意力確定單元，用於基於所述第一相似度，確定組內人物特徵之間的第一注意力分配權重。In a possible implementation manner, the attention allocation unit 23 includes: a first similarity determination unit, configured to determine the first similarity between the character features in the same group obtained by clustering; and a first An attention determination unit, configured to determine a first attention distribution weight among characters in the group based on the first similarity.

在一種可能的實現方式中，所述第一相似度確定單元，用於將所述人物特徵的特徵矩陣劃分爲N份；對不同的人物特徵的N份特徵分別對應計算相似度，得到N個第一相似度；所述第一注意力確定單元，用於基於所述N個第一相似度，確定組內人物特徵之間的N個第一注意力分配權重。In a possible implementation manner, the first similarity determination unit is configured to divide the feature matrix of the character features into N parts; and calculate the similarity for the N features of different character features to obtain N The first degree of similarity; the first attention determination unit is configured to determine N first attention distribution weights between the characters in the group based on the N first similarities.

在一種可能的實現方式中，所述注意力分配單元23，包括：一總體特徵確定單元，用於確定聚類得到的各組的總體特徵；一第二相似度確定單元，用於確定聚類得到的各組的總體特徵之間的第二相似度；及一第二注意力確定單元，用於基於所述第二相似度，確定所述人物特徵之間的第二注意力分配權重。In a possible implementation manner, the attention allocation unit 23 includes: an overall feature determination unit, used to determine the overall features of each group obtained by clustering; a second similarity determination unit, used to determine the clustering The obtained second similarity between the overall features of each group; and a second attention determination unit, configured to determine a second attention distribution weight among the character features based on the second similarity.

在一種可能的實現方式中，所述人物特徵更新單元24，用於針對單個組的人物特徵中的目標人物特徵，利用目標人物特徵與組內的各人物特徵的第一注意力分配權重，對組內的各人物特徵進行加權求和，得到組內各人物對應的組內更新特徵，作爲更新後的人物特徵。In a possible implementation manner, the character feature update unit 24 is configured to use the target character feature and the first attention distribution weights of each character feature in the group for a target character feature in a single group of character features to The characteristics of each person in the group are weighted and summed to obtain the update characteristics in the group corresponding to each person in the group, which is used as the updated person characteristic.

在一種可能的實現方式中，所述人物特徵更新單元24，包括：一組間更新特徵確定單元，用於針對各組中的目標分組的目標總體特徵，利用目標總體特徵與各組的總體特徵的第二注意力分配權重，得到各組對應的組間更新特徵；及一人物特徵更新子單元，用於將所述組間更新特徵分別加到目標分組中的各人物對應的組內更新特徵中，得到更新後的人物特徵。In a possible implementation manner, the character feature update unit 24 includes: an inter-group update feature determination unit, configured to use the overall feature of the target and the overall feature of each group for the target overall feature of the target group in each group The second attention distribution weight of each group is obtained corresponding to the update feature between groups; and a character feature update subunit, which is used to add the update feature between the groups to the update feature in the group corresponding to each character in the target grouping , get the updated character features.

在一種可能的實現方式中，所述人物特徵提取單元21，包括：一目標矩形框確定單元，用於對影片幀中進行人體識別，得到各人物的目標矩形框；及一人物特徵提取子單元，用於提取影片幀中的特徵，並利用影片幀中的所述目標矩形框，從提取的影片幀的特徵中匹配得到對應的人物特徵。In a possible implementation manner, the character feature extraction unit 21 includes: a target rectangular frame determination unit, configured to perform human body recognition in movie frames to obtain the target rectangular frame of each character; and a character feature extraction subunit , for extracting features in the film frame, and using the target rectangular frame in the film frame to match the features of the extracted film frame to obtain corresponding character features.

在一些實施例中，本發明實施例提供的裝置具有的功能或包含的模組可以用於執行上文方法實施例描述的方法，其具體實現和技術效果可以參照上文方法實施例的描述，爲了簡潔，這裡不再贅述。In some embodiments, the functions or modules included in the device provided by the embodiments of the present invention can be used to execute the methods described in the above method embodiments, and its specific implementation and technical effects can refer to the descriptions of the above method embodiments, For the sake of brevity, no more details are given here.

本發明實施例還提出一種電腦可讀儲存媒體，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。電腦可讀儲存媒體可以是揮發性或非揮發性電腦可讀儲存媒體。The embodiment of the present invention also proposes a computer-readable storage medium on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. Computer readable storage media can be volatile or non-volatile computer readable storage media.

本發明實施例還提出一種電子設備，包括：一處理器；及一用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置爲調用所述記憶體儲存的指令，以執行上述方法。An embodiment of the present invention also proposes an electronic device, including: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to call the instructions stored in the memory to execute the above method.

本發明實施例還提供了一種電腦程式産品，包括電腦可讀代碼，或者承載有電腦可讀代碼的非揮發性電腦可讀儲存媒體，當所述電腦可讀代碼在電子設備的處理器中運行時，所述電子設備中的處理器執行上述方法。An embodiment of the present invention also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are run in a processor of an electronic device , the processor in the electronic device executes the above method.

電子設備可以被提供爲終端、伺服器或其它形態的設備。Electronic devices may be provided as terminals, servers, or other types of devices.

圖3示出根據本發明實施例的一種電子設備800的方塊圖。例如，電子設備800可以是行動電話，電腦，數位廣播終端，消息收發設備，遊戲控制台，平板設備，醫療設備，健身設備，個人數位助理等終端。Fig. 3 shows a block diagram of an electronic device 800 according to an embodiment of the present invention. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant and other terminals.

參照圖3，電子設備800可以包括以下一個或多個元件：處理元件802，記憶體804，電源元件806，多媒體元件808，音訊元件810，輸入/輸出（I/ O）的介面812，感測器元件814，以及通信元件816。Referring to FIG. 3, the electronic device 800 may include one or more of the following elements: a processing element 802, a memory 804, a power supply element 806, a multimedia element 808, an audio element 810, an input/output (I/O) interface 812, a sensor device element 814, and communication element 816.

處理元件802通常控制電子設備800的整體操作，諸如與顯示，電話呼叫，數據通信，相機操作和記錄操作相關聯的操作。處理元件802可以包括一個或多個處理器820來執行指令，以完成上述的方法的全部或部分步驟。此外，處理元件802可以包括一個或多個模組，便於處理元件802和其他元件之間的互動。例如，處理元件802可以包括多媒體模組，以方便多媒體元件808和處理元件802之間的互動。The processing element 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing element 802 may include one or more modules to facilitate interaction between processing element 802 and other elements. For example, the processing element 802 may include a multimedia module to facilitate interaction between the multimedia element 808 and the processing element 802 .

記憶體804被配置爲儲存各種類型的數據以支持在電子設備800的操作。這些數據的示例包括用於在電子設備800上操作的任何應用程式或方法的指令，聯絡人數據，電話簿數據，消息，圖片，影片等。記憶體804可以由任何類型的揮發性或非揮發性儲存設備或者它們的組合實現，如靜態隨機存取記憶體（SRAM），電子可抹除可程式化唯讀記憶體（EEPROM），可抹除可程式化唯讀記憶體（EPROM），可程式唯讀記憶體（PROM），唯讀記憶體（ROM），磁記憶體，快閃記憶體，磁碟或光碟。The memory 804 is configured to store various types of data to support operations in the electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electronically erasable programmable read-only memory (EEPROM), erasable In addition to programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

電源元件806爲電子設備800的各種元件提供電力。電源元件806可以包括電源管理系統，一個或多個電源，及其他與爲電子設備800生成、管理和分配電力相關聯的元件。The power supply element 806 provides power to various elements of the electronic device 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 800 .

多媒體元件808包括在所述電子設備800和用戶之間的提供一個輸出介面的螢幕。在一些實施例中，螢幕可以包括液晶顯示器（LCD）和觸控面板（TP）。如果螢幕包括觸控面板，螢幕可以被實現爲觸控面板，以接收來自用戶的輸入信號。觸控面板包括一個或多個觸控感測器以感測觸控、滑動和觸控面板上的手勢。所述觸控感測器可以不僅感測觸控或滑動動作的邊界，而且還檢測與所述觸控或滑動操作相關的持續時間和壓力。在一些實施例中，多媒體元件808包括一個前置攝影機和/或後置攝影機。當電子設備800處於操作模式，如拍攝模式或影片模式時，前置攝影機和/或後置攝影機可以接收外部的多媒體數據。每個前置攝影機和後置攝影機可以是一個固定的光學透鏡系統或具有焦距和光學變焦能力。The multimedia component 808 includes a screen providing an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch panel to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia element 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a movie mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.

音訊元件810被配置爲輸出和/或輸入音訊信號。例如，音訊元件810包括一個麥克風（MIC），當電子設備800處於操作模式，如呼叫模式、記錄模式和語音識別模式時，麥克風被配置爲接收外部音訊信號。所接收的音訊信號可以被進一步儲存在記憶體804或經由通信元件816發送。在一些實施例中，音訊元件810還包括一個揚聲器，用於輸出音訊信號。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), which is configured to receive an external audio signal when the electronic device 800 is in an operation mode, such as a calling mode, a recording mode and a voice recognition mode. The received audio signal can be further stored in the memory 804 or sent via the communication element 816 . In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.

I/O介面812爲處理元件802和外圍介面模組之間提供介面，上述外圍介面模組可以是鍵盤，點擊輪，按鈕等。這些按鈕可包括但不限於：主頁按鈕、音量按鈕、啓動按鈕和鎖定按鈕。The I/O interface 812 provides an interface between the processing element 802 and the peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

感測器元件814包括一個或多個感測器，用於爲電子設備800提供各個方面的狀態評估。例如，感測器元件814可以檢測到電子設備800的打開/關閉狀態，元件的相對定位，例如所述元件爲電子設備800的顯示器和小鍵盤，感測器元件814還可以檢測電子設備800或電子設備800一個元件的位置改變，用戶與電子設備800接觸的存在或不存在，電子設備800方位或加速/減速和電子設備800的溫度變化。感測器元件814可以包括接近感測器，被配置用來在沒有任何的物理接觸時檢測附近物體的存在。感測器元件814還可以包括光感測器，如互補式金氧半導體（CMOS）或電荷耦合元件（CCD）圖像感測器，用於在成像應用中使用。在一些實施例中，該感測器元件814還可以包括加速度感測器，陀螺儀感測器，磁感測器，壓力感測器或溫度感測器。Sensor element 814 includes one or more sensors for providing various aspects of status assessment for electronic device 800 . For example, the sensor element 814 can detect the open/closed state of the electronic device 800, the relative positioning of elements such as the display and keypad of the electronic device 800, the sensor element 814 can also detect the electronic device 800 or The position of a component of the electronic device 800 changes, the presence or absence of user contact with the electronic device 800 , the orientation or acceleration/deceleration of the electronic device 800 and the temperature of the electronic device 800 change. The sensor element 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor element 814 may also include a light sensor, such as a complementary metal oxide semiconductor (CMOS) or charge coupled device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor element 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通信元件816被配置爲便於電子設備800和其他設備之間有線或無線方式的通信。電子設備800可以接入基於通信標準的無線網路，如無線網路（WiFi），第二代行動通信技術（2G）或第三代行動通信技術（3G），或它們的組合。在一個示例性實施例中，通信元件816經由廣播頻道接收來自外部廣播管理系統的廣播信號或廣播相關訊息。在一個示例性實施例中，所述通信元件816還包括近場通信（NFC）模組，以促進短程通信。例如，在NFC模組可基於射頻識別（RFID）技術，紅外線數據協會（IrDA）技術，超寬帶（UWB）技術，藍芽（BT）技術和其他技術來實現。The communication element 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related messages from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication element 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性實施例中，電子設備800可以被一個或多個特殊應用積體電路（ASIC）、數位信號處理器（DSP）、數位信號處理設備（DSPD）、可程式邏輯裝置（PLD）、現場可程式邏輯閘陣列（FPGA）、控制器、微控制器、微處理器或其他電子元件實現，用於執行上述方法。In an exemplary embodiment, electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field Programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the method described above.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存媒體，例如包括電腦程式指令的記憶體804，上述電腦程式指令可由電子設備800的處理器820執行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to implement the above method.

圖4示出根據本發明實施例的一種電子設備1900的方塊圖。例如，電子設備1900可以被提供爲一伺服器。參照圖4，電子設備1900包括處理元件1922，其進一步包括一個或多個處理器，以及由記憶體1932所代表的記憶體資源，用於儲存可由處理元件1922執行的指令，例如應用程式。記憶體1932中儲存的應用程式可以包括一個或一個以上的每一個對應於一組指令的模組。此外，處理元件1922被配置爲執行指令，以執行上述方法。Fig. 4 shows a block diagram of an electronic device 1900 according to an embodiment of the present invention. For example, electronic device 1900 may be provided as a server. Referring to FIG. 4 , the electronic device 1900 includes a processing element 1922 , which further includes one or more processors, and a memory resource represented by a memory 1932 for storing instructions executable by the processing element 1922 , such as application programs. The application programs stored in the memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing element 1922 is configured to execute instructions to perform the above method.

電子設備1900還可以包括一個電源元件1926被配置爲執行電子設備1900的電源管理，一個有線或無線網路介面1950被配置爲將電子設備1900連接到網路，和一個輸入輸出（I/O）介面1958。電子設備1900可以操作基於儲存在記憶體1932的作業系統，例如微軟伺服器作業系統（Windows Server ^TM），蘋果公司推出的基於圖形用戶介面作業系統(Mac OS X ^TM)，多用戶多行程的電腦作業系統（Unix ^TM）, 自由和開放原始碼的類Unix作業系統（Linux ^TM），開放原始碼的類Unix作業系統（FreeBSD ^TM）或類似。 Electronic device 1900 may also include a power supply element 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input/output (I/O) Interface 1958. The electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server ^TM ), the operating system based on the graphical user interface (Mac OS X ^TM ) introduced by Apple Inc., multi-user multi-process computer Operating System (Unix ^™ ), Free and Open Source Unix-like Operating System (Linux ^™ ), Open Source Unix-like Operating System (FreeBSD ^™ ) or similar.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存媒體，例如包括電腦程式指令的記憶體1932，上述電腦程式指令可由電子設備1900的處理元件1922執行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 1932 including computer program instructions, which can be executed by the processing element 1922 of the electronic device 1900 to implement the above method.

本發明可以是系統、方法和/或電腦程式産品。電腦程式産品可以包括電腦可讀儲存媒體，其上載有用於使處理器實現本發明的各個方面的電腦可讀程式指令。The present invention can be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for causing a processor to implement various aspects of the invention.

電腦可讀儲存媒體可以是可以保持和儲存由指令執行設備使用的指令的有形設備。電腦可讀儲存媒體例如可以是（但不限於）電儲存設備、磁儲存設備、光儲存設備、電磁儲存設備、半導體儲存設備或者上述的任意合適的組合。電腦可讀儲存媒體的更具體的例子（非窮舉的列表）包括：隨身碟、硬碟、隨機存取記憶體（RAM）、唯讀記憶體（ROM）、可抹除可程式化唯讀記憶體（EPROM或快閃）、靜態隨機存取記憶體（SRAM）、光碟唯讀記憶體（CD-ROM）、數位多功能影音光碟（DVD）、記憶棒、軟碟、機械編碼設備、例如其上儲存有指令的打孔卡或凹槽內凸起結構、以及上述的任意合適的組合。這裡所使用的電腦可讀儲存媒體不被解釋爲瞬時信號本身，諸如無線電波或者其他自由傳播的電磁波、通過波導或其他傳輸媒介傳播的電磁波（例如，通過光纖電纜的光脈衝）、或者通過電線傳輸的電信號。A computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: pen drives, hard drives, random access memory (RAM), read-only memory (ROM), erasable programmable read-only Memory (EPROM or Flash), Static Random Access Memory (SRAM), Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD), Memory Stick, Floppy Disk, Mechanically Encoded Devices, such as Punched cards or raised-in-recess structures having instructions stored thereon, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

這裡所描述的電腦可讀程式指令可以從電腦可讀儲存媒體下載到各個計算/處理設備，或者通過網路、例如網際網路、區域網路、廣域網路和/或無線網路下載到外部電腦或外部儲存設備。網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換器、網關電腦和/或邊緣伺服器。每個電腦/處理設備中的網路配接卡或者網路介面從網路接收電腦可讀程式指令，並轉發該電腦可讀程式指令，以供儲存在各個電腦/處理設備中的電腦可讀儲存媒體中。The computer readable program instructions described herein may be downloaded from a computer readable storage medium to each computing/processing device, or to an external computer over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computer/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable storage in each computer/processing device in storage media.

用於執行本發明操作的電腦程式指令可以是彙編指令、指令集架構（ISA）指令、機器指令、機器相關指令、微代碼、韌體指令、狀態設置數據、或者以一種或多種程式語言的任意組合編寫的源代碼或目標代碼，所述程式語言包括面向對象的程式語言—諸如Smalltalk、C++等，以及常規的程序性程式語言—諸如“C”語言或類似的程式語言。電腦可讀程式指令可以完全地在用戶電腦上執行、部分地在用戶電腦上執行、作爲一個獨立的軟體套件執行、部分在用戶電腦上部分在遠端電腦上執行、或者完全在遠端電腦或伺服器上執行。在涉及遠端電腦的情形中，遠端電腦可以通過任意種類的網路—包括區域網路(LAN)或廣域網路(WAN)—連接到用戶電腦，或者，可以連接到外部電腦（例如利用網際網路服務供應商來通過網際網路連接）。在一些實施例中，通過利用電腦可讀程式指令的狀態訊息來個性化定制電子電路，例如可程式邏輯電路、現場可程式邏輯閘陣列（FPGA）或可程式邏輯陣列（PLA），該電子電路可以執行電腦可讀程式指令，從而實現本發明的各個方面。Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or any Composition of source or object code written in programming languages including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or executed on the server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using the Internet) Internet Service Provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs), are customized by utilizing state information of computer readable program instructions, the electronic circuits Computer readable program instructions can be executed to implement various aspects of the invention.

這裡參照根據本發明實施例的方法、裝置（系統）和電腦程式產品的流程圖和/或方塊圖描述了本發明的各個方面。應當理解，流程圖和/或方塊圖的每個方塊以及流程圖和/或方塊圖中各方塊的組合，都可以由電腦可讀程式指令實現。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowchart and/or block diagram and combinations of blocks in the flowchart and/or block diagram can be implemented by computer readable program instructions.

這些電腦可讀程式指令可以提供給通用電腦、專用電腦或其它可程式數據處理裝置的處理器，從而生產出一種機器，使得這些指令在通過電腦或其它可程式數據處理裝置的處理器執行時，產生了實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的裝置。也可以把這些電腦可讀程式指令儲存在電腦可讀儲存媒體中，這些指令使得電腦、可程式數據處理裝置和/或其他設備以特定方式工作，從而，儲存有指令的電腦可讀介質則包括一個製造品，其包括實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的各個方面的指令。These computer-readable program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, or other programmable data-processing device to produce a machine such that, when executed by the processor of the computer or other programmable data-processing device, Means that implement the function/action specified in one or more blocks in the flowchart and/or block diagram are produced. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause a computer, a programmable data processing device and/or other equipment to work in a specific way, so that a computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

也可以把電腦可讀程式指令加載到電腦、其它可程式數據處理裝置、或其它設備上，使得在電腦、其它可程式數據處理裝置或其它設備上執行一系列操作步驟，以產生電腦實現的過程，從而使得在電腦、其它可程式數據處理裝置、或其它設備上執行的指令實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

圖式中的流程圖和方塊圖顯示了根據本發明的多個實施例的系統、方法和電腦程式產品的可能實現的體系架構、功能和操作。在這點上，流程圖或方塊圖中的每個方塊可以代表一個模組、程式段或指令的一部分，所述模組、程式段或指令的一部分包含一個或多個用於實現規定的邏輯功能的可執行指令。在有些作爲替換的實現中，方塊中所標註的功能也可以以不同於圖式中所標註的順序發生。例如，兩個連續的方塊實際上可以基本並行地執行，它們有時也可以按相反的順序執行，這依所涉及的功能而定。也要注意的是，方塊圖和/或流程圖中的每個方塊、以及方塊圖和/或流程圖中的方塊的組合，可以用執行規定的功能或動作的專用的基於硬體的系統來實現，或者可以用專用硬體與電腦指令的組合來實現。The flowchart and block diagrams in the drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction that contains one or more logic for implementing the specified Executable instructions for a function. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also to be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action implemented, or may be implemented using a combination of dedicated hardware and computer instructions.

該電腦程式產品可以具體通過硬體、軟體或其結合的方式實現。在一個可選實施例中，所述電腦程式產品具體體現爲電腦儲存媒體，在另一個可選實施例中，電腦程式產品具體體現爲軟體產品，例如軟體開發套件(Software Development Kit，SDK)等等。The computer program product can be realized by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium. In another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. wait.

以上已經描述了本發明的各實施例，上述說明是示例性的，並非窮盡性的，並且也不限於所披露的各實施例。在不偏離所說明的各實施例的範圍和精神的情況下，對於本技術領域的普通技術人員來說許多修改和變更都是顯而易見的。本文中所用術語的選擇，旨在最好地解釋各實施例的原理、實際應用或對市場中的技術的改進，或者使本技術領域的其它普通技術人員能理解本文披露的各實施例。Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

惟以上所述者，僅為本發明的實施例而已，當不能以此限定本發明實施的範圍，凡是依本發明申請專利範圍及專利說明書內容所作的簡單的等效變化與修飾，皆仍屬本發明專利涵蓋的範圍內。But the above-mentioned ones are only embodiments of the present invention, and should not limit the scope of the present invention. All simple equivalent changes and modifications made according to the patent scope of the present invention and the content of the patent specification are still within the scope of the present invention. Within the scope covered by the patent of the present invention.

S11~S16:步驟 20:行爲識別裝置 21:人物特徵提取單元 22:聚類單元 23:注意力分配單元 24:人物特徵更新單元 25:人物時空特徵提取單元 26:行為識別單元 800、1900: 電子設備 802、1922:處理元件 804、1932:記憶體 806、1926:電源元件 808:多媒體元件 810:音訊元件 812:輸入/輸出介面 814:感測器元件 816:通信元件 820:處理器 1950:網路介面 1958:輸入輸出介面 S11~S16: Steps 20: Behavior recognition device 21: Character feature extraction unit 22: Clustering unit 23: Attention allocation unit 24:Characteristic update unit 25: Character space-time feature extraction unit 26: Behavior recognition unit 800, 1900: Electronic equipment 802, 1922: Processing elements 804, 1932: memory 806, 1926: Power components 808:Multimedia components 810:Audio components 812: input/output interface 814: sensor element 816: Communication components 820: Processor 1950: Web interface 1958: Input and output interface

本發明的其他的特徵及功效，將於參照圖式的實施方式中清楚地呈現，其中：圖1示出根據本發明實施例的一種行爲識別方法的流程圖；圖2示出根據本發明實施例的一種行爲識別裝置的方塊圖；圖3示出根據本發明實施例的一種電子設備的方塊圖；及圖4示出根據本發明實施例的一種電子設備的方塊圖。 Other features and effects of the present invention will be clearly presented in the implementation manner with reference to the drawings, wherein: Fig. 1 shows a flow chart of a behavior recognition method according to an embodiment of the present invention; FIG. 2 shows a block diagram of a behavior recognition device according to an embodiment of the present invention; FIG. 3 shows a block diagram of an electronic device according to an embodiment of the present invention; and Fig. 4 shows a block diagram of an electronic device according to an embodiment of the present invention.

S11~S16:步驟 S11~S16: Steps

Claims

A behavior recognition method, comprising: receiving an input movie frame, and extracting character features in the movie frame; Clustering a plurality of the character features in the movie frame to obtain a clustering result; Determining the attention distribution weights of the character features in the film frame based on the clustering result; updating the character features based on the attention distribution weights; Based on the updated character features, extracting character spatio-temporal features; and Based on the spatiotemporal characteristics of the person, conduct behavior recognition on the film frame to obtain a recognition result.

The method as described in claim 1, wherein said determination of the attention distribution weights of the character features in the film frame based on the clustering results includes: Based on the association relationship between the character features in the clustering result, determine the attention distribution weights among the character features.

The method according to claim 2, wherein, based on the association relationship between the character features in the clustering result, determining the attention distribution weight between the character features includes: determining the first similarity between the character features in the same group obtained by clustering; and Based on the first similarity, a first attention distribution weight among characters in the group is determined.

The method as described in claim 3, wherein determining the first similarity between the characteristics of the characters in the same group obtained by clustering includes: Dividing the feature matrix of the character features into N parts; Computing corresponding similarities for N features of different character features, to obtain N first similarities; and Based on the first similarity, determining the first attention distribution weight between the characters in the group includes: Based on the N first similarities, determine N first attention distribution weights among the characters in the group.

The method according to any one of claim items 2-4, wherein, based on the association relationship between the character features in the clustering result, determining the attention distribution weight between the character features includes: Determine the overall characteristics of each group obtained by clustering; Determining a second degree of similarity between the overall characteristics of each group obtained by clustering; and Based on the second similarity, a second attention distribution weight between the character features is determined.

The method according to claim 1, wherein updating the character features based on the attention distribution weights includes: For the target character feature in the character features of a single group, use the target character feature and the first attention distribution weight of each character feature in the group, carry out weighted summation on each character feature in the group, and obtain the corresponding weight of each character in the group The update feature within the group is used as the updated character feature.

The method according to claim 1, wherein updating the character features based on the attention distribution weights includes: For the target overall feature of the target grouping in each group, using the second attention distribution weight of the target overall feature and the overall feature of each group, obtain the update feature between groups corresponding to each group; and The inter-group update features are respectively added to the intra-group update features corresponding to the characters in the target group to obtain updated character features.

The method according to claim 1, wherein said extracting the spatio-temporal features of the character based on the updated character features includes: Spatial decoding is performed on the updated character features to obtain the spatio-temporal features of the character.

The method as described in claim item 8, wherein spatial decoding is performed on the updated character features to obtain the character spatio-temporal features, including: Perform spatial decoding on the updated character features to obtain the character spatial features; performing time-domain encoding and decoding on the character features of multiple film frames to obtain character time-domain features; and The space-time feature of the person is fused with the time-domain feature of the person to obtain the space-time feature of the person.

The method according to claim 9, wherein the time-domain encoding and decoding of the character features of multiple film frames to obtain the character time-domain features includes: Encoding the character features of multiple film frames based on a self-attention mechanism to obtain time-domain encoding features; and Decoding the time-domain coding features based on a self-attention mechanism, and/or decoding the time-domain coding features based on spatial coding features, to obtain character time-domain features; wherein, the spatial coding features are the updated character traits.

The method according to claim 10, wherein the spatial decoding of the updated character features to obtain the character spatial features includes: The spatial encoding features are decoded based on a self-attention mechanism, and/or the spatial encoding features are decoded based on the time-domain encoding features to obtain character spatial features.

The method according to any one of claim items 8-11, wherein the method further comprises: extracting global features of the film frame; Using the character spatio-temporal features to determine the third attention distribution weight in the global features; updating the global features using the third attention distribution weights; The step of performing behavior recognition on the movie frame based on the spatio-temporal characteristics of the character to obtain a recognition result includes: Based on the updated global features, crowd behavior recognition is performed on the movie frame to obtain a crowd behavior recognition result.

The method according to claim 12, wherein, after using the third attention distribution weight to update the global features, the method further includes: The updated global feature is used as a new global feature, and the person's spatio-temporal feature is used as a new character feature, and the global feature and the person's spatio-temporal feature are iteratively updated until the iterative stop condition is satisfied, and the iteratively updated Global characteristics and character spatiotemporal characteristics; and The step of performing behavior recognition on the movie frame based on the spatio-temporal characteristics of the character to obtain a recognition result includes: Based on the iteratively updated global features, crowd behavior recognition is performed on the movie frame to obtain a crowd behavior recognition result.

The method according to claim 13, wherein, after obtaining the iteratively updated global features and character spatio-temporal features, the method further includes: Performing character behavior recognition based on the iteratively updated spatiotemporal features of the character to obtain a character behavior recognition result.

The method as claimed in claim 1, wherein said extracting character features in said movie frame includes: Carry out human body recognition in the video frame to obtain the target rectangular frame of each character; and The features in the film frame are extracted, and the corresponding character features are obtained by matching the features of the extracted film frame by using the target rectangular frame in the film frame.

An electronic device comprising: a processor; and a memory for storing processor-executable instructions; Wherein, the processor is configured to invoke instructions stored in the memory to execute the method described in any one of claim items 1 to 15.

A computer-readable storage medium, on which computer program instructions are stored, wherein the computer program instructions implement the method described in any one of claims 1 to 15 when executed by a processor.