TW202145131A

TW202145131A - Video processing method and device, electronic equipment and storage medium

Info

Publication number: TW202145131A
Application number: TW110100570A
Authority: TW
Inventors: 孫賀然; 王磊; 白登峰; 夏建明; 曹軍
Original assignee: 大陸商北京市商湯科技開發有限公司
Priority date: 2020-05-22
Filing date: 2021-01-07
Publication date: 2021-12-01
Also published as: JP2022537475A; KR20210144658A; CN111553323A; WO2021232775A1

Abstract

The invention relates to a video processing method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a video, wherein at least part of video frames in the video comprise a target object; according to the video, detecting at least one kind of learning behaviors of the target object in the teaching course watching process; and when it is detected that the target object executes the at least one type of learning behavior, generating learning state information according to the video frame at least partially containing the at least one type of learning behavior and/or the duration of executing the at least one type of learning behavior by the target object.

Description

Video processing method and device, electronic device and computer-readable storage medium

本發明要求在2020年05月22日提交中國專利局、申請號爲202010442733.6、申請名稱爲“視頻處理方法及裝置、電子設備和存儲介質”的中國專利申請的優先權，其全部內容通過引用結合在本發明中。The present invention claims the priority of the Chinese patent application with the application number of 202010442733.6 and the application title of "video processing method and device, electronic equipment and storage medium" submitted to the Chinese Patent Office on May 22, 2020, the entire contents of which are incorporated by reference in the present invention.

本發明涉及電腦視覺領域，尤其涉及一種視訊處理方法及裝置、電子設備和電腦可讀儲存介質。The present invention relates to the field of computer vision, and in particular, to a video processing method and device, an electronic device and a computer-readable storage medium.

在教學過程中，由於老師需要集中精力授課，使得機構或者老師難以掌握學生的聽課狀態，家長也無法瞭解孩子在學校的表現。學生是否真正上課以及是否在認真聽課、課堂互動表現如何，都無法量化評估。In the teaching process, because teachers need to concentrate on teaching, it is difficult for institutions or teachers to grasp the status of students' lectures, and parents cannot understand their children's performance in school. Whether students are actually in class, whether they are listening carefully, and how well they interact in class cannot be quantitatively assessed.

因此，如何在保證教學質量的同時，掌握每個學生在教學過程中的學習狀態，成爲目前一個極待解決的問題。Therefore, how to grasp the learning status of each student in the teaching process while ensuring the quality of teaching has become an extremely unsolved problem at present.

本發明提出了一種視訊處理的方案。The present invention proposes a solution for video information processing.

根據本發明的一方面，提供了一種視訊處理方法，包括：According to an aspect of the present invention, a video processing method is provided, comprising:

獲取視訊，其中，所述視訊中的至少部分視訊幀包含目標對象；根據所述視訊，對所述目標對象在觀看教學課程過程中的至少一類學習行爲進行檢測；在檢測到所述目標對象執行至少一類學習行爲的情況下，根據至少部分包含所述至少一類學習行爲的視訊幀和/或所述目標對象執行所述至少一類學習行爲的持續時間，生成學習狀態訊息。根據本發明的一方面，提供了一種視訊處理裝置，包括：Acquiring video information, wherein at least part of the video frames in the video information includes a target object; according to the video information, detect at least one type of learning behavior of the target object in the process of watching the teaching course; In the case of at least one type of learning behavior, the learning status message is generated according to a video frame at least partially including the at least one type of learning behavior and/or the duration of the target object performing the at least one type of learning behavior. According to an aspect of the present invention, a video processing apparatus is provided, comprising:

視訊獲取模組，用於獲取視訊，其中，所述視訊中的至少部分視訊幀包含目標對象；a video acquisition module for acquiring video, wherein at least some of the video frames in the video contain a target object;

檢測模組，用於根據所述視訊，對所述目標對象在觀看教學課程過程中的至少一類學習行爲進行檢測；a detection module, configured to detect at least one type of learning behavior of the target object in the process of watching the teaching course according to the video;

生成模組，用於在檢測到所述目標對象執行至少一類學習行爲的情況下，根據至少部分包含所述至少一類學習行爲的視訊幀和/或所述目標對象執行所述至少一類學習行爲的持續時間，生成學習狀態訊息。The generating module is used for, when it is detected that the target object performs at least one type of learning behavior, according to a video frame that at least partially contains the at least one type of learning behavior and/or the target object performs the at least one type of learning behavior. Duration to generate learning status messages.

根據本發明的一方面，提供了一種電子設備，包括：According to an aspect of the present invention, an electronic device is provided, comprising:

處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置爲：執行上述視訊處理方法。a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: execute the above video processing method.

根據本發明的一方面，提供了一種電腦可讀儲存介質，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述視訊處理方法。According to an aspect of the present invention, a computer-readable storage medium is provided, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned video processing method is implemented.

根據本發明的一方面，提供了一種電腦程式，包括電腦可讀代碼，當所述電腦可讀代碼在電子設備中運行時，所述電子設備中的處理器執行用於實現上述視訊處理方法。According to an aspect of the present invention, a computer program is provided, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the above-mentioned video processing method.

在本發明實施例中，可以在檢測到目標對象存在至少一類學習行爲的情況下，利用包含學習行爲的視訊幀來生成直觀的學習狀態訊息，以及根據學習行爲的持續時間來生成量化的學習狀態訊息，採用上述方式可以靈活地得到具有評估價值的學習狀態訊息，便於老師或家長等相關人員與機構，有效且準確地掌握學生的學習狀態。In the embodiment of the present invention, when it is detected that the target object has at least one type of learning behavior, a video frame containing the learning behavior can be used to generate an intuitive learning state message, and a quantified learning state can be generated according to the duration of the learning behavior. The above methods can flexibly obtain learning status information with evaluation value, which is convenient for teachers or parents and other relevant personnel and institutions to effectively and accurately grasp the learning status of students.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本發明。根據下面參考圖式對示例性實施例的詳細說明，本發明的其它特徵及方面將變得清楚。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention. Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments with reference to the drawings.

以下將參考圖式詳細說明本發明的各種示例性實施例、特徵和方面。圖式中相同的圖式標記表示功能相同或相似的元件。儘管在圖式中示出了實施例的各種方面，但是除非特別指出，不必按比例繪製圖式。Various exemplary embodiments, features and aspects of the present invention will be described in detail below with reference to the drawings. The same reference numerals in the figures denote elements with the same or similar function. Although various aspects of the embodiments are shown in the drawings, the drawings are not necessarily to scale unless otherwise indicated.

在這裏專用的詞“示例性”意爲“用作例子、實施例或說明性”。這裏作爲“示例性”所說明的任何實施例不必解釋爲優於或好於其它實施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

本文中術語“和/或”，僅僅是一種描述關聯對象的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. three situations. In addition, the term "at least one" herein refers to any combination of any one of a plurality or at least two of a plurality, for example, including at least one of A, B, and C, and may mean including those composed of A, B, and C. Any one or more elements selected in the collection.

另外，爲了更好地說明本發明，在下文的具體實施方式中給出了眾多的具體細節。本領域技術人員應當理解，沒有某些具體細節，本發明同樣可以實施。在一些實例中，對於本領域技術人員熟知的方法、手段、元件和電路未作詳細描述，以便於凸顯本發明的主旨。In addition, in order to better illustrate the present invention, numerous specific details are given in the following detailed description. It will be understood by those skilled in the art that the present invention may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present invention.

圖1示出根據本發明一實施例的視訊處理方法的流程圖，該方法可以應用於視訊處理裝置，視訊處理裝置可以爲終端設備、伺服器或者其他處理設備等。其中，終端設備可以爲用戶設備（User Equipment，UE）、行動設備、用戶終端、終端、行動電話、無線電話、個人數位助理（Personal Digital Assistant，PDA）、手持設備、計算設備、車載設備、可穿戴設備等。在一個示例中，該數據處理方法可以應用於雲端伺服器或本地伺服器，雲端伺服器可以爲公有雲伺服器，也可以爲私有雲伺服器，根據實際情況靈活選擇即可。FIG. 1 shows a flowchart of a video processing method according to an embodiment of the present invention. The method can be applied to a video processing apparatus, and the video processing apparatus can be a terminal device, a server, or other processing equipment. The terminal device may be User Equipment (UE), mobile device, user terminal, terminal, mobile phone, wireless phone, Personal Digital Assistant (PDA), handheld device, computing device, in-vehicle device, mobile phone, etc. wearable devices, etc. In one example, the data processing method can be applied to a cloud server or a local server, and the cloud server can be a public cloud server or a private cloud server, which can be flexibly selected according to actual conditions.

在一些可能的實現方式中，該視訊處理方法也可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。In some possible implementations, the video processing method can also be implemented by the processor calling computer-readable instructions stored in the memory.

如圖1所示，在一種可能的實現方式中，所述視訊處理方法可以包括：As shown in FIG. 1, in a possible implementation manner, the video processing method may include:

步驟S11，獲取視訊，其中，視訊中的至少部分視訊幀包含目標對象。Step S11 , acquiring video information, wherein at least some of the video frames in the video information include the target object.

步驟S12，根據視訊，對目標對象在觀看教學課程過程中的至少一類學習行爲進行檢測。Step S12, according to the video, detect at least one type of learning behavior of the target object in the process of watching the teaching course.

步驟S13，在檢測到目標對象執行至少一類學習行爲的情況下，根據至少部分包含至少一類學習行爲的視訊幀和/或目標對象執行至少一類學習行爲的持續時間，生成學習狀態訊息。Step S13, when it is detected that the target object performs at least one type of learning behavior, generate learning status information according to at least part of the video frames including at least one type of learning behavior and/or the duration of the target object performing at least one type of learning behavior.

其中，目標對象可以是任意被獲取學習狀態訊息的對象，即具有學習狀態評價需求的對象，其具體實現形式可以根據實際情況靈活確定。在一種可能的實現方式中，目標對象可以是學生，比如小學生、中學生或大學生等；在一種可能的實現方式中，目標對象可以是進修的成年人，比如參加職業教育培訓的成年人，或是在老年大學中學習的老年人等。Among them, the target object can be any object whose learning status information is obtained, that is, an object with learning status evaluation requirements, and its specific implementation form can be flexibly determined according to the actual situation. In a possible implementation, the target object can be students, such as primary school students, middle school students or college students; Seniors studying in senior colleges, etc.

本發明實施例中，視訊可以是目標對象在觀看教學課程過程中所錄製的視訊，其中，教學課程的實現形式不受限定，可以是預先錄製的課程視訊，也可以是直播課程或是教師現場授課的課程等；視訊中的至少部分視訊幀可以包含目標對象，即錄製的視訊中目標對象的出現情況可以根據實際情況靈活決定。在一種可能的實現方式中，目標對象可以一直在視訊中，在一種可能的實現方式中，目標對象也可以在某些時刻或某些時間段，未出現在視訊幀等。In this embodiment of the present invention, the video may be the video recorded by the target object in the course of watching the teaching course, wherein the implementation form of the teaching course is not limited, and may be a pre-recorded course video, a live course or a teacher's live course Courses taught, etc.; at least part of the video frames in the video may contain the target object, that is, the appearance of the target object in the recorded video can be flexibly determined according to the actual situation. In a possible implementation manner, the target object may be in the video all the time, and in a possible implementation manner, the target object may also not appear in the video frame at certain times or in certain time periods.

目標對象觀看教學課程的場景可以根據實際情況靈活決定，在一種可能的實現方式中，這一場景可以爲線上場景，即目標對象通過網路課堂等在線教育方式觀看教學課程等；在一種可能的實現方式中，這一場景也可以爲線下場景，即目標對象通過傳統的面對面授課的方式來觀看老師當場傳授的教學課程，或是目標對象在教室等特定的教學場所中觀看通過視訊或其他多媒體形式所播放的教學課程等。The scene in which the target object watches the teaching course can be flexibly determined according to the actual situation. In a possible implementation, this scene can be an online scene, that is, the target object watches the teaching course through online education methods such as online classrooms, etc.; In the implementation method, this scene can also be an offline scene, that is, the target object watches the teaching course taught by the teacher on the spot through the traditional face-to-face teaching method, or the target object watches in a specific teaching place such as a classroom through video or other teaching methods. Teaching courses played in the form of multimedia, etc.

視訊的具體實現形式可以根據視訊處理方法的應用場景靈活決定。在一種可能的實現方式中，視訊可以是實時視訊，比如目標對象通過在線課堂學習的過程中所實時錄製的視訊，或是目標對象在教室上課的過程中，通過部署在教室的攝影機來採集到的實時視訊等；在一種可能的實現方式中，視訊也可以是錄製視訊，比如目標對象通過在線課堂學習後，錄製到的目標對象學習的回放視訊，或是目標對象在教室上課後，通過部署在教室的攝影機來採集到的完整課堂學習視訊等。The specific implementation form of the video message can be flexibly determined according to the application scenario of the video message processing method. In a possible implementation manner, the video may be real-time video, such as the real-time video recorded by the target object in the process of learning in the online class, or the video captured by the target object through the camera deployed in the classroom when the target object is in the classroom. real-time video, etc.; in a possible implementation, the video can also be recorded video, for example, after the target object learns in the online classroom, the recorded playback video of the target object’s learning, or the target object is in the classroom after class, through deployment The complete classroom learning video captured by the camera in the classroom, etc.

爲了便於描述，後續各發明實施例均以視訊是目標對象通過在線課堂學習的過程中所實時錄製的視訊爲例，對視訊處理過程進行說明。其他應用場景下的視訊處理過程可以參考後續各發明實施例進行靈活擴展，在此不再贅述。For ease of description, the following embodiments of the invention take the video being recorded in real time by the target object in the process of learning in an online classroom as an example to describe the video processing process. Video processing procedures in other application scenarios can be flexibly expanded with reference to subsequent embodiments of the invention, and details are not described herein again.

在通過步驟S11獲取如上述各發明實施例所述的視訊後，可以通過步驟S12，對目標對象在觀看教學課程過程中的至少一類學習行爲進行檢測。其中被檢測的學習行爲的種類和數量可以根據實際情況靈活決定，不局限於下述各發明實施例。在一種可能的實現方式中，目標對象執行的學習行爲可以包括以下行爲中的至少一類：比如執行至少一種目標手勢、表現目標情緒、關注教學課程的展示區域、與其他對象産生至少一種互動行爲、在視訊中的至少部分視訊幀中未出現、閉眼以及在教學課程的展示區域內的目光交流等。After acquiring the video according to the above-mentioned embodiments of the invention in step S11, in step S12, at least one type of learning behavior of the target object in the process of watching the teaching course can be detected. The type and quantity of the detected learning behaviors can be flexibly determined according to the actual situation, and are not limited to the following invention embodiments. In a possible implementation manner, the learning behavior performed by the target object may include at least one of the following behaviors: for example, performing at least one target gesture, expressing target emotion, paying attention to the display area of the teaching course, generating at least one interactive behavior with other objects, Absence in at least some of the video frames in the video, eyes closed, eye contact in the presentation area of the teaching session, etc.

其中，目標手勢可以是反映目標對象在觀看教學課程過程中，可能會産生的某些預設手勢，其具體實現形式可以根據實際情況靈活設定，詳見後續各發明實施例，在此先不做展開。Among them, the target gesture may be some preset gestures that may be generated by the target object in the process of watching the teaching course, and the specific implementation form can be flexibly set according to the actual situation. For details, please refer to the subsequent embodiments of the invention, which will not be described here. Expand.

目標情緒可以是目標對象在觀看教學課程過程中，反映出對教學課程真實感受的某些情緒，其具體實現形式同樣可以根據實際情況靈活設定，在此先不做展開。The target emotion can be some emotions that the target object reflects on the real feeling of the teaching course in the process of watching the teaching course, and its specific realization form can also be flexibly set according to the actual situation, and will not be expanded here.

關注教學課程的展示區域可以體現目標對象在觀看教學課程的過程中的關注度，其中，展示區域的具體區域範圍可以根據實際情況靈活設定，不局限於下述各發明實施例。在一種可能的實現方式中，展示區域可以是線上課堂中教學課程視訊的展示區域，比如學生通過電腦、手機或是平板電腦等終端設備進行在線學習的過程中，展示區域可以是這些終端設備中播放教學課程的螢幕等；在一種可能的實現方式中，展示區域可以是線下課堂中教師的教學區域，比如教室中的講臺或是黑板等位置。Focusing on the display area of the teaching course can reflect the attention of the target object in the process of watching the teaching course, wherein the specific area range of the display area can be flexibly set according to the actual situation, and is not limited to the following invention embodiments. In a possible implementation manner, the display area may be the display area of the teaching course video in the online classroom. For example, during the process of online learning by terminal devices such as computers, mobile phones, or tablet computers, the display area may be the display area of these terminal devices. The screen that plays the teaching course, etc.; in a possible implementation, the display area can be the teaching area of the teacher in the offline classroom, such as the podium or the blackboard in the classroom.

與其他對象産生至少一種互動行爲可以是目標對象在觀看教學課程過程中，與教學課程中相關的其他對象所産生的與學習相關的互動，其中，其他對象的實現形式可以根據實際情況靈活決定，在一種可能的實現方式中，其他對象可以是授課對象，比如教師等，在一種可能的實現方式中，其他對象也可以是教學過程中除目標對象以外的學習對象，比如目標對象的同學等；與其他對象産生的互動行爲可以根據對象的不同靈活發生變化，在一種可能的實現方式中，在其他對象爲授課教師的情況下，與其他對象産生的互動可以包括接收老師所發送的獎勵，比如收到老師發的小紅花或點名表揚等，在一種可能的實現方式中，在其他對象爲授課教師的情況下，與其他對象産生的互動可以包括回答老師的問題或根據老師的點名進行發言等，在一種可能的實現方式中，在其他對象爲同學的情況下，與其他對象産生的互動可以包括小組互助、小組討論或小組學習等。At least one interactive behavior with other objects can be the learning-related interaction generated by the target object with other objects related to the teaching course during the course of watching the teaching course. The realization form of other objects can be flexibly determined according to the actual situation. In a possible implementation manner, other objects may be teaching objects, such as teachers, etc. In a possible implementation manner, other objects may also be learning objects other than the target object in the teaching process, such as the target object's classmates, etc.; The interaction behavior with other objects can change flexibly according to different objects. In a possible implementation, when the other objects are teachers, the interaction with other objects can include receiving rewards sent by teachers, such as In a possible implementation, when the other object is the teacher, the interaction with other objects may include answering the teacher's question or speaking according to the teacher's call, etc. , in a possible implementation manner, in the case that other objects are classmates, the interaction with other objects may include group mutual assistance, group discussion or group study and so on.

在視訊中的至少部分視訊幀中未出現可以是學習對象在某些時刻或某些時間段，離開了教學課程等情況，比如目標對象在線學習的過程中，可能由於私人原因暫時離開當前的在線學習設備，或是離開當前的在線學習設備的拍攝範圍內等。Not appearing in at least part of the video frames in the video may be that the learning object has left the teaching course at certain moments or certain time periods. For example, during the online learning process of the target object, he may temporarily leave the current online for personal reasons. Learning equipment, or out of the shooting range of the current online learning equipment, etc.

閉眼可以是目標對象在觀看教學課程的過程中進行的閉眼操作，在教學課程的展示區域內的目光交流，可以是觀看教學課程的展示區域，與之相對應的，根據視訊中目標對象在教學課程的展示區域內的目光交流的情況，還可以進一步確定目標對象未觀看教學課程的展示區域的情況等。Eye-closing can be the eye-closing operation performed by the target object in the process of watching the teaching course. The eye contact in the display area of the teaching course can be the display area of watching the teaching course. Correspondingly, according to the target object in the video The situation of eye contact in the display area of the course, the situation of the target object not watching the display area of the teaching course, etc. can be further determined.

通過上述發明實施例中提到的各類學習行爲，可以對目標對象的學習過程進行全面且靈活的行爲檢測，從而提升根據檢測所得到的學習狀態訊息的全面性和準確性，更加靈活準確地掌握目標對象的學習狀態。Through the various learning behaviors mentioned in the above embodiments of the invention, comprehensive and flexible behavior detection can be performed on the learning process of the target object, so as to improve the comprehensiveness and accuracy of the learning status information obtained according to the detection, and to be more flexible and accurate. Grasp the learning status of the target object.

具體地，步驟S12對上述發明實施例中的各類學習行爲執行哪類或哪幾類檢測，可以根據實際情況靈活設定。在一種可能的實現方式中，可以對上述發明實施例中提到的各類學習行爲同時進行檢測，具體的檢測方式與過程可以詳見下述各發明實施例，在此先不做展開。Specifically, in step S12, which type or which types of detections are performed on the various learning behaviors in the above embodiments of the invention can be flexibly set according to the actual situation. In a possible implementation manner, the various learning behaviors mentioned in the above embodiments of the invention may be detected at the same time, and the specific detection methods and processes can be found in the following embodiments of the invention, which will not be expanded here.

在檢測到目標對象執行至少一類學習行爲的情況下，可以根據至少部分包含至少一類學習行爲的視訊幀和/或目標對象執行至少一類學習行爲的持續時間，來生成學習狀態訊息。其中，學習狀態訊息的具體實現形式，可以根據學習行爲的種類，以及對應執行的操作所靈活決定。在一種可能的實現方式中，在根據至少部分包含至少一類學習行爲的視訊幀生成學習狀態訊息的情況下，學習狀態訊息可以包括由視訊幀所組成的訊息；在一種可能的實現方式中，在根據目標對象執行至少一類學習行爲的持續時間的情況下，學習狀態訊息可以爲數位形式的數據訊息；在一種可能的實現方式中，學習狀態訊息也可以同時包含有視訊幀訊息和數據訊息這兩種形式的訊息；在一種可能的實現方式中，學習狀態訊息也可以包含其他狀態的訊息等。具體地，如何生成學習狀態訊息以及學習狀態訊息的實現形式，可以參考後續各發明實施例，在此先不做展開。When it is detected that the target object performs at least one type of learning behavior, the learning status message may be generated according to a video frame at least partially including the at least one type of learning behavior and/or the duration of the target object performing at least one type of learning behavior. The specific implementation form of the learning status information can be flexibly determined according to the type of learning behavior and the corresponding operation. In a possible implementation, when the learning state information is generated according to at least part of the video frames containing at least one type of learning behavior, the learning state information may include information composed of video frames; in a possible implementation, in the When the target object performs at least one type of learning behavior for the duration, the learning status message may be a digital data message; in a possible implementation manner, the learning status message may also include both video frame messages and data messages. One form of information; in a possible implementation, the learning state information may also include information of other states, and so on. Specifically, for how to generate the learning status information and the implementation form of the learning status information, reference may be made to subsequent embodiments of the invention, which will not be described here.

上述發明實施例中已經提到，視訊可以是目標對象在觀看教學課程過程中所錄製的視訊，而目標對象觀看教學課程的場景可以根據實際情況靈活決定，因此，相應地，步驟S11中獲取視訊的方式也可以根據場景不同而靈活發生變化。在一種可能的實現方式中，在目標對象觀看教學課程的場景爲線上場景的情況下，即目標對象可以通過在線課堂觀看教學課程的情況下，獲取視訊的方式可以包括：如果視訊處理裝置與目標對象進行在線學習的設備爲同一裝置，則可以通過目標對象進行在線學習的設備對目標對象觀看教學課程的過程進行視訊採集；如果視訊處理裝置與目標對象進行在線學習的設備爲不同裝置，則可以通過目標對象進行在線學習的設備對目標對象觀看教學課程的過程進行視訊採集，並實時和/或非實時地傳輸到視訊處理裝置中。在一種可能的實現方式中，在目標對象觀看教學課程的場景爲線下場景的情況下，即目標對象參加面對面授課或是在特定教學場景中觀看教學課程視訊等情況下，獲取視訊的方式可以包括：通過部署在線下的圖像採集設備（如普通攝影機、應安全需求部署的拍攝裝置等）來採集目標對象的視訊。進一步地，如果部署在線下的圖像採集設備可以進行視訊處理，即可以作爲視訊處理裝置，則步驟S11中的獲取視訊過程已完成；如果部署在線下的圖像採集設備無法進行視訊處理，則可以將部署在線下的圖像採集設備採集的視訊實時和/或非實時地傳輸到視訊處理裝置中。It has been mentioned in the above-mentioned embodiments of the invention that the video information can be the video information recorded by the target object in the process of watching the teaching course, and the scene in which the target object watches the teaching course can be flexibly determined according to the actual situation. Therefore, correspondingly, the video information is obtained in step S11. The method can also be flexibly changed according to different scenarios. In a possible implementation manner, in the case that the scene where the target object watches the teaching course is an online scene, that is, in the case that the target object can watch the teaching course through the online classroom, the method of acquiring the video information may include: if the video information processing device is connected to the target object The equipment for the online learning of the target object is the same device, and the video capture process of the target object watching the teaching course can be collected through the equipment of the target object for online learning; if the video processing device and the equipment for the online learning of the target object are different devices, the device can The device for online learning by the target object collects video information in the process of the target object watching the teaching course, and transmits it to the video processing device in real time and/or in non-real time. In a possible implementation manner, when the scene where the target object watches the teaching course is an offline scene, that is, when the target object participates in face-to-face teaching or watches the video of the teaching course in a specific teaching scene, the way to obtain the video can be as follows: Including: collecting the video of the target object by deploying offline image acquisition equipment (such as ordinary cameras, shooting devices deployed in response to security requirements, etc.). Further, if the image capture device deployed offline can perform video processing, that is, can be used as a video processing device, then the process of acquiring video in step S11 has been completed; if the offline image capture device can not perform video processing, then The video captured by the image capturing device deployed offline can be transmitted to the video processing device in real time and/or non-real time.

如上述各發明實施例所述，步驟S12中對目標對象進行學習行爲檢測的方式可以根據實際情況靈活決定。在一種可能的實現方式中，步驟S12可以包括：As described in the above embodiments of the invention, the manner of detecting the learning behavior of the target object in step S12 can be flexibly determined according to the actual situation. In a possible implementation manner, step S12 may include:

步驟S121，對視訊進行目標對象檢測，得到包含目標對象的視訊幀。Step S121, performing target object detection on the video to obtain a video frame including the target object.

步驟S122，對包含目標對象的視訊幀進行至少一類學習行爲檢測。Step S122, performing at least one type of learning behavior detection on the video frame containing the target object.

通過上述發明實施例可以看出，在一種可能的實現方式中，可以通過對視訊進行目標對象檢測，確定視訊中包含目標對象的視訊幀。在確定了哪些視訊幀包含有目標對象以後，可以對包含目標對象的視訊幀中的目標對象，進行至少一類學習行爲檢測。It can be seen from the above embodiments of the invention that, in a possible implementation manner, a video frame including a target object in the video may be determined by performing target object detection on the video. After determining which video frames contain the target object, at least one type of learning behavior detection may be performed on the target objects in the video frames containing the target object.

其中，目標對象檢測的方式可以根據實際情況靈活決定，不局限於下述實施例。在一種可能的實現方式中，可以通過人臉檢測或是人臉跟蹤等方式，對視訊中的目標對象進行檢測。在一種可能的實現方式中，在通過人臉檢測或是人臉跟蹤等方式對視訊幀進行檢測後，可能檢測到多個對象，在這種情況下，還可以進一步對檢測到的人臉圖像進行篩選，從中選定一個或多個對象作爲目標對象，具體的篩選方式可以根據實際情況靈活設定，在本發明實施例中不做限定。The method for detecting the target object can be flexibly determined according to the actual situation, and is not limited to the following embodiments. In a possible implementation manner, the target object in the video may be detected by means of face detection or face tracking. In a possible implementation manner, after the video frame is detected by means of face detection or face tracking, multiple objects may be detected. For example, screening is performed, and one or more objects are selected as target objects. The specific screening method can be flexibly set according to the actual situation, which is not limited in the embodiment of the present invention.

在一種可能的實現方式中，在得到了包含目標對象的視訊幀後，可以通過步驟S122，對包含目標對象的視訊幀進行至少一類學習行爲檢測。步驟S122的實現方式可以根據學習行爲的不同而靈活發生變化，詳見下述各發明實施例，在此先不做展開。在需要對目標對象的多類學習行爲進行檢測的情況下，可以同時採用多種方式進行組合來實現多類學習行爲檢測。In a possible implementation manner, after the video frame containing the target object is obtained, at least one type of learning behavior detection may be performed on the video frame containing the target object through step S122. The implementation manner of step S122 can be flexibly changed according to different learning behaviors. For details, please refer to the following embodiments of the invention, which will not be expanded here. In the case of needing to detect multiple types of learning behaviors of the target object, a combination of multiple methods can be used to realize the detection of multiple types of learning behaviors at the same time.

在一些可能的實現方式中，在對視訊進行目標對象檢測後，即可以完成對目標對象在觀看教學課程過程中的學習行爲檢測。即可以通過對視訊進行目標對象檢測，確定上述發明實施例中提到的在視訊中的至少部分視訊幀中未出現這一學習行爲。並進一步根據未檢測到目標對象的視訊幀，來得到學習狀態訊息，或是根據未檢測到目標對象的視訊幀來統計目標對象在視訊中的至少部分視訊幀中未出現的時間作爲學習狀態訊息。In some possible implementation manners, after the target object detection is performed on the video, the learning behavior detection of the target object in the process of watching the teaching course can be completed. That is, it can be determined that the learning behavior mentioned in the above embodiments of the invention does not occur in at least part of the video frames in the video by performing target object detection on the video. and further according to the video frames in which the target object is not detected, the learning status information is obtained, or the time when the target object does not appear in at least part of the video frames in the video is counted as the learning status information according to the video frames in which the target object is not detected. .

在本發明實施例中，通過對視訊進行目標對象檢測，得到包含目標對象的視訊幀，以及對包含目標對象的視訊幀進行至少一類學習行爲檢測，通過上述過程，可以利用對視訊進行的目標對象檢測，更加有針對性地對目標對象的至少一類學習行爲進行檢測，從而使得學習行爲檢測更爲準確，進一步提高後續得到的學習狀態訊息的準確性與可靠性。In the embodiment of the present invention, by performing target object detection on the video, a video frame containing the target object is obtained, and at least one type of learning behavior detection is performed on the video frame containing the target object. Through the above process, the target object in the video can be used. Detection, more targeted detection of at least one type of learning behavior of the target object, so that the learning behavior detection is more accurate, and the accuracy and reliability of the subsequent learning status information are further improved.

如上述各發明實施例所述，步驟S122的實現方式可以根據學習行爲的不同而靈活發生變化。在一種可能的實現方式中，學習行爲可以包括：執行至少一種目標手勢；As described in the above embodiments of the invention, the implementation of step S122 can be flexibly changed according to different learning behaviors. In a possible implementation manner, the learning behavior may include: performing at least one target gesture;

在這種情況下，對包含目標對象的視訊幀進行至少一類學習行爲檢測，可以包括：In this case, performing at least one class of learned behavior detection on video frames containing target objects may include:

對包含目標對象的視訊幀進行至少一種目標手勢的檢測；Detecting at least one target gesture on the video frame containing the target object;

在檢測到包含至少一種目標手勢的連續視訊幀的數量超過第一閾值的情況下，將包含目標手勢的視訊幀中的至少一幀記錄爲手勢開始幀；When it is detected that the number of consecutive video frames containing at least one target gesture exceeds the first threshold, recording at least one frame in the video frames containing the target gesture as a gesture start frame;

在手勢開始幀以後的視訊幀中，不包含目標手勢的連續視訊幀的數量超過第二閾值的情況下，將不包含目標手勢的視訊幀中的至少一幀記錄爲手勢結束幀；In the video frames after the gesture start frame, when the number of consecutive video frames not including the target gesture exceeds the second threshold, recording at least one frame in the video frames not including the target gesture as the gesture end frame;

根據手勢開始幀與手勢結束幀的數量，確定視訊中所述目標對象執行至少一種目標手勢的次數和/或時間。The number of times and/or the time that the target object in the video performs at least one target gesture is determined according to the number of gesture start frames and gesture end frames.

通過上述發明實施例可以看出，在學習行爲包括執行至少一種目標手勢的情況下，對目標對象的視訊幀進行的學習行爲檢測可以包括目標手勢檢測。It can be seen from the above embodiments of the invention that in the case that the learning behavior includes performing at least one target gesture, the learning behavior detection performed on the video frame of the target object may include target gesture detection.

其中，目標手勢具體包含哪些手勢，可以根據實際情況進行靈活設定，不局限於下述發明實施例。示例性的，目標手勢包括舉手手勢、點讚手勢、OK手勢以及勝利手勢中的一種或多種。The gestures specifically included in the target gesture can be flexibly set according to the actual situation, and are not limited to the following inventive embodiments. Exemplarily, the target gesture includes one or more of a raise hand gesture, a like gesture, an OK gesture, and a victory gesture.

在一種可能的實現方式中，目標手勢可以包括在觀看教學課程的過程中，目標對象根據聽課情況所反映的與學習相關的手勢，比如用於回答問題的舉手手勢、對授課內容或授課教師表示讚賞的點讚手勢（竪起大拇指等）、對授課內容表示明白或認同的OK手勢以及與授課教師之間進行互動的勝利手勢（比如Yeah手勢等）中等。In a possible implementation manner, the target gesture may include learning-related gestures reflected by the target object according to the listening situation during the course of watching the teaching course, such as a hand-raising gesture for answering a question, a gesture of raising a hand for answering a question, responding to the teaching content or the teacher Like gestures to express appreciation (thumbs up, etc.), OK gestures to express understanding or approval of the teaching content, and victory gestures (such as Yeah gestures, etc.) to interact with the teacher.

具體地，對包含目標對象的視訊幀進行至少一種目標手勢的檢測的方式可以根據實際情況靈活決定，不局限於下述發明實施例。在一種可能的實現方式中，可以通過手勢識別的相關算法來實現目標手勢的檢測，比如可以識別視訊幀中目標對象的手部關鍵點或者手部檢測框對應的圖像區域，基於手部關鍵點或手部檢測框對應的圖像區域進行手勢檢測，基於手勢檢測結果確定目標對象是否在執行目標手勢。在一種可能的實現方式中，可以通過具有手勢檢測功能的神經網路，實現目標手勢的檢測。具有手勢檢測功能的神經網路的具體結構與實現方式可以根據實際情況進行靈活設定，在目標手勢包含多種手勢的情況下，在一種可能的實現方式中，可以將包含目標對象的視訊幀輸入到可以同時檢測到多個手勢的神經網路，來實現目標手勢的檢測；在一種可能的實現方式中，也可以將包含目標對象的視訊幀分別輸入到多個具有單一手勢檢測功能的神經網路中，來實現多個目標手勢的檢測。Specifically, the manner of detecting at least one target gesture on the video frame including the target object can be flexibly determined according to the actual situation, and is not limited to the following inventive embodiments. In a possible implementation manner, the detection of the target gesture can be realized by the relevant algorithm of gesture recognition, for example, the key points of the hand of the target object in the video frame or the image area corresponding to the hand detection frame can be recognized, and the key points of the hand Gesture detection is performed on the image area corresponding to the point or the hand detection frame, and based on the gesture detection result, it is determined whether the target object is performing the target gesture. In a possible implementation manner, the detection of the target gesture can be achieved through a neural network with a gesture detection function. The specific structure and implementation method of the neural network with gesture detection function can be flexibly set according to the actual situation. A neural network that can detect multiple gestures at the same time to detect target gestures; in a possible implementation, video frames containing target objects can also be input to multiple neural networks with a single gesture detection function. , to realize the detection of multiple target gestures.

在通過上述任意發明實施例進行目標手勢檢測的過程中，如果檢測到包含至少一種目標手勢的連續視訊幀的數量超過第一閾值的情況下，可以從這些包含目標手勢的連續視訊幀中，選定至少一幀作爲手勢開始幀。其中，第一閾值的數量可以根據實際情況靈活設定，不同的目標手勢對應的第一閾值的數量可以相同，也可以不同，比如可以將舉手手勢對應的第一閾值設定爲6，點讚手勢對應的第一閾值設定爲7，則在檢測到包含舉手手勢的連續視訊幀的數量不小於6的情況下，可以從包含舉手手勢的視訊幀中選定至少一幀作爲舉手手勢的手勢開始幀，在檢測到點讚手勢的連續視訊幀的數量不小於7的情況下，可以從包含點讚手勢的視訊幀中選定至少一幀作爲點讚手勢的手勢開始幀。在一種可能的實現方式中，爲了便於目標手勢的檢測，可以將不同目標手勢對應的第一閾值設定爲同一數值，在一個示例中，第一閾值的數量可以設置爲6。In the process of detecting the target gesture by any of the above-mentioned embodiments of the invention, if the number of consecutive video frames containing at least one target gesture is detected to exceed the first threshold, the target gesture may be selected from the continuous video frames containing the target gesture. At least one frame is the gesture start frame. The number of the first thresholds can be flexibly set according to the actual situation, and the number of the first thresholds corresponding to different target gestures can be the same or different. The corresponding first threshold is set to 7, then in the case where the number of consecutive video frames containing the hand-raising gesture is detected not less than 6, at least one frame can be selected from the video frames containing the hand-raising gesture as the gesture of the hand-raising gesture. The start frame, when the number of consecutive video frames in which the like gesture is detected is not less than 7, at least one frame may be selected from the video frames containing the like gesture as the gesture start frame of the like gesture. In a possible implementation manner, in order to facilitate the detection of target gestures, the first thresholds corresponding to different target gestures may be set to the same value, and in an example, the number of first thresholds may be set to 6.

手勢開始幀的選定方式同樣可以根據實際情況靈活設定，在一種可能的實現方式中，可以將檢測到的包含目標手勢的連續視訊幀中的第一幀，作爲該目標手勢的手勢開始幀，在一種可能的實現方式中，爲了減少手勢檢測的誤差，也可以將檢測到的包含目標手勢的連續視訊幀中的第一幀以後的某一幀，作爲該目標手勢的手勢開始幀。The selection method of the gesture start frame can also be flexibly set according to the actual situation. In a possible implementation, the first frame of the detected continuous video frames containing the target gesture can be used as the gesture start frame of the target gesture. In a possible implementation manner, in order to reduce the error of gesture detection, a frame after the first frame in the detected continuous video frames containing the target gesture may also be used as the gesture start frame of the target gesture.

在確定了手勢開始幀以後，可以從手勢開始幀以後的視訊幀中確定手勢結束幀，即確定手勢開始幀中的目標手勢的結束時間。具體的確定方式可以根據實際情況靈活選擇，不局限於下述發明實施例。在一種可能的實現方式中，可以在檢測到手勢開始幀以後的視訊幀中，檢測到不包含手勢開始幀中的目標手勢的連續視訊幀數量超過第二閾值的情況下，將不包含目標手勢的連續視訊幀中的至少一幀記錄爲手勢結束幀。其中，第二閾值的數值同樣可以根據實際情況靈活設定，不同目標手勢對應的第二閾值的數值可以相同也可以不同，具體的設定方式可以參考第一閾值，在此不再贅述。在一個示例中，不同目標手勢對應的第二閾值的數值可以相同，比如可以設置爲10，即在手勢開始幀以後，檢測到連續10幀不包含手勢開始幀中的目標手勢的情況下，可以認爲目標對象結束執行目標手勢。在這種情況下，可以從不包含目標手勢的連續視訊幀中選定至少一幀作爲手勢結束幀，選定的方式同樣可以參考手勢開始幀，在一個示例中，可以將不包含目標手勢的連續視訊幀中的最後一幀作爲手勢結束幀；在一個示例中，也可以將不包含目標手勢的連續視訊幀中的最後一幀以前的某一幀作爲手勢結束幀。在一種可能的實現方式中，如果在檢測到手勢開始幀以後，存在某一幀或某幾幀不包含目標對象的視訊幀，則也可以將不包含目標對象的某一或某些視訊幀作爲手勢結束幀。After the gesture start frame is determined, the gesture end frame may be determined from the video frame after the gesture start frame, that is, the end time of the target gesture in the gesture start frame is determined. The specific determination manner can be flexibly selected according to the actual situation, and is not limited to the following inventive embodiments. In a possible implementation manner, in the video frame after the gesture start frame is detected, if the number of consecutive video frames that do not include the target gesture in the gesture start frame exceeds the second threshold, the target gesture will not be included in the case. At least one of the consecutive video frames is recorded as the gesture end frame. The value of the second threshold can also be flexibly set according to the actual situation, and the values of the second threshold corresponding to different target gestures can be the same or different. The specific setting method can refer to the first threshold, which will not be repeated here. In an example, the value of the second threshold corresponding to different target gestures may be the same, for example, it may be set to 10, that is, after the gesture start frame, if it is detected that 10 consecutive frames do not include the target gesture in the gesture start frame, you can The target object is considered to have finished performing the target gesture. In this case, at least one frame can be selected as the gesture end frame from the continuous video frames that do not contain the target gesture, and the selection method can also refer to the gesture start frame. In an example, the continuous video frame that does not contain the target gesture can be The last frame in the frames is used as the gesture end frame; in an example, a frame before the last frame in the continuous video frames that do not contain the target gesture can also be used as the gesture end frame. In a possible implementation, if after the gesture start frame is detected, there is a video frame or frames that do not contain the target object, then one or some video frames that do not contain the target object can also be used as Gesture end frame.

在確定了手勢開始幀與手勢結束幀以後，可以根據視訊幀中包含的手勢開始幀與手勢結束幀的數量，來確定目標對象執行某種或某些目標手勢的次數，進一步地，還可以確定執行某種或某些目標手勢的持續時間等。具體確定哪些與目標手勢相關的內容，可以根據步驟S13中學習狀態訊息的需求來靈活決定，詳見後續各發明實施例，在此先不做展開。After the gesture start frame and the gesture end frame are determined, the number of gesture start frames and gesture end frames included in the video frame can be used to determine the number of times the target object performs one or some target gestures, and further, it can be determined Duration of performing a certain or certain target gesture, etc. Which content related to the target gesture is specifically determined can be flexibly determined according to the requirement of learning the status information in step S13 . For details, please refer to the subsequent embodiments of the invention, which will not be expanded here.

通過對包含目標對象的視訊幀進行至少一種目標手勢的檢測，並根據檢測情況確定手勢開始幀與手勢結束幀，從而進一步確定視訊中目標對象執行至少一種目標手勢的次數和/或時間，通過上述過程，可以對視訊中目標對象根據學習狀態所反饋的手勢進行全面且準確的檢測，從而提高後續得到的學習狀態訊息的全面性與精度，繼而可以準確地掌握目標對象的學習狀態。By detecting at least one target gesture on the video frame containing the target object, and determining the gesture start frame and the gesture end frame according to the detection situation, the number of times and/or the time that the target object in the video performs at least one target gesture is further determined. The process can comprehensively and accurately detect the gestures fed back by the target object in the video according to the learning state, thereby improving the comprehensiveness and accuracy of the subsequent learning state information, and then accurately grasping the learning state of the target object.

在一種可能的實現方式中，學習行爲可以包括：表現目標情緒；In one possible implementation, the learning behavior may include: expressing the target emotion;

對包含目標對象的視訊幀進行表情檢測和/或微笑值檢測；Perform expression detection and/or smile detection on video frames containing target objects;

在檢測到視訊幀中目標對象展示至少一種第一目標表情或微笑值檢測的結果超過目標微笑值情況下，將檢測到的視訊幀作爲第一檢測幀；In the case where it is detected that the target object in the video frame shows at least one first target expression or the smile value detection result exceeds the target smile value, the detected video frame is used as the first detection frame;

在檢測到連續的第一檢測幀的數量超過第三閾值的情況下，確定目標對象産生目標情緒。In a case where it is detected that the number of consecutive first detection frames exceeds the third threshold, it is determined that the target object produces the target emotion.

其中，目標情緒可以爲根據實際需求設定的任意情緒，比如可以爲表明目標對象在專注學習的開心情緒，或是表明目標對象學習狀態不佳的厭煩情緒等。下述各發明實施例以目標情緒爲開心情緒爲例進行說明，目標情緒爲其他情緒的情況可以參考後續各發明實施例進行相應擴展。The target emotion may be any emotion set according to actual needs, for example, it may be a happy emotion indicating that the target object is focusing on learning, or an annoying emotion indicating that the target object is in a poor learning state. The following embodiments of the invention are described by taking the target emotion as a happy emotion as an example, and the case where the target emotion is other emotions may be expanded accordingly with reference to subsequent invention embodiments.

通過上述發明實施例可以看出，在學習行爲包括表現目標情緒的情況下，可以通過表情檢測和/或微笑值檢測，來實現目標對象的學習行爲檢測。在一種可能的實現方式中，可以僅通過表情檢測或微笑值檢測來實現表現目標情緒這一學習行爲的檢測，在一種可能的實現方式中，可以通過表情檢測與微笑值檢測，來共同確定目標對象是否表現目標情緒。後續各發明實施例均以通過表情檢測與微笑值檢測來共同確定目標對象是否表現目標情緒爲例進行說明，其餘實現方式可以參考後續各發明實施例進行相應擴展，在此不再贅述。It can be seen from the above embodiments of the invention that, in the case where the learning behavior includes expressing the target emotion, the detection of the learning behavior of the target object can be realized through expression detection and/or smile value detection. In a possible implementation, the detection of the learning behavior of expressing the target emotion can be realized only through expression detection or smile value detection. In a possible implementation, the target can be jointly determined through expression detection and smile value detection. Whether the subject exhibits the target emotion. Subsequent embodiments of the invention are described by jointly determining whether the target object expresses the target emotion through expression detection and smile value detection.

其中，表情檢測可以包括對目標對象展示的表情進行檢測，比如可以檢測目標對象展示何種表情，具體的表情劃分可以根據實際情況靈活設定，在一種可能的實現方式中，可以將表情劃分爲高興、平靜以及其他等；而微笑值檢測可以包括對目標對象的微笑强度進行檢測，比如可以檢測目標對象的微笑幅度有多大，微笑值檢測的結果可以通過數值來反饋，比如可以將微笑值檢測的結果設定爲在[0,100]之間，數值越高，表明目標對象的微笑强度或是幅度越高等。具體的表情檢測與微笑值檢測的方式可以根據實際情況靈活決定，任何能檢測到目標對象的表情或是微笑程度的方式，均可以作爲相應的檢測方式，不局限於下述各發明實施例。在一種可能的實現方式中，可以通過表情識別神經網路來實現目標對象的表情檢測，在一種可能的實現方式中，可以通過微笑值檢測神經網路，來實現目標對象的微笑值檢測。具體地表情識別神經網路與微笑值檢測神經網路的結構與實現方式在本發明實施例中不做限定，任何可以通過訓練實現表情識別功能的神經網路以及通過訓練實現微笑值檢測功能的神經網路均可以應用於本發明實施例。在一種可能的實現方式中，也可以通過對視訊中目標對象的人臉關鍵點以及嘴部關鍵點進行檢測，來分別實現表情檢測和微笑值檢測。The expression detection may include detecting the expression displayed by the target object, such as detecting which expression the target object displays. The specific expression division can be flexibly set according to the actual situation. In a possible implementation, the expression can be divided into happy , calmness, etc.; while the smile value detection can include detecting the smile intensity of the target object, such as how much the target object smiles, and the result of the smile value detection can be fed back through numerical values, for example, the smile value can be detected. The result is set to be between [0, 100], the higher the value, the higher the intensity or amplitude of the smile of the target object. The specific expression detection and smile value detection methods can be flexibly determined according to the actual situation. Any method that can detect the expression or smile degree of the target object can be used as a corresponding detection method, and is not limited to the following invention embodiments. In a possible implementation manner, the facial expression detection of the target object can be realized through an expression recognition neural network, and in a possible implementation manner, the smile value detection of the target object can be realized through a smile value detection neural network. Specifically, the structures and implementations of the facial expression recognition neural network and the smile value detection neural network are not limited in the embodiments of the present invention. Any neural network that can realize the expression recognition function through training and any neural network that can realize the smile value detection function through training Any neural network can be applied to the embodiments of the present invention. In a possible implementation manner, expression detection and smile value detection can also be realized by detecting the key points of the face and the key points of the mouth of the target object in the video.

具體在表情檢測與微笑值檢測達到何種檢測結果的情況下，確定目標對象産生目標情緒，其實現方式可以根據實際情況靈活設定。在一種可能的實現方式中，可以認爲檢測到視訊幀中目標對象展示至少一種第一目標表情，或是微笑值檢測的結果超過目標微笑值的情況下，認爲該視訊幀中的目標對象表現出目標情緒，在這種情況下，可以將該視訊幀作爲第一檢測幀。其中，第一目標表情的具體表情種類可以根據實際情況靈活設定，不局限於下述發明實施例。在一種可能的實現方式中，可以將高興作爲第一目標表情，即可以將檢測到的目標對象的表情爲高興的視訊幀均作爲第一檢測幀。在一種可能的實現方式中，可以將高興與平靜均作爲第一目標表情，即可以將檢測到的目標對象的表情爲高興或平靜的視訊幀，均作爲第一檢測幀。同理，目標微笑值的具體數值同樣可以根據實際情況進行靈活設定，在此不做具體限定。因此，在一種可能的實現方式中，還可以將微笑值的檢測結果超過目標微笑值的視訊幀，作爲第一檢測幀。Specifically, in the case of the detection result achieved by the expression detection and the smile value detection, it is determined that the target object generates the target emotion, and the realization method can be flexibly set according to the actual situation. In a possible implementation, it can be considered that the target object in the video frame is considered to be the target object in the video frame when it is detected that the target object shows at least one first target expression, or the result of the smile value detection exceeds the target smile value. The target emotion is expressed, in this case, the video frame can be used as the first detection frame. The specific expression type of the first target expression can be flexibly set according to the actual situation, and is not limited to the following inventive embodiments. In a possible implementation manner, happiness may be used as the first target expression, that is, all video frames in which the expression of the detected target object is happy may be used as the first detection frame. In a possible implementation manner, both happy and calm may be used as the first target expressions, that is, video frames in which the detected expressions of the target object are happy or calm may be used as the first detection frames. Similarly, the specific value of the target smile value can also be flexibly set according to the actual situation, which is not specifically limited here. Therefore, in a possible implementation manner, the video frame in which the detection result of the smile value exceeds the target smile value may also be used as the first detection frame.

在一種可能的實現方式中，可以在檢測到某一視訊幀爲第一檢測幀的情況下，確定目標對象産生目標情緒。在一種可能的實現方式中，爲了提高檢測的準確性，減小檢測誤差對學習行爲檢測結果的影響，可以在檢測到連續的第一檢測幀的數量超過第三閾值的情況下，確定目標對象産生目標情緒。其中，可以將連續視訊幀中每一幀均爲第一檢測幀的視訊幀序列，作爲連續的第一檢測幀。第三閾值的數量可以爲根據實際情況靈活設定的數量，其數值可以與第一閾值或第二閾值相同，也可以不同，在一個示例中，第三閾值的數量可以爲6，即檢測到連續6幀均爲第一檢測幀的情況下，可以認爲目標對象産生目標情緒。In a possible implementation manner, it may be determined that the target object generates the target emotion when a certain video frame is detected as the first detection frame. In a possible implementation manner, in order to improve the detection accuracy and reduce the influence of the detection error on the learning behavior detection result, the target object may be determined when the number of consecutive first detection frames exceeds the third threshold. Generate a target emotion. Wherein, a video frame sequence in which each frame in the continuous video frames is the first detection frame may be regarded as the continuous first detection frame. The number of the third thresholds can be flexibly set according to the actual situation, and its value can be the same as the first threshold or the second threshold, or it can be different. When the 6 frames are all the first detection frames, it can be considered that the target object produces the target emotion.

進一步地，在確定目標對象産生目標情緒以後，還可以從連續的第一檢測幀中選定一幀作爲目標情緒開始幀，然後在目標情緒開始幀以後，連續10幀未檢測到目標對象的表情爲第一目標表情，或是連續10幀中目標對象的微笑值檢測結果不超過第三閾值，或是某幀或某幾幀檢測不到目標對象的情況下，可以進一步確定目標情緒結束幀，然後根據目標情緒開始幀或是目標情緒結束幀來確定目標對象産生目標情緒的次數和/或時間等，具體的過程可以參考目標手勢的相應過程，在此不再贅述。Further, after it is determined that the target object generates the target emotion, one frame can also be selected from the consecutive first detection frames as the target emotion start frame, and then after the target emotion start frame, the expression of the target object is not detected for 10 consecutive frames. The first target expression, or the detection result of the smile value of the target object in 10 consecutive frames does not exceed the third threshold, or when the target object cannot be detected in a certain frame or several frames, the end frame of the target emotion can be further determined, and then The number of times and/or the time that the target object generates the target emotion is determined according to the target emotion start frame or the target emotion end frame, and the specific process can refer to the corresponding process of the target gesture, which will not be repeated here.

通過對包含目標對象的視訊幀進行表情檢測和/或微笑值檢測，並根據表情檢測以及微笑值檢測的結果，來確定第一檢測幀，從而在檢測到連續的第一檢測幀的數量超過第三閾值的情況下，確定目標對象産生目標情緒，通過上述過程，可以基於目標對象的表情以及微笑程度來靈活確定目標對象在學習過程中的情緒，從而可以更加全面和準確地感知目標對象在學習過程中的情緒狀態，生成更爲準確的學習狀態訊息。By performing expression detection and/or smile value detection on the video frame containing the target object, and determining the first detection frame according to the results of the expression detection and smile value detection, the number of consecutive first detection frames detected exceeds the number of the first detection frame. In the case of three thresholds, it is determined that the target object produces the target emotion. Through the above process, the emotion of the target object in the learning process can be flexibly determined based on the target object's expression and degree of smile, so that the target object's learning process can be more comprehensively and accurately perceived. Emotional state during the process to generate more accurate learning state information.

在一種可能的實現方式中，學習行爲可以包括：關注教學課程的展示區域；In a possible implementation manner, the learning behavior may include: paying attention to the display area of the teaching course;

對包含目標對象的視訊幀進行表情檢測和人臉角度檢測；Perform expression detection and face angle detection on video frames containing target objects;

在檢測到視訊幀中目標對象展示至少一種第二目標表情且人臉角度在目標人臉角度範圍以內的情況下，將檢測到的視訊幀作爲第二檢測幀；When it is detected that the target object in the video frame shows at least one second target expression and the face angle is within the range of the target face angle, the detected video frame is used as the second detection frame;

在檢測到連續的第二檢測幀的數量超過第四閾值的情況下，確定目標對象關注教學課程的展示區域。When it is detected that the number of consecutive second detection frames exceeds the fourth threshold, it is determined that the target object pays attention to the display area of the teaching course.

其中，教學課程的展示區域的實現形式可以參考上述各發明實施例，在此不再贅述。The implementation form of the display area of the teaching course may refer to the above-mentioned embodiments of the invention, which will not be repeated here.

通過上述發明實施例可以看出，在學習行爲包括關注教學課程的展示區域的情況下，可以通過表情檢測和人臉角度檢測，來實現目標對象的學習行爲檢測。在一種可能的實現方式中，也可以僅通過人臉角度檢測來實現關注教學課程的展示區域這一學習行爲的檢測。後續各發明實施例均以通過表情檢測與人臉角度檢測來確定目標對象是否關注教學課程的展示區域爲例進行說明，其餘實現方式可以參考後續各發明實施例進行相應擴展，在此不再贅述。It can be seen from the above embodiments of the invention that in the case where the learning behavior includes paying attention to the display area of the teaching course, the learning behavior detection of the target object can be realized through expression detection and face angle detection. In a possible implementation manner, the detection of the learning behavior of paying attention to the display area of the teaching course can also be realized only by detecting the face angle. Subsequent embodiments of the invention are described by using expression detection and face angle detection to determine whether the target object pays attention to the display area of the teaching course as an example, and other implementation methods can be expanded correspondingly with reference to the subsequent invention embodiments, which will not be repeated here. .

其中，表情檢測的實現方式可以參考上述各發明實施例，在此不再贅述；人臉角度檢測可以是對人臉的朝向角度等進行檢測。具體的人臉角度檢測方式可以根據實際情況靈活決定，任何能檢測到目標對象的人臉角度的方式，均可以作爲人臉角度檢測的檢測方式，不局限於下述各發明實施例。在一種可能的實現方式中，可以通過人臉角度檢測神經網路，來實現目標對象的人臉角度檢測。具體地人臉角度檢測神經網路的結構與實現方式在本發明實施例中不做限定，任何可以通過訓練實現人臉角度檢測功能的神經網路均可以應用於本發明實施例。在一種可能的實現方式中，也可以通過對視訊中目標對象的人臉關鍵點進行檢測，來確定目標對象的人臉角度。人臉角度檢測可以檢測出的人臉的角度的形式也可以根據實際情況靈活決定，在一種可能的實現方式中，可以通過檢測出目標對象的人臉的偏航角與俯仰角，來確定目標對象的人臉角度。The implementation of expression detection may refer to the above embodiments of the invention, which will not be repeated here; the face angle detection may be to detect the orientation angle of the face and the like. The specific face angle detection method can be flexibly determined according to the actual situation. Any method that can detect the face angle of the target object can be used as the detection method for face angle detection, and is not limited to the following invention embodiments. In a possible implementation manner, the face angle detection of the target object can be realized through a face angle detection neural network. Specifically, the structure and implementation manner of the face angle detection neural network are not limited in the embodiments of the present invention, and any neural network that can realize the function of face angle detection through training can be applied to the embodiments of the present invention. In a possible implementation manner, the face angle of the target object may also be determined by detecting key points of the face of the target object in the video. The form of the angle of the face that can be detected by face angle detection can also be flexibly determined according to the actual situation. In a possible implementation, the target can be determined by detecting the yaw angle and pitch angle of the face of the target object. The face angle of the object.

具體在表情檢測與人臉角度檢測達到何種檢測結果的情況下，確定目標對象關注教學課程的展示區域，其實現方式可以根據實際情況靈活設定。在一種可能的實現方式中，可以認爲檢測到視訊幀中目標對象展示至少一種第二目標表情，且檢測到的人臉角度在目標人臉角度範圍以內的情況下，認爲該視訊幀中的目標對象關注了教學課程的展示區域，在這種情況下，可以將該視訊幀作爲第二檢測幀。其中，第二目標表情的具體表情種類可以根據實際情況靈活設定，可以與上述發明實施例中提到的第一目標表情相同，也可以與上述發明實施例中提到的第一目標表情不同，不局限於下述發明實施例。在一種可能的實現方式中，可以將平靜作爲第二目標表情，即可以將檢測到的目標對象的表情爲平靜且人臉角度在目標人臉角度範圍以內的視訊幀均作爲第二檢測幀。在一種可能的實現方式中，可以將其他以外的表情均作爲第二目標表情，即可以將檢測到的目標對象的人臉角度在目標人臉角度範圍以內，且表情不是“其他”的視訊幀，均作爲第二檢測幀。同理，目標人臉角度範圍的具體範圍數值同樣可以根據實際情況進行靈活設定，在此不做具體限定。在一種可能的實現方式中，該目標人臉角度範圍可以是靜態的，在一個示例中，可以將教師授課中可能移動到的總體位置（比如線下場景中教師所處的講臺區域等）作爲目標人臉角度範圍；在一個示例中，可以將目標對象觀看教學課程過程中可能關注到的固定區域（比如線上場景中目標對象所關注的顯示螢幕等）作爲目標人臉角度範圍。在一種可能的實現方式中，該目標人臉角度範圍也可以是動態的，在一個示例中，可以根據教師授課中移動的當前位置來靈活確定目標人臉角度範圍，即可以隨著教師的移動，來動態更改目標人臉角度範圍的數值。Specifically, in the case of the detection result of expression detection and face angle detection, it is determined that the target object pays attention to the display area of the teaching course, and the implementation method can be flexibly set according to the actual situation. In a possible implementation, it may be considered that in the case where the target object in the video frame is detected to display at least one second target expression, and the detected face angle is within the range of the target face angle, the video frame is considered to be in the The target object of , pays attention to the display area of the teaching course, in this case, the video frame can be used as the second detection frame. The specific expression type of the second target expression can be flexibly set according to the actual situation, which may be the same as the first target expression mentioned in the above-mentioned embodiment of the invention, or may be different from the first target expression mentioned in the above-mentioned embodiment of the invention. It is not limited to the following inventive examples. In a possible implementation manner, calmness may be used as the second target expression, that is, the video frames whose expression of the detected target object is calm and the face angle is within the target face angle range may be used as the second detection frame. In a possible implementation, all expressions other than other expressions can be used as the second target expressions, that is, the detected face angle of the target object can be within the range of the target face angle, and the expressions are not “other” video frames , both as the second detection frame. Similarly, the specific range value of the target face angle range can also be flexibly set according to the actual situation, which is not specifically limited here. In a possible implementation manner, the target face angle range may be static. In an example, the overall position that the teacher may move to during teaching (such as the podium area where the teacher is in the offline scene, etc.) may be used as the The target face angle range; in one example, the fixed area that the target object may pay attention to during the course of watching the teaching course (such as the display screen that the target object pays attention to in the online scene, etc.) can be used as the target face angle range. In a possible implementation manner, the target face angle range may also be dynamic. In an example, the target face angle range may be flexibly determined according to the current position of the teacher's movement during the teaching, that is, the target face angle range may be flexibly determined as the teacher moves , to dynamically change the value of the target face angle range.

在一種可能的實現方式中，可以在檢測到某一視訊幀爲第二檢測幀的情況下，確定目標對象關注教學課程的展示區域。在一種可能的實現方式中，爲了提高檢測的準確性，減小檢測誤差對學習行爲檢測結果的影響，可以在檢測到連續的第二檢測幀的數量超過第四閾值的情況下，確定目標對象關注教學課程的展示區域。其中，可以將連續視訊幀中每一幀均爲第二檢測幀的視訊幀序列，作爲連續的第二檢測幀。第四閾值的數量可以爲根據實際情況靈活設定的數量，其數值可以與第一閾值、第二閾值或第三閾值相同，也可以不同，在一個示例中，第四閾值的數量可以爲6，即檢測到連續6幀均爲第二檢測幀的情況下，可以認爲目標對象關注教學課程的展示區域。In a possible implementation manner, it may be determined that the target object pays attention to the display area of the teaching course when a certain video frame is detected as the second detection frame. In a possible implementation manner, in order to improve the detection accuracy and reduce the influence of the detection error on the learning behavior detection result, the target object may be determined when the number of consecutive second detection frames detected exceeds the fourth threshold. Pay attention to the display area of the teaching course. Wherein, a video frame sequence in which each frame of the continuous video frames is a second detection frame may be regarded as a continuous second detection frame. The number of the fourth threshold may be a number flexibly set according to the actual situation, and its value may be the same as or different from the first threshold, the second threshold or the third threshold. In an example, the number of the fourth threshold may be 6, That is, when it is detected that 6 consecutive frames are all second detection frames, it can be considered that the target object pays attention to the display area of the teaching course.

進一步地，在確定目標對象關注教學課程的展示區域以後，還可以從連續的第二檢測幀中選定一幀作爲關注開始幀，然後在關注開始幀以後，連續10幀未檢測到目標對象的表情爲第二目標表情，或是連續10幀中目標對象的人臉角度不在目標人臉角度範圍以內，或是某幀或某幾幀檢測不到目標對象的情況下，可以進一步確定關注結束幀，然後根據關注開始幀或是關注結束幀來確定目標對象關注教學課程展示區域的次數和/或時間等，具體的過程可以參考目標手勢以及目標情緒的相應過程，在此不再贅述。Further, after it is determined that the target object pays attention to the display area of the teaching course, one frame can be selected from the second consecutive detection frames as the attention start frame, and after the attention start frame, the expression of the target object is not detected for 10 consecutive frames. It is the second target expression, or the face angle of the target object in 10 consecutive frames is not within the range of the target face angle, or when the target object cannot be detected in a certain frame or several frames, the attention end frame can be further determined. Then, the number of times and/or the time that the target object pays attention to the teaching course display area is determined according to the attention start frame or the attention end frame. The specific process can refer to the corresponding process of the target gesture and target emotion, which will not be repeated here.

通過對包含目標對象的視訊幀進行表情檢測和人臉角度檢測，並根據表情檢測以及人臉角度檢測的結果，來確定第二檢測幀，從而在檢測到連續的第二檢測幀的數量超過第四閾值的情況下，確定目標對象關注教學課程的展示區域，通過上述過程，可以基於目標對象的表情以及人臉角度來靈活確定目標對象是否關注教學課程的展示區域，從而可以更加全面和準確地感知目標對象在學習過程中的精力集中情況，生成更爲準確的學習狀態訊息。By performing expression detection and face angle detection on the video frame containing the target object, and determining the second detection frame according to the results of the expression detection and the face angle detection, so that the number of consecutive second detection frames detected exceeds the number of the first detection frame. In the case of four thresholds, it is determined that the target object pays attention to the display area of the teaching course. Through the above process, it can be flexibly determined whether the target object pays attention to the display area of the teaching course based on the expression and face angle of the target object, so as to be more comprehensive and accurate. Perceive the concentration of the target object in the learning process, and generate more accurate learning status information.

在一種可能的實現方式中，學習行爲還可以包括：與其他對象産生至少一種互動行爲。互動行爲的實現方式可以參考上述各發明實施例，在此不再贅述。在這種情況下，對包含目標對象的視訊幀進行互動行爲檢測的方式可以根據實際情況靈活決定，在一種可能的實現方式中，如果互動行爲爲線上的互動行爲，比如收到老師通過線上課堂發送的小紅花，或是根據老師在線上課堂的點名進行發言的情況下，則對互動行爲的檢測方式可以爲直接根據其他對象傳遞的訊號，確定目標對象是否産生互動行爲。在一種可能的實現方式中，如果互動行爲爲線下的互動行爲，比如目標對象在教室中受到老師的點名而發言的情況下，檢測目標對象是否發生互動行爲的方式可以包括：通過對目標對象的目標動作進行識別，來確定目標對象是否發生互動行爲，其中，目標動作可以根據互動行爲的實際情況靈活設定，比如目標動作可以包括有起立後發言、或是人臉朝向其他對象且發言時間超過一定時間數值等。In a possible implementation manner, the learning behavior may further include: generating at least one interactive behavior with other objects. For the implementation manner of the interactive behavior, reference may be made to the foregoing embodiments of the invention, and details are not described herein again. In this case, the method of detecting the interactive behavior of the video frame containing the target object can be flexibly determined according to the actual situation. In the case of sending a small red flower, or speaking according to the teacher's online class roll call, the detection method of the interactive behavior can be directly based on the signals transmitted by other objects to determine whether the target object produces interactive behavior. In a possible implementation, if the interactive behavior is offline, for example, when the target object is called by the teacher to speak in the classroom, the method of detecting whether the target object has interactive behavior may include: The target action can be identified to determine whether the target object has interactive behavior. The target action can be flexibly set according to the actual situation of the interactive behavior. For example, the target action can include standing up and speaking, or the face is facing other objects and the speaking time exceeds a certain time value, etc.

在一種可能的實現方式中，學習行爲還可以包括在視訊中的至少部分視訊幀中未出現，在這種情況下，步驟S12可以包括：In a possible implementation manner, the learning behavior may also include not appearing in at least part of the video frames in the video, in this case, step S12 may include:

對視訊進行目標對象檢測，得到包含目標對象的視訊幀，並將視訊中包含目標對象的視訊幀以外的視訊幀，作爲未檢測到目標對象的視訊幀；Performing target object detection on the video, obtaining a video frame containing the target object, and using the video frame other than the video frame containing the target object in the video as the video frame in which the target object is not detected;

在未檢測到目標對象的視訊幀的數量超過預設視訊幀數量的情況下，檢測到學習行爲包括：在視訊中的至少部分視訊幀中未出現。When the number of video frames in which the target object is not detected exceeds the preset number of video frames, detecting the learning behavior includes: not appearing in at least part of the video frames in the video.

其中，對視訊進行目標對象檢測的方式詳見上述各發明實施例，在此不再贅述。在一種可能的實現方式中，視訊中的各視訊幀除了包含目標對象的視訊幀以外，還可能存在不包含目標對象的視訊幀，因此可以將這些不包含目標對象的視訊幀作爲未檢測到目標對象的視訊幀，並在未檢測到目標對象的視訊幀的數量超過預設視訊幀數量的情況下，確認檢測到“在視訊中的至少部分視訊幀中未出現”這一學習行爲。預設視訊幀數量可以根據實際情況靈活設定，在一種可能的實現方式中，可以將預設視訊幀數量設爲0，即在視訊中包含未檢測到目標對象的視訊幀的情況下，即認爲檢測到在視訊中的至少部分視訊幀中未出現這一學習行爲，在一種可能的實現方式中，預設視訊幀數量也可以爲大於0的數量，具體如何設定可以根據實際情況靈活決定。The method of performing target object detection on video information is detailed in the foregoing embodiments of the invention, and details are not described herein again. In a possible implementation manner, in addition to the video frames containing the target object, each video frame in the video may also have video frames that do not contain the target object, so these video frames that do not contain the target object may be regarded as the undetected target. The object's video frames, and if the number of video frames of the target object is not detected exceeds the preset number of video frames, it is confirmed that the learning behavior of "not appearing in at least some of the video frames in the video" is detected. The preset number of video frames can be flexibly set according to the actual situation. In a possible implementation, the preset number of video frames can be set to 0, that is, when the video contains video frames for which the target object is not detected, it is recognized that the target object is not detected. In order to detect that this learning behavior does not occur in at least some of the video frames in the video, in a possible implementation manner, the preset number of video frames can also be a number greater than 0, and the specific setting can be flexibly determined according to the actual situation.

在一種可能的實現方式中，學習行爲還可以包括閉眼，在這種情況下的學習行爲檢測方式可以爲閉眼檢測，閉眼檢測的具體過程可以根據實際情況靈活設定，在一個示例中，可以通過具有閉眼檢測功能的神經網路來實現，在一個示例中，也可以通過對眼睛及眼球內的關鍵點檢測來確定目標對象是否閉眼等，比如，在檢測到眼球內的關鍵點的情況下，確定目標對象睜眼；在僅檢測到眼睛關鍵點，未檢測到眼球內的關鍵點情況下，確定目標對象閉眼。在一種可能的實現方式中，學習行爲還可以包括在教學課程的展示區域內的目光交流，在這種情況下的學習行爲檢測方式可以參考上述發明實施例中的關注教學課程的展示區域的過程，具體的檢測方式可以靈活發生變化，比如可以對目標對象同時進行閉眼與人臉角度檢測，將人臉角度在目標人臉角度範圍內且無閉眼的視訊幀作爲第三檢測幀，然後在第三檢測幀的數量超過某一設定閾值的情況下，認定目標對象在教學課程的展示區域內進行目光交流等。In a possible implementation manner, the learning behavior may also include closing eyes. In this case, the learning behavior detection method may be eye closing detection. The specific process of eye closing detection may be flexibly set according to the actual situation. It is realized by the neural network of the closed eye detection function. In an example, it is also possible to determine whether the target object has closed eyes by detecting key points in the eye and the eyeball. For example, when the key point in the eyeball is detected, determine The eyes of the target object are open; when only the key points of the eyes are detected, and the key points in the eyeball are not detected, the eyes of the target object are determined to be closed. In a possible implementation manner, the learning behavior may also include eye contact in the display area of the teaching course. In this case, the learning behavior detection method may refer to the process of paying attention to the display area of the teaching course in the above-mentioned embodiment of the invention , the specific detection method can be changed flexibly. For example, the target object can be detected with eyes closed and face angle at the same time, and the video frame with the face angle within the target face angle range and no closed eyes can be used as the third detection frame, and then in the first detection frame. When the number of three detection frames exceeds a certain set threshold, it is determined that the target object is making eye contact in the display area of the teaching course.

在通過上述發明實施例的各種實現方式的任意組合，以實現對目標對象的至少一類學習行爲的檢測以後，可以在檢測到目標對象執行至少一類學習行爲的情況下，通過步驟S13生成學習狀態訊息。步驟S13的具體實現方式不受限定，可以根據檢測到的學習行爲的實際情況所靈活變化，不局限於下述各發明實施例。After the detection of at least one type of learning behavior of the target object is achieved through any combination of the various implementation manners of the above embodiments of the invention, the learning status information may be generated by step S13 when it is detected that the target object performs at least one type of learning behavior . The specific implementation manner of step S13 is not limited, and can be flexibly changed according to the actual situation of the detected learning behavior, and is not limited to the following inventive embodiments.

通過上述發明實施例中步驟S13的實際內容可以看出，步驟S13在生成學習狀態訊息的過程中，可能存在如下幾種生成方式，比如可以根據包含至少一類學習行爲的視訊幀來生成學習狀態訊息；或是根據目標對象執行至少一類學習行爲的持續時間來生成學習狀態訊息；或是對上述兩種情況進行組合，既根據包含至少一類學習行爲的視訊幀來生成一部分學習狀態訊息，又根據目標對象執行至少一類學習行爲的持續時間來生成另外一類學習狀態訊息。在既可以根據學習行爲的視訊幀來生成學習狀態訊息，又可以根據目標對象執行至少一類學習行爲的持續時間來生成學習狀態訊息的情況下，具體根據哪類學習行爲對應生成哪種學習狀態訊息，其映射方式可以根據實際情況靈活設定。在一種可能的實現方式中，可以將一些積極的學習行爲與根據包含學習行爲的視訊幀來生成學習狀態訊息這一過程相對應，比如在目標對象執行至少一種目標手勢、展現積極的目標情緒、關注教學課程的展示區域以及與其他對象産生至少一種互動行爲等情況下，可以根據包含上述學習行爲的視訊幀，來生成學習狀態訊息；在一種可能的實現方式中，也可以將一些消極的學習行爲，比如目標對象在視訊中的至少部分視訊幀中未出現、閉眼或是在教學課程的展示區域內未進行目光交流等情況下，可以根據上述學習行爲的持續時間，來生成學習狀態訊息。It can be seen from the actual content of step S13 in the above embodiment of the invention that in the process of generating the learning status information in step S13, there may be the following generation methods. For example, the learning status information can be generated according to video frames containing at least one type of learning behavior. ; or generate learning status information according to the duration of at least one type of learning behavior performed by the target object; or combine the above two situations, generate a part of learning status information according to the video frame containing at least one type of learning behavior, and according to the target object. The duration for which the subject performs at least one type of learning behavior to generate another type of learning status information. In the case where the learning status information can be generated according to the video frame of the learning behavior, and the learning status information can be generated according to the duration of the target object performing at least one type of learning behavior, which kind of learning status information is generated according to which type of learning behavior. , the mapping method can be flexibly set according to the actual situation. In a possible implementation, some positive learning behaviors may correspond to the process of generating learning status information according to video frames containing the learning behaviors, such as performing at least one target gesture on the target object, showing a positive target emotion, In the case of paying attention to the display area of the teaching course and generating at least one interactive behavior with other objects, the learning status message can be generated according to the video frame containing the above learning behavior; in a possible implementation, some negative learning behavior, such as the target object does not appear in at least part of the video frame in the video, closes eyes, or does not make eye contact in the display area of the teaching course, etc., the learning status message can be generated according to the duration of the above learning behavior.

在一種可能的實現方式中，根據至少部分包含至少一類學習行爲的視訊幀，生成學習狀態訊息，可以包括：In a possible implementation manner, the learning status information is generated according to at least part of the video frames containing at least one type of learning behavior, which may include:

步驟S1311，獲取視訊中包含至少一類學習行爲的視訊幀，作爲目標視訊幀集合；Step S1311, acquiring video frames containing at least one type of learning behavior in the video, as a set of target video frames;

步驟S1312，對目標視訊幀集合中的至少一個視訊幀進行人臉質量檢測，將人臉質量大於人臉質量閾值的視訊幀作爲目標視訊幀；Step S1312, performing face quality detection on at least one video frame in the target video frame set, and taking a video frame with a face quality greater than a face quality threshold as the target video frame;

步驟S1313，根據目標視訊幀，生成學習狀態訊息。Step S1313, generating a learning status message according to the target video frame.

其中，包含至少一類學習行爲的視訊幀，可以是在學習行爲檢測的過程中，檢測到目標對象執行其中至少一類行爲的視訊幀，比如上述發明實施例中提到的第一檢測幀、第二檢測幀以及第三檢測幀等，或是在手勢開始幀與手勢結束幀之間的包含目標手勢的視訊幀等。The video frame containing at least one type of learning behavior may be a video frame in which the target object is detected to perform at least one type of behavior in the process of learning behavior detection, such as the first detection frame, the second detection frame mentioned in the above embodiment of the invention The detection frame and the third detection frame, etc., or the video frame containing the target gesture between the gesture start frame and the gesture end frame, etc.

在確定了包含至少一類學習行爲的視訊幀以後，如何得到目標視訊幀集合，其實現方式可以靈活決定。在一種可能的實現方式中，可以按照學習行爲的類別，分別獲取包含每類學習行爲的每個視訊幀，從而組成每類學習行爲的目標視訊幀集合；在一種可能的實現方式中，也可以按照學習行爲的類別，分別獲取包含每類學習行爲的部分幀等，然後基於每類學習行爲的部分幀來得到該類學習行爲的目標視訊幀集合，具體選擇哪些部分幀，其選擇方式可以靈活決定。After determining the video frames containing at least one type of learning behavior, how to obtain the target video frame set can be flexibly determined. In a possible implementation manner, each video frame containing each type of learning behavior can be obtained separately according to the type of learning behavior, so as to form a target video frame set for each type of learning behavior; in a possible implementation manner, it is also possible to According to the type of learning behavior, the partial frames containing each type of learning behavior are obtained, and then based on the partial frames of each type of learning behavior, the target video frame set of this type of learning behavior is obtained, and the selection method of which partial frames are selected can be flexible. Decide.

在得到了與學習行爲對應的目標視訊幀集合以後，可以通過步驟S1312，來從目標視訊幀集合中選擇得到目標視訊幀。通過步驟S1312可以看出，在一種可能的實現方式中，可以對目標視訊幀集合中的視訊幀進行人臉質量檢測，然後將人臉質量大於人臉質量閾值的視訊幀作爲目標視訊幀。After the target video frame set corresponding to the learning behavior is obtained, step S1312 may be used to select and obtain the target video frame from the target video frame set. It can be seen from step S1312 that, in a possible implementation manner, the video frames in the target video frame set can be subjected to face quality detection, and then the video frames whose face quality is greater than the face quality threshold are used as target video frames.

其中，人臉質量的檢測方式可以根據實際情況靈活設定，不局限於下述發明實施例，在一種可能的實現方式中，可以通過對視訊幀中的人臉進行人臉識別，從而確定視訊幀中人臉的完整度來確定人臉質量；在一種可能的實現方式中，也可以基於視訊幀中人臉的清晰度來確定人臉質量；在一種可能的實現方式中，也可以基於視訊幀人臉的完整度、清晰度以及亮度等多個參數來綜合評判視訊幀中的人臉質量；在一種可能的實現方式中，還可以通過將視訊幀輸入到人臉質量神經網路，來得到視訊幀中的人臉質量，人臉質量神經網路可以通過大量包含人臉質量打分標注的人臉圖片訓練得到，其具體實現形式可以根據實際情況靈活選擇，在本發明實施例中不做限制。The detection method of the face quality can be flexibly set according to the actual situation, and is not limited to the following invention embodiments. In a possible implementation, the video frame can be determined by performing face recognition on the face in the video frame. The integrity of the face in the video frame is used to determine the face quality; in a possible implementation, the face quality can also be determined based on the clarity of the face in the video frame; in a possible implementation, it can also be based on the video frame. The integrity, clarity, brightness and other parameters of the face are used to comprehensively judge the quality of the face in the video frame; in a possible implementation, it can also be obtained by inputting the video frame into the face quality neural network. The face quality in the video frame, the face quality neural network can be obtained by training a large number of face pictures containing the face quality score and annotation, and its specific implementation form can be flexibly selected according to the actual situation, which is not limited in the embodiment of the present invention. .

人臉質量閾值的具體數值可以根據實際情況靈活決定，本發明實施例對此不做限制。在一種可能的實現方式中，可以分別爲每類學習行爲設置不同的人臉質量閾值；在一種可能的實現方式中，也可以分別爲每類學習行爲設置相同的人臉閾值。在一種可能的實現方式中，還可以將人臉質量閾值設置爲目標視訊幀集合中人臉質量的最大值，在這種情況下，可以直接將每類學習行爲下，人臉質量最高的視訊幀作爲目標視訊幀。The specific value of the face quality threshold can be flexibly determined according to the actual situation, which is not limited in this embodiment of the present invention. In a possible implementation, different face quality thresholds may be set for each type of learning behavior; in a possible implementation, the same face threshold may also be set for each type of learning behavior. In a possible implementation, the face quality threshold can also be set as the maximum face quality in the target video frame set. In this case, the video with the highest face quality under each type of learning behavior can be directly set. frame as the target video frame.

在一些可能的實現方式中，可能存在某些視訊幀，同時包含多類學習行爲，在這種情況下，處理包含多類學習行爲的視訊幀的方式可以根據實際情況靈活變化。在一種可能的實現方式中，可以將這些視訊幀分別歸屬在每類學習行爲下，然後從每類學習行爲對應的視訊幀集合中按照步驟S1312進行選擇，來得到目標視訊幀；在一種可能的實現方式中，也可以直接將同時包含多類學習行爲的視訊幀選定爲目標視訊幀。In some possible implementations, there may be some video frames that contain multiple types of learning behaviors at the same time. In this case, the manner of processing the video frames containing multiple types of learning behaviors can be flexibly changed according to the actual situation. In a possible implementation manner, these video frames can be respectively attributed to each type of learning behavior, and then selected from the video frame set corresponding to each type of learning behavior according to step S1312 to obtain the target video frame; in a possible In an implementation manner, a video frame containing multiple types of learning behaviors at the same time can also be directly selected as the target video frame.

在通過上述任意實施例確定目標視訊幀以後，可以通過步驟S1313，來根據目標視訊幀生成學習狀態訊息。步驟S1313的實現方式可以根據實際情況靈活選擇，詳見下述各發明實施例，在此先不做展開。After the target video frame is determined through any of the above embodiments, step S1313 may be used to generate a learning status message according to the target video frame. The implementation manner of step S1313 can be flexibly selected according to the actual situation. For details, please refer to the following embodiments of the invention, which will not be expanded here.

在本發明實施例中，通過獲取視訊幀中包含至少一類學習行爲的視訊幀，作爲目標視訊幀集合，從而根據每類學習行爲的目標視訊幀集合，選定人臉質量較高的視訊幀作爲目標視訊幀，繼而根據目標視訊幀來生成學習狀態訊息。通過上述過程，可以使得生成的學習狀態訊息，是基於具有較高人臉質量且包含有學習行爲的視訊幀所得到的訊息，具有更高的準確性，從而可以更加精準地把握目標對象的學習狀態。In the embodiment of the present invention, a video frame containing at least one type of learning behavior in the video frame is obtained as a target video frame set, so that according to the target video frame set of each type of learning behavior, a video frame with higher face quality is selected as the target. video frame, and then generate the learning status message according to the target video frame. Through the above process, the generated learning status information can be based on the information obtained from the video frames with higher face quality and including learning behavior, and has higher accuracy, so that the learning of the target object can be more accurately grasped state.

如上述發明實施例所述，步驟S1313的實現方式可以靈活變化。在一種可能的實現方式中，步驟S1313可以包括：As described in the above embodiments of the invention, the implementation manner of step S1313 can be flexibly changed. In a possible implementation, step S1313 may include:

將目標視訊幀中的至少一幀作爲學習狀態訊息；和/或，use at least one of the target video frames as a learning status message; and/or,

識別在至少一幀目標視訊幀中目標對象所在區域，基於目標對象所在區域，生成學習狀態訊息。Identifying the region where the target object is located in at least one target video frame, and generating a learning status message based on the region where the target object is located.

通過上述發明實施例可以看出，在一種可能的實現方式中，可以直接將目標視訊幀中的至少一幀作爲學習狀態訊息，在一個示例中，可以對得到的目標視訊幀進行進一步的選定，這一選定可以是隨機的，也可以是有一定條件的，然後將選定的目標視訊幀直接作爲學習狀態訊息；在一個示例中，也可以直接將得到的每個目標視訊幀均作爲學習狀態訊息。It can be seen from the above embodiments of the invention that, in a possible implementation manner, at least one frame in the target video frames can be directly used as the learning state information, and in an example, the obtained target video frames can be further selected, The selection can be random or under certain conditions, and then the selected target video frame can be directly used as the learning status message; in an example, each obtained target video frame can also be directly used as the learning status message .

在一種可能的實現方式中，還可以對目標視訊幀中的目標對象所在區域進行進一步識別，從而根據目標對象所在的區域來生成學習狀態訊息。其中，識別目標對象區域的方式在本發明實施例中不做限定，在一種可能的實現方式中，可以通過上述發明實施例中提到的具有目標對象檢測功能的神經網路來實現。在確定了目標對象在目標視訊幀中的區域後，可以進一步對目標視訊幀進行相應處理，來得到學習狀態訊息。其中，處理的方式可以靈活決定，在一個示例中，可以將目標視訊幀中目標對象所在區域的圖像，作爲學習狀態訊息；在一個示例中，也可以對目標視訊幀目標對象所在區域以外的背景區域進行渲染，比如增加其他貼紙，或是對背景區域增加馬賽克，或是替換背景區域的圖像等，來得到不顯示目標對象當前背景的學習狀態訊息，從而可以對目標對象進行更好的隱私保護，也可以利用貼紙等渲染方式，增加學習狀態訊息的多樣性和美觀。In a possible implementation manner, the region where the target object is located in the target video frame may be further identified, so as to generate the learning status message according to the region where the target object is located. The manner of identifying the target object area is not limited in the embodiments of the present invention, and in a possible implementation manner, it may be implemented by the neural network with the target object detection function mentioned in the above embodiments of the present invention. After the region of the target object in the target video frame is determined, the target video frame can be further processed correspondingly to obtain learning status information. The processing method can be flexibly determined. In one example, the image of the area where the target object is located in the target video frame can be used as the learning state information; Render the background area, such as adding other stickers, or adding mosaics to the background area, or replacing the image of the background area, etc., to obtain the learning status information that does not display the current background of the target object, so that the target object can be better. Privacy protection, and rendering methods such as stickers can also be used to increase the diversity and beauty of learning status information.

通過將目標視訊中的至少一幀作爲學習狀態訊息，和/或根據目標視訊幀中目標對象所在區域來生成學習狀態訊息，通過上述方式，可以使得最終得到的學習狀態訊息更爲靈活，從而可以根據目標對象的需求，來得到更加突出目標對象的學習狀態訊息，或是更爲保護目標對象隱私的學習狀態訊息。By using at least one frame in the target video as the learning status information, and/or generating the learning status information according to the area where the target object is located in the target video frame, the learning status information finally obtained can be made more flexible by the above method, so that it is possible to According to the needs of the target object, the learning status information that is more prominent for the target object, or the learning status information that protects the privacy of the target object is obtained.

上述各發明實施例可以通過任意組合，來得到以包含學習行爲的視訊幀爲基礎所生成的學習狀態訊息，比如表1示出根據本發明一實施例的學習狀態訊息生成規則。學習行爲檢測規則數據處理圖像處理圖像上傳舉手手勢同一個人臉track（幀序列）連續M幀中有N幀檢測到舉手檢測到舉手的第1幀的時間戳作爲舉手開始時間。若有連續Z幀均未檢測到舉手，或人臉track跟丟，則最後檢測到舉手的幀的時間戳作爲舉手結束時間。開始時間和結束時間，作爲一次舉手事件。從舉手開始到舉手結束，在所有檢測到舉手的幀中，取人臉質量分最高的且高於精彩時刻質量分閾值的原圖或摳圖編碼MJPEG上傳到後臺。每X分鐘上傳Y張。點讚手勢同一個人臉track（幀序列）連續M幀中有N幀檢測到大拇哥同上從手勢點讚開始到結束，在所有檢測到手勢點讚的幀中，取人臉質量分最高的且高於精彩時刻質量分閾值的原圖或摳圖編碼MJPEG上傳到後臺。每X分鐘上傳Y張。 OK手勢同一個人臉track（幀序列）連續M幀中有N幀檢測到手勢OK 同上從手勢OK開始到結束，在所有檢測到手勢OK的幀中，取人臉質量分最高的且高於精彩時刻質量分閾值的原圖或摳圖編碼MJPEG上傳到後臺。每X分鐘上傳Y張。勝利手勢同一個人臉track（幀序列）連續M幀中有N幀檢測到手勢Yeah 同上從手勢Yeah開始到結束，在所有檢測到手勢Yeah的幀中，取人臉質量分最高的且高於精彩時刻質量分閾值的原圖或摳圖編碼MJPEG上傳到後臺。每X分鐘上傳Y張。互動行爲-老師發小紅花 SDK（軟體開發工具包）調用小紅花貼紙的渲染渲染開始時間作爲開始時間，渲染結束時間作爲結束時間無無互動行爲-老師點名發言 SDK調用發言貼紙的渲染同上無無表現開心情緒同一個人臉track（幀序列）連續M幀中有N幀，檢測到表情高興或微笑值高於某閾值第1幀的時間戳作爲開始時間。若連續Z幀均未檢測到高興或微笑值高於某閾值，或人臉track跟丟，則最後檢測到高興或者微笑值高於某閾值的幀的時間戳作爲結束時間。開始時間和結束時間，作爲一次開心事件。從開始到結束，在所有檢測到表情高興或微笑值高於某閾值的幀中，取人臉質量分最高的且高於精彩時刻質量分閾值的原圖或摳圖上傳編碼MJPEG上傳到後臺。每X分鐘上傳Y張。關注教學課程的展示區域（聚精會神）同一個人臉track（幀序列）連續M幀中有N幀，檢測到表情平靜且headpose（人臉角度）在觀看閾值範圍內第1幀的時間戳作爲開始時間。若連續Z幀均未檢測到平靜且headpose在觀看閾值範圍內，或人臉track跟丟，則最後檢測到平靜且headpose在觀看閾值範圍內的幀的時間戳作爲結束時間。開始時間和結束時間，作爲一次聚精會神事件。從開始到結束，在所有檢測到平均且headpose在觀看閾值範圍內的幀中，取人臉質量分最高的且高於精彩時刻質量分閾值的原圖或摳圖上傳編碼MJPEG上傳到後臺。每X分鐘上傳Y張。 The above embodiments of the invention can be combined arbitrarily to obtain learning status information generated based on video frames containing learning behaviors. For example, Table 1 shows the learning status information generation rules according to an embodiment of the present invention. learning behavior Detection rules data processing Image Processing image upload raise hand gesture Hand raising is detected in N frames in consecutive M frames of the same face track (frame sequence) The time stamp of the first frame where the raised hand was detected is used as the start time of the raised hand. If the hand-raising is not detected in consecutive Z frames, or the face track is lost, the timestamp of the frame where the hand-raising was last detected is used as the hand-raising end time. Start time and end time, as a raise of hands event. From the start of raising the hand to the end of raising the hand, among all the frames in which the raised hand is detected, take the original image or cutout with the highest face quality score and higher than the quality score threshold of the wonderful moment The encoded MJPEG is uploaded to the background. Upload Y sheets every X minutes. Like gesture Thumbs up is detected in N frames in consecutive M frames of the same face track (frame sequence) Ditto From the beginning to the end of the gesture liking, among all the frames where the gesture liking is detected, take the original image or cutout with the highest face quality score and higher than the quality score threshold of the wonderful moment The encoded MJPEG is uploaded to the background. Upload Y sheets every X minutes. OK gesture Gestures are detected in N frames in consecutive M frames of the same face track (frame sequence) OK Ditto From the beginning of the gesture OK to the end, among all the frames where the gesture OK is detected, take the original image or cutout with the highest face quality score and higher than the quality score threshold of the wonderful moment The encoded MJPEG is uploaded to the background. Upload Y sheets every X minutes. victory gesture Gestures are detected in N frames in consecutive M frames of the same face track (frame sequence) Yeah Ditto From the beginning to the end of the gesture Yeah, in all the frames where the gesture Yeah is detected, take the original image or cutout with the highest face quality score and higher than the quality score threshold of the wonderful moment The encoded MJPEG is uploaded to the background. Upload Y sheets every X minutes. Interactive Behavior-Teacher Sends Little Red Flowers SDK (software development kit) calls the rendering of the red flower sticker The rendering start time is used as the start time, and the rendering end time is used as the end time none none Interactive Behavior - The teacher speaks by name SDK calls the rendering of speech stickers Ditto none none show happiness There are N frames in consecutive M frames of the same face track (frame sequence), and it is detected that the expression is happy or the smile value is higher than a certain threshold The timestamp of frame 1 is used as the start time. If no happiness or smile value higher than a certain threshold is detected in consecutive Z frames, or the face track is lost, the time stamp of the frame with the happy or smile value higher than a certain threshold is finally detected as the end time. Start time and end time as a happy event. From the beginning to the end, among all the frames with a happy expression or a smile value higher than a certain threshold, take the original image or cutout with the highest face quality score and higher than the quality score threshold of the wonderful moment and upload it The encoded MJPEG is uploaded to the background. Upload Y sheets every X minutes. Focus on the display area of the teaching course (Concentration) There are N frames in consecutive M frames of the same face track (frame sequence), the expression is detected to be calm and the headpose (face angle) is within the viewing threshold range The timestamp of frame 1 is used as the start time. If no calmness is detected in consecutive Z frames and the headpose is within the viewing threshold range, or the face track is lost, the timestamp of the frame where calmness is finally detected and the headpose is within the viewing threshold range is used as the end time. Start time and end time, as a concentration event. From the beginning to the end, among all the frames with the average detected and the headpose within the viewing threshold range, take the original image or cutout with the highest face quality score and higher than the quality score threshold of the wonderful moment and upload it The encoded MJPEG is uploaded to the background. Upload Y sheets every X minutes.

表1 學習狀態訊息生成規則Table 1 Learning status message generation rules

其中，M、N、X、Y、Z均爲正整數，具體數值可根據實際需求來設定。且對於表1中處於不同行的M等參數，可以相同或不同，上述M等參數僅作爲一種示意說明，並不作爲對本發明內容的限定。Among them, M, N, X, Y, and Z are all positive integers, and the specific values can be set according to actual needs. In addition, the parameters such as M in different rows in Table 1 may be the same or different, and the above parameters such as M are only used as a schematic illustration, and are not used as a limitation on the content of the present invention.

其中，精彩時刻爲目標對象産生積極學習行爲所對應的時刻。通過表1可以看出，在一個示例中，可以在檢測到目標對象執行舉手等目標手勢、産生開心這一目標情緒、或是聚精會神關注教學課程的展示區域以及與老師産生點名發言等互動等學校行爲的情況下，對視訊進行一定的數據處理，並在數據處理後，對視訊幀進行進一步地圖像處理，從而得到目標視訊幀作爲學習狀態訊息。Among them, the wonderful moment is the moment corresponding to the target object's active learning behavior. It can be seen from Table 1 that, in an example, the target object can be detected to perform target gestures such as raising his hand, generate the target emotion of happiness, or concentrate on the display area of the teaching course, and interact with the teacher such as roll call speech, etc. In the case of school behavior, certain data processing is performed on the video, and after the data processing, further image processing is performed on the video frame, so as to obtain the target video frame as the learning status message.

在一種可能的實現方式中，根據目標對象執行至少一類學習行爲的持續時間，生成學習狀態訊息，可以包括：In a possible implementation manner, the learning status information is generated according to the duration of at least one type of learning behavior performed by the target object, which may include:

步驟S1321，在檢測到目標對象執行至少一類學習行爲的時間不小於時間閾值的情況下，記錄至少一類學習行爲的持續時間；Step S1321, when it is detected that the time when the target object performs at least one type of learning behavior is not less than a time threshold, record the duration of at least one type of learning behavior;

步驟S1322，將至少一類學習行爲對應的持續時間，作爲學習狀態訊息。In step S1322, the duration corresponding to at least one type of learning behavior is used as the learning status message.

其中，時間閾值可以是根據實際情況靈活設定的某一數值，不同類學習行爲的時間閾值可以相同，也可以不同。在檢測到目標對象在一定時間內執行某一類學習行爲的情況下，可以統計目標對象執行這些學習行爲的時間，從而作爲學習狀態訊息反饋到老師或家長處。具體的統計條件以及在哪些學習行爲下統計時間，其實現方式均可以根據實際情況靈活設定。The time threshold may be a certain value flexibly set according to the actual situation, and the time thresholds of different types of learning behaviors may be the same or different. When it is detected that the target object performs a certain type of learning behavior within a certain period of time, the time when the target object performs these learning behaviors can be counted, so as to be fed back to the teacher or the parent as a learning status message. The specific statistical conditions and the statistical time under which learning behaviors are implemented can be flexibly set according to the actual situation.

在一種可能的實現方式中，在檢測到目標對象的未出現在視訊中（比如視訊中無人、視訊幀中有人但無法確定是否爲目標對象或是鏡頭中有人但並非目標對象）的時間超過一定時長、目標對象閉眼或是目標對象未觀看教學課程的展示區域的情況下，可以統計這些學習行爲的時長並將其作爲學習狀態訊息。In one possible implementation, the absence of the target object in the video (such as no one in the video, someone in the video frame but not sure if it is the target object, or someone in the shot but not the target object) is detected for more than a period of time. When the timing is long, the target object closes his eyes, or the target object does not watch the display area of the teaching course, the duration of these learning behaviors can be counted and used as learning status information.

在本發明實施例中，通過在檢測到目標對象執行至少一類學習行爲的時間不小於時間閾值的情況下，記錄至少一類學習行爲的持續時間並作爲學習狀態訊息，通過上述過程，可以將學習狀態訊息進行量化，更爲直觀且精確地掌握目標對象的學習狀態。In the embodiment of the present invention, when it is detected that the time when the target object performs at least one type of learning behavior is not less than the time threshold, the duration of at least one type of learning behavior is recorded as the learning state information, and through the above process, the learning state can be The information is quantified, and the learning status of the target object can be grasped more intuitively and accurately.

在一種可能的實現方式中，本發明實施例中提出的視訊處理方法，還可以包括：In a possible implementation manner, the video processing method proposed in the embodiment of the present invention may further include:

對視訊中的至少部分視訊幀中的背景區域進行渲染，其中，背景區域爲視訊幀中目標對象以外的區域。Rendering a background area in at least a part of the video frame in the video, wherein the background area is an area other than the target object in the video frame.

其中，背景區域的分割方式，以及對背景區域的渲染方式，可以參考上述發明實施例中，對目標視訊幀中目標對象所在區域進行識別以及識別後的渲染過程，在此不再贅述。對背景區域進行渲染的過程中，在一個示例中，可以通過當前的視訊處理裝置中預設的通用模板進行渲染；在一個示例中，也可以通過調用非視訊處理裝置的數據庫中的其他模板或定製模板等進行渲染，比如可以從非視訊處理裝置的雲端伺服器中，調用其他的背景模板等，對視訊中的背景區域進行渲染等。For the segmentation method of the background area and the rendering method of the background area, reference may be made to the above-mentioned embodiments of the invention, the identification of the area where the target object is located in the target video frame and the rendering process after identification, which will not be repeated here. In the process of rendering the background area, in one example, the rendering can be performed by using a preset general template in the current video processing device; Customized templates, etc. for rendering, for example, other background templates can be called from a cloud server that is not a video processing device to render the background area in the video.

通過對視訊中的至少部分視訊幀中的背景區域進行渲染，一方面可以保護視訊中目標對象的隱私，減小目標對象由於沒有合適的視訊採集位置導致隱私洩漏的可能性，另一方面，也可以增强目標對象觀看教學課程過程的趣味性。By rendering the background area in at least part of the video frame in the video, on the one hand, the privacy of the target object in the video can be protected, and the possibility of privacy leakage of the target object due to the lack of a suitable video capture position can be reduced. It can enhance the interest of the target object to watch the teaching course process.

統計至少一個目標對象的學習狀態訊息，得到至少一個目標對象的統計結果；Count the learning status information of at least one target object, and obtain the statistical result of at least one target object;

根據至少一個目標對象的統計結果，生成學習狀態統計數據。Based on the statistical results of at least one target object, learning state statistics are generated.

在本發明實施例中，一個視訊中包含的目標對象可以爲一個，也可以爲多個，另外，本發明實施例中的視訊處理方法，可以用於對單個視訊進行處理，也可以用於對多個視訊進行處理。因此，相應的，可以得到一個目標對象的學習狀態訊息，也可以得到多個目標對象的學習狀態訊息。在這種情況下，可以對至少一個目標對象的學習狀態訊息進行統計，來得到至少一個目標對象的統計結果。其中，統計結果可以包含有目標對象的學習狀態訊息以外，還可以包含有其他與目標對象觀看教學課程所相關的訊息。比如，在一種可能的實現方式中，在步驟S12以前，即對目標對象進行學習行爲檢測以前，還可以獲取目標對象的簽到數據。目標對象的簽到數據可以包含有目標對象的身份訊息以及簽到時間等，具體簽到數據的獲取方式可以根據目標對象的實際簽到方式所靈活決定，在本發明實施例中不做限定。In the embodiment of the present invention, a target object contained in a video may be one or multiple. In addition, the video processing method in the embodiment of the present invention may be used to process a single video, or may be used to process a single video. Multiple videos are processed. Therefore, correspondingly, the learning status information of one target object can be obtained, and the learning status information of multiple target objects can also be obtained. In this case, the learning status information of the at least one target object may be counted to obtain the statistical result of the at least one target object. The statistical result may include not only the learning status information of the target object, but also other information related to the target object's viewing of the teaching course. For example, in a possible implementation manner, before step S12, that is, before performing learning behavior detection on the target object, the check-in data of the target object may also be obtained. The check-in data of the target object may include the identity information of the target object and the check-in time, etc. The specific method of obtaining the check-in data can be flexibly determined according to the actual check-in method of the target object, which is not limited in this embodiment of the present invention.

在得到了至少一個目標對象的統計結果以後，可以根據至少一個統計結果生成學習狀態統計數據。具體地，學習狀態統計數據的生成方式與內容，可以根據統計結果的實現形式所靈活變化。詳見下述各發明實施例，在此先不做展開。After the statistical result of the at least one target object is obtained, the statistical data of the learning state may be generated according to the at least one statistical result. Specifically, the generation method and content of the learning state statistical data can be flexibly changed according to the realization form of the statistical results. Please refer to the following embodiments of the invention for details, which will not be expanded here.

在本發明實施例中，通過統計至少一個目標對象的學習狀態訊息，得到至少一個目標對象的統計結果，從而根據至少一個目標對象的統計結果來生成學習狀態統計數據，通過上述過程，可以有效地對多個目標對象的學習狀態進行綜合評估，從而更加便於教師掌握整個課堂的整體學習情況，也便於其他相關人員更加全面的瞭解目標對象當前所處的學習位置等。In the embodiment of the present invention, the statistical result of at least one target object is obtained by counting the learning status information of at least one target object, so that the statistical data of the learning status is generated according to the statistical result of the at least one target object. Through the above process, it is possible to effectively Comprehensive evaluation of the learning status of multiple target objects makes it easier for teachers to grasp the overall learning situation of the entire classroom, and for other relevant personnel to more comprehensively understand the current learning position of the target objects.

在一種可能的實現方式中，根據至少一個所述目標對象的統計結果，生成學習狀態統計數據，包括：In a possible implementation manner, according to the statistical result of at least one of the target objects, the statistical data of the learning state is generated, including:

根據至少一個目標對象所屬的類別，獲取至少一個類別包含的目標對象的統計結果，生成至少一個類別的學習狀態統計數據，其中，目標對象所屬的類別包括目標對象參與的課程、目標對象註冊的機構以及目標對象使用的設備中的至少一種；和/或，According to the category to which the at least one target object belongs, the statistical results of the target objects included in the at least one category are obtained, and the statistical data of the learning status of the at least one category is generated, wherein the category to which the target object belongs includes the courses that the target object participates in and the institution where the target object is registered. and at least one of the devices used by the target; and/or,

將至少一個目標對象的統計結果進行可視化處理，生成至少一個目標對象的學習狀態統計數據。The statistical result of the at least one target object is visualized to generate the statistical data of the learning state of the at least one target object.

其中，目標對象所屬的類別可以是根據目標對象的身份所劃分的類別，舉例來說，目標對象所屬的類別可以包括目標對象參與的課程、目標對象註冊的機構以及目標對象使用的設備中的至少一種，其中，目標對象參與的課程可以是上述發明實施例中提到的目標對象觀看的教學課程，目標對象註冊的機構可以是目標對象所在的教育機構、或是目標對象所在的年級或是目標對象所在的班級等，目標對象使用的設備可以是線上場景中，目標對象參加在線課程所使用的終端設備等。The category to which the target object belongs may be a category classified according to the identity of the target object. For example, the category to which the target object belongs may include at least one of the courses that the target object participates in, the institution registered by the target object, and the equipment used by the target object. One, in which the course that the target object participates in can be the teaching course watched by the target object mentioned in the above invention embodiment, and the institution registered by the target object can be the educational institution where the target object is located, or the grade or target object where the target object is located. The class in which the object is located, etc., the device used by the target object may be the terminal device used by the target object to participate in the online course in the online scene, and the like.

在本發明實施例中，可以根據目標對象所屬的類別，來獲取至少一個類別包含的目標對象的統計結果，即可以將目標對象所屬類別下的至少一個統計結果進行匯總，來得到該類別下的整體學習狀態統計數據。舉例來說，可以按照使用設備、課程、教育機構等類別進行劃分，分別得到同一設備下不同目標對象的統計結果、同一課程下不同目標對象的統計結果以及同一教育機構中不同目標對象的統計結果等。在一個示例中，還可以將這些統計結果以報表的形式進行展現。在一個示例中，報表中每個類別下的統計結果，既可以包含有每個目標對象的總體學習狀態訊息，還可以包含有每個目標對象的具體學習狀態訊息，比如關注教學課程展示區域的時間長度、微笑的時間長度等，除此以外，還可以包含有其他與觀看教學課程相關的訊息，比如目標對象的簽到時間、簽到次數、目標對象和預設數據庫中的人臉匹配的情況、簽到設備以及簽到課程等。In this embodiment of the present invention, statistical results of target objects included in at least one category may be obtained according to the category to which the target object belongs, that is, at least one statistical result of the category to which the target object belongs may be aggregated to obtain the statistical results of the category to which the target object belongs. Overall learning status statistics. For example, it can be divided according to the categories of equipment, courses, educational institutions, etc., to obtain the statistical results of different target objects under the same device, the statistical results of different target objects under the same course, and the statistical results of different target objects in the same educational institution. Wait. In an example, these statistical results can also be presented in the form of a report. In an example, the statistical results under each category in the report can include not only the overall learning status information of each target object, but also the specific learning status information of each target object, such as focusing on the teaching course display area. The length of time, the length of time to smile, etc., in addition to other information related to watching the teaching course, such as the check-in time of the target object, the number of check-in times, the match between the target object and the face in the preset database, Check-in equipment, check-in classes, etc.

除此之外，還可以對至少一個目標對象的統計結果進行可視化處理，來得到至少一個目標對象的學習狀態統計數據。其中，可視化處理的方式可以根據實際情況靈活決定，比如可以將數據整理成圖表或視訊等形式。學習狀態統計數據中包含的內容可以根據實際情況靈活決定，比如可以包含有目標對象的總體學習狀態訊息、目標對象觀看的教學課程名稱以及目標對象的具體學習狀態訊息等，具體包含哪些數據可以根據實際情況靈活設定。在一個示例中，可以將目標對象的身份、目標對象觀看的教學課程名稱、目標對象的關注教學課程展示區域的時長、目標對象的關注程度强弱、目標對象與其他目標對象之間的數據比較結果、目標對象的互動次數以及目標對象的情緒等內容，整理成可視化的報告，並發送給目標對象或目標對象的其他相關人員，比如目標對象的家長等。In addition, the statistical result of the at least one target object can also be visualized to obtain the statistical data of the learning state of the at least one target object. Among them, the visualization processing method can be flexibly determined according to the actual situation, for example, the data can be organized into the form of charts or videos. The content included in the learning status statistics can be flexibly determined according to the actual situation, such as the overall learning status information of the target object, the name of the teaching course watched by the target object, and the specific learning status information of the target object, etc. The specific data included can be determined according to the The actual situation is flexible. In one example, the identity of the target object, the name of the teaching course watched by the target object, the duration of the target object's attention to the teaching course display area, the degree of attention of the target object, and the data between the target object and other target objects The comparison results, the number of interactions of the target object, and the emotions of the target object are organized into a visual report and sent to the target object or other relevant personnel of the target object, such as the target object's parents, etc.

在一個示例中，可視化處理後的學習狀態統計數據除了圖片與視訊以外，包含的文字內容的形式可以爲“上課科目爲XX，A學生專注時長30分鐘，專注力爲集中，高於班上10%的同學，互動次數3次，微笑5次，特此提出表揚，願繼續努力”或是“上課科目爲XX，B學生注意力較不集中，舉手等手勢互動頻次較低，建議家長密切關注，及時調整孩子的學習習慣”等。In an example, in addition to pictures and videos, the visualized learning status statistics can contain text content in the form of "The class subject is XX, the student A's concentration time is 30 minutes, and the concentration is concentrated, which is higher than that in the class. 10% of the classmates interacted 3 times and smiled 5 times. I would like to express my praise and I would like to continue to work hard.” Pay attention and adjust your child’s study habits in time” and so on.

在本發明實施例中，通過獲取至少一個目標對象所屬的類別，從而生成至少一個類別的學習狀態統計數據，和/或，將至少一個目標對象的統計結果進行可視化處理，生成至少一個目標對象的學習狀態統計數據。通過上述過程，可以通過不同的數據統計方式，更爲直觀與全面地掌握目標對象的學習狀態。In this embodiment of the present invention, by acquiring the category to which the at least one target object belongs, the statistical data of the learning state of the at least one category is generated, and/or the statistical results of the at least one target object are visualized to generate the statistical data of the at least one target object. Learn state statistics. Through the above process, the learning state of the target object can be grasped more intuitively and comprehensively through different data statistics methods.

圖2示出根據本發明實施例的視訊處理裝置的方塊圖。如圖所示，所述視訊處理裝置20可以包括：FIG. 2 shows a block diagram of a video processing apparatus according to an embodiment of the present invention. As shown in the figure, the video processing apparatus 20 may include:

視訊獲取模組21，用於獲取視訊，其中，視訊中的至少部分視訊幀包含目標對象；a video acquisition module 21, used for acquiring video, wherein at least some of the video frames in the video contain a target object;

檢測模組22，用於根據視訊，對目標對象在觀看教學課程過程中的至少一類學習行爲進行檢測；The detection module 22 is used to detect at least one type of learning behavior of the target object in the process of watching the teaching course according to the video;

生成模組23，用於在檢測到目標對象執行至少一類學習行爲的情況下，根據至少部分包含至少一類學習行爲的視訊幀和/或目標對象執行至少一類學習行爲的持續時間，生成學習狀態訊息。The generation module 23 is used to generate learning status information according to at least part of the video frames that contain at least one type of learning behavior and/or the duration of the target object to perform at least one type of learning behavior when it is detected that the target object performs at least one type of learning behavior .

在一種可能的實現方式中，學習行爲包括以下行爲中的至少一類：執行至少一種目標手勢、表現目標情緒、關注教學課程的展示區域、與其他對象産生至少一種互動行爲、在視訊中的至少部分視訊幀中未出現、閉眼以及在教學課程的展示區域內的目光交流。In one possible implementation, the learning behavior includes at least one of the following behaviors: performing at least one target gesture, expressing target emotion, paying attention to the display area of the teaching course, generating at least one interactive behavior with other objects, performing at least one part of the video Eyes closed, absent from video frames, and eye contact within the presentation area of the teaching session.

在一種可能的實現方式中，檢測模組22用於：對視訊進行目標對象檢測，得到包含目標對象的視訊幀；對包含目標對象的視訊幀進行至少一類學習行爲檢測。In a possible implementation manner, the detection module 22 is configured to: perform target object detection on the video to obtain a video frame containing the target object; and perform at least one type of learning behavior detection on the video frame containing the target object.

在一種可能的實現方式中，學習行爲包括執行至少一種目標手勢；檢測模組22進一步用於：對包含目標對象的視訊幀進行至少一種目標手勢的檢測；在檢測到包含至少一種目標手勢的連續視訊幀的數量超過第一閾值的情況下，將包含目標手勢的視訊幀中的至少一幀記錄爲手勢開始幀；在手勢開始幀以後的視訊幀中，不包含目標手勢的連續視訊幀的數量超過第二閾值的情況下，將不包含目標手勢的視訊幀中的至少一幀記錄爲手勢結束幀；根據手勢開始幀與手勢結束幀的數量，確定視訊中目標對象執行至少一種目標手勢的次數和/或時間。In a possible implementation manner, the learning behavior includes performing at least one target gesture; the detection module 22 is further configured to: detect the at least one target gesture on the video frame containing the target object; When the number of video frames exceeds the first threshold, at least one frame in the video frames containing the target gesture is recorded as the gesture start frame; in the video frames after the gesture start frame, the number of consecutive video frames that do not contain the target gesture In the case of exceeding the second threshold, at least one frame in the video frames that does not contain the target gesture is recorded as the gesture end frame; according to the number of gesture start frames and gesture end frames, determine the number of times that the target object in the video performs at least one target gesture and/or time.

在一種可能的實現方式中，學習行爲包括表現目標情緒；檢測模組22進一步用於：對包含目標對象的視訊幀進行表情檢測和/或微笑值檢測；在檢測到視訊幀中目標對象展示至少一種第一目標表情或微笑值檢測的結果超過目標微笑值情況下，將檢測到的視訊幀作爲第一檢測幀；在檢測到連續的第一檢測幀的數量超過第三閾值的情況下，確定目標對象産生目標情緒。In a possible implementation manner, the learning behavior includes expressing the target emotion; the detection module 22 is further configured to: perform expression detection and/or smile value detection on the video frame containing the target object; in the detected video frame, the target object displays at least A first target expression or smile value detection result exceeds the target smile value situation, the detected video frame is used as the first detection frame; In the case where the number of consecutive first detection frames is detected to exceed the third threshold, determine The target object produces the target emotion.

在一種可能的實現方式中，學習行爲包括關注教學課程的展示區域；檢測模組22進一步用於：對包含目標對象的視訊幀進行表情檢測和人臉角度檢測；在檢測到視訊幀中目標對象展示至少一種第二目標表情且人臉角度在目標人臉角度範圍以內的情況下，將檢測到的視訊幀作爲第二檢測幀；在檢測到連續的第二檢測幀的數量超過第四閾值的情況下，確定目標對象關注教學課程的展示區域。In a possible implementation manner, the learning behavior includes paying attention to the display area of the teaching course; the detection module 22 is further configured to: perform expression detection and face angle detection on the video frame containing the target object; In the case where at least one second target expression is displayed and the face angle is within the range of the target face angle, the detected video frame is used as the second detection frame; when it is detected that the number of consecutive second detection frames exceeds the fourth threshold In this case, it is determined that the target object pays attention to the display area of the teaching course.

在一種可能的實現方式中，生成模組23用於：獲取視訊中包含至少一類學習行爲的視訊幀，作爲目標視訊幀集合；對目標視訊幀集合中的至少一個視訊幀進行人臉質量檢測，將人臉質量大於人臉質量閾值的視訊幀作爲目標視訊幀；根據目標視訊幀，生成學習狀態訊息。In a possible implementation manner, the generating module 23 is used to: obtain a video frame containing at least one type of learning behavior in the video, as a target video frame set; perform face quality detection on at least one video frame in the target video frame set, The video frame whose face quality is greater than the face quality threshold is used as the target video frame; the learning status message is generated according to the target video frame.

在一種可能的實現方式中，生成模組23進一步用於：將目標視訊幀中的至少一幀作爲學習狀態訊息；和/或，識別在至少一幀目標視訊幀中目標對象所在區域，基於目標對象所在區域，生成學習狀態訊息。In a possible implementation manner, the generating module 23 is further configured to: use at least one frame in the target video frame as the learning state information; and/or, identify the area where the target object is located in the at least one frame of the target video frame, based on the target video frame The area where the object is located, and the learning status message is generated.

在一種可能的實現方式中，檢測模組22用於：對視訊進行目標對象檢測，得到包含目標對象的視訊幀，並將視訊中包含目標對象的視訊幀以外的視訊幀，作爲未檢測到目標對象的視訊幀；在未檢測到目標對象的視訊幀的數量超過預設視訊幀數量的情況下，檢測到學習行爲包括：在視訊中的至少部分視訊幀中未出現。In a possible implementation manner, the detection module 22 is used to: perform target object detection on the video, obtain a video frame containing the target object, and use the video frame other than the video frame containing the target object as the undetected target video frames of the object; in the case that the number of video frames of the target object is not detected exceeds the preset number of video frames, detecting the learning behavior includes: not appearing in at least part of the video frames in the video.

在一種可能的實現方式中，生成模組23用於：在檢測到目標對象執行至少一類學習行爲的時間不小於時間閾值的情況下，記錄至少一類學習行爲的持續時間；將至少一類學習行爲對應的持續時間，作爲學習狀態訊息。In a possible implementation manner, the generating module 23 is configured to: record the duration of the at least one type of learning behavior when it is detected that the time when the target object performs at least one type of learning behavior is not less than a time threshold; duration as a learning status message.

在一種可能的實現方式中，裝置還用於：對視訊中的至少部分視訊幀中的背景區域進行渲染，其中，背景區域爲視訊幀中目標對象以外的區域。In a possible implementation manner, the apparatus is further configured to: render a background area in at least part of the video frame in the video, where the background area is an area other than the target object in the video frame.

在一種可能的實現方式中，裝置還用於：統計至少一個目標對象的學習狀態訊息，得到至少一個目標對象的統計結果；根據至少一個目標對象的統計結果，生成學習狀態統計數據。In a possible implementation manner, the apparatus is further configured to: count the learning state information of the at least one target object to obtain a statistical result of the at least one target object; and generate learning state statistical data according to the statistical result of the at least one target object.

在一種可能的實現方式中，裝置還用於：根據至少一個目標對象所屬的類別，獲取至少一個類別包含的目標對象的統計結果，生成至少一個類別的學習狀態統計數據，其中，目標對象所屬的類別包括目標對象參與的課程、目標對象註冊的機構以及目標對象使用的設備中的至少一種；和/或，將至少一個目標對象的統計結果進行可視化處理，生成至少一個目標對象的學習狀態統計數據。In a possible implementation manner, the apparatus is further configured to: obtain statistical results of the target objects included in the at least one category according to the category to which the at least one target object belongs, and generate statistical data of the learning state of the at least one category, wherein the target object belongs to The category includes at least one of the courses attended by the target object, the institution registered by the target object, and the equipment used by the target object; and/or, the statistical result of the at least one target object is visualized to generate the learning status statistical data of the at least one target object .

在不違背邏輯的情況下，本申請不同實施例之間可以相互結合，不同實施例描述有所側重，未側重描述的部分可參見其他實施例的記載。In the case of not violating the logic, different embodiments of the present application may be combined with each other, and the descriptions of different embodiments have some emphasis, and for the parts that are not described, reference may be made to the records of other embodiments.

在本發明的一些實施例中，本發明實施例提供的裝置具有的功能或包含的模組可以用於執行上文方法實施例描述的方法，其具體實現和技術效果可參照上文方法實施例的描述，爲了簡潔，這裏不再贅述。In some embodiments of the present invention, the functions or modules included in the devices provided in the embodiments of the present invention may be used to execute the methods described in the above method embodiments, and the specific implementation and technical effects may refer to the above method embodiments The description, for the sake of brevity, will not be repeated here.

應用場景示例Application Scenario Example

學生學習的方式通常是老師授課，學生聽課，課堂缺少互動和趣味性，學生不容易提起興趣聽課，不能通過學生的實時表現對學生形成正向激勵。同時，機構或者老師也無法掌握學生的聽課狀態，家長也無法瞭解孩子在學校的表現，尤其是受疫情影響，學生在線上課的時間非常多，然而，學生是否真正上課以及是否在認真聽課、課堂互動表現如何，都無法量化評估。因此，如何有效地把握學生的學習狀態，成爲目前一個極待解決的問題。The way students learn is usually that teachers teach and students listen to the class. The class lacks interaction and interest. It is not easy for students to get interested in listening to the class, and it is not possible to form positive incentives for students through their real-time performance. At the same time, institutions or teachers cannot grasp the status of students' class attendance, and parents cannot understand their children's performance in school. Especially due to the impact of the epidemic, students spend a lot of time in online classes. The performance of the interaction cannot be quantified. Therefore, how to effectively grasp the students' learning status has become an extremely unsolved problem.

本發明應用示例提出了一套學習系統，該系統可以通過上述發明實施例中提出的視訊處理方法，來有效地掌握學生的學習狀態。The application example of the present invention proposes a learning system, which can effectively grasp the learning status of students through the video processing method proposed in the above-mentioned embodiments of the present invention.

圖3示出根據本發明一應用示例的示意圖。如圖所示，在一個示例中，學習系統可以由用戶端、教育軟體服務化（SaaS，Software-as-a-Service）後臺以及互動課堂後臺等三部分所構成。其中，學生通過用戶端觀看教學課程，用戶端可以包含兩部分，分別是用於學習的硬體設備（比如圖中安裝了Windows系統或是IOS系統以及SDK的客戶端），以及學生登入在線課堂的應用程式（即圖中的用戶APP）。教育SaaS後臺可以是學生所在的教育機構的伺服器所搭建的平臺，互動課堂後臺可以是匯總不同教育機構的數據並進行數據維護的伺服器所搭建的平臺，無論是教育SaaS後臺還是互動課堂後臺，均可以通過API介面，與用戶端之間進行數據交互。從而實現上述各發明實施例中所提到的學習狀態訊息生成以及學習狀態統計數據的生成。FIG. 3 shows a schematic diagram of an application example according to the present invention. As shown in the figure, in an example, the learning system can be composed of three parts: a client, an educational software-as-a-Service (SaaS, Software-as-a-Service) background, and an interactive classroom background. Among them, students watch teaching courses through the client, and the client can include two parts, namely the hardware device for learning (such as the client with Windows system or IOS system and SDK installed in the picture), and the student logging in to the online class application (ie the user APP in the figure). The educational SaaS backend can be a platform built by the server of the educational institution where students are located, and the interactive classroom backend can be a platform built by a server that aggregates data from different educational institutions and maintains data, whether it is an educational SaaS backend or an interactive classroom backend , can exchange data with the client through the API interface. Thus, the generation of learning status information and the generation of learning status statistical data mentioned in the above embodiments of the invention are realized.

在本發明應用示例中，學習狀態訊息的生成過程可以包括：In an application example of the present invention, the generation process of the learning status message may include:

用戶端通過採集學生觀看教學課程過程的視訊，並對採集的視訊進行處理，從而獲取每個學生的學習狀態訊息，教育SaaS後臺以及互動課堂後臺通過API介面，調用不同用戶端中生成的學習狀態訊息，並對這些學習狀態訊息通過上述發明實施例中提到的任意方式進行統計處理，生成學習狀態統計數據。The client obtains the learning status information of each student by collecting the video of the students watching the teaching course and processing the collected video. The education SaaS background and the interactive classroom background call the learning status generated in different clients through the API interface. information, and perform statistical processing on the learning status information in any manner mentioned in the above embodiments of the invention to generate learning status statistical data.

在一個示例中，用戶端對採集的視訊進行處理，獲取每個學生的學習狀態訊息的過程可以包括：In an example, the user terminal processes the collected video, and the process of acquiring the learning status information of each student may include:

A．獲取學生上課的精彩時刻（即上述發明實施例中提到的積極的學習行爲）。A. Obtain the wonderful moments of the students in class (that is, the active learning behavior mentioned in the above-mentioned embodiments of the invention).

在一個示例中，可以通過定義一定的規則，製作學生的精彩視訊集錦，可以將學生的表現剪輯成一小段視訊或者是一些精彩圖片並提供給家長，這樣家長可以及時評估學生的上課表現，如果效果好，可能會鼓勵學生繼續參加相關課程。In an example, a highlight video collection of students can be produced by defining certain rules, and students' performance can be edited into a short video or some wonderful pictures and provided to parents, so that parents can timely evaluate students' performance in class, and if the effect is effective Well, students may be encouraged to continue taking relevant courses.

在一個示例中，獲取學生的精彩時刻可以在學生簽到成功後進行，後去的精彩時刻的視訊或圖片會上傳後臺或雲端，同時，還可以選擇學生是否實時可見上傳的精彩時刻的內容。在一個示例中，精彩時刻定義規則可以包括：産生至少一種目標手勢，目標手勢可以包括舉手、點讚、手勢OK以及手勢Yeah等，在一段時間範圍內如果檢測到學生執行以上的手勢，則可以對包含有手勢的視訊進行圖片或視訊幀抽取。表現開心的目標情緒，在一段時間範圍內如果檢測到學生的表情是高興，且微笑值達到某一目標微笑值（比如99分），則可以有高興標籤的視訊幀或是達到目標微笑值的視訊幀進行圖片或視訊幀抽取。關注教學課程的展示區域，在一段時間範圍內如果學生人臉朝向一直較正，即headpose在某個閾值範圍內，則可以對這段時間範圍內的視訊進行圖片或視訊幀抽取。In an example, obtaining the highlights of the students can be performed after the students have successfully signed in, and the video or pictures of the highlights will be uploaded to the background or the cloud. At the same time, it is also possible to select whether the students can see the content of the uploaded moments in real time. In one example, the rules for defining a wonderful moment may include: generating at least one target gesture, and the target gesture may include raising a hand, a thumbs up, an OK gesture, a gesture Yeah, etc. If it is detected that a student performs the above gestures within a period of time, the Image or video frame extraction can be performed on videos containing gestures. The target emotion of expressing happiness. If it is detected that the student's expression is happy within a certain period of time, and the smile value reaches a certain target smile value (such as 99 points), there can be a video frame with a happy label or a target smile value. Video frames are extracted from pictures or video frames. Pay attention to the display area of the teaching course. If the student's face orientation is always correct within a certain period of time, that is, the headpose is within a certain threshold range, then pictures or video frames can be extracted from the video within this period of time.

B．對學生的學習情況進行學情檢測（針對上述發明實施例中提到的消極的學習行爲）。b. The learning situation detection is performed on the learning situation of the students (for the negative learning behavior mentioned in the above embodiments of the invention).

在一個示例中，可以將學生可能不在畫面中，或者有不專注的情況，通過學情檢測，將數據實時推送給家長，便於家長第一時間關注孩子，及時糾正孩子的不良學習習慣，起到輔助監督作用。In one example, students may not be on the screen or are not focused, and the data can be pushed to parents in real time through learning situation detection, so that parents can pay attention to their children at the first time and correct their children's bad study habits in a timely manner. Auxiliary supervision.

在一個示例中，對學生進行學情檢測的過程可以在學生簽到成功後進行，如鏡頭前多長時間範圍內無人出現、未觀看螢幕、閉眼等，則判斷該人專注度較低，在這種情況下，可以統計學生出現上述學習行爲的時長，並將其作爲學情檢測的結果，得到相應的學習狀態數據。具體的學情檢測配置規則可以參考上述各發明實施例，在此不再贅述。In an example, the process of student learning situation detection can be carried out after the student has successfully signed in. If no one appears in front of the camera for a long time, does not watch the screen, closes his eyes, etc., it is judged that the person's concentration is low. In this case, the duration of the student's above-mentioned learning behavior can be counted, and it can be used as the result of the learning situation detection to obtain the corresponding learning status data. For specific configuration rules of learning situation detection, reference may be made to the foregoing embodiments of the invention, and details are not described herein again.

通過上述各發明示例，可以得到包含有精彩時刻以及學情檢測的學習狀態訊息，進一步地，教育SaaS後臺以及互動課堂後臺通過API介面，調用不同用戶端中生成的學習狀態訊息，來生成學習狀態統計數據的過程可以包括：Through the above examples of the invention, learning status information including exciting moments and learning situation detection can be obtained. Further, the education SaaS background and the interactive classroom background use the API interface to call the learning status information generated in different clients to generate learning status. The process of statistical data can include:

C．報表生成（即上述發明實施例中的生成至少一個類別的學習狀態統計數據）。c. Report generation (that is, generating at least one category of learning status statistics in the above embodiments of the invention).

在一個示例中，後臺或雲端API可以分設備、課程、機構等不同維度查看學生的簽到訊息以及學習狀態訊息，主要數據指標可以包括：簽到時間、簽到次數、比中人臉庫（即上述發明實施例中的目標對象和預設數據庫中的人臉匹配的情況）、簽到設備、簽到課程、專注時長以及微笑時長等。In one example, the background or cloud API can view students' check-in information and learning status information in different dimensions such as devices, courses, and institutions. The main data indicators can include: check-in time, number of check-ins, and face database (that is, the above invention). The target object in the embodiment is matched with the face in the preset database), check-in equipment, check-in course, focus time, smile time, etc.

D．分析報告（即上述發明實施例中的可視化處理生成至少一個目標對象的學習狀態統計數據）。D. An analysis report (that is, the visualization process in the above embodiments of the invention generates the statistical data of the learning state of at least one target object).

在一個示例中，教育SaaS後臺或互動課堂後臺可以將學生在在線課堂的表現情況，統一整理成一個完整的學情分析報告。報告通過可視化的圖形界面說明學生上課的情況，進一步地，後臺還可以選擇較好的情況推送給家長或老師，從而可以用於機構老師分析學生情況，逐步輔助孩子改善學習行爲。In one example, the educational SaaS backend or the interactive classroom backend can organize students' performance in online classes into a complete learning situation analysis report. The report explains the students' class status through a visual graphical interface. Further, the background can also select a better situation and push it to parents or teachers, so that it can be used by institutional teachers to analyze the students' situation and gradually assist children to improve their learning behavior.

除上述過程以外，學習系統還可以在學生通過用戶端進行學習的過程中，對學生的學習視訊進行背景分割處理。在一個示例中，用戶端可以針對於學生沒有適合直播的位置背景或者出於隱私保護不願意顯示背景畫面的情況，提供背景分割功能。在一個示例中，用戶端的SDK可以支持若干不同的背景模版，比如可以預設置若干通用模版，在一個示例中，學生也可通過用戶端從互動課堂後臺調用定製模版。在一個示例中，SDK可以提供背景模版預覽介面給用戶端的APP，便於學生通過APP預覽可以調用的定製模板；學生在上課過程中，也可以通過用戶端中APP上背景分割的貼紙，用於對直播背景進行渲染，在一個示例中，如果學生不滿意貼紙，也可以手動觸發關閉。用戶端的APP可以將學生使用貼紙的數據上報相應後臺（教育SaaS後臺或互動課堂後臺），相應後臺可以分析學生使用了哪些背景貼紙以及使用量等訊息，作爲額外的學習狀態訊息等。In addition to the above process, the learning system can also perform background segmentation processing on the student's learning video during the process of the student's learning through the user terminal. In an example, the user terminal may provide a background segmentation function for the situation where the student does not have a location background suitable for live broadcast or is unwilling to display the background image due to privacy protection. In an example, the SDK of the client can support several different background templates, for example, several general templates can be preset. In an example, students can also call customized templates from the interactive classroom background through the client. In an example, the SDK can provide a background template preview interface to the APP on the client side, so that students can preview the customized templates that can be called through the APP; during the class, students can also use the background-segmented stickers on the APP on the client side to be used for Render the live background and, in one example, also trigger a shutdown manually if the students are not satisfied with the sticker. The APP on the client side can report the data of students' use of stickers to the corresponding backend (educational SaaS backend or interactive classroom backend).

本發明應用示例中提出的學習系統，除了可以應用於在線課堂外，還可以擴展應用於其他相關領域，比如在線會議等。The learning system proposed in the application example of the present invention can be extended to other related fields, such as online conferences, in addition to being applicable to online classes.

可以理解，本發明提及的上述各個方法實施例，在不違背原理邏輯的情況下，均可以彼此相互結合形成結合後的實施例，限於篇幅，本發明不再贅述。It can be understood that the above method embodiments mentioned in the present invention can be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, the present invention will not repeat them.

本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的撰寫順序並不意味著嚴格的執行順序而對實施過程構成任何限定，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

本發明實施例還提出一種電腦可讀儲存介質，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。電腦可讀儲存介質可以是揮發性電腦可讀儲存介質或非揮發性電腦可讀儲存介質。An embodiment of the present invention further provides a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned method is implemented. The computer-readable storage medium can be a volatile computer-readable storage medium or a non-volatile computer-readable storage medium.

本發明實施例還提出一種電子設備，包括：處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置爲上述方法。An embodiment of the present invention further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to perform the above method.

本發明實施例還提出一種電腦程式，包括電腦可讀代碼，當所述電腦可讀代碼在電子設備中運行時，所述電子設備中的處理器執行用於實現上述方法。An embodiment of the present invention further provides a computer program, including computer-readable code, when the computer-readable code is executed in an electronic device, the processor in the electronic device executes the method to implement the above method.

在實際應用中，上述記憶體可以是揮發性記憶體（volatile memory），例如RAM；或者非揮發性記憶體（non-volatile memory），例如ROM，快閃記憶體（flash memory），硬碟（Hard Disk Drive，HDD）或固態硬碟（Solid-State Drive，SSD）；或者上述種類的記憶體的組合，並向處理器提供指令和數據。In practical applications, the above-mentioned memory may be volatile memory, such as RAM; or non-volatile memory, such as ROM, flash memory, hard disk ( Hard Disk Drive, HDD) or Solid-State Drive (SSD); or a combination of the above types of memory, and provides instructions and data to the processor.

上述處理器可以爲ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微處理器中的至少一種。可以理解地，對於不同的設備，用於實現上述處理器功能的電子裝置還可以爲其它，本發明實施例不作具體限定。The above-mentioned processor may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that, for different devices, the electronic apparatus for implementing the above processor function may also be other, which is not specifically limited in this embodiment of the present invention.

電子設備可以被提供爲終端、伺服器或其它形態的設備。The electronic device may be provided as a terminal, server or other form of device.

基於前述實施例相同的技術構思，本發明實施例還提供了一種電腦程式，該電腦程式被處理器執行時實現上述方法。Based on the same technical concept as the foregoing embodiments, an embodiment of the present invention further provides a computer program, which implements the above method when the computer program is executed by a processor.

圖4是根據本發明實施例的一種電子設備800的方塊圖。例如，電子設備800可以是行動電話，電腦，數位廣播終端，訊息收發設備，遊戲控制台，平板設備，醫療設備，健身設備，個人數位助理等終端。FIG. 4 is a block diagram of an electronic device 800 according to an embodiment of the present invention. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant and other terminals.

參照圖4，電子設備800可以包括以下一個或多個組件：處理組件802，記憶體804，電源組件806，多媒體組件808，音訊組件810，輸入/輸出（I/O）的介面812，感測器組件814，以及通訊組件816。4, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensing server component 814, and communication component 816.

處理組件802通常控制電子設備800的整體操作，諸如與顯示，電話呼叫，數據通訊，相機操作和記錄操作相關聯的操作。處理組件802可以包括一個或多個處理器820來執行指令，以完成上述的方法的全部或部分步驟。此外，處理組件802可以包括一個或多個模組，便於處理組件802和其他組件之間的交互。例如，處理組件802可以包括多媒體模組，以方便多媒體組件808和處理組件802之間的交互。The processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

記憶體804被配置爲儲存各種類型的數據以支持在電子設備800的操作。這些數據的示例包括用於在電子設備800上操作的任何應用程式或方法的指令，連絡人數據，電話簿數據，訊息，圖片，視訊等。記憶體804可以由任何類型的揮發性或非揮發性儲存設備或者它們的組合實現，如靜態隨機存取記憶體（SRAM），電子可抹除可程式化唯讀記憶體（EEPROM），可抹除可程式化唯讀記憶體（EPROM），可程式化唯讀記憶體（PROM），唯讀記憶體（ROM），磁記憶體，快閃記憶體，磁碟或光碟。The memory 804 is configured to store various types of data to support the operation of the electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as static random access memory (SRAM), electronically erasable programmable read only memory (EEPROM), erasable Except Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.

電源組件806爲電子設備800的各種組件提供電力。電源組件806可以包括電源管理系統，一個或多個電源，及其他與爲電子設備800生成、管理和分配電力相關聯的組件。Power supply assembly 806 provides power to various components of electronic device 800 . Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .

多媒體組件808包括在所述電子設備800和用戶之間的提供一個輸出介面的螢幕。在一些實施例中，螢幕可以包括液晶顯示器（LCD）和觸控面板（TP）。如果螢幕包括觸控面板，螢幕可以被實現爲觸控螢幕，以接收來自用戶的輸入訊號。觸控面板包括一個或多個觸控感測器以感測觸控、滑動和觸控面板上的手勢。所述觸控感測器可以不僅感測觸控或滑動動作的邊界，而且還檢測與所述觸控或滑動操作相關的持續時間和壓力。在一些實施例中，多媒體組件808包括一個前置攝影機和/或後置攝影機。當電子設備800處於操作模式，如拍攝模式或視訊模式時，前置攝影機和/或後置攝影機可以接收外部的多媒體數據。每個前置攝影機和後置攝影機可以是一個固定的光學透鏡系統或具有焦距和光學變焦能力。Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音訊組件810被配置爲輸出和/或輸入音訊訊號。例如，音訊組件810包括一個麥克風（MIC），當電子設備800處於操作模式，如呼叫模式、記錄模式和語音識別模式時，麥克風被配置爲接收外部音訊訊號。所接收的音訊訊號可以被進一步儲存在記憶體804或經由通訊組件816發送。在一些實施例中，音訊組件810還包括一個揚聲器，用於輸出音訊訊號。Audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when the electronic device 800 is in an operating mode, such as a calling mode, a recording mode, and a voice recognition mode. The received audio signal can be further stored in the memory 804 or sent via the communication component 816 . In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.

I/O介面812爲處理組件802和周邊介面模組之間提供介面，上述周邊介面模組可以是鍵盤，滑鼠，按鈕等。這些按鈕可包括但不限於：主頁按鈕、音量按鈕、啓動按鈕和鎖定按鈕。The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules. The peripheral interface modules may be keyboards, mice, buttons, and the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

感測器組件814包括一個或多個感測器，用於爲電子設備800提供各個方面的狀態評估。例如，感測器組件814可以檢測到電子設備800的打開/關閉狀態，組件的相對定位，例如所述組件爲電子設備800的顯示器和小鍵盤，感測器組件814還可以檢測電子設備800或電子設備800一個組件的位置改變，用戶與電子設備800接觸的存在或不存在，電子設備800方位或加速/減速和電子設備800的溫度變化。感測器組件814可以包括接近感測器，被配置用來在沒有任何的物理接觸時檢測附近物體的存在。感測器組件814還可以包括光感測器，如CMOS或CCD圖像感測器，用於在成像應用中使用。在一些實施例中，該感測器組件814還可以包括加速度感測器，陀螺儀感測器，磁感測器，壓力感測器或溫度感測器。Sensor assembly 814 includes one or more sensors for providing various aspects of status assessment for electronic device 800 . For example, the sensor assembly 814 can detect the open/closed state of the electronic device 800, the relative positioning of the components, such as the display and keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or The position of a component of the electronic device 800 changes, the presence or absence of user contact with the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature of the electronic device 800 changes. Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通訊組件816被配置爲便於電子設備800和其他設備之間有線或無線方式的通訊。電子設備800可以接入基於通訊標準的無線網路，如WiFi，2G，3G，4G或5G，或它們的組合。在一個示例性實施例中，通訊組件816經由廣播信道接收來自外部廣播管理系統的廣播訊號或廣播相關人員訊息。在一個示例性實施例中，所述通訊組件816還包括近場通訊（NFC）模組，以促進短程通訊。例如，在NFC模組可基於射頻識別（RFID）技術，紅外數據協會（IrDA）技術，超寬頻（UWB）技術，藍牙（BT）技術和其他技術來實現。Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G or 5G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related personnel messages from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication assembly 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性實施例中，電子設備800可以被一個或多個應用專用積體電路（ASIC）、數位訊號處理器（DSP）、數位訊號處理設備（DSPD）、可程式化邏輯裝置（PLD）、現場可程式化邏輯閘陣列（FPGA）、控制器、微控制器、微處理器或其他電子元件實現，用於執行上述方法。In an exemplary embodiment, electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), Field Programmable Logic Gate Array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above method.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存介質，例如包括電腦程式指令的記憶體804，上述電腦程式指令可由電子設備800的處理器820執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 804 including computer program instructions executable by the processor 820 of the electronic device 800 to accomplish the above method.

圖5是根據本發明實施例的一種電子設備1900的方塊圖。例如，電子設備1900可以被提供爲一伺服器。參照圖5，電子設備1900包括處理組件1922，其進一步包括一個或多個處理器，以及由記憶體1932所代表的記憶體資源，用於儲存可由處理組件1922的執行的指令，例如應用程式。記憶體1932中儲存的應用程式可以包括一個或一個以上的每一個對應於一組指令的模組。此外，處理組件1922被配置爲執行指令，以執行上述方法。FIG. 5 is a block diagram of an electronic device 1900 according to an embodiment of the present invention. For example, the electronic device 1900 may be provided as a server. 5, electronic device 1900 includes processing component 1922, which further includes one or more processors, and memory resources represented by memory 1932 for storing instructions executable by processing component 1922, such as applications. An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Additionally, the processing component 1922 is configured to execute instructions to perform the above-described methods.

電子設備1900還可以包括一個電源組件1926被配置爲執行電子設備1900的電源管理，一個有線或無線網路介面1950被配置爲將電子設備1900連接到網路，和一個輸入輸出（I/O）介面1958。電子設備1900可以操作基於儲存在記憶體1932的操作系統，例如Windows ServerTM，Mac OS XTM，UnixTM, LinuxTM，FreeBSDTM或類似。The electronic device 1900 may also include a power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) Interface 1958. Electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存介質，例如包括電腦程式指令的記憶體1932，上述電腦程式指令可由電子設備1900的處理組件1922執行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as a memory 1932 including computer program instructions executable by the processing component 1922 of the electronic device 1900 to accomplish the above method.

本發明可以是系統、方法和/或電腦程式産品。電腦程式産品可以包括電腦可讀儲存介質，其上載有用於使處理器實現本發明的各個方面的電腦可讀程式指令。The present invention may be a system, method and/or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present invention.

電腦可讀儲存介質可以是可以保持和儲存由指令執行設備使用的指令的有形設備。電腦可讀儲存介質例如可以是――但不限於――電儲存設備、磁儲存設備、光儲存設備、電磁儲存設備、半導體儲存設備或者上述的任意合適的組合。電腦可讀儲存介質的更具體的例子（非窮舉的列表）包括：可攜式電腦盤、硬碟、隨機存取記憶體（RAM）、唯讀記憶體（ROM）、可抹除可程式化唯讀記憶體（EPROM或閃存）、靜態隨機存取記憶體（SRAM）、可攜式壓縮磁碟唯讀記憶體（CD-ROM）、數位多功能影音光碟（DVD）、記憶卡、磁片、機械編碼設備、例如其上儲存有指令的打孔卡或凹槽內凸起結構、以及上述的任意合適的組合。這裏所使用的電腦可讀儲存介質不被解釋爲瞬時訊號本身，諸如無線電波或者其他自由傳播的電磁波、通過波導或其他傳輸媒介傳播的電磁波（例如，通過光纖電纜的光脈衝）、或者通過電線傳輸的電訊號。A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disk-read-only memory (CD-ROM), digital versatile disc (DVD), memory card, magnetic A sheet, a mechanical coding device, such as a punched card or a raised structure in a groove with instructions stored thereon, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.

這裏所描述的電腦可讀程式指令可以從電腦可讀儲存介質下載到各個計算/處理設備，或者通過網路、例如網際網路、區域網路、廣域網路和/或無線網下載到外部電腦或外部儲存設備。網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換機、閘道電腦和/或邊緣伺服器。每個計算/處理設備中的網路介面卡或者網路介面從網路接收電腦可讀程式指令，並轉發該電腦可讀程式指令，以供儲存在各個計算/處理設備中的電腦可讀儲存介質中。The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing/processing devices, or downloaded to external computers over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network or external storage device. Networks may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. A network interface card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable storage stored in each computing/processing device in the medium.

用於執行本發明操作的電腦程式指令可以是彙編指令、指令集架構（ISA）指令、機器指令、機器相關指令、微代碼、韌體指令、狀態設置數據、或者以一種或多種程式化語言的任意組合編寫的原始碼或目標代碼，所述程式化語言包括面向對象的程式化語言—諸如Smalltalk、C++等，以及常規的過程式程式化語言—諸如“C”語言或類似的程式化語言。電腦可讀程式指令可以完全地在用戶電腦上執行、部分地在用戶電腦上執行、作爲一個獨立的套裝軟體執行、部分在用戶電腦上部分在遠程電腦上執行、或者完全在遠程電腦或伺服器上執行。在涉及遠程電腦的情形中，遠程電腦可以通過任意種類的網路—包括區域網路(LAN)或廣域網路(WAN)—連接到用戶電腦，或者，可以連接到外部電腦（例如利用網際網路服務提供商來通過網際網路連接）。在一些實施例中，通過利用電腦可讀程式指令的狀態人員訊息來個性化定製電子電路，例如可程式化邏輯電路、現場可程式化邏輯閘陣列（FPGA）或可程式化邏輯陣列（PLA），該電子電路可以執行電腦可讀程式指令，從而實現本發明的各個方面。The computer program instructions for carrying out the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source or object code written in any combination, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages. Computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server execute on. In the case of a remote computer, the remote computer may be connected to the user's computer via any kind of network - including a local area network (LAN) or wide area network (WAN) - or, it may be connected to an external computer (for example, using the Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as Programmable Logic Circuits, Field Programmable Logic Gate Arrays (FPGA) or Programmable Logic Arrays (PLA), are personalized by utilizing state personnel information of computer readable program instructions. ), the electronic circuitry can execute computer-readable program instructions to implement various aspects of the present invention.

這裏參照根據本發明實施例的方法、裝置（系統）和電腦程式産品的流程圖和/或方塊圖描述了本發明的各個方面。應當理解，流程圖和/或方塊圖的每個方塊以及流程圖和/或方塊圖中各方塊的組合，都可以由電腦可讀程式指令實現。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

這些電腦可讀程式指令可以提供給通用電腦、專用電腦或其它可程式化數據處理裝置的處理器，從而生産出一種機器，使得這些指令在通過電腦或其它可程式化數據處理裝置的處理器執行時，産生了實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的裝置。也可以把這些電腦可讀程式指令儲存在電腦可讀儲存介質中，這些指令使得電腦、可程式化數據處理裝置和/或其他設備以特定方式工作，從而，儲存有指令的電腦可讀介質則包括一個製造品，其包括實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作的各個方面的指令。These computer readable program instructions may be provided to the processor of a general purpose computer, special purpose computer or other programmable data processing device to produce a machine for execution of the instructions by the processor of the computer or other programmable data processing device When, means are created that implement the functions/acts specified in one or more of the blocks in the flowchart and/or block diagrams. These computer-readable program instructions may also be stored in a computer-readable storage medium, the instructions causing the computer, programmable data processing device and/or other equipment to operate in a particular manner, so that the computer-readable medium storing the instructions Included is an article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

也可以把電腦可讀程式指令加載到電腦、其它可程式化數據處理裝置、或其它設備上，使得在電腦、其它可程式化數據處理裝置或其它設備上執行一系列操作步驟，以産生電腦實現的過程，從而使得在電腦、其它可程式化數據處理裝置、或其它設備上執行的指令實現流程圖和/或方塊圖中的一個或多個方塊中規定的功能/動作。Computer-readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment, such that a series of operational steps are performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented processes such that instructions executing on a computer, other programmable data processing apparatus, or other device implement the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

圖式中的流程圖和方塊圖顯示了根據本發明的多個實施例的系統、方法和電腦程式産品的可能實現的體系架構、功能和操作。在這點上，流程圖或方塊圖中的每個方塊可以代表一個模組、程式段或指令的一部分，所述模組、程式段或指令的一部分包含一個或多個用於實現規定的邏輯功能的可執行指令。在有些作爲替換的實現中，方塊中所標注的功能也可以以不同於圖式中所標注的順序發生。例如，兩個連續的方塊實際上可以基本並行地執行，它們有時也可以按相反的順序執行，這依所涉及的功能而定。也要注意的是，方塊圖和/或流程圖中的每個方塊、以及方塊圖和/或流程圖中的方塊的組合，可以用執行規定的功能或動作的專用的基於硬體的系統來實現，或者可以用專用硬體與電腦指令的組合來實現。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more logic for implementing the specified logic Executable instructions for the function. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by dedicated hardware-based systems that perform the specified functions or actions. implementation, or may be implemented in a combination of special purpose hardware and computer instructions.

以上已經描述了本發明的各實施例，上述說明是示例性的，並非窮盡性的，並且也不限於所披露的各實施例。在不偏離所說明的各實施例的範圍和精神的情況下，對於本技術領域的普通技術人員來說許多修改和變更都是顯而易見的。本文中所用術語的選擇，旨在最好地解釋各實施例的原理、實際應用或對市場中的技術改進，或者使本技術領域的其它普通技術人員能理解本文披露的各實施例。Various embodiments of the present invention have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

20:視訊處理裝置 21:視訊獲取模組 22:檢測模組 23:生成模組 800:電子設備 802:處理組件 804:記憶體 806:電源組件 808:多媒體組件 810:音訊組件 812:輸入/輸出介面 814:感測器組件 816:通訊組件 820:處理器 1900:電子設備 1922:處理組件 1926:電源組件 1932:記憶體 1950:網路介面 1958:輸入/輸出介面 S11~S13:步驟20: Video processing device 21: Video acquisition module 22: Detection module 23: Generate Mods 800: Electronics 802: Process component 804: memory 806: Power Components 808: Multimedia Components 810: Audio Components 812: Input/Output Interface 814: Sensor Assembly 816: Communication Components 820: Processor 1900: Electronic equipment 1922: Processing components 1926: Power Components 1932: Memory 1950: Web Interface 1958: Input/Output Interface S11~S13: Steps

此處的圖式被併入說明書中並構成本說明書的一部分，這些圖式示出了符合本發明的實施例，並與說明書一起用於說明本發明的技術方案：圖1示出根據本發明一實施例的視訊處理方法的流程圖；圖2示出根據本發明一實施例的視訊處理裝置的方塊圖；圖3示出根據本發明一應用示例的示意圖；圖4示出根據本發明實施例的一種電子設備的方塊圖；及圖5示出根據本發明實施例的一種電子設備的方塊圖。The drawings herein are incorporated into and constitute a part of this specification, and these drawings illustrate embodiments consistent with the present invention, and together with the description, serve to explain the technical solutions of the present invention: 1 shows a flowchart of a video processing method according to an embodiment of the present invention; FIG. 2 shows a block diagram of a video processing apparatus according to an embodiment of the present invention; 3 shows a schematic diagram of an application example according to the present invention; FIG. 4 shows a block diagram of an electronic device according to an embodiment of the present invention; and FIG. 5 shows a block diagram of an electronic device according to an embodiment of the present invention.

S11~S13:步驟S11~S13: Steps

Claims

A video processing method, comprising: acquiring a video, wherein at least some of the video frames in the video include a target object; According to the video, detecting at least one type of learning behavior of the target object in the process of watching the teaching course; In the case where it is detected that the target object performs at least one type of learning behavior, a learning state is generated according to a video frame at least partially including the at least one type of learning behavior and/or the duration of the target object performing the at least one type of learning behavior message.

The method according to claim 1, wherein the learning behavior includes at least one of the following behaviors: performing at least one target gesture, expressing a target emotion, paying attention to the display area of the teaching course, and generating at least one interactive behavior with other objects , absence in at least some of the video frames in the video, eyes closed, and eye contact within the presentation area of the teaching session.

The method according to claim 1 or 2, wherein the detecting at least one type of learning behavior of the target object according to the video includes: performing target object detection on the video to obtain a video frame containing the target object; At least one type of learning behavior detection is performed on the video frame containing the target object.

The method of claim 3, wherein, In the case that the learning behavior includes performing at least one target gesture, performing at least one type of learning behavior detection on the video frame containing the target object includes: Perform detection of at least one target gesture on the video frame containing the target object; in the case where the number of consecutive video frames containing the at least one target gesture is detected exceeds a first threshold, the video frame containing the target gesture is detected. At least one frame is recorded as the gesture start frame; in the video frames after the gesture start frame, when the number of consecutive video frames that do not include the target gesture exceeds the second threshold, the video frames that do not include the target gesture will be At least one frame in the frame is recorded as the gesture end frame; according to the number of the gesture start frame and the gesture end frame, determine the number of times and/or the time that the target object in the video performs at least one target gesture; and / or, In the case that the learning behavior includes expressing the target emotion, performing at least one type of learning behavior detection on the video frame containing the target object includes: Expression detection and/or smile value detection is performed on the video frame containing the target object; when it is detected that the target object in the video frame shows at least one first target expression or the result of the smile value detection exceeds the target smile value, the The detected video frame is used as a first detection frame; in the case that the number of consecutive first detection frames is detected to exceed a third threshold, it is determined that the target object produces the target emotion; and / or, When the learning behavior includes paying attention to the display area of the teaching course, performing at least one type of learning behavior detection on the video frame containing the target object includes: Perform expression detection and face angle detection on the video frame containing the target object; when it is detected that the target object in the video frame shows at least one second target expression and the face angle is within the range of the target face angle, The detected video frame is used as the second detection frame; in the case that the number of consecutive second detection frames is detected to exceed a fourth threshold, it is determined that the target object pays attention to the display area of the teaching course.

The method according to claim 1 or 2, wherein the generating the learning status message according to at least part of the video frames containing the at least one type of learning behavior includes: Acquire a video frame containing at least one type of learning behavior in the video, as a target video frame set; perform face quality detection on at least one video frame in the target video frame set, and detect the video with the face quality greater than the face quality threshold. frame as a target video frame; according to the target video frame, generate the learning status message; and / or, The generation of learning status information according to the duration of the at least one type of learning behavior performed by the target object includes: In the case where it is detected that the time when the target object performs at least one type of learning behavior is not less than a time threshold, record the duration of at least one type of learning behavior; take the duration corresponding to at least one type of learning behavior as the Learning status messages.

The method according to claim 5, wherein the generating the learning status message according to the target video frame includes: Using at least one frame of the target video frames as learning status information; and/or, Identifying the region where the target object is located in at least one frame of the target video frame, and generating the learning status message based on the region where the target object is located.

The method according to claim 1 or 2, wherein the detecting at least one type of learning behavior of the target object according to the video includes: Performing target object detection on the video, obtaining a video frame containing the target object, and using the video frame other than the video frame containing the target object in the video as a video frame in which the target object is not detected; When the number of video frames in which the target object is not detected exceeds a preset number of video frames, detecting the learning behavior includes: not appearing in at least part of the video frames in the video.

The method according to claim 1 or 2, wherein the method further comprises: Rendering a background area in at least part of the video frame in the video, wherein the background area is an area other than the target object in the video frame; and / or, Counting the learning status information of at least one of the target objects to obtain statistical results of at least one of the target objects; and generating learning status statistical data according to the statistical results of at least one of the target objects.

The method according to claim 8, wherein generating the statistical data of the learning state according to the statistical result of at least one of the target objects, comprises: Obtain statistical results of at least one target object included in the category according to the category to which at least one of the target objects belongs, and generate statistical data of the learning state of at least one category, wherein the category to which the target object belongs includes the participation of the target object at least one of the course, the institution in which the target is registered, and the equipment used by the target; and/or, Visually process the statistical result of at least one of the target objects, and generate the statistical data of the learning state of at least one of the target objects.

A video processing device, comprising: a video acquisition module for acquiring video, wherein at least some of the video frames in the video contain a target object; a detection module, configured to detect at least one type of learning behavior of the target object in the process of watching the teaching course according to the video; The generating module is used for, when it is detected that the target object performs at least one type of learning behavior, according to a video frame that at least partially contains the at least one type of learning behavior and/or the target object performs the at least one type of learning behavior. Duration to generate learning status messages.

An electronic device comprising: processor; memory for storing processor-executable instructions; Wherein, the processor is configured to invoke the instructions stored in the memory to execute the method described in any one of request items 1 to 9.

A computer-readable storage medium on which computer program instructions are stored, wherein when the computer program instructions are executed by a processor, the method described in any one of claim 1 to 9 is implemented.