WO2021218194A1 - 数据处理方法及装置、电子设备和存储介质 - Google Patents
数据处理方法及装置、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2021218194A1 WO2021218194A1 PCT/CN2020/137678 CN2020137678W WO2021218194A1 WO 2021218194 A1 WO2021218194 A1 WO 2021218194A1 CN 2020137678 W CN2020137678 W CN 2020137678W WO 2021218194 A1 WO2021218194 A1 WO 2021218194A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- detection
- target object
- gesture
- detection result
- data
- Prior art date
Links
- 238000003860 storage Methods 0.000 title claims abstract description 29
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000001514 detection method Methods 0.000 claims abstract description 908
- 238000000034 method Methods 0.000 claims abstract description 145
- 238000012545 processing Methods 0.000 claims abstract description 54
- 230000008451 emotion Effects 0.000 claims description 125
- 230000014509 gene expression Effects 0.000 claims description 95
- 230000008569 process Effects 0.000 claims description 73
- 230000006399 behavior Effects 0.000 claims description 68
- 238000004891 communication Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 19
- 230000003542 behavioural effect Effects 0.000 claims description 14
- 210000003813 thumb Anatomy 0.000 claims description 5
- 230000000875 corresponding effect Effects 0.000 description 49
- 238000013528 artificial neural network Methods 0.000 description 31
- 230000006870 function Effects 0.000 description 28
- 230000003993 interaction Effects 0.000 description 21
- 230000009471 action Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 19
- 238000011156 evaluation Methods 0.000 description 17
- 230000002452 interceptive effect Effects 0.000 description 16
- 238000012360 testing method Methods 0.000 description 8
- 230000002996 emotional effect Effects 0.000 description 7
- 238000013507 mapping Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000011218 segmentation Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 210000004247 hand Anatomy 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 240000007643 Phytolacca americana Species 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004399 eye closure Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000002650 habitual effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/70—Multimodal biometrics, e.g. combining information from different biometric modalities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Definitions
- the present disclosure relates to the field of computer vision, and in particular to a data processing method and device, electronic equipment, and storage medium.
- the behavior state evaluation of the target object can be widely used in various fields, and the obtained evaluation result can be used to analyze the target object or the behavior of the target object.
- the more accurate the evaluation result the more real and meaningful the corresponding analysis.
- the present disclosure proposes a data processing solution.
- a data processing method including:
- Acquire multimedia data of a target object perform behavioral state detection on the target object in at least one detection dimension according to the multimedia data, and obtain an intermediate detection result of the target object in at least one detection dimension;
- the intermediate detection result in the detection dimension is processed to obtain the target detection result of the target object, wherein the target detection result is used to indicate the behavior state of the target object.
- a data processing device including:
- the obtaining module is used to obtain the multimedia data of the target object; the detection module is used to detect the behavior state of the target object in at least one detection dimension according to the multimedia data, and obtain that the target object is in at least one detection dimension
- the processing module is used to process the intermediate detection result in the at least one detection dimension to obtain the target detection result of the target object, wherein the target detection result is used to represent the target object Behavioral status.
- an electronic device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the above-mentioned data processing method.
- a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the foregoing data processing method is implemented.
- a computer program including computer readable code, and when the computer readable code is executed in an electronic device, a processor in the electronic device executes the method for realizing the above-mentioned data processing method. .
- the intermediate detection result of the target object in at least one detection dimension is obtained, and then The intermediate detection results in at least one dimension are processed to obtain the target detection result used to represent the behavior state of the target object.
- Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present disclosure.
- Fig. 2 shows a block diagram of a data processing device according to an embodiment of the present disclosure.
- Fig. 3 shows a schematic diagram of a target detection result according to an application example of the present disclosure.
- Fig. 4 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- Fig. 5 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present disclosure.
- the method may be applied to a data processing apparatus, which may be a terminal device, a server, or other processing equipment.
- terminal devices can be User Equipment (UE), mobile devices, user terminals, terminals, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistants, PDAs), handheld devices, computing devices, vehicle-mounted devices, and mobile devices.
- UE User Equipment
- PDAs Personal Digital Assistants
- the data processing method can be applied to a cloud server or a local server
- the cloud server can be a public cloud server or a private cloud server, which can be flexibly selected according to actual conditions.
- the data processing method may also be implemented in a manner in which a processor calls computer-readable instructions stored in a memory.
- the data processing method may include:
- Step S11 Acquire multimedia data of the target object.
- Step S12 Perform behavior state detection on the target object in at least one detection dimension according to the multimedia data, and obtain an intermediate detection result of the target object in at least one detection dimension.
- Step S13 Process the intermediate detection result in at least one detection dimension to obtain the target detection result of the target object, where the target detection result is used to indicate the behavior state of the target object.
- the target object can be any object that has behavior state representation or evaluation requirements, and its specific implementation form can be flexibly determined according to the application scenario of the target object's behavior.
- the specific behavior performed by the target object is not limited in the embodiments of the present disclosure.
- the behavior may be a teaching behavior, a management behavior, or a work behavior.
- the realization form of the target object will also change.
- the target object when the behavior is a teaching behavior, the target object can be a teacher; further, teaching Behaviors can also be formal teaching behaviors or simulated teaching behaviors.
- the target audience can be teachers who teach formal lectures, teachers who teach simulated lectures, or teachers who are not on the job and are in the interview stage.
- the behavior in the case that the behavior is a management behavior, the target object may be an object with management functions, such as teaching administrators.
- the target object in the case where the behavior is a work behavior, the target object may be a related work object, such as an educator.
- Subsequent disclosed embodiments all take the target object as the teacher, and the behaviors performed are simulated teaching behaviors (hereinafter referred to as simulated lesson behaviors) as examples.
- simulated lesson behaviors for the target objects and behaviors in other forms of realization, please refer to the subsequent disclosed embodiments Carry on the corresponding expansion, no longer repeat them one by one.
- the multimedia data of the target object may be the data acquired by the target object under the condition of performing the corresponding behavior, and its realization form may be flexibly determined according to the actual situation.
- the multimedia data of the target object may include video data and/or audio data.
- the specific method of obtaining the multimedia data of the target object can be flexibly determined according to the actual situation. For details, please refer to the subsequent disclosed embodiments, which will not be expanded here.
- step S12 may be performed to perform behavior state detection on the target object in at least one detection dimension to obtain an intermediate detection result in at least one detection dimension.
- the status detection can be performed on each detection dimension of the target object in the teaching behavior, such as gestures and emotions in the teaching process. , Eye contact, fluency, speaking rate, pause or volume, etc., which dimensions are specifically included, and the specific implementation forms of detecting behavior states in these dimensions, can be seen in the subsequent public embodiments for details, and will not be expanded here. .
- step S13 may be used to process the intermediate detection result in at least one detection dimension to obtain the target detection result of the target object.
- the number of target detection results is not limited in the embodiments of the present disclosure, and can be flexibly set according to actual needs.
- the target detection result may include an overall detection result, which is used to reflect the overall situation of the target object's behavioral state; in a possible implementation manner, the target detection result may also include an overall detection result.
- the detection result and multiple detailed subdivision results are used to simultaneously reflect the overall situation and details of the target object's behavioral state.
- the target detection result and the method of obtaining the target detection result please refer to the subsequent disclosed embodiments, which will not be expanded here.
- the intermediate detection result of the target object in at least one detection dimension is obtained, and then The intermediate detection results in at least one dimension are processed to obtain the target detection result used to represent the behavior state of the target object.
- multimedia data may only include audio data; in a possible implementation manner, multimedia data may only include video data , Such as silent video, etc.; in a possible implementation, multimedia data can contain both video data and audio data, such as audio video.
- the resolution of the video data is not limited, and can be flexibly selected according to actual conditions, such as 640P, 720P, and 1080P.
- the audio sampling frequency of the audio data is also not limited, and can be flexibly selected, such as 8000 Hz or 16000 Hz.
- the way in which the multimedia data is generated can also be flexibly changed.
- the audio data can be generated by recording the audio of the teacher model lesson process
- the video data can be generated by shooting the teacher model lesson.
- the process is generated by the action of the process. Therefore, in one example, the multimedia data can be generated by video shooting the process of the teacher's model lesson.
- the multimedia data can be obtained by the target object performing a teaching operation according to preset text data, where the preset text data includes at least one instruction mark, and the instruction mark is used to divide and/or label the preset text At least part of the data.
- the preset text data may be text content used by the teacher for teaching or model lesson, such as the verbatim draft of the model lesson, which contains relevant content that the teacher needs to tell in the model lesson.
- the instruction mark may be a mark located in the preset text data and used to divide or label part of the content of the preset text data. The position, specific content, and function of the instruction mark can be flexibly selected according to actual conditions, and is not limited to the following disclosed embodiments.
- the instruction mark may be a marker used to divide the model lesson process to which the verbatim draft belongs, that is, it may be a model lesson. Part of the structure of the verbatim draft of the lesson is marked.
- the specific implementation form of the instruction mark can be flexibly determined according to the process division of the model class.
- the model lesson process can be divided into pre-class warm-up, knowledge teaching, in-class training, and classroom testing.
- the content of the model lesson verbatim draft can be divided into These four stages.
- the implementation form can also be flexibly selected.
- the corresponding stages can be marked by ⁇ start instruction start> ⁇ start instruction end>; ⁇ end instruction start>, ⁇ end instruction end>, etc., respectively, to mark the corresponding stages, thereby realizing the structure division of the word-for-word draft of the model lesson.
- the specific implementation form of the labels such as ⁇ start instruction start> and ⁇ end instruction start> can also be flexibly determined according to the actual situation. For example, some specific words or action descriptions can be used as ⁇ start instruction start> or It is the specific realization form of the annotations such as ⁇ end instruction start>.
- the content of the verbatim draft of the model lesson can be as follows, " ⁇ pre-class warm-up session starts and instructions start>: Next is our pre-class warm-up session. ⁇ Pre-class warm-up session starts and ends> In the middle is a big A section of the course content. ⁇ The pre-class warm-up session ends and the instruction begins>: Okay, let’s move on to the next section. ⁇ The pre-class warm-up session ends and the instruction ends> This is a large section of the course content.
- the corresponding instructions further divide the text content of the knowledge teaching link from the verbatim manuscript of the model lesson.
- the specific instructions mark the corresponding specific vocabulary or action description, which is not limited in the embodiments of the present disclosure, and can be flexibly selected according to actual needs. .
- the teacher performs the model lesson teaching according to the preset text data with instruction marks for the division structure to obtain the corresponding multimedia data, which can make the multimedia data have marks of different stages (such as specific vocabulary or actions). These marks at different stages can be automatically recognized by the data processing device, so that the data processing device can automatically divide the multimedia data according to the corresponding structure.
- the divided multimedia data can be used to obtain the target detection results of each stage of the teacher's model lesson, etc., that is, the target detection results of each stage of the model lesson process can be obtained respectively. Not only can the degree of automation of the data processing process be improved, but also the pertinence and practicality of the target detection results obtained by the data processing can be improved.
- the instruction mark in the case where the preset text data is a verbatim manuscript of a model lesson, can also be used to mark the key content of the verbatim manuscript or the position that needs to be interacted, that is, It can be the knowledge points and interactive annotations of the verbatim draft of the model lesson.
- the specific implementation form of the instruction mark can be flexibly determined according to the location of the key content and the required interaction. For example, when there are multiple key paragraphs or interactive positions, the number of instruction marks can be multiple.
- how to mark the key content and interactive positions in the verbatim manuscript of the model lesson through instruction marks, and the implementation form can also be flexibly selected.
- the specific implementation form of the annotations such as ⁇ emphasis start> and ⁇ need to add interaction> can also be flexibly determined according to the actual situation.
- some specific words or action descriptions can be used as ⁇ emphasis start> or ⁇ Need to add the specific implementation form of interaction> etc.
- the verbatim content of an exemplary model lesson can also be described as follows, "This is part of the lecture content ⁇ emphasis start> students, let me see the pictures, they belong to the situation of the intersection. With the help of the protractor in your hand , Can we see if you have any new discoveries? ⁇ Need to join the interaction> students in blue clothes, for you.
- Teachers use the pre-set text data with instruction marks for marking knowledge points and interactive teaching to obtain the corresponding multimedia data, which can make the multimedia data in some important stages of the model course (such as the teaching stage of important knowledge points). Or the stage that requires interaction) is marked.
- These marks can be automatically recognized by the data processing device, so that the data processing device can automatically recognize the teaching process or the interactive process of important knowledge points in the multimedia data.
- the multimedia data generated by the preset text data with instruction marks can be easily recognized and processed automatically, thereby increasing the degree of automation of the data processing method and improving the final target detection The pertinence and practicality of the results.
- both the realization form and the generation method of multimedia data can have multiple realization forms.
- the realization method of obtaining multimedia data can also be flexibly changed.
- the multimedia data can be pre-recorded multimedia data. In this case, it can be obtained according to the storage location of the multimedia data, such as a uniform resource locator (URL, Uniform Resource Locator) link, etc.
- Multimedia data In a possible implementation, the multimedia data may be data during the recording process, such as live video. In this case, the multimedia data may be obtained according to the live link or address of the multimedia data.
- the multimedia data may include video data and/or audio data. Therefore, as the specific content of the multimedia data is different, the way of obtaining it can also be flexibly changed.
- the multimedia data contains both video data and audio data, and the audio data and video data are integrated, the audio and video integrated data can be directly obtained, and then the audio data can be obtained in a certain way.
- the video data and the audio data are separately obtained from the integrated video data.
- the specific separation method is not limited in the embodiment of the present disclosure, and can be flexibly selected according to actual conditions.
- the multimedia data includes both video data and audio data, and the audio data and the video data are independent of each other, the video data and the audio data can be obtained separately for subsequent detection.
- multimedia data may include multiple stages of the model course process, such as pre-class warm-up, knowledge teaching, in-class training, and classroom detection, etc., and these stages may correspond to specific instruction marks Therefore, in a possible implementation, in the case of acquiring multimedia data, the multimedia data can also be segmented according to specific words or actions in the multimedia data, so as to obtain the required part Multimedia data, for example, in one example, the multimedia data of the pre-class warm-up part of the multimedia data can be obtained by identifying the pre-class warm-up link start instruction and the pre-class warm-up end instruction in the multimedia data, and based on the pre-class warm-up The multimedia data in the warm-up phase can be used to obtain the subsequent target detection results.
- the multimedia data of the pre-class warm-up part of the multimedia data can be obtained by identifying the pre-class warm-up link start instruction and the pre-class warm-up end instruction in the multimedia data, and based on the pre-class warm-up
- the multimedia data in the warm-up phase can be used to obtain
- the multimedia data of each part of the multimedia data by identifying the start and end instructions of multiple phases in the multimedia data, so as to pass step S12 And step S13, to obtain the target detection result of each part in the multimedia data and so on.
- the multimedia data of different stages can also be obtained based on the recording time of each stage of the model lesson process.
- the teacher can record multimedia data through the client.
- the implementation form of the client is not limited in the embodiments of the present disclosure, and it can be a mobile phone, a computer, or other user equipment.
- the client can display different stages of the model lesson in the form of taps on the client interface. Then the teacher can enter this stage by clicking tap, and record the multimedia data of the model lesson at this stage.
- the multimedia data of the model lesson at this stage contains not only the video and audio, but also the time stamp of the recording, so
- the data processing device can determine the model lesson stage corresponding to the multimedia data through the timestamp contained in the multimedia data, so as to obtain the multimedia data of each part of the multimedia data.
- the subsequent disclosed embodiments all illustrate the process of data processing without dividing each stage of the multimedia data as an example. After the multimedia data is divided into multiple parts, the realization of the target detection results of each part is obtained. The manner can be extended with reference to the subsequent disclosed embodiments, and will not be repeated.
- the number of multimedia data obtained for the target object is also not limited, and may be multimedia data corresponding to one target object, or multimedia data corresponding to multiple target objects. That is, the data processing method in the embodiment of the present disclosure can process the multimedia data of only one target object at a time, or can process the multimedia data of multiple target objects in batches at the same time. In the case of multiple multimedia data, in order to distinguish the multimedia data of different target objects, other information may be added to the multimedia data to determine the target object to which the multimedia data belongs.
- multimedia data may also contain identity information, such as teacher ID (teacherID), course ID (model class ID), and group ID (such as The vendor ID of the teacher’s company or school, etc.
- the multimedia data may also contain other related information, such as the multimedia data address (URL link), the structure of the multimedia data (such as the model lesson phase corresponding to the multimedia data mentioned in the above disclosed embodiment, and each Phase start timestamp or end timestamp, etc.) or multimedia detection related information (such as video detection frame rate), etc.
- Subsequent disclosed embodiments take the acquired multimedia data as the multimedia data of a target object as an example. The process of simultaneously acquiring multimedia data of multiple target objects for data processing can be extended with reference to the subsequent disclosed embodiments. No longer.
- step S12 may be used to perform behavioral state detection on the target object in at least one detection dimension, so as to obtain an intermediate detection result of the target object in at least one detection dimension .
- the implementation of step S12 is not limited, and can be flexibly selected according to the actual situation of the multimedia data, and is not limited to the following disclosed embodiments.
- the multimedia data may include video data.
- step S12 may be to perform behavior state detection on the target object based on the video data. Therefore, in a possible implementation manner, step S12 may include:
- Step S1211 Determine the target object in the video data
- Step S1212 Perform behavior state detection on at least one detection dimension of the target object in gestures, emotions, and eye contact, to obtain an intermediate detection result of the target object in at least one detection dimension.
- the method of determining the target object in step S1211 is not limited, and can be flexibly determined according to the actual realization method of the target object.
- the target object can be a teacher object, and the behavior performed can be a model lesson behavior. In this case, it can be implemented through face detection or face tracking. In this way, the teacher who teaches the lesson is determined from the video data to achieve the determination of the target object.
- the human action SDK's face detection and face tracking models can be invoked to determine the target object from the video data.
- step S1212 can be used to detect the behavior state of the target object from at least one of the detection dimensions of gestures, emotions, and target communication. Which detection dimensions are specifically included, and how these detection dimensions are related to each other? The order of detection can be flexibly selected according to the actual situation. For details on how to detect in each detection dimension, please refer to the subsequent disclosed embodiments, which will not be expanded here. In the present disclosure, the subsequent disclosed embodiments are described by taking the simultaneous behavior state detection of video data in the three detection dimensions of gesture, emotion, and eye contact as an example. The remaining implementation manners can refer to the subsequent disclosed embodiments. Flexible expansion, no longer repeat them one by one.
- the multimedia data contains video data
- computer vision processing can be performed according to the video data, so as to realize the target object in the video data.
- Perform behavioral state detection in multiple detection dimensions such as gestures, emotions, and target communication.
- the video data in the multimedia data can be fully and effectively used to detect the target object in multiple different detection dimensions, improve the diversity of the intermediate detection results, and then improve the comprehensiveness and reliability of the subsequent target detection results sex.
- the specific method for obtaining the corresponding intermediate detection results in each detection dimension can be flexibly determined.
- the intermediate detection result of the target object in the gesture detection dimension can be obtained according to the following steps:
- the target object performs at least one target gesture in the gesture detection period, and obtain the gesture detection result of the gesture detection period, where the target gesture includes one or more of holding a hand, raising a hand, and raising a thumb;
- an intermediate detection result of the target object in the gesture detection dimension is obtained.
- the video data can be divided into multiple segments in chronological order, and each segment of the divided video is recorded as a gesture detection period.
- the division method and the length of each gesture detection cycle after division are not limited.
- the video data can be divided according to the same duration.
- the duration of the gesture detection period corresponding to different segments of the video is the same; in a possible implementation manner, it can also be The video data is randomly divided according to different durations.
- the duration of the gesture detection period corresponding to different segments of the video is the same.
- the embodiment of the present disclosure takes the time length of the gesture detection cycle as an example for description.
- the gesture detection cycle can be recorded as one minute, that is, the number of times the target object performs at least one target gesture in each minute is obtained.
- the gesture detection result of the target object every minute and then according to the gesture detection result every minute, get the complete video data, the target object's intermediate detection result in the gesture detection dimension.
- the target gesture can be set, and the teacher can judge it as a valid gesture during the model lesson, such as holding a hand (indicating that a student is asked to answer a question), raising a hand (indicating that a student is prompted to answer a question), or raising a thumb (indicating that a student is asked to answer a question) Like the student's behavior), etc., which specific gestures can be used as target gestures can be flexibly set according to the actual situation.
- the gesture detection result of the gesture detection period is obtained, and then the gesture detection result of the at least one gesture detection period is obtained to obtain the target object in the gesture
- the intermediate detection results in the detection dimension can effectively reflect the degree of physical movement of the teacher in the course of the model lesson and the degree of gesture interaction with the students, while improving the efficiency of the data processing method. , Can also improve the accuracy and reliability of data processing results.
- how the target object performs at least one target gesture in the gesture detection cycle can be flexibly determined according to the actual situation.
- the target object is acquired during the gesture detection cycle.
- the number of at least one target gesture including:
- a fixed detection frame rate can be used to perform gesture detection on video data under the length of each gesture detection cycle.
- the value can be flexibly set according to the actual situation.
- the detection frame rate can be set to 10FPS, that is, gesture detection can be performed on 10 frames of video data per second.
- the number of frames to be detected in the gesture detection period is actually fixed, that is, the video data in the gesture detection period can correspond to a complete frame sequence, and the number of frames contained in the frame sequence can be detected by gesture Determined by the product of the length of the cycle and the detection frame rate.
- gesture detection can be performed directly on the complete frame sequence corresponding to the gesture detection cycle. For example, the number of frames containing the target gesture in the complete frame sequence can be used to determine the number of times the target gesture is executed in the gesture detection cycle. Wait.
- At least one gesture detection frame sequence can be obtained from the complete frame sequence corresponding to the gesture detection period, and then the detection result of each gesture detection frame sequence is obtained respectively. , To determine the number of target gestures in the gesture detection cycle.
- the gesture detection frame sequence may be a plurality of selected frame sequences from the complete frame sequence corresponding to the gesture detection period.
- the specific selection method can be flexibly selected, and is not limited to the following disclosed embodiments.
- multiple gesture detection frame sequences can be obtained by sliding frames.
- the specific process can be: set the length of each gesture detection frame sequence to X, and in the complete frame sequence corresponding to the gesture detection cycle, set The first frame is used as the start frame of the first gesture detection frame sequence, and the Xth frame is used as the end frame of the first gesture detection frame sequence to obtain the first gesture detection frame sequence; then in the complete frame sequence, the first One gesture detection frame sequence slides backward one frame to obtain the second gesture detection frame sequence, that is, the second frame in the complete frame sequence is used as the starting frame of the second gesture detection frame sequence, and the X+1th frame is used as the second The end frame of a gesture detection frame sequence is obtained, and a second gesture detection frame sequence is obtained; and so on, a plurality of gesture detection frame sequences are obtained.
- the number of X can be flexibly selected according to the actual situation, and is not limited in the embodiment of the present disclosure.
- X can be consistent with the detection frame rate, that is, when the detection frame rate is 10 FPS, X can be set to 10. frame.
- the gesture start frame and gesture end frame can be obtained based on the multiple gesture detection frame sequences. As described in the above-mentioned disclosed embodiment, in a possible implementation manner, each can be detected. In the gesture detection frame sequence, gesture detection is performed on each frame to determine the number of frames containing the target gesture. If the number of frames containing the target gesture exceeds the first threshold, it can be considered that there is a target gesture in the current gesture detection frame sequence At this time, at least one frame can be selected from the current gesture detection frame sequence as the gesture start frame.
- the manner of performing gesture detection for each frame can be flexibly selected according to the actual situation.
- a neural network with a gesture detection function can be used to realize the gesture detection of the frame image.
- the implementation of the neural network with gesture detection function can also be flexibly determined.
- the target gesture may include multiple gestures. In a possible implementation, one can recognize multiple at the same time.
- a neural network for target gestures is used to perform gesture detection on each frame of image; in a possible implementation, a corresponding neural network can also be used for gesture detection for each target gesture.
- the human body detection and hand-raising detection model of the Insight SDK can be called to detect the target gesture of raising the hand of the target object.
- the target gesture of raising the hand can also be detected by calling the human action SDK. Gesture detection model to detect other target gestures of the target object, etc.
- it can be separately determined whether the number of frames of each target gesture exceeds the first threshold, and if they all exceed the first threshold, then It can be explained that there are multiple gestures in the current gesture detection frame sequence, and if the number of frames of some types of target gestures exceeds the first threshold, it can indicate that there are some types of target gestures in the current gesture detection frame sequence that exceed the first threshold.
- the number of the first threshold can be flexibly set according to actual conditions and is not limited to the embodiment of the present disclosure. In an example, when the gesture detection frame sequence includes 10 frames, the first threshold can be set to 6.
- the Nth frame containing the target gesture in the gesture detection frame sequence may be used as the gesture start frame, and the time corresponding to the gesture start frame may be recorded as the start time of the gesture interaction.
- the value of N can be flexibly selected. In an example, N can be consistent with the value of the first threshold.
- the gesture detection frame sequence includes 10 frames and the first threshold is set to 6, if the current The gesture detection frame sequence detects that the number of frames containing the target gesture is not less than 6, then the sixth frame containing the target gesture in the current gesture detection frame sequence can be used as the gesture start frame, and the gesture start frame is included in the video data. The time is recorded as the start time of the gesture.
- the time when the gesture ends can be further determined.
- the determination method of the gesture end frame is similar to the gesture start frame.
- gesture detection can be performed separately in the gesture detection frame sequence after the gesture start frame, and if there is something in it. If the number of frames that do not contain the target gesture exceeds the second threshold, it can be considered that there is no target gesture in the gesture detection frame sequence, and at least one frame is selected as the gesture end frame.
- the number of the second threshold can be flexibly determined according to the actual situation, and can be the same as the first threshold or different. In an example, the number of the second threshold may be the same as the number of the first threshold, and both are 6.
- the process of selecting the gesture end frame from the gesture detection frame sequence can refer to the process of selecting the gesture start frame, which will not be repeated here.
- Table 1 shows a gesture detection rule according to an embodiment of the present disclosure.
- the interactive gesture detection period in the table corresponds to the gesture detection frame sequence in the above disclosed embodiment
- the interactive gesture rule threshold corresponds to the first threshold and the second threshold in the above disclosed embodiment
- the interactive gesture start time corresponds to the above disclosed embodiment
- the gesture start frame time and the interactive gesture end time correspond to the gesture end time in the disclosed embodiment. It can be seen from Table 1 that in an example, every 10 frames in the gesture detection cycle can be regarded as a gesture detection frame sequence, so that in each gesture detection frame sequence, gesture detection is performed on each frame to determine the gesture The start frame and the gesture end frame, and then get the number of target gestures in each gesture detection cycle.
- the detection of the number of target gestures can be achieved based on multiple gesture detection frame sequences in the gesture detection cycle, which effectively reduces the influence of inaccurate gesture detection results of individual frames on the gesture detection results, and improves the performance of gesture detection. Accuracy, and then improve the accuracy and reliability of the entire data processing process.
- the gesture detection result corresponding to the gesture detection period can be obtained according to the number of acquisitions.
- the number of executions of the target gesture in the gesture detection cycle can be directly used as the gesture detection result; in a possible implementation manner, the number of executions of the target gesture in the gesture detection cycle can also be A certain rule is mapped into a score, and as a result of gesture detection, the mapping rule is not limited in the embodiment of the present disclosure. Table 2 shows the mapping rule of the gesture detection result according to an embodiment of the present disclosure.
- the interactive gesture scoring period corresponds to the gesture detection period in the above disclosed embodiment. It can be seen from Table 2 that, in an example, in one gesture detection period, a target gesture can be counted as 1 point, so that according to the target The number of gestures determines the score of the gesture detection cycle; if there are more than 10 target gestures in a gesture detection cycle, the gesture detection result of the gesture detection cycle is recorded as 10 points.
- the gesture detection result can be standardized, thereby improving the standardization of the intermediate detection result determined based on the gesture detection result, and facilitating the intermediate detection result of the gesture dimension and other dimensions
- the above intermediate detection results are merged to obtain more intuitive target detection results.
- an intermediate detection result in the gesture detection dimension may be further obtained.
- the manner of obtaining the intermediate detection result according to the gesture detection result can be flexibly determined, and is not limited to the following disclosed embodiments.
- the average value of the gesture detection results of each gesture detection period may be used as the intermediate detection result in the dimension of the gesture detection.
- the intermediate detection result of the target object in the emotion detection dimension can be obtained according to the following steps:
- the expression detection result includes the emotion result determined based on the expression of the target object
- the smile detection result includes the smile of the target object strength
- an intermediate detection result of the target object in the emotion detection dimension is obtained.
- the implementation form of the emotion detection cycle can refer to the implementation form of the gesture detection cycle in the above-mentioned disclosed embodiment, which will not be repeated here.
- the length of the emotion detection cycle can be the same as or different from the length of the gesture detection cycle, and it can be selected flexibly according to the actual situation.
- the emotion detection period can be set to be the same as the gesture detection period, both of which are one minute.
- the expression detection result may be an emotional result determined by performing expression detection on the target object, such as whether the emotion of the target object is happy, calm, or sad.
- the implementation form can be flexibly set, and the acquisition method and implementation form of the expression detection result can refer to the subsequent disclosed embodiments, which will not be expanded here.
- the smile detection result may be a related result determined by performing smile detection on the target object, which may reflect the smile intensity or smile amplitude of the target object.
- the implementation form can be flexibly set, and the method of obtaining the smile detection result and the implementation form can refer to the subsequent disclosed embodiments, which will not be expanded here.
- how to obtain the intermediate detection result of the target object in the emotion detection dimension according to the expression detection result and the smile detection result can be determined according to the actual situation of the expression detection result and the smile detection result, and can also be seen in the subsequent public embodiments for details. .
- smile detection and expression detection can be two independent detections. Although both can be used to indicate the emotional state of the target object, they are implemented from two different perspectives.
- the intermediate detection result which is jointly determined based on the expression detection result and the smile detection result, can more comprehensively and reliably indicate the emotional state of the target object in the emotional detection dimension, thereby improving the comprehensiveness and reliability of the final target detection result.
- obtaining the expression detection result of the target object during the emotion detection period according to the video data may include: performing expression detection on the target object during the emotion detection period, and determining that the target object displays at least one target expression The number of times to obtain the expression detection result; among them, the target expression includes one or more of happy, calm, and others.
- the expression detection result can be obtained based on the number of times that the target object displays different target expressions in the emotion detection cycle.
- the target expression can be flexibly set according to the actual situation.
- the target expression can be set as happy, calm or other, etc., in a possible realization mode, it can also be further adjusted to other
- the expression is refined, such as setting the target expression as happy, calm, sad, or angry.
- the number of target expressions in the emotion detection cycle can be determined by detecting the number of frames containing the target expression.
- the detection frame rate of emotion detection can be the same as the detection frame rate of gesture detection.
- a fixed value, the number of frames that can be detected in each emotion detection cycle is fixed.
- the emotion can be determined according to the number of frames in which each target expression is detected in the emotion detection cycle The number of times the target object displays each target expression in the detection period.
- the emotion detection cycle can also be divided into multiple emotion detection sub-cycles, and in each emotion detection sub-cycle, the target expression with the largest number of frames is detected as the emotion detection sub-cycle Expressions, so as to determine the number of target expressions in the emotion detection cycle based on the expressions of each emotion detection sub-period.
- Table 3 shows an expression detection rule according to an embodiment of the present disclosure.
- the specific method of performing expression detection on each frame is not limited.
- a neural network with expression detection function can be used to realize expression detection for each frame of image, that is, input each frame of image to the expression with expression.
- the neural network of the detection function can output the target expression corresponding to the target object.
- the specific implementation of the neural network with the expression detection function is not limited in the embodiments of the present disclosure, and a suitable neural network can be flexibly selected according to the actual situation.
- the facial expression detection of the target object can be realized by calling the human action SDK's face detection or face attribute models.
- the expression detection results of the emotion detection cycle can be obtained. Specifically, how to convert the times of different target expressions into expression detection results, and the mapping rules can be flexible according to the actual situation. The decision is not limited to the following disclosed embodiments. Table 4 shows the corresponding rules of emotion detection results according to an embodiment of the present disclosure.
- Emoticon scoring rules 10-point scale 1 minute and 60 seconds, corresponding to 60 facial expression detections, respectively assigned values: happy 10 points, calm 5 points,
- the expression score corresponds to the expression detection result in the above disclosed embodiment
- the smile score corresponds to the smile detection result in the above disclosed embodiment.
- Table 4 in an example, different target expressions in the emotion detection cycle can be recorded as different points. For example, happiness can be recorded as 10 points, calm as 5 points, and others as 0 points, etc. Then, the average score of the target expression in the emotion detection period is used as the expression detection result of the emotion detection period.
- the expression detection of the target object in the emotion detection cycle Through the expression detection of the target object in the emotion detection cycle, the number of times that the target object displays at least one target expression is determined, and the expression detection result is obtained.
- the comparison can be obtained based on the multiple different target expressions that the target object appears in the emotion detection cycle.
- Comprehensive and reliable expression detection results which can more accurately reflect the emotion of the target object, and improve the accuracy of the emotion detection results.
- obtaining the smile detection result of the target object during the emotion detection cycle according to the video data may include:
- smile detection In the emotion detection cycle, perform smile detection on the target object according to at least one frame of the video data, and obtain a smile detection result corresponding to at least one frame; according to the smile detection result corresponding to at least one frame, determine that the target object is in the emotion detection cycle Smile test results inside. It can be seen from the above disclosed embodiments that, in a possible implementation, smile detection can be performed on each frame of the video data in the emotion detection cycle during the emotion detection cycle, and then based on some of the frames or each frame. The smile detection results are averaged to obtain the smile detection results of the target object in the emotion detection cycle.
- the emotion detection cycle is divided into multiple emotion detection sub-cycles, and each emotion detection sub-cycle is included, and then based on each emotion detection sub-cycle To get the smile detection result of the emotion detection cycle.
- the manner of dividing the emotion detection period can refer to the above-mentioned disclosed embodiment, which will not be repeated here.
- the method for determining the smile detection result in each emotion detection sub-period can be flexibly determined according to actual conditions. Table 5 shows a smile detection rule according to an embodiment of the present disclosure.
- the emotion detection period can be further divided into multiple emotion detection sub-periods according to seconds.
- each frame in the emotion detection sub-period can be Perform smile detection, and then use the average of the smile detection results of all frames as the smile detection result of the emotion detection sub-period.
- the method of performing smile detection on each frame of image is not limited in the embodiments of the present disclosure.
- the frame image can be passed through a neural network with smile detection function to output the corresponding frame image.
- Smile value The implementation of the neural network with smile detection function is not limited in the embodiments of the present disclosure. Any neural network that can reflect the smile amplitude or intensity of the target object in the image can be used as the implementation of the neural network with smile detection function.
- the manner in which the smile detection result of the emotion detection period is obtained according to the smile detection result of the emotion detection sub-period can also be flexibly determined. It can be seen from Table 4 mentioned in the above disclosed embodiment that, in an example, the average value of 60 smile detection results within one minute can be used as the smile detection result of the emotion detection cycle, that is, the emotion detection cycle can pass the emotion detection cycle. The average value of the smile detection results of the detection sub-period is used to obtain the smile detection result of the emotion detection period.
- the smile detection results of different frames in the emotion detection cycle can be obtained, which can reduce the influence of the inaccurate smile detection results of some frames, so that the smile detection results in the emotion detection cycle are higher.
- the reliability of the target detection results will then improve the reliability and accuracy of the final target detection results.
- the intermediate detection result of the target object in the emotion detection dimension can be further obtained based on the two.
- the emotion detection results of each emotion detection cycle can be obtained based on the expression detection results and/or smile detection results in each emotion detection cycle, and then the emotion detection results of different emotion detection cycles can be obtained. Average, to get the intermediate detection result of the target object in the emotion detection dimension.
- the method of obtaining the emotion detection results of the target object in each emotion detection cycle is not limited.
- the average value of the expression detection results and the smile detection results in the emotion detection cycle can be As the emotion detection result of the cycle; in a possible implementation, the expression detection result and the smile detection result in the emotion detection cycle can also be weighted and averaged to obtain the emotion detection result of the cycle, the expression detection result and the smile detection result
- the weight of the result can be flexibly set according to the actual situation, and is not limited to the following disclosed embodiments, and the sum of the two is only 1.
- the weight of the expression detection result can be set to 1, and the weight of the smile detection result can be set to 0, that is, the expression detection result can be directly used as the emotion detection result of the emotion detection cycle; in an example, the emotion detection result can also be set The weight of the expression detection result is set to 0, and the weight of the smile detection result is set to 1, that is, the smile detection result can be directly used as the emotion detection result of the emotion detection cycle.
- the intermediate detection result of the target object in the eye contact detection dimension can be obtained according to the following steps:
- the face angle of the target object is detected, and the time when the face angle of the target object is within the face angle threshold is determined as the face angle detection result;
- the target object is detected with closed eyes to determine the target
- the time for the subject to perform the closed eye operation is used as the closed eye detection result;
- the face angle detection result and the closed eye detection result determine the length of time that the face angle of the target object is within the face angle threshold and the eye closed operation is not performed; according to Time length to obtain the intermediate detection result of the target object in the eye contact detection dimension.
- the detection of the target object in the eye contact detection dimension can be composed of two parts, namely, face angle detection and closed eye detection.
- the face orientation of the target object can be determined through face angle detection. If the face orientation of the target object is within the face angle threshold, it can be considered that the viewing angle of the target object is within the range of eye contact.
- the specific value of the face angle threshold can be flexibly set according to the actual situation.
- the face angle threshold can be a static value, that is, at any time period in the video data, the face angle threshold is The values do not change; in a possible implementation, the face angle threshold can also be set to a dynamic value, that is, it can be flexibly changed according to the change of the position of the target object in the video data.
- the target object can be further detected with closed eyes to determine whether the target object is in the state of closed eyes, if the viewing angle of the target object is within the range of eye contact, and the target object is in the open eye state. State (ie, non-closed eyes state), it can be considered that the target object is currently performing eye contact actions. Therefore, in a possible implementation manner, it is possible to determine the length of time that the face angle of the target object is within the threshold of the face angle and the eye-closing operation is not performed through face angle detection and closed eye detection. The proportion of time in the video data is used to obtain the intermediate detection result of the target object in the eye contact detection dimension.
- the length of time that the face angle of the target object is within the face angle threshold and the eye closed operation is not performed is determined, and then the intermediate detection result of the target object in the eye contact detection dimension is obtained.
- the process of detecting eye contact it is possible to consider whether the target object is in the direction of eye contact, and whether the eye object has closed eyes when it is in this direction, to comprehensively evaluate the degree of eye contact of the target object , which greatly improves the accuracy of the intermediate detection results in the eye contact detection dimension, and then improves the accuracy of the subsequent target detection results.
- a gaze communication detection period to determine the target object in each In the gaze detection cycle, the length of time that the face angle is within the face angle threshold and the eye closure operation is not performed to obtain the intermediate detection result of at least one gaze communication detection cycle, and then based on the intermediate detection result of at least one gaze communication detection cycle, Obtain the intermediate detection result of the target object in the eye contact detection dimension.
- the realization form of the eye contact detection cycle can refer to the gesture detection cycle and the emotion detection cycle in the above disclosed embodiments, which will not be repeated here.
- the length of the eye contact detection period can be set to one minute.
- the process of performing face angle detection can refer to the process of gesture detection. Therefore, in a possible implementation manner, the process of performing face angle detection during the eye contact detection cycle Can include:
- At least one face angle detection frame sequence of the video data in the eye contact detection period if the number of frames with the face angle within the face angle threshold exceeds the third threshold, the At least one frame in the face angle detection frame sequence is recorded as the face orientation start frame; in the face angle detection frame sequence located after the face orientation start frame, the number of frames whose face angle is outside the face angle threshold exceeds the first frame.
- at least one frame in the face angle detection frame sequence located after the face orientation start frame is recorded as the face orientation end frame; according to the number and time of the face orientation start frame and the face orientation end frame, Obtain the time during which the face angle of the target object is within the face angle threshold in the eye contact period.
- the method for obtaining the face angle detection frame sequence can refer to the gesture detection frame sequence
- the method for determining the face orientation start frame can refer to the gesture start frame
- the method for determining the face orientation end frame can refer to the gesture end frame.
- the third threshold and the fourth threshold may be values flexibly set according to actual conditions, and may be the same as or different from the first threshold and the second threshold, and they may be flexibly set according to actual conditions.
- the detection method of the face angle can be flexibly determined according to the actual situation.
- each frame of image can be input into the neural network with the face angle detection function to realize the face angle detection.
- the implementation form of the neural network for the face angle detection function is not limited in the embodiments of the present disclosure.
- the face detection or face tracking models in the human action SDK can be called to obtain the face detection function.
- Neural network for angle detection. Table 6 shows a face angle detection rule according to an embodiment of the present disclosure.
- the viewing threshold may correspond to the face angle threshold in the above disclosed embodiment
- the viewing detection period may correspond to the face angle detection frame sequence in the above disclosed embodiment
- the viewing rule threshold may correspond to the third threshold and the third threshold in the above disclosed embodiment.
- the fourth threshold, the start time of the viewing event may correspond to the time when the face faces the start frame in the above-mentioned disclosed embodiment, and the end time of the viewing event may correspond to the time when the face faces the end frame in the above-mentioned public embodiment.
- the face angle threshold can include four parameters, namely positive yaw angle, negative yaw angle, positive pitch angle and negative pitch angle. The specific values can be flexibly determined according to the actual situation.
- the face angle in a certain frame of image is detected
- the face angle in the frame can be considered to be within the face angle threshold;
- the detection frame rate of face angle detection can be set to 10FPS, the length of the face angle detection frame sequence It can be set to 10, and the third and fourth thresholds can both be set to 8.
- the eighth frame in which the face angle is within the face angle threshold can be used as the face orientation start frame, and the corresponding time is the start time when the face angle is within the face angle threshold. Similarly, it can be determined that the face angle is within the face angle. The end time within the angle threshold, and then obtain the time range of the face angle within the face angle threshold in the eye contact period.
- the process of performing closed eyes detection can refer to the above-mentioned gesture detection and face angle detection process. Therefore, in a possible implementation manner, the closed eye detection process is performed during the eye contact detection cycle.
- the process of eye detection can include:
- At least one frame in the detection frame sequence is recorded as the closed-eye start frame; in the closed-eye detection frame sequence located after the closed-eye start frame, the number of frames in which neither eyes are closed or only one eye is closed
- the sixth threshold is exceeded, at least one frame in the closed-eye detection frame sequence located after the closed-eye start frame is recorded as the closed-eye end frame; the target is obtained according to the number and time of the closed-eye start frame and the closed-eye end frame The amount of time the subject is in the closed eye state during the eye-to-eye communication cycle.
- the method for obtaining the closed-eye detection frame sequence can all refer to the above-mentioned disclosed embodiments, and details are not described herein again.
- the fifth threshold and the sixth threshold may be values flexibly set according to actual conditions, and may be the same as or different from the above-mentioned thresholds, and can be set flexibly according to actual conditions.
- the method of detecting whether the target object has closed eyes can be flexibly determined according to the actual situation.
- each frame of image can be input into a neural network with closed eye detection function to achieve closed eye detection.
- the implementation form of the neural network for the closed-eye detection function is not limited in the embodiments of the present disclosure. In one example, it can be obtained by calling the face detection or face attribute models in the human action SDK to obtain closed-eye detection. Neural network. Table 7 shows a closed eye detection rule according to an embodiment of the present disclosure.
- the closed eye detection period can correspond to the closed eye detection frame sequence in the above disclosed embodiment
- the closed eye rule threshold can correspond to the fifth threshold and the sixth threshold in the above disclosed embodiment
- the closed eye start time can correspond to the above disclosed embodiment.
- the time of the closed eye start frame in, and the end time of the closed eye event may correspond to the time of the closed eye end frame in the above disclosed embodiment.
- both eyes of the target object can be closed and set to the closed eye state, and the rest of the state can be set to the non-closed eye state;
- the detection frame rate of closed eye detection can be set to 10FPS, with closed eyes
- the length of the detection frame sequence can be set to 10
- the fifth threshold can be set to 6
- the sixth threshold can be set to 8, that is, in a closed-eye detection frame sequence, if the number of frames in the closed-eye state is detected Not less than 6, the first frame in the closed-eye state can be used as the closed-eye start frame, and the corresponding time is the closed-eye start time.
- the first frame in the state of non-closed eyes can be used as the end frame of closed eyes, so that the end time of closed eyes can be determined, and then the target object is closed in the eye contact period.
- the time frame of the eye state if detected The number of frames in the state of non-closed eyes is not less than 8, then the first frame in the state of non-closed eyes can be used as the end frame of closed eyes, so that the end time of closed eyes can be determined, and then the target object is closed in the eye contact period.
- the time frame of the eye state if detected The number of frames in the state of non-closed eyes is not less than 8
- the target object's face in the eye contact cycle is at the face angle
- the time range within the threshold and not in the closed eye that is, the time range during which the target object performs eye contact in the eye contact cycle, and then determines the intermediate detection result of the eye contact cycle.
- the mapping rule can be flexibly set according to the actual situation, and is not limited to the following disclosed embodiments.
- Table 8 shows the rule of the gaze communication detection result according to an embodiment of the present disclosure, where the gaze communication scoring period can correspond to the gaze communication detection period in the above-mentioned disclosed embodiment, and the gaze communication score can correspond to the gaze communication detection dimension in the above-mentioned disclosed embodiment Intermediate test results on the above.
- the multimedia data may include audio data.
- step S12 may also perform behavior state detection on the target object based on the audio data. . Therefore, in a possible implementation manner, step S12 may include:
- Step S1221 segment the audio data according to sentences to obtain at least one audio sub-data
- Step S1222 Perform behavioral state detection in at least one detection dimension of fluency, speaking rate, pause, and volume for at least one audio sub-data, to obtain an intermediate detection result of the target object in at least one detection dimension.
- the implementation of segmenting audio data according to sentences is not limited in the embodiments of the present disclosure, and is not limited to the following disclosed embodiments.
- the audio data can be recognized through an audio data recognition neural network with text recognition in the audio data, so as to obtain the recognition result of each sentence in the audio data, such as each sentence in the audio data, The words contained in each sentence, the start timestamp of each sentence, the time length of each sentence, the start timestamp of the word, and the time length of the word, etc.
- the specific implementation of the audio data recognition neural network can be flexibly determined, and any neural network that can recognize audio data can be used as the implementation of the audio data recognition neural network.
- each of the obtained audio sub-data can respectively correspond to each complete sentence in the audio data.
- the behavior state detection can be performed on part or each of the audio sub-data according to the obtained audio sub-data.
- the detection of audio sub-data can also be performed in different dimensions. For example, one or more of fluency, speech rate, pause, or volume can be detected. The specific selection is Which dimensions can be flexibly determined according to actual conditions, and are not limited in the embodiments of the present disclosure.
- the method for detecting at least one of the fluency, speech rate, pause, and volume of the audio sub-data is not limited.
- multiple neural networks with different functions can be obtained through training, such as fluency detection neural network, speech rate detection neural network, pause detection neural network, volume detection neural network, etc., and the audio sub-data Input to these neural networks can output the corresponding fluency, speech rate, pause and volume detection results.
- the specific implementation form of the foregoing neural networks can be flexibly determined according to actual conditions, and is not limited in the embodiments of the present disclosure.
- the intermediate detection results of each audio sub-data can be weighted and fused according to the proportion of time in each detection dimension, so that the weighted fusion result can be regarded as a complete The intermediate detection results of the audio data in each detection dimension.
- the audio data before the audio data is detected, can also be format-transcoded, so that the subsequent audio data detection process is easier to implement.
- the method of transcoding and the format after transcoding can be flexibly determined according to actual detection requirements.
- the audio data can be transcoded into pcm format (such as uncompressed pcm files or wav files) or Mono format with 16bit sampling bits, etc.
- the video data before detecting the video data, the video data can also be transcoded into a suitable video format.
- At least one audio sub-data is obtained, so that at least one audio sub-data is detected in one or more of the detection dimensions of fluency, speaking rate, pause, and volume.
- the detection process of audio data can be transformed into the detection process of each sub-data in the audio data, which reduces the difficulty of detection and the amount of data to be processed for each detection, thereby improving the detection efficiency and detection accuracy of audio data, and then improving The efficiency and precision of data processing.
- the intermediate detection results obtained can be processed through step S13 to obtain the target object's intermediate detection results.
- Target detection result The implementation of step S13 can be flexibly determined according to actual conditions, and is not limited to the following disclosed embodiments.
- step S13 may include: combining the intermediate detection results of at least one detection dimension according to the preset weight of the detection dimension to obtain the target detection result of the target object.
- multimedia data can be detected in one or more detection dimensions of gesture, emotion, eye communication, fluency, speech rate, pause, and volume.
- the intermediate detection results in these detection dimensions can be fused or combined to obtain the target detection result.
- the process of fusion or merging can be flexibly selected according to the actual situation.
- the intermediate detection of these detection dimensions can be determined according to the preset weight of each detection dimension.
- the results are weighted and averaged to obtain the target detection result of the target object.
- the preset weight value of each detection dimension can be flexibly set according to actual needs.
- the detection dimension that has a greater impact on the state evaluation of the target object can be set to a higher preset weight.
- the preset weights of each detection dimension can also be set to be consistent. In this case, the average value of the intermediate detection results on each detection dimension can be directly used as the target detection result of the target object.
- the final target detection result can be obtained based on the intermediate detection results of each detection dimension. Since the preset weights can be adjusted according to the actual needs of the behavior state average, the target detection results obtained can better reflect the target object The behavioral state has high reliability.
- step S13 may also include:
- Step S131 According to the time of the audio sub-data in the audio data, determine the video sub-data corresponding to the audio sub-data from the video data included in the multimedia data;
- Step S132 According to the preset weight, the intermediate detection result of the audio sub-data in at least one detection dimension is combined with the intermediate detection result of the corresponding video sub-data in at least one detection dimension to obtain at least one audio sub-data or The target detection result of the video sub-data;
- Step S133 Combine the target detection results of at least one audio sub-data or video sub-data to obtain a target detection result of the target object.
- multimedia data can include both video data and audio data.
- there can be a one-to-one correspondence between the video data and the audio data that is, the two can be Separate separately from data containing audio.
- multiple audio sub-data can be obtained by segmenting the audio data according to the sentence, and the audio sub-data can be detected in fluency, speech rate, pause, and volume. The intermediate detection result of the dimension.
- the video data can be segmented according to the segmentation method of the audio sub-data in the audio data, so as to obtain multiple video sub-data.
- the video data corresponds to each other, and the splitting method is the same. Therefore, the obtained video sub-data corresponds to the audio sub-data in a one-to-one correspondence. Since the video data can be detected in the behavior state through any of the above disclosed embodiments, intermediate detection results in multiple detection dimensions can be obtained. Further, by mapping these intermediate detection results to each video sub-data in a segmented manner, you can obtain The intermediate detection result of each video sub-data in at least one detection dimension.
- the intermediate results of each dimension of the video sub-data and the intermediate results of each dimension of the audio sub-data can be compared. Merging is performed to obtain the target detection result of each audio sub-data. Since the audio sub-data corresponds to the video sub-data, the target detection result may also be the target detection result of each video sub-data.
- the method of merging can refer to the above disclosed embodiments, and details are not described herein again.
- the target detection of different audio sub-data or video sub-data can be merged again according to the inverse method of the audio sub-data or video sub-data segmentation method. As a result, the overall target detection result of the target object is obtained.
- the target detection result of the target object under each sentence can also be obtained, so as to better reflect the behavior state of the target object and improve the reference value of the target detection result. And the scope of use.
- Fig. 2 shows a block diagram of a data processing device according to an embodiment of the present disclosure.
- the data processing device 20 may include:
- the acquiring module 21 is used to acquire the multimedia data of the target object.
- the detection module 22 is configured to detect the behavior state of the target object in at least one detection dimension according to the multimedia data, and obtain an intermediate detection result of the target object in at least one detection dimension.
- the processing module 23 is configured to process the intermediate detection result in the at least one detection dimension to obtain the target detection result of the target object, wherein the target detection result is used to indicate the behavior state of the target object.
- the multimedia data includes video data; the detection module is used to: determine a target object in the video data; perform at least one detection of gestures, emotions, and eye contact on the target object
- the behavior state detection in the dimension obtains an intermediate detection result of the target object in the at least one detection dimension.
- the at least one detection dimension includes a gesture detection dimension; the detection module is further configured to: according to the video data, obtain the target object performing at least one target gesture in the gesture detection cycle The number of times to obtain the gesture detection result of the gesture detection period, wherein the target gesture includes one or more of holding a hand, raising a hand, and raising a thumb; according to the gesture detection result of at least one of the gesture detection period To obtain the intermediate detection result of the target object in the gesture detection dimension.
- the detection module is further configured to: obtain at least one gesture detection frame sequence of the video data in the gesture detection period; in the gesture detection frame sequence, the target is included When the number of gesture frames exceeds the first threshold, at least one frame in the gesture detection frame sequence is recorded as a gesture start frame; in the gesture detection frame sequence located after the gesture start frame, the gesture detection frame sequence does not include the When the number of frames of the target gesture exceeds the second threshold, at least one frame in the gesture detection frame sequence located after the gesture start frame is recorded as the gesture end frame; according to the gesture start frame and the gesture end frame The number of times that the target object performs at least one target gesture in the gesture detection period is obtained.
- the at least one detection dimension includes an emotion detection dimension
- the detection module is further configured to: obtain, according to the video data, the expression detection result of the target object in the emotion detection cycle and/ Or smile detection result, wherein the expression detection result includes an emotional result determined based on the expression of the target object, the smile detection result includes the smile intensity of the target object; According to the expression detection result and/or smile detection result of the target object, an intermediate detection result of the target object in the dimension of emotion detection is obtained.
- the detection module is further configured to: perform expression detection on the target object during the emotion detection period, determine the number of times the target object displays at least one target expression, and obtain the Expression detection result; wherein, the target expression includes one or more of happy, calm, and others.
- the detection module is further configured to: in the emotion detection period, perform smile detection on the target object according to at least one frame of the video data to obtain at least one frame corresponding to According to the smile detection result corresponding to at least one frame, determine the smile detection result of the target object in the emotion detection period.
- the at least one detection dimension includes a gaze communication detection dimension; the detection module is further configured to: perform face angle detection on the target object according to the video data to determine the target The time when the face angle of the subject is within the face angle threshold is used as the face angle detection result; according to the video data, the closed eye detection is performed on the target object, and the time for the target object to perform the closed eye operation is determined as Closed eyes detection result; according to the face angle detection result and the closed eyes detection result, determine the length of time that the face angle of the target object is within the face angle threshold and the eye closed operation is not performed; according to the time Length, obtain the intermediate detection result of the target object in the gaze communication detection dimension.
- the multimedia data includes audio data; the detection module is configured to: segment the audio data according to sentences to obtain at least one audio sub-data; and compare the at least one audio sub-data , Performing behavioral state detection in at least one detection dimension of fluency, speaking rate, pause, and volume, to obtain an intermediate detection result of the target object in at least one detection dimension.
- the processing module is configured to combine the intermediate detection results of at least one detection dimension according to the preset weight of the detection dimension to obtain the target detection result of the target object.
- the processing module is configured to: according to the time of the audio sub-data in the audio data, determine from the video data included in the multimedia data the corresponding to the audio sub-data Video sub-data; according to preset weights, the intermediate detection results of the audio sub-data in at least one detection dimension are combined with the corresponding intermediate detection results of the video sub-data in at least one detection dimension to obtain at least one The target detection result of the audio sub-data or the video sub-data; and the target detection result of at least one of the audio sub-data or the video sub-data is combined to obtain the target detection result of the target object.
- the multimedia data is obtained by the target object performing a teaching operation according to preset text data, wherein the preset text data includes at least one instruction mark, and the instruction mark is used to divide And/or mark at least part of the content of the preset text data.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the teacher model class that is, the teacher simulates the class, which can be that several teachers face to face in the offline scene, respectively carry out the simulated class and give each other evaluation.
- model lessons can also be converted into online model lessons, that is, teachers can record or live broadcast the simulated lesson process through terminal devices (such as mobile phones, computers, etc.).
- Model lessons can help teachers rehearse the process of formal class, and the evaluation of model lessons has high guiding value for teachers' teaching work. Therefore, a highly reliable model lesson method can be effectively applied to the teacher's online model lesson process, and play a better role in assisting teachers' formal teaching.
- the application example of the present disclosure proposes a set of teacher model lesson system, which can realize the effective evaluation of the teacher model lesson behavior state through the data processing method proposed in the above disclosed embodiment.
- the teacher model lesson system proposed in the embodiments of the present disclosure may include two parts: a client (such as a mobile phone, computer, user equipment, etc.) and a server (such as a local server or a cloud server, etc.).
- the teacher can perform a model lesson on the client.
- the process is recorded or live broadcast, and the result of the recording or live broadcast is uploaded to the server as multimedia data.
- the server can receive the multimedia data uploaded by the client, and process the multimedia data through the data processing methods of the above disclosed embodiments, Thereby, the target detection result of the target object is obtained.
- the process of the teacher's model lesson on the client can include:
- the client can display the four parts of the model lesson process through the display interface, which are: warm-up before class, knowledge lecture, in-class training, and class detection.
- Each part corresponds to a tap in the display interface, and teachers can enter this part by clicking tap.
- the server can collect the timestamp of each tap of the teacher click, so as to map the multimedia data recorded by the teacher to one or more of the four parts.
- the teacher can conduct a model lesson based on the existing verbatim manuscript (that is, the preset text data in the above disclosed embodiment).
- the verbatim manuscript can be a txt format text file
- the verbatim manuscript can contain instruction marks, which can mark the verbatim manuscript in a structured way, as well as knowledge points and interactive annotations, so that the verbatim manuscript can be divided into the above four parts , And in each part, prompt the teacher to interact in the appropriate position, including voice content and interactive gestures.
- the structured annotation can divide the verbatim manuscript according to different parts of the model lesson.
- the verbatim manuscript can be marked with specific start and end instructions to mark 4 parts (pre-class warm-up, knowledge teaching, In-class training, classroom testing) start and end links.
- the specific implementation form of the instruction mark of the structure label can refer to the above disclosed embodiment.
- Knowledge points and interactive annotations can mark the model lesson knowledge points and interactive positions in the verbatim manuscript.
- you can use ⁇ emphasis start> and ⁇ emphasis end> to mark key content separately, so as to facilitate the detection of the model lesson process The midpoint of the paragraph. If interaction is required in the course of the model lesson, use the ⁇ need to add interaction> mark in the corresponding teaching content of the verbatim manuscript.
- the specific implementation form of the instruction mark of the knowledge point and the interactive mark can refer to the above-mentioned disclosed embodiment.
- the teacher uses verbatim manuscripts to conduct a model lesson
- he can record the model lesson process through the client, so as to obtain the teacher's multimedia data and upload it to the server.
- the process for the server to perform data processing on the multimedia data uploaded by the client may include:
- the server obtains the multimedia data to be processed by initiating a request.
- the request initiated by the server may include the URL link of the multimedia data (such as MP4 file), vendorID, teacherID, model lesson ID, multimedia data structure (that is, the multimedia data according to the The model lesson process is divided into parts, as well as the start timestamp and end timestamp of each part, video detection frame rate, etc.
- the multimedia data includes video data
- the resolution of the video may include multiple formats, such as 640p, 720p, or 1080p.
- the audio data It can include multiple audio sampling rates, such as 8000Hz or 16000Hz.
- the server can also obtain multimedia data (ie, video and audio data) in real time.
- Multimedia data preprocessing (such as video transcoding or audio transcoding, etc.):
- the server can separate the video stream and the audio stream from the obtained multimedia data, and respectively transcode them into formats supported by video detection, voice recognition or voice evaluation.
- the separated audio stream can be converted into pcm (uncompressed pcm file or wav file) or a mono format with 16 bit sampling bits.
- the human action SDK's face detection, face tracking, face attribute and gesture detection models, and the insight SDK's human detection and hand-raising detection models can be called to perform multi-dimensional detection on video data.
- the multi-dimensional detection of video data may include gesture detection, emotion detection, and eye contact detection.
- gesture detection can reflect the degree of interaction of the teacher's model lesson.
- Gesture detection can support the detection of three kinds of gestures, namely: holding hands (invite a student to answer questions), raising hands (prompting students to answer questions), and raising thumbs (like).
- the detection method can use the neural network of gesture detection. Perform detection so that the number of each gesture and the timestamp of each gesture detection can be output.
- the specific implementation of the gesture detection can refer to the above-mentioned public embodiments, and the rules for obtaining the intermediate detection result in the gesture detection dimension can refer to Table 1 and Table 2 in the above-mentioned public embodiment, which will not be repeated here.
- Emotion detection can reflect the affinity of the teacher's model lesson, which can include two aspects, namely expression detection and smile detection.
- expression detection can be detected by the neural network of expression detection.
- the expression detection result is output according to the emotion detection cycle (tentatively set as minutes).
- An exemplary rule can be: in the emotion detection cycle The expression with the most detection times can be used as the expression detection result of the emotion detection period.
- smile detection can output smile detection results according to the emotion detection cycle (tentatively set to minutes) based on the single frame detection result.
- An exemplary rule can be: the arithmetic of all single frame smile detection results in the emotion detection cycle The average value can be used as the smile detection result of the emotion detection cycle.
- the eye contact detection can reflect the situation of the teacher's eye contact with the students in the course of the model lesson. It can include two aspects, namely, face angle detection (headpose orientation) and closed eyes detection. Among them, eye contact detection can be defined as an eye contact event, face angle detection can be defined as a viewing event, and closed eye detection can be defined as a closed eye event.
- the eye contact event can be the intersection of a viewing event and a non-closed eye event.
- the start time of the eye contact event can be set as the initial time within the time range of the viewing event and not within the time range of the closed eye event
- the end time of the eye contact event can be set as the viewing event The end time or the start time of the closed eye event.
- the relevant recognition model of speech recognition can be called, and the audio data can be input to obtain the speech recognition result in real time, including the sentence in the audio data, the word in the sentence, and the start time of each sentence and each word Poke and duration.
- the sentence audio can be segmented based on the starting timestamp and duration of each sentence of the speech recognition result, and the detection result of the sentence audio can be obtained and returned, including: fluency, speech rate, pause, and volume Wait.
- audio recognition and audio detection can reflect the intermediate detection results in the dimensions of fluency, speaking speed and volume during the teacher's model lesson.
- audio detection can support Chinese speech recognition for the evaluation of non-English subject courses; it can also support speech recognition of mixed reading of Chinese and English for evaluation of English courses.
- audio recognition can call the neural network model related to speech recognition, and return the recognition result in real time.
- the recognition result is divided into sentences and words in the sentence.
- audio detection the sentence returned by the speech recognition can be detected to obtain the detection results of the above dimensions. Further, audio detection for paragraphs can also be added.
- the target detection result can include the overall target detection result and the subdivided target detection result.
- the overall target detection result can include: interaction, fluency, speech rate, and volume.
- the interaction can be further divided into gesture interaction, emotional interaction, and eye contact interaction.
- 3 shows a schematic diagram of a target detection result according to an application example of the present disclosure. It can be seen from the figure that the overall target detection result can include the overall score calculated based on the intermediate detection results of each dimension, and the intermediate score of each dimension. Scoring of test results, etc. It should be noted that FIG. 3 is only an exemplary schematic diagram showing the target detection result. In the actual application process, the target detection result can be visually displayed in any form according to actual needs.
- the segmentation target detection result may be the output detection result of each sentence based on speech recognition.
- the segmentation target detection result may include: sentence ID, sentence text, sentence start timestamp, sentence duration, sentence fluency Degree, sentence speed, sentence volume, sentence gestures (multiple gestures supported), sentence expressions, sentence smile value, etc.
- the system proposed in the application examples of the present disclosure can not only be applied to teacher model lesson analysis, but also be applied to other related fields, such as teacher formal teaching analysis, or trial lecture evaluation of teacher candidates.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the foregoing method when executed by a processor.
- the computer-readable storage medium may be a volatile computer-readable storage medium or a non-volatile computer-readable storage medium.
- An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured as the above-mentioned method.
- the embodiment of the present disclosure also provides a computer program, including computer readable code, when the computer readable code is executed in an electronic device, the processor in the electronic device is executed to implement the above method.
- the above-mentioned memory may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, hard disk drive (Hard Disk Drive) , HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
- volatile memory such as RAM
- non-volatile memory such as ROM, flash memory, hard disk drive (Hard Disk Drive) , HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
- the foregoing processor may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It is understandable that, for different devices, the electronic device used to implement the above-mentioned processor function may also be other, and the embodiment of the present disclosure does not specifically limit it.
- the electronic device can be provided as a terminal, server or other form of device.
- the embodiment of the present disclosure also provides a computer program, which implements the foregoing method when the computer program is executed by a processor.
- FIG. 4 is a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
- the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
- the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
- the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
- the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
- the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
- the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
- the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
- the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable and Programmable read only memory
- PROM programmable read only memory
- ROM read only memory
- magnetic memory flash memory
- flash memory magnetic or optical disk.
- the power supply component 806 provides power for various components of the electronic device 800.
- the power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.
- the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
- the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
- the audio component 810 is configured to output and/or input audio signals.
- the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
- the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
- the audio component 810 further includes a speaker for outputting audio signals.
- the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
- the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
- the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
- the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
- the component is the display and the keypad of the electronic device 800.
- the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
- the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
- the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
- the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
- the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, or 5G, or a combination thereof.
- the communication component 816 receives a broadcast signal or broadcast related personnel information from an external broadcast management system via a broadcast channel.
- the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- ASIC application-specific integrated circuits
- DSP digital signal processors
- DSPD digital signal processing devices
- PLD programmable logic devices
- FPGA field-available A programmable gate array
- controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
- FIG. 5 is a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
- the electronic device 1900 may be provided as a server. 5
- the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932, for storing instructions executable by the processing component 1922, such as application programs.
- the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
- the processing component 1922 is configured to execute instructions to perform the above-described methods.
- the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to the network, and an input output (I/O) interface 1958 .
- the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
- the present disclosure may be a system, method and/or computer program product.
- the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
- the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- flash memory flash memory
- SRAM static random access memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disk
- memory stick floppy disk
- mechanical encoding device such as a printer with instructions stored thereon
- the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
- the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
- the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
- Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
- Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server implement.
- the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connect).
- LAN local area network
- WAN wide area network
- an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), is personalized by using status personnel information of computer-readable program instructions.
- FPGA field programmable gate array
- PDA programmable logic array
- the computer-readable program instructions can be executed to implement various aspects of the present disclosure.
- These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
- each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
- Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Image Analysis (AREA)
- Television Signal Processing For Recording (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
规则 | 默认值 | 备注 |
情绪评分周期 | 1分钟 | |
情绪得分 | 10分制 | 情绪得分为表情得分和微笑得分的平均。 |
表情得分规则 | 10分制 | 1分钟60秒,对应60个表情检测,分别赋值:高兴10分,平静5分, |
其他0分。所有60个分数的平均分为该分钟亲和力得分。 | ||
微笑得分规则 | 10分制 | 所有60个分数的平均分为该分钟微笑得分。 |
Claims (16)
- 一种数据处理方法,其特征在于,包括:获取目标对象的多媒体数据;根据所述多媒体数据,对所述目标对象在至少一个检测维度上进行行为状态检测,得到所述目标对象在所述至少一个检测维度上的中间检测结果;对所述至少一个检测维度上的中间检测结果进行处理,得到所述目标对象的目标检测结果,其中,所述目标检测结果用于表示所述目标对象的行为状态。
- 根据权利要求1所述的方法,其特征在于,所述多媒体数据包括视频数据;所述根据所述多媒体数据,对所述目标对象在至少一个检测维度上进行行为状态检测,得到所述目标对象在至少一个检测维度上的中间检测结果,包括:确定所述视频数据中的目标对象;对所述目标对象进行手势、情绪以及目光交流中至少一个检测维度上的行为状态检测,得到所述目标对象在所述至少一个检测维度上的中间检测结果。
- 根据权利要求2所述的方法,其特征在于,所述至少一个检测维度包括手势检测维度;根据以下步骤得到所述目标对象在所述手势检测维度上的中间检测结果:根据所述视频数据,获取所述目标对象在手势检测周期内执行至少一个目标手势的次数,得到所述手势检测周期的手势检测结果,其中,所述目标手势包括托手、举手以及举大拇指中的一个或多个;根据至少一个所述手势检测周期的所述手势检测结果,得到所述目标对象在手势检测维度上的中间检测结果。
- 根据权利要求3所述的方法,其特征在于,所述根据所述视频数据,获取所述目标对象在手势检测周期内执行至少一个目标手势的次数,包括:获取所述视频数据在所述手势检测周期内的至少一个手势检测帧序列;在所述手势检测帧序列中,包含所述目标手势的帧的数量超过第一阈值的情况下,将所述手势检测帧序列中的至少一帧记录为手势开始帧;在位于所述手势开始帧以后的手势检测帧序列中,不包含所述目标手势的帧的数量超过第二阈值的情况下,将位于所述手势开始帧以后的手势检测帧序列中的至少一帧记录为手势结束帧;根据所述手势开始帧以及所述手势结束帧的数量,得到所述目标对象在手势检测周期内执行至少一个目标手势的次数。
- 根据权利要求2至4中任意一项所述的方法,其特征在于,所述至少一个检测维度包括情绪检测维度;根据以下步骤得到目标对象在所述情绪检测维度上的中间检测结果:根据所述视频数据,获取所述目标对象在情绪检测周期内的表情检测结果和/或微笑检测结果,其中,所述表情检测结果包括基于所述目标对象的表情所确定的情绪结果,所述微笑检测结果包括所述目标对象的微笑强度;根据至少一个所述情绪检测周期中所述目标对象的表情检测结果和/或微笑检测结果,得到所述目标对象在情绪检测维度上的中间检测结果。
- 根据权利要求5所述的方法,其特征在于,所述根据所述视频数据,获取所述目标对象在情绪检测周期内的表情检测结果,包括:在所述情绪检测周期内,对所述目标对象进行表情检测,确定所述目标对象展示至少一个目标表情的次数,得到所述表情检测结果;其中,所述目标表情包括高兴、平静以及其他中的一个或多个。
- 根据权利要求5或6所述的方法,其特征在于,所述根据所述视频数据,获取所述目标对象在情绪检测周期内的微笑检测结果,包括:在所述情绪检测周期内,根据所述视频数据的至少一帧,对所述目标对象进行微笑检测,得到与至少一帧对应的微笑检测结果;根据所述与至少一帧对应的微笑检测结果,确定所述目标对象在所述情绪检测周期内的微笑检测结果。
- 根据权利要求3至7中任意一项所述的方法,其特征在于,所述至少一个检测维度包括目光交流检测维度;根据以下步骤得到所述目标对象在所述目光交流检测维度上的中间检测结果:根据所述视频数据,对所述目标对象进行人脸角度检测,确定所述目标对象的人脸角度在人脸角度阈值内的时间,作为人脸角度检测结果;根据所述视频数据,对所述目标对象进行闭眼检测,确定所述目标对象执行闭眼操作的时间,作为闭眼检测结果;根据所述人脸角度检测结果与所述闭眼检测结果,确定所述目标对象的人脸角度在人脸角度阈值内且未执行闭眼操作的时间长度;根据所述时间长度,得到所述目标对象在所述目光交流检测维度上的中间检测结果。
- 根据权利要求2至8中任意一项所述的方法,其特征在于,所述多媒体数据包括音频数据;所述根据所述多媒体数据,对所述目标对象在至少一个检测维度上进行行为状态检测,得到所述目标对象在至少一个检测维度上的中间检测结果,包括:对所述音频数据按照语句进行切分,得到至少一个音频子数据;对所述至少一个音频子数据,进行流利度、语速、停顿以及音量中至少一个检测维度上的行为状态检测,得到所述目标对象在至少一个检测维度上的中间检测结果。
- 根据权利要求1至9中任意一项所述的方法,其特征在于,所述对所述至少一个检测维度上的中间检测结果进行处理,得到所述目标对象的目标检测结果,包括:按照所述检测维度的预设权重,对至少一个检测维度的所述中间检测结果进行合并,得到所述目标对象的目标检测结果。
- 根据权利要求9所述的方法,其特征在于,所述对所述至少一个检测维度上的中间检测结果进行处理,得到所述目标对象的目标检测结果,包括:根据所述音频子数据在所述音频数据中的时间,从所述多媒体数据包括的视频数据中确定与所述音频子数据对应的视频子数据;根据预设权重,对所述音频子数据在至少一个检测维度上的中间检测结果,与对应的所述视频子数据在至少一个检测维度上的中间检测结果进行合并,得到至少一个所述音频子数据或所述视频子数据的目标检测结果;对至少一个所述音频子数据或所述视频子数据的目标检测结果进行合并,得到所述目标对象的目标检测结果。
- 根据权利要求1至11中任意一项所述的方法,其特征在于,所述多媒体数据通过所述目标对象根据预设文本数据进行教学操作所获得,其中,所述预设文本数据包括至少一个指令标记,所述指令标记用于划分和/或标注所述预设文本数据的至少部分内容。
- 一种数据处理装置,其特征在于,包括:获取模块,用于获取目标对象的多媒体数据;检测模块,用于根据所述多媒体数据,对所述目标对象在至少一个检测维度上进行行为状态检测,得到所述目标对象在至少一个检测维度上的中间检测结果;处理模块,用于对所述至少一个检测维度上的中间检测结果进行处理,得到所述目标对象的目标检测结果,其中,所述目标检测结果用于表示所述目标对象的行为状态。
- 一种电子设备,其特征在于,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至12中任意一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至12中任意一项所述的方法。
- 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1-12中的任一权利要求所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217024179A KR20210134614A (ko) | 2020-04-26 | 2020-12-18 | 데이터 처리 방법 및 장치, 전자 기기 및 저장 매체 |
JP2021544171A JP2022534345A (ja) | 2020-04-26 | 2020-12-18 | データ処理方法及び装置、電子機器並びに記憶媒体 |
SG11202109528SA SG11202109528SA (en) | 2020-04-26 | 2020-12-18 | Data processing method and apparatus, electronic device and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010339381.1A CN111539339A (zh) | 2020-04-26 | 2020-04-26 | 数据处理方法及装置、电子设备和存储介质 |
CN202010339381.1 | 2020-04-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021218194A1 true WO2021218194A1 (zh) | 2021-11-04 |
Family
ID=71967577
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/137678 WO2021218194A1 (zh) | 2020-04-26 | 2020-12-18 | 数据处理方法及装置、电子设备和存储介质 |
Country Status (6)
Country | Link |
---|---|
JP (1) | JP2022534345A (zh) |
KR (1) | KR20210134614A (zh) |
CN (1) | CN111539339A (zh) |
SG (1) | SG11202109528SA (zh) |
TW (1) | TW202141240A (zh) |
WO (1) | WO2021218194A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115953715A (zh) * | 2022-12-22 | 2023-04-11 | 北京字跳网络技术有限公司 | 一种视频检测方法、装置、设备及存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539339A (zh) * | 2020-04-26 | 2020-08-14 | 北京市商汤科技开发有限公司 | 数据处理方法及装置、电子设备和存储介质 |
CN112883782B (zh) * | 2021-01-12 | 2023-03-24 | 上海肯汀通讯科技有限公司 | 投放行为识别方法、装置、设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7512537B2 (en) * | 2005-03-22 | 2009-03-31 | Microsoft Corporation | NLP tool to dynamically create movies/animated scenes |
CN102523502A (zh) * | 2011-12-15 | 2012-06-27 | 四川长虹电器股份有限公司 | 智能电视交互系统及交互方法 |
CN110598632A (zh) * | 2019-09-12 | 2019-12-20 | 深圳市商汤科技有限公司 | 目标对象的监测方法及装置、电子设备和存储介质 |
CN111046819A (zh) * | 2019-12-18 | 2020-04-21 | 浙江大华技术股份有限公司 | 一种行为识别处理方法及装置 |
CN111539339A (zh) * | 2020-04-26 | 2020-08-14 | 北京市商汤科技开发有限公司 | 数据处理方法及装置、电子设备和存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101731461B1 (ko) * | 2015-12-09 | 2017-05-11 | 고려대학교 산학협력단 | 객체에 대한 행동 탐지 장치 및 이를 이용한 행동 탐지 방법 |
US20180218308A1 (en) * | 2017-01-31 | 2018-08-02 | International Business Machines Corporation | Modeling employee productivity based on speech and ambient noise monitoring |
CN109766770A (zh) * | 2018-12-18 | 2019-05-17 | 深圳壹账通智能科技有限公司 | 服务质量评价方法、装置、计算机设备和存储介质 |
CN110378228A (zh) * | 2019-06-17 | 2019-10-25 | 深圳壹账通智能科技有限公司 | 面审视频数据处理方法、装置、计算机设备和存储介质 |
CN110443487A (zh) * | 2019-07-31 | 2019-11-12 | 浙江工商职业技术学院 | 一种教学评价方法及设备 |
CN110968239B (zh) * | 2019-11-28 | 2022-04-05 | 北京市商汤科技开发有限公司 | 一种展示对象的控制方法、装置、设备及存储介质 |
-
2020
- 2020-04-26 CN CN202010339381.1A patent/CN111539339A/zh active Pending
- 2020-12-18 SG SG11202109528SA patent/SG11202109528SA/en unknown
- 2020-12-18 WO PCT/CN2020/137678 patent/WO2021218194A1/zh active Application Filing
- 2020-12-18 KR KR1020217024179A patent/KR20210134614A/ko unknown
- 2020-12-18 JP JP2021544171A patent/JP2022534345A/ja active Pending
-
2021
- 2021-01-11 TW TW110100963A patent/TW202141240A/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7512537B2 (en) * | 2005-03-22 | 2009-03-31 | Microsoft Corporation | NLP tool to dynamically create movies/animated scenes |
CN102523502A (zh) * | 2011-12-15 | 2012-06-27 | 四川长虹电器股份有限公司 | 智能电视交互系统及交互方法 |
CN110598632A (zh) * | 2019-09-12 | 2019-12-20 | 深圳市商汤科技有限公司 | 目标对象的监测方法及装置、电子设备和存储介质 |
CN111046819A (zh) * | 2019-12-18 | 2020-04-21 | 浙江大华技术股份有限公司 | 一种行为识别处理方法及装置 |
CN111539339A (zh) * | 2020-04-26 | 2020-08-14 | 北京市商汤科技开发有限公司 | 数据处理方法及装置、电子设备和存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115953715A (zh) * | 2022-12-22 | 2023-04-11 | 北京字跳网络技术有限公司 | 一种视频检测方法、装置、设备及存储介质 |
CN115953715B (zh) * | 2022-12-22 | 2024-04-19 | 北京字跳网络技术有限公司 | 一种视频检测方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JP2022534345A (ja) | 2022-07-29 |
SG11202109528SA (en) | 2021-12-30 |
TW202141240A (zh) | 2021-11-01 |
CN111539339A (zh) | 2020-08-14 |
KR20210134614A (ko) | 2021-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021232775A1 (zh) | 视频处理方法及装置、电子设备和存储介质 | |
WO2021218194A1 (zh) | 数据处理方法及装置、电子设备和存储介质 | |
WO2020215966A1 (zh) | 一种远程教学互动方法、服务器、终端以及系统 | |
CN112287844B (zh) | 学情分析方法及装置、电子设备和存储介质 | |
US10395545B2 (en) | Analyzing speech delivery | |
US10614298B2 (en) | Generating auxiliary information for a media presentation | |
CN108875785B (zh) | 基于行为特征对比的关注度检测方法以及装置 | |
RU2615632C2 (ru) | Способ и устройство распознавания коммуникационных сообщений | |
US20190244427A1 (en) | Switching realities for better task efficiency | |
CN109191940B (zh) | 一种基于智能设备的交互方法及智能设备 | |
US20190147760A1 (en) | Cognitive content customization | |
CN109191939B (zh) | 一种基于智能设备的三维投影交互方法及智能设备 | |
CN111833861A (zh) | 基于人工智能的事件评估报告生成 | |
US20230222932A1 (en) | Methods, systems, and media for context-aware estimation of student attention in online learning | |
US20210225185A1 (en) | Method and apparatus for determining key learning content, device and storage medium | |
EP4075411A1 (en) | Device and method for providing interactive audience simulation | |
Nasereddin | MMLSL: modelling mobile learning for sign language | |
CN113591678B (zh) | 课堂注意力确定方法、装置、设备、存储介质及程序产品 | |
CN113391745A (zh) | 网络课程的重点内容处理方法、装置、设备及存储介质 | |
WO2023279699A1 (zh) | 实验生成方法及装置、电子设备、存储介质和程序 | |
Hirt et al. | Measuring emotions during learning: lack of coherence between automated facial emotion recognition and emotional experience | |
WO2023079370A1 (en) | System and method for enhancing quality of a teaching-learning experience | |
CN113409766A (zh) | 一种识别方法、装置、用于识别的装置及语音合成方法 | |
CN111144255B (zh) | 一种教师的非语言行为的分析方法及装置 | |
Sümer et al. | Estimating Presentation Competence using Multimodal Nonverbal Behavioral Cues |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021544171 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20933838 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933838 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 521430564 Country of ref document: SA |