WO2020007097A1 - 数据处理方法、存储介质和电子设备 - Google Patents

数据处理方法、存储介质和电子设备 Download PDF

Info

Publication number
WO2020007097A1
WO2020007097A1 PCT/CN2019/083368 CN2019083368W WO2020007097A1 WO 2020007097 A1 WO2020007097 A1 WO 2020007097A1 CN 2019083368 W CN2019083368 W CN 2019083368W WO 2020007097 A1 WO2020007097 A1 WO 2020007097A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
structured
evaluation
data
video data
Prior art date
Application number
PCT/CN2019/083368
Other languages
English (en)
French (fr)
Inventor
沈亮
张连杰
赵明明
张保福
王正博
Original Assignee
北京大米科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201810718955.9A external-priority patent/CN108898115B/zh
Priority claimed from CN201810759328.XA external-priority patent/CN109063587B/zh
Application filed by 北京大米科技有限公司 filed Critical 北京大米科技有限公司
Publication of WO2020007097A1 publication Critical patent/WO2020007097A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to data processing and machine learning technologies, and in particular, to a data processing method, a storage medium, and an electronic device, and more particularly, to a method for evaluating a learner's online learning effect or online teaching status based on video data and audio data, and Related devices.
  • Knowledge servers or knowledge sharers also known as educators
  • learners can communicate and communicate with learners in real time through the network.
  • understanding and evaluating the learning effect of online teaching it usually depends on the manual evaluation method of the educator and the manual feedback of the learner, or on the test methods such as knowledge point test.
  • the method of knowledge point test has a single evaluation dimension, while the method of manual evaluation and feedback is not objective.
  • understanding and evaluating the instruction of online teaching it usually depends on manual video review or online supervision. However, if the number of online classrooms is relatively large, it will involve a large amount of audio and video data, and manual methods will consume a lot of human resources, and it may not even be completed at all.
  • embodiments of the present invention provide a data processing method, a storage medium, and an electronic device to automatically process video data and audio data recorded online, and to improve the learning effect or online teaching of a learner in an online teaching process. For a more accurate automated assessment of the situation.
  • a data processing method includes:
  • Extracting first structured information from video data is a learner video recorded during online teaching, and the first structured information includes face information and / or action information in the video data;
  • a first evaluation parameter is obtained according to the first structured information and the second structured information.
  • a computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions, when executed by a processor, implement the method according to the first aspect.
  • an electronic device including a memory and a processor, wherein the memory is used to store one or more computer program instructions, wherein the one or more computer program instructions are The processor executes to implement the method as described in the first aspect.
  • the first structured information and the second structured information are respectively extracted from the recorded video data and corresponding audio data to obtain the learner's performance information or the teacher's State, and evaluate the situation of the online teaching based on the obtained performance information of the learner or the state of the teacher to obtain a first evaluation parameter. Therefore, it is possible to quickly and objectively and accurately evaluate the massive online teaching video data and audio data.
  • FIG. 1 is a schematic diagram of an online teaching system to which a data processing method according to an embodiment of the present invention is applicable;
  • FIG. 2 is a schematic diagram of an interface of a client application of an online teaching system according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a data processing method according to the first embodiment of the present invention.
  • FIG. 5 is a data flow diagram of a data processing method according to a second embodiment of the present invention.
  • FIG. 6 is a flowchart of extracting first structured information in combination with courseware operation data according to a second embodiment of the present invention.
  • FIG. 8 is a flowchart of a data processing method according to a third embodiment of the present invention.
  • FIG. 9 is a schematic diagram of an electronic device according to a fourth embodiment of the present invention.
  • FIG. 1 is a schematic diagram of an online teaching system to which a data processing method according to an embodiment of the present invention is applicable.
  • the online teaching system includes a first client 1, a second client 2, and a server 3.
  • the first client 1, the second client 2, and the server 3 are connected through a network communication.
  • the first client 1 and the second client 2 can directly or indirectly establish a communication connection through the server 3 to perform online teaching activities after real-time communication.
  • the first client 1 may be operated by a teacher.
  • the second client 2 may be operated by a learner.
  • the server 3 forms a communication connection with the first client 1 and the second client 2 at the same time, and stores the data exchanged between the two.
  • the first client 1 and the second client 2 can access the server 3 to obtain courseware data for display, thereby implementing online teaching based on the courseware.
  • the content of the courseware displayed by the first client 1 and the second client 2 changes synchronously, so that the teacher and the learner can synchronously communicate based on the same part of the courseware.
  • the first client 1 and the second client 2 may be any general-purpose data processing device running a predetermined computer application program, such as a desktop computer, a portable computer, a tablet computer, a smart phone, or the like.
  • the server 3 is a high-performance data processing device for running a predetermined computer application program.
  • the server 3 may be a server, a server cluster deployed in a distributed manner, or a virtual server cluster deployed in a virtual machine or container manner. It should be understood that in the online teaching system according to the embodiment of the present invention, a large number of first clients 1 establish communication connections with the second client 2 in a one-to-one, one-to-many, or many-to-many manner to communicate.
  • FIG. 2 is a schematic diagram of an interface of a client application of an online teaching system according to an embodiment of the present invention.
  • the client application in this embodiment may display the courseware in the main window 21 of the application interface, and display a real-time image collected by the counterpart image acquisition device in the sub-window 22 of the application interface.
  • a video of the upper body of the counterpart is displayed in the sub-window 22 of the application interface.
  • both communicating parties can see the status of the courseware and the other party at the same time.
  • the content of the courseware displayed in the main window is controlled by the operation of the instructor to switch or display the track.
  • the teacher on the first client 1 may perform page switching (ie, page turning) on the courseware or perform trajectory operations on the content of the courseware.
  • the trajectory operation refers to identifying content or drawing an image through a trajectory on a courseware.
  • the instructor can highlight certain content of the courseware through lines or circles, and can also draw graphics or text by trajectory through handwriting or mouse operation.
  • the server 3 may record the video data of the learner and the video data of the learner.
  • the server 3 can also record audio data of the learner during the entire teaching process and audio data of the learner during the teaching process.
  • the audio data includes corresponding voice information. Therefore, the video data and audio data of the learner recorded by the server 3 can be processed to automatically evaluate the learning effect of the learner's online learning.
  • the classroom performance of the learner can be reflected in two aspects: the learner's facial expression (visual performance) and the process of voice communication with the teacher (audio performance).
  • Video can make learners have a face-to-face communication experience.
  • learners' facial (face) expressions can convey their feedback on the content being explained in various teaching scenarios. For example, if the learner's facial expression is a positive expression such as a smile or concentration, it indicates that the learner has a better learning motivation in the classroom.
  • the online teaching process if the learner performs well in the classroom, it will reflect that he has communicated with the teacher more often, and that the learner's speech time is longer in audio data. Thereby, the performance information of the learner can be extracted based on the video data and audio data.
  • the server 3 can record all courseware operations (including page switching operations and trajectory operations) applied by the teacher on the first client 1 during the teaching process.
  • the server 3 can also record audio data of the learner during the entire teaching process and audio data of the learner during the teaching process.
  • the audio data includes corresponding voice information. Therefore, the video data and audio data of the teacher recorded by the server 3 can be processed to automatically evaluate the situation of the teaching process.
  • learners mainly obtain information from three dimensions, on the one hand, the content on the courseware, on the other hand, the voice information of the lecturer, and the third aspect is the instructor's video.
  • Video can make learners have a face-to-face communication experience.
  • learners can learn the pronunciation skills through the lip of the instructor in the video.
  • the facial expression (face) and movements of the instructor can convey to the learner information that cannot be conveyed by the voice, and can promote the teaching atmosphere and improve the teaching quality.
  • FIG. 3 is a flowchart of a data processing method according to the first embodiment of the present invention.
  • the method of this embodiment is suitable for understanding and evaluating the teaching situation of online teaching.
  • the data processing method in this embodiment includes the following steps:
  • step S100 first structured information is extracted from the video data.
  • the video data is a video of a teacher recorded during an online teaching process.
  • the first structured information includes face information and / or motion information in video data.
  • Video data can be viewed as a time series of images. By performing recognition processing on each frame image or some key frame images, the face image information in the image can be identified. Further, according to the face image information of different images arranged along the time axis, the face information in the video data can be obtained. At the same time, various existing technologies can be used to identify motion information in the video. This embodiment evaluates the visual performance of the teacher during the teaching process by extracting the first structured information from the video data.
  • the first structured information includes face information and motion information.
  • the face information includes at least one of face position information, information representing a detected face, and facial expression classification information.
  • the face detection algorithm can effectively detect whether the face appears in the image and the specific position of the face.
  • Face detection algorithms include, for example, a reference template method, a face rule method, a feature sub-face method, and a sample recognition method.
  • the obtained face position information may be represented by a data structure R (X, Y, W, H) of the face area. Among them, R (X, Y, W, H) defines a rectangular area including the main part of the face in the image, wherein X and Y define the coordinates of an endpoint of the rectangular area, and W and H define the rectangular area, respectively. Width and height.
  • Dlib can be used to perform the above-mentioned face detection and lip keypoint information acquisition.
  • Dlib is a C ++ open source toolkit containing machine learning algorithms.
  • the facial features and contours of a face are identified by 68 key points.
  • the expressions can be identified and classified by using a classifier trained by itself or a classifier provided by a related development function library.
  • expression recognition can be implemented based on the OpenCV library.
  • OpenCV is a cross-platform computer vision library released under the BSD license (open source) and can run on Linux, Windows, Android, and Mac OS operating systems. It consists of a series of C functions and a small number of C ++ classes. It also provides interfaces for languages such as Python, Ruby, MATLAB, etc., and implements many general algorithms in image processing and computer vision.
  • existing technology Design and Implementation of Opencv-based Facial Expression Recognition System", Qin Xuyang, Master's Degree Thesis, Zhengzhou University; 2013
  • an existing commercial expression recognition software interface can also be called to perform expression recognition.
  • Existing image recognition service providers such as Baidu AI and Shangtang Technology, can provide service interfaces for expression recognition.
  • a time series of the above two information corresponding to the video data can be obtained.
  • face information can be obtained through statistics or other means, and further processing and evaluation can be performed.
  • the prior art can also recognize human motion based on video data to obtain human motion information in the video.
  • the motion information may include a limb motion of a human body.
  • the position of the human hand in the video data can be identified through hand recognition, the movement of the human hand can be tracked, and the relevant information of its movement trajectory is used as the movement information.
  • the visual performance of the teacher in the teaching process can be evaluated.
  • step S200 second structured information is extracted from audio data corresponding to the video data, and the second structured information includes speech recognition information in the audio data.
  • step S100 and step S200 may be performed simultaneously or sequentially. When executed sequentially, the execution order of the two is not limited.
  • Voice-based communication is an important means of online teaching.
  • all the voice information of the conversation between the lecturer and the learner is recorded as audio files with different audio tracks.
  • the audio data collected by the learner terminal and the learner terminal are stored in different audio tracks. Therefore, the audio data of the instructor can be analyzed and evaluated.
  • the performance of the teacher in the speech process of the teaching process is evaluated by extracting the second structured information from the audio data.
  • the second structured information includes speech recognition information obtained by performing speech recognition on audio data.
  • Speech recognition technology is a technology that processes audio data containing speech information to obtain information related to speech content.
  • the voice recognition information obtained through voice recognition may be voice duration information, may also be text information corresponding to the voice information, and may also be information on the number of conversations.
  • the text information can reflect the specific content explained by the instructor during the teaching process, which can be used as the basis for subsequent evaluation.
  • the speech duration information refers to the time axis information of the detected speech in the audio data. Because the lecturer may not be continuously explaining during the teaching process, the information about the length of speech and the number of conversations can reflect the intensity of the exchange between the lecturer and the learner to a certain extent.
  • the speech recognition information obtained in this step also carries time axis information.
  • the time axis information represents the text content in the text corresponding to the time on the time axis.
  • the timeline information represents the start time and end time of the speech duration.
  • the time axis information represents the time point at which the speaking object switches in the conversation.
  • step S300 a first evaluation parameter is acquired according to the first structured information and the second structured information.
  • the first evaluation parameter is an evaluation parameter for video data and audio data. Specifically, the first evaluation parameter is obtained according to the first structured information, the second structured information, and the classification evaluation model.
  • the first structured information includes face information and / or motion information in the video data.
  • the second structured information includes speech recognition information of audio data corresponding to the video data.
  • the voice recognition information may include text information, voice duration information, and conversation times information.
  • the expectations of teaching organizers or supervisors are usually that the performance of the teaching staff should not deviate significantly from the average performance.
  • the statistics of expected facial information and / or action information are approximated, and in the audio data of different online classrooms, the statistical information of expected speech recognition data is also approximated. Therefore, in an optional implementation manner, the evaluation parameters of the video data and audio data are obtained by comparing the extracted information with the corresponding average state information.
  • step S300 may include the following steps:
  • step S310 the first structured information is compared with the first average state information of the classification evaluation model to obtain a first comparison parameter.
  • the first average state information is obtained according to the first structured information corresponding to the historical video data. Specifically, it can be obtained by statistical average or weighted average.
  • the first structured information includes face information and action information, where the face information includes a positive expression ratio (face expression classification information) and an average coordinate and a coordinate variance of a face position.
  • the motion information includes the duration of the hand trajectory in the video data.
  • the first average state information may include an average value of the foregoing parameters obtained according to the historical video data statistics, that is, an average value of the positive expression ratio, an average value of the average coordinates of the face position, an average value of the coordinate variance, and the hand The average of the duration of the external trajectory.
  • the above average value can be obtained by separately extracting the first structured information from the historical video data, and then calculating the average value for all the first structured information.
  • the first structured information may constitute a one-dimensional vector, and each element of the vector is one of the above parameters.
  • the average state information also constitutes a one-dimensional vector.
  • a first comparison parameter representing the degree of difference between the first structured information and the first average state information can be obtained.
  • the acquisition method of the first average state information is not limited to averaging, and different historical video data may be given different weights and obtained by weighted average.
  • a weighted sum may be obtained for each element in the first structured information, a weighted sum may be obtained for each element of the first average state data, and a difference between the two weighted sums may be used as a first comparison parameter.
  • Step S320 Compare the second structured information with the second average state information of the classification evaluation model to obtain a second comparison parameter.
  • the second average state information is obtained according to the second structured information corresponding to the historical audio data. Specifically, it can be obtained by statistical average or weighted average.
  • step S310 and step S320 may be performed simultaneously or sequentially. When executed sequentially, the execution order of the two is not limited.
  • the second structured information includes text information corresponding to a voice in the audio data.
  • the average status information of text information can be obtained in the following ways.
  • a vector space model (VSM) is often used to identify text.
  • the vector space model uses a vector to represent a piece of text information, and each item in the vector is the weight of the feature item.
  • Feature items can be words, words, or phrases of information in the text. Through word segmentation and statistical word frequency operations, the feature terms of the text and their weights can be obtained. If necessary, feature extraction can be performed on the vector to reduce the dimensionality of the vector to reduce the amount of data processing calculations.
  • the extracted feature vector is a mapping of text information in a predetermined feature space, which can uniquely characterize the text information.
  • a feature vector corresponding to each text can be obtained.
  • the average of these feature vectors can be used as the average state information of this type of text information.
  • the word segmentation, word frequency statistics, vectorized expression of text, and feature extraction in the above process can be implemented using various existing text processing technologies.
  • the operation of comparing the text information with the average state information can be implemented by calculating the distance between the feature vector corresponding to the text information and the average state information in the feature space or the included angle on a certain projection plane.
  • the values of the distance and the included angle can represent the degree of difference between the text information and the average state information, so as to serve as the second comparison parameter.
  • the second structured information includes speech length information and dialogue times information of audio data.
  • the second average state information may be an average value of the speech length information and an average value of the number of times of dialogue information obtained according to the historical audio data extraction.
  • the second comparison parameter can be obtained by comparing the difference between two vectors or the weighted sum of the above information. The way to obtain the second comparison parameter in this case is similar to the way to obtain the first comparison parameter.
  • the corresponding text comparison parameters may be obtained based on the text information, and then the non-text comparison parameters may be obtained based on the voice length information and the conversation times information.
  • the second comparison parameter can be obtained by weighted summing or weighted average of the text comparison parameter and the non-text comparison parameter.
  • Step S330 Obtain a first evaluation parameter according to a weighted summation of the first comparison parameter and the second comparison parameter.
  • the first comparison parameter may represent a difference between the average state of the video data related to the performance of the instructor and the historical video data.
  • the second comparison parameter can characterize the difference between the audio data related to the performance of the instructor and the average state of the historical audio data. By weighting the two together, a first evaluation parameter for video data and audio data can be obtained. Based on the first evaluation parameter, a rapid and objective evaluation of the teaching process of the teacher of the video data and audio data recording can be performed.
  • the weights of the first comparison parameter and the second comparison parameter may be set according to the relative importance between video and audio in the application scenario.
  • the above implementation mode provides an unsupervised classification evaluation model for classification.
  • other unsupervised classification methods can also be used to obtain the first evaluation parameter.
  • the first structured information and the second structured information extracted from all video data and audio data may be subjected to unsupervised clustering, respectively, and the first evaluation parameter may be calculated based on the unsupervised clustering result.
  • methods such as K-means clustering, kernel K-means clustering, and spectral clustering can be used.
  • the first evaluation parameter is obtained by using a supervised classification evaluation model.
  • the supervised classification evaluation model is obtained by training based on the labeled first structured information sample and the labeled second structured information sample.
  • the classification evaluation model takes the first structured information and the second structured information as input parameters, and uses the first evaluation parameter as output parameters.
  • the first structured information sample includes first structured information corresponding to historical video data and a first evaluation parameter manually labeled.
  • the second structured information sample includes second structured information corresponding to historical audio data and a first evaluation parameter manually labeled.
  • SVM Small Vector Machine, Support Vector Machines
  • linear regression logistic regression
  • naive Bayes linear discriminant analysis
  • decision tree K-NN (K-nearest, K-nearest, neighborhood, analysis)
  • K-NN K-nearest, K-nearest, neighborhood, analysis
  • the first structured information and the second structured information are respectively extracted from the recorded video data and corresponding audio data, so that the state of the teacher can be obtained from the two dimensions of image and voice, and based on The above-mentioned state obtained through the classification and evaluation model is used to evaluate the situation of online teaching to obtain the first evaluation parameter. Therefore, it is possible to quickly and objectively and accurately evaluate and process massive amounts of video and audio data for online teaching.
  • video data and the corresponding audio data can be divided based on the structure of the courseware, and the obtained video data fragments and audio data fragments actually correspond to a page or a part of the courseware.
  • structured data extraction may be performed on video data fragments and audio data fragments in the same manner as in the above embodiment, and then the structured data of different video data fragments are combined to obtain a first structure.
  • the division of video data and audio data can be based on courseware operation data.
  • the courseware operation data includes an operation record of the courseware, in which a time point at which a teacher performs a page switching operation on the courseware is recorded.
  • FIG. 5 is a data flowchart of a data processing method according to a second embodiment of the present invention.
  • the method of this embodiment is suitable for understanding and evaluating the teaching situation of online teaching.
  • the first structured information is extracted from the video data in combination with courseware operation data.
  • the first structured information includes face information and / or action information corresponding to different courseware operation intervals.
  • step S100 ' includes the following steps:
  • step S110 the time axis is divided into a plurality of courseware operation blocks according to the courseware operation data.
  • the time axis corresponding to each page of courseware can be used as a courseware operation block according to the page switching data in the courseware operation data.
  • step S120 the corresponding first related information is extracted from the corresponding part of the video data according to the courseware operation block obtained by the division.
  • the first related information includes face information and / or motion information in a corresponding part of the video data.
  • video data can be segmented, and structured data extraction is performed on each piece of video data to obtain first relevant information.
  • This extraction process is the same as the way of extracting face information or motion information from the entire video data.
  • step S130 the first structured information is acquired according to the first related information of each courseware operation block.
  • the first structured information of this embodiment can be obtained. That is, in this embodiment, the first structured information is a vector composed of the first related information.
  • step S200 ' the second structured information is extracted from the audio data in conjunction with courseware operation data.
  • the second structured information includes speech recognition information of different courseware operation intervals.
  • step S210 the time axis is divided into a plurality of courseware operation blocks according to the courseware operation data.
  • step S220 the corresponding second related information is extracted from the corresponding portion of the audio data according to the courseware operation block obtained by the division.
  • the second related information includes speech recognition information in a corresponding part of the audio data.
  • the extraction method of the second related information is the same as the extraction method of the speech recognition information in the previous embodiment.
  • step S230 the second structured information is acquired according to the second related information of each courseware operation block.
  • the second structured information of this embodiment can be obtained. That is, in this embodiment, the second structured information is a vector composed of the second related information.
  • step S300 ' a first evaluation parameter is obtained according to the first structured information and the second structured information.
  • the first evaluation parameter is obtained according to the first structured information, the second structured information, and the classification evaluation model.
  • the first related information and audio data of the video data segment corresponding to each courseware operation section according to the first structured information and the second structured information may be used in the same manner as the first embodiment of the present invention.
  • the second related information of the segment obtains the first sub-evaluation parameter, and performs weighted summation on the first sub-evaluation parameter according to the predetermined weight of different courseware operation intervals to obtain the first evaluation parameter.
  • the first structured information and the second structured information may also be input into the classification evaluation model as a whole, and the first evaluation parameter may be directly obtained according to the output of the classification evaluation model.
  • the classification evaluation model may be an unsupervised model or a supervised model.
  • the video data and the audio data are divided based on the courseware operation data, so that benchmarking analysis can be effectively performed on the same part of the teaching content, and the accuracy of the evaluation is improved.
  • FIG. 8 is a flowchart of a data processing method according to a third embodiment of the present invention.
  • the method of this embodiment is suitable for understanding and evaluating the learning effect of online teaching.
  • the data processing method in this embodiment includes the following steps:
  • step S1000 first structured information is extracted from the video data.
  • the first structured information includes face information in video data.
  • the video data is video data recorded by the learner during the online learning process.
  • the video data may be selected according to an evaluation period.
  • the selection method has greater flexibility. For example, it may be video data of one online teaching process, a collection of video data of multiple online teachings corresponding to one teaching unit, or a segment of video data corresponding to one part of one online teaching process.
  • step S2000 second structured information is extracted from audio data corresponding to the video data.
  • the second structured information includes voice recognition information in audio data.
  • step S3000 a first evaluation parameter is obtained according to the first structured information and the second structured information, wherein the first evaluation parameter is used to characterize the historical performance of the current performance information relative to the same learner. Classification of information.
  • the current performance information of the learner is obtained according to the first structured information and the second structured information, and the first evaluation parameter is obtained according to the current performance information.
  • the technical solution of the embodiment of the present invention extracts the first structured information and the second structured information from the recorded video data and corresponding audio data, so that the learner's performance information can be obtained from the two dimensions of image and speech, and based
  • the obtained performance information is longitudinally compared with historical performance information of the same learner to obtain a first evaluation value.
  • the method in this embodiment may further include step S4000, obtaining a second evaluation parameter according to the current performance information.
  • the second evaluation parameter is used to represent classification information of the current performance information relative to performance information of different learners.
  • the classification of the current performance information of learners in all classroom performances can be further obtained through horizontal comparison, so as to obtain more data support for objectively evaluating the learning effects of learners.
  • the video data can be viewed as a time series of images.
  • the face image information in the image can be identified. Further, according to the face image information of different images arranged along the time axis, the face information in the video data can be obtained. In this step, the face image information can be acquired in the same manner as the first embodiment of the present invention.
  • a time series of the above two information corresponding to the video data can be obtained.
  • corresponding performance information can be obtained through statistics or other means, and further processing and evaluation can be performed.
  • the visual performance of the learner can be evaluated.
  • step S1000 and step S2000 may be performed simultaneously or sequentially. When executed sequentially, the execution order of the two is not limited.
  • voice-based communication is an important means of online teaching.
  • all the voice messages of the conversation between the learner and the learner are recorded as audio files with different audio tracks.
  • the audio data collected by the learner terminal and the audio data collected by the learner terminal are stored in different audio tracks. Therefore, the audio data of the learner can be analyzed and evaluated.
  • the performance of the learner in terms of speech is evaluated by extracting the second structured information from the audio data.
  • the second structured information includes speech recognition information obtained by performing speech recognition on audio data.
  • Speech recognition technology is a technology that processes audio data containing speech information to obtain information related to speech content.
  • the voice recognition information obtained through voice recognition may be voice duration information, text information corresponding to the voice information, dialogue information, or pause time information of the learner's voice when the dialogue party switches.
  • the text information can reflect the specific content explained by the instructor during the teaching process, which can be used as the basis for subsequent evaluation.
  • the voice duration information refers to the length of time during which voice is detected in the audio data. Because the instructor may not be continuously explaining during the teaching process, the information about the length of speech and the number of conversations can reflect the enthusiasm of the learner to a certain extent.
  • the pause time information of the learner's voice when the dialogue party switches can reflect the learner's response speed when the teacher asks a question or asks the learner to repeat it, which can also reflect the classroom performance of the learner.
  • the first structured information characterizing the visual features and the second structured information characterizing the voice features are further obtained to obtain the current performance information of the learner.
  • the current performance information is feature information suitable for classification, and the feature information represents a learner's performance in video data and audio data currently being analyzed and evaluated.
  • related feature information may be extracted from the first structured information and the second structured information in a statistical manner, and merged to obtain performance information.
  • the performance information may include at least one of information about the number of facial expressions of a predetermined category and information about the number of predetermined facial gestures obtained according to the first structured information.
  • the performance information may further include at least one of information on the number of conversations obtained based on the second structured information, information on the duration of the learner's voice, and ratio information of the duration of the learner's voice to the length of the instructor's voice.
  • the performance information may further include a feature vector of text information in the second structured information.
  • the performance information may also include a vector of the pause time information of the learner's voice each time the conversation party switches, or the total duration of the pause time.
  • each element of the vector is a corresponding type of information.
  • the first evaluation parameter may reflect a comparison between the current performance and historical performance of the learner, that is, classification information of the current performance information relative to the historical performance information of the same learner.
  • the historical performance information can be obtained by analyzing the learner's historical video data and corresponding historical audio data.
  • Historical performance information is a set of vectors in the same format as the current performance information.
  • the set consisting of the current performance information and the historical performance information may be subjected to unsupervised cluster analysis to obtain the difference information between the current performance information and the historical performance information as classification information.
  • unsupervised clustering methods such as K-means clustering, kernel K-means clustering, and spectral clustering can be used.
  • historical performance information and corresponding first evaluation parameters may be used as samples to obtain a classification model through training.
  • the input of this classification model is a vector as performance information, and the output is a first evaluation parameter.
  • the first evaluation parameter in the sample may be manually labeled, or part of it may be calculated by the original classification model, and part of it may be manually labeled.
  • SVM Small Vector Machine, Support Vector Machines
  • linear regression logistic regression
  • naive Bayes linear discriminant analysis
  • decision tree K-NN (K-nearest, K-nearest, neighborhood, analysis)
  • K-NN K-nearest, K-nearest, neighborhood, analysis
  • the first evaluation parameter may be an evaluation score value, or a vector composed of multiple evaluation scores of different dimensions.
  • a vector of evaluation scores including aspects such as learning attitude, initiative, and scalability.
  • the above implementation mode provides an unsupervised classification evaluation model for classification.
  • other unsupervised classification methods can also be used to obtain the evaluation parameters.
  • the first structured information and the second structured information extracted from all video data and audio data may be subjected to unsupervised clustering, respectively, and evaluation parameters may be calculated based on the unsupervised clustering results.
  • methods such as K-means clustering, kernel K-means clustering, and spectral clustering can be used.
  • step S4000 perform horizontal comparison of performance information of different learners according to the current performance information to obtain a second evaluation parameter of the learner being evaluated.
  • the second evaluation parameter may present a comparison of the classroom performance of the learners being evaluated with other learners who are participating in the same online learning course.
  • the performance information of different learners can be obtained from the adaptation data and audio data of one or more different learners.
  • a set consisting of current performance information and performance information of different learners may be subjected to unsupervised cluster analysis to obtain difference information of current performance information and performance information of other learners as classification information.
  • unsupervised clustering methods such as K-means clustering, kernel K-means clustering, and spectral clustering can be used.
  • the performance information of different learners and corresponding second evaluation parameters can be used as samples to train and obtain a classification model.
  • the input of the classification model is a vector as performance information and the output is a second evaluation. parameter.
  • the current performance information can be input into the classification model, and corresponding second evaluation parameters can be obtained.
  • SVM Small Vector Machine, Support Vector Machines
  • linear regression logistic regression
  • naive Bayes linear discriminant analysis
  • decision tree K-NN (K-nearest, K-nearest, neighborhood, analysis)
  • K-NN K-nearest, K-nearest, neighborhood, analysis
  • the learning effect evaluation information may be acquired only according to the first evaluation parameter and the second evaluation parameter.
  • the first evaluation parameter and the second evaluation parameter may be directly weighted and summed, or the elements thereof may be weighted and summed to obtain learning effect evaluation information.
  • the first evaluation parameter and the second evaluation parameter may be combined with other parameters related to the learning effect to obtain the learning effect evaluation information.
  • the learning effect evaluation information may be acquired according to the first evaluation parameter, the second evaluation parameter, and evaluation result information of a corresponding course standard.
  • the assessment result information corresponding to the course standard may be knowledge point test information obtained based on computer testing.
  • the learning effect evaluation information can be obtained by weighting and summing the above parameters.
  • the elements can be weighted and summed to obtain learning effect evaluation information.
  • the learning effect evaluation information may be acquired according to the first evaluation parameter, the second evaluation parameter, and manual evaluation information.
  • the artificial evaluation information is the scoring of the learner's classroom performance from different dimensions in an artificial manner by the teacher after the lesson.
  • the scoring operation may be performed by the first client 1 operated by the teacher.
  • the learning effect evaluation information may be acquired according to the first evaluation parameter, the second evaluation parameter, and learner attribute information.
  • the learner's attribute information may include information such as the learner's personality classification and original learning level. This information can be used to influence or adjust the weights of the first and second evaluation parameters or some of their elements. For example, when the learner's personality is classified as introverted, although the speech duration information in the performance information may be relatively small, the classroom performance still has a good level.
  • the second evaluation parameter can be adjusted by introducing learner attribute information, so that the learner's inherent personality can be used to more accurately evaluate the learning effect.
  • the learning effect evaluation information may be obtained according to the first evaluation parameter, the second evaluation parameter, learner attribute information, assessment result information corresponding to a course standard, and manual evaluation information. As a result, the accuracy and objectivity of the learning effect evaluation information is maximized.
  • the first evaluation parameter, the second evaluation parameter, the learner attribute information, the evaluation result information corresponding to the course standard, the manual evaluation information, and the corresponding learning effect evaluation information may also be presented to a data analyst through an output device.
  • the learner attribute information may also be presented to a data analyst through an output device.
  • FIG. 9 is a schematic diagram of an electronic device according to a fourth embodiment of the present invention.
  • the electronic device shown in FIG. 9 is a general-purpose data processing apparatus including a general-purpose computer hardware structure including at least a processor 91 and a memory 92.
  • the processor 91 and the memory 92 are connected through a bus 93.
  • the memory 92 is adapted to store instructions or programs executable by the processor 91.
  • the processor 91 may be an independent microprocessor or a collection of multiple microprocessors. Therefore, the processor 91 executes the commands stored in the memory 92, thereby executing the method flow of the embodiment of the present invention as described above to implement data processing and control on other devices.
  • the bus 93 connects the above-mentioned multiple components together, and simultaneously connects the above-mentioned components to the display controller 94 and the display device and the input / output (I / O) device 94.
  • Input / output (I / O) devices 94 may be a mouse, keyboard, modem, network interface, touch input device, somatosensory input device, printer, and other devices known in the art.
  • an input / output (I / O) device 95 is connected to the system through an input / output (I / O) controller 96.
  • the memory 92 may store software components, such as an operating system, a communication module, an interaction module, and an application program. Each module and application described above corresponds to a set of executable program instructions that perform one or more functions and methods described in the embodiments of the invention.
  • aspects of the embodiments of the present invention may be implemented as a system, method or computer program product. Therefore, various aspects of the embodiments of the present invention may take the following forms: a completely hardware implementation, a completely software implementation (including firmware, resident software, microcode, etc.) or may generally be referred to herein as “circuits", “modules” “Or” system “implementations that combine software and hardware aspects. Furthermore, aspects of the invention may take the form of a computer program product implemented in one or more computer-readable media, the computer-readable medium having computer-readable program code implemented thereon.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium capable of containing or storing a program used by or in conjunction with an instruction execution system, device, or device.
  • the computer-readable signal medium may include a propagated data signal having computer-readable program code implemented therein, such as in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electromagnetic, optical, or any suitable combination thereof.
  • the computer-readable signal medium may be any of the following computer-readable media: not a computer-readable storage medium, and may communicate and propagate a program used by or in conjunction with an instruction execution system, device, or device Or transmission.
  • Computer program code for performing operations directed to aspects of the present invention may be written in any combination of one or more programming languages, including: object-oriented programming languages such as Java, Smalltalk, C ++, PHP, Python Etc .; and conventional procedural programming languages such as the "C" programming language or similar programming languages.
  • the program code can be executed entirely on the user's computer as a stand-alone software package, and partially on the user's computer. Performed partially on the user's computer and partially on the remote computer; or entirely on the remote computer or server. In the latter case, the remote computer can be connected to the user's computer through any type of network including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer (for example, by using the Internet of an Internet service provider) .
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, by using the Internet of an Internet service provider

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

本发明实施例的技术方案根据从记录的视频数据和对应的音频数据中分别提取第一结构化信息和第二结构化信息以从图像和语音两个维度获取学习者的表现信息或教学者的状态,并基于提取获得的学习者的表现信息或教学者的状态来对在线教学的情况进行评估获取第一评估参数。由此,可以快速地对海量的在线教学视频数据和音频数据进行较为客观、准确的评估处理。

Description

数据处理方法、存储介质和电子设备
本申请要求了2018年7月3日提交的、申请号为2018107189559、发明名称为“数据处理方法、存储介质和电子设备”和2018年7月11日提交的、申请号为201810759328X、发明名称为“数据处理方法、存储介质和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据处理和机器学习技术,具体涉及一种数据处理方法、存储介质和电子设备,更具体地,涉及基于视频数据和音频数据对学习者在线学习效果或在线教学状态进行评估的方法和相关装置。
背景技术
随着互联网技术的发展,基于网络的在线教学得到越来越广泛的应用。知识服务者或知识分享者(也可称为教学者)可以通过网络与学习者进行实时的交流和沟通。在对在线教学的学习效果进行了解和评估时,通常需要依赖于教学者人工评价以及学习者人工反馈的方式,或者依赖于知识点测试这样的应试手段。但是,知识点测试的方式评价维度单一,而人工评价和反馈的方式则客观性不足。在对在线教学的讲解情况进行了解和评估时,通常需要依赖于基于人工进行录像回看或进行在线监督。但是,如果在线课堂的数量比较大,则会涉及大量的音视频数据,采用人工方式会消耗大量的人力资源,甚至根本无法完成。
发明内容
有鉴于此,本发明实施例提供一种数据处理方法、存储介质和电子设备,以自动化地对于在线录制的视频数据和音频数据进行处理,对学习者在在线教学过程中的学习效果或在线教学的情况进行较为准确的自动化评估。
根据本发明实施例的第一方面,提供一种数据处理方法,其中,所述方法包括:
从视频数据中提取第一结构化信息,所述视频数据为在线教学过程中录制的学习者视频,所述第一结构化信息包括视频数据中的人脸信息和/或动作信息;
从所述视频数据对应的音频数据中提取第二结构化信息,所述第二结构化信息 包括音频数据中的语音识别信息;以及
根据所述第一结构化信息和所述第二结构化信息获取第一评价参数。
根据本发明实施例的第二方面,提供一种计算机可读存储介质,其上存储计算机程序指令,其中,所述计算机程序指令在被处理器执行时实现如第一方面所述的方法。
根据本发明实施例的第三方面,提供一种电子设备,包括存储器和处理器,其中,所述存储器用于存储一条或多条计算机程序指令,其中,所述一条或多条计算机程序指令被所述处理器执行以实现如第一方面所述的方法。
本发明实施例的技术方案根据从记录的视频数据和对应的音频数据中分别提取第一结构化信息和第二结构化信息以从图像和语音两个维度获取学习者的表现信息或教学者的状态,并基于提取获得的学习者的表现信息或教学者的状态来对在线教学的情况进行评估获取第一评估参数。由此,可以快速地对海量的在线教学视频数据和音频数据进行较为客观、准确的评估处理。
附图说明
通过以下参照附图对本发明实施例的描述,本发明的上述以及其它目的、特征和优点将更为清楚,在附图中:
图1是本发明实施例的数据处理方法所适用的在线教学系统的示意图;
图2是本发明实施例的在线教学系统的客户端应用的界面示意图;
图3是本发明第一实施例的数据处理方法的流程图;
图4是本发明第一实施例的方法获取评价参数的流程图;
图5是本发明第二实施例的数据处理方法的数据流向图;
图6是本发明第二实施例结合课件操作数据提取第一结构化信息的流程图;
图7是本发明第二实施例结合课件操作数据提取第二结构化信息的流程图;
图8是本发明第三实施例的数据处理方法的流程图;
图9是本发明第四实施例的电子设备的示意图。
具体实施方式
以下基于实施例对本发明进行描述,但是本发明并不仅仅限于这些实施例。在下文对本发明的细节描述中,详尽描述了一些特定的细节部分。对本领域技术人员 来说没有这些细节部分的描述也可以完全理解本发明。为了避免混淆本发明的实质,公知的方法、过程、流程、元件和电路并没有详细叙述。
此外,本领域普通技术人员应当理解,在此提供的附图都是为了说明的目的,并且附图不一定是按比例绘制的。
除非上下文明确要求,否则整个说明书和权利要求书中的“包括”、“包含”等类似词语应当解释为包含的含义而不是排他或穷举的含义;也就是说,是“包括但不限于”的含义。
在本发明的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。
本发明的数据处理方法适用于基于预定的课件进行在线音视频教学的场景。图1是本发明实施例的数据处理方法所适用的在线教学系统的示意图。如图1所示,所述在线教学系统包括第一客户端1、第二客户端2和服务器3。其中,第一客户端1、第二客户端2和服务器3通过网络通信连接。第一客户端1和第二客户端2可以直接或通过服务器3间接地建立通信连接后实时通信进行在线教学的活动。第一客户端1可以由教学者操作。第二客户端2可以由学习者操作。同时,服务器3同时与第一客户端1和第二客户端2形成通信连接,对两者之间交互的数据进行存储。第一客户端1和第二客户端2可以访问服务器3获取课件数据进行展示,从而实现基于课件的在线教学。在本实施例使用的在线教学系统中,第一客户端1和第二客户端2展示的课件内容同步变化,使得教学者和学习者可以同步地,基于课件的相同部分进行交流。应理解,第一客户端1和第二客户端2可以为任何运行有预定计算机应用程序的通用数据处理设备,例如台式计算机、便携式计算机、平板计算机、智能手机等。服务器3为用于运行预定计算机应用程序的高性能数据处理设备,其可以是一台服务器,也可以是分布式部署的服务器集群,还可以是以虚拟机或容器方式部署的虚拟服务器集群。应理解,在本发明实施例的在线教学系统中,大量的第一客户端1以一对一、一对多、或多对多的方式与第二客户端2建立通信连接进行通信。
图2是本发明实施例的在线教学系统的客户端应用的界面示意图。如图2所示,本实施例的客户端应用可以在应用界面的主窗口21显示课件,并在应用界面的子窗口22显示对方的图像采集装置采集的实时图像。通常,在应用界面的子窗口22中显示的对方的上半身的视频。由此,通信的双方可以同时看到课件和对方的状态。同时, 在主窗口中显示的课件内容受控于教学者一端的操作进行切换或者显示轨迹。具体地,教学者在第一客户端1上可以在课件上进行页面切换(也即,翻页)或对课件的内容进行轨迹操作。所述轨迹操作是指在课件上通过轨迹标识内容或绘制图像。例如,教学者可以通过线或者圆圈来突出标识课件的某些内容,也可以通过手写或鼠标操作通过轨迹绘制图形或文字。
在对在线教学的学习效果进行了解和评估时,服务器3可以记录采集到的教学者的视频数据以及学习者的视频数据。服务器3还可以记录教学者在整个教学过程中的音频数据以及学习者在教学过程中的音频数据。所述音频数据包括对应的语音信息。由此,可以通过对服务器3记录的学习者的视频数据和音频数据进行处理,以自动化地评估学习者进行在线学习的学习效果。
如上所述,在在线教学的过程中,学习者主要从三个维度来获取信息,一方面是课件上的内容,另一方面是教学者进行讲解的语音信息,第三个方面是教学者的视频。对应地,学习者的课堂表现可以通过学习者的面部表情(视觉表现)和与教学者语音交流的过程(音频表现)两个方面来体现。通过视频可以使得学习者具有面对面交流的体验。一方面,在各种教学场景下,学习者的面部(人脸)表情可以传达其对于所讲解的内容的反馈。例如,如果学习者的面部表情为微笑或专注等积极的表情,则表明在课堂上学习者具有较好的学习积极性。另一方面,在线教学过程中,如果学习者课堂表现较好,则会体现为其与教学者的沟通次数较多,以及音频数据中学习者的语音时间长度较长。由此,可以基于视频数据和音频数据来提取学习者的表现信息。
在对在线教学的讲解情况进行了解和评估时,服务器3可以记录教学过程中教学者在第一客户端1上所施加的所有的课件操作(包括页面切换操作和轨迹操作)。服务器3还可以记录教学者在整个教学过程中的音频数据以及学习者在教学过程中的音频数据。所述音频数据包括对应的语音信息。由此,可以通过对服务器3记录的教学者的视频数据和音频数据进行处理,以自动化地评估教学过程的情况。
如上所述,在在线教学的过程中,学习者主要从三个维度来获取信息,一方面是课件上的内容,另一方面是教学者进行讲解的语音信息,第三个方面是教学者的视频。通过视频可以使得学习者具有面对面交流的体验。一方面,对于语言学习或音乐学习的场景,学习者可以通过视频中教学者的口型学习发音的技巧。另一个方面,在各种教学场景下,教学者的面部(人脸)表情以及动作可以向学习者传达语音无法传达的信息,而且可以带动教学的气氛,提升教学质量。从提高教学质量和学习者满意度的 角度,期望教学者在教学过程中能恰当地运用表情和动作来调节教学氛围以及增强沟通效果。同时,期望教学者的面部始终位于视频中,处于学习者可视的状态。
图3是本发明第一实施例的数据处理方法的流程图。本实施例的方法适用于对在线教学的讲解情况进行了解和评估。如图3所示,本实施例的数据处理方法包括如下步骤:
在步骤S100,从视频数据中提取第一结构化信息。其中,所述视频数据为在线教学过程中录制的教学者视频。所述第一结构化信息包括视频数据中的人脸信息和/或动作信息。
视频数据可以看作图像的时间序列。通过对每一帧图像或某些关键帧图像进行识别处理可以识别得到图像中的人脸图像信息。进一步,根据延时间轴排列的不同图像的人脸图像信息,就可以获取到视频数据中的人脸信息。同时,可以采用各种现有的技术来识别视频中的动作信息。本实施例通过从视频数据中提取第一结构化信息对教学者在教学过程中的视觉表现进行评估。
在一个可选的实现方式中,第一结构化信息包括人脸信息和动作信息。
其中,人脸信息包括人脸位置信息、表征检测到人脸的信息和人脸表情分类信息中的至少一项。通过人脸检测算法可以有效地检测获取人脸是否出现在图像中以及人脸的具体位置。人脸检测算法包括例如参考模板法、人脸规则法、特征子脸法以及样本识别法等。所获取的人脸位置信息可以通过人脸区域的数据结构R(X,Y,W,H)来表示。其中,R(X,Y,W,H)限定了图像中包括人脸主要部分的一个矩形区域,其中,X和Y限定了该矩形区域的一个端点的坐标,W和H分别限定该矩形区域的宽度和高度。由于人脸五官的分布具有较高的相似性,因此,在检测获得人脸区域信息后,就可以对人脸区域内的图像进行进一步检测来获取五官的位置。例如,可以利用Dlib来进行上述的人脸检测和唇部关键点信息获取。Dlib是一个包含机器学习算法的C++开源工具包。在Dlib中,将人脸的五官和轮廓通过68个关键点来进行标识。由于人的表情不同时,人脸的五官位于不同的相对位置和状态,因此,通过自行训练的分类器或者相关开发函数库提供的分类器就可以对表情进行识别和分类。又例如,可以基于OpenCV库来实现对于表情的识别。OpenCV是一个基于BSD许可(开源)发行的跨平台计算机视觉库,可以运行在Linux、Windows、Android和Mac OS操作系统上。它由一系列C函数和少量C++类构成,同时提供了Python、Ruby、MATLAB等语言的接口,实现了图像处理和计算机视觉方面的很多通用算法。在现有技术(“基 于opencv的人脸表情识别系统的设计与实现”,秦续洋,硕士学位论文,郑州大学;2013年)中描述了通过OpenCV进行表情识别的方法。又例如,也可以调用现有的商业表情识别软件接口来进行表情识别。现有的各图像识别服务提供商,例如百度AI,商汤科技均可以提供表情识别的服务接口。
在获取到各图像中的人脸位置信息和人脸表情分类信息后,就可以获得视频数据对应的上述两个信息的时间序列。根据上述时间序列就可以通过统计或其它手段获取人脸信息,进而进行进一步的处理和评估。
同时,现有技术也可以根据视频数据对人体的动作来进行识别,获得视频中人体的动作信息。所述动作信息可以包括人体的肢体动作。在一个可选实现方式中,可以通过手部识别来识别获得视频数据中人手的位置,跟踪人手的动作,将其移动轨迹的相关信息作为动作信息。
通过包括人脸信息和动作信息的第一结构化信息,就可以对于教学者在教学过程中的可视表现进行评估。
应理解,也可以仅采集人脸信息或动作信息作为第一结构化信息。
在步骤S200,从所述视频数据对应的音频数据中提取第二结构化信息,所述第二结构化信息包括音频数据中的语音识别信息。
应理解,步骤S100和步骤S200可以同时执行也可以先后执行,在先后执行时,两者的执行顺序不受限制。
基于语音的交流是在线教学的重要手段。在线教学过程中,将教学者和学习者的对话的所有的语音信息录制为具有不同音轨的音频文件。其中,教学者一侧终端采集的音频数据和学习者一侧终端采集的音频数据采用不同的音轨存储。因此,可以针对教学者的音频数据进行分析和评估。在本实施例中,通过从音频数据中提取第二结构化信息对教学者在教学过程中的语音方面呈现出来的表现进行评估。其中第二结构化信息包括通过对音频数据进行语音识别获得的语音识别信息。语音识别技术是对于包含语音信息的音频数据进行处理,以获取与语音内容相关的信息的技术。在本实施例中,通过语音识别获得的语音识别信息可以是语音时长信息,也可以是语音信息对应的文本信息,还可以是对话次数信息。文本信息可以体现教学过程中教学者讲解的具体内容,其可以作为后续评估的基础。同时,语音时长信息是指音频数据中检测到语音的时间轴信息。由于在教学过程中教学者可能并不是持续地在进行讲解,因此,语音时长信息以及对话次数信息一定程度上可以反映教学者与学习者交流的强度。本 步骤获取的语音识别信息同时也带有时间轴信息。对于文本信息,所述时间轴信息表征文本中的文字内容对应在时间轴上的时间。对于语音时长信息,时间轴信息表征语音时长的起点时间和终点时间。对于对话次数信息,时间轴信息表征对话中讲话对象切换的时间点。
在步骤S300,根据第一结构化信息和第二结构化信息获取第一评价参数。
在本实施例中,第一评价参数为对于视频数据和音频数据的评价参数。具体地,根据第一结构化信息、第二结构化信息和分类评价模型获取第一评价参数。
如上所述,第一结构化信息包括视频数据中的人脸信息和/或动作信息。第二结构化信息包括视频数据对应的音频数据的语音识别信息。语音识别信息可以包括文本信息、语音时长信息和对话次数信息。对于在线教学的过程,教学组织者或监管者的期望通常是教学者的表现不要大幅偏离平均的表现。这意味着不同的在线课堂的视频数据中,期望人脸信息和/或动作信息的统计数据是趋近的,不同的在线课堂的音频数据中,期望语音识别数据的统计信息也是趋近的。因此,在一个可选实现方式中,通过将提取获得的信息与对应的平均状态信息比较,来获取对视频数据和音频数据的评价参数。具体地,如图4所示,步骤S300可以包括如下步骤:
在步骤S310,将第一结构化信息与分类评价模型的第一平均状态信息比较获取第一比较参数。其中,所述第一平均状态信息根据历史视频数据对应的第一结构化信息获得。具体地,可以通过统计平均或加权平均来获得。
例如,第一结构化信息包括人脸信息和动作信息,其中,人脸信息包括积极表情占比(人脸表情分类信息)以及人脸位置的平均坐标和坐标方差。动作信息包括视频数据中手部轨迹的持续时间。第一平均状态信息则可以包括根据历史视频数据统计获得的上述各项参数的平均值,也即,积极表情占比的平均值、人脸位置平均坐标的平均值以及坐标方差的平均值和手部轨迹持续时间的平均值。上述平均值可以通过对历史视频数据分别提取第一结构化信息,进而对所有的第一结构化信息计算平均值获得。第一结构化信息可以构成一个一维向量,向量的每个元素是上述参数中的一项。同时,平均状态信息也构成一个一维向量。通过计算两个向量之间或向量在特定平面投影之间的夹角,或者两个向量之间的距离,就可以获得表征第一结构化信息和第一平均状态信息差异度的第一比较参数。
应理解,第一平均状态信息的获取方式并不限于求平均,还可以对不同的历史视频数据赋予不同的权重,通过加权平均来获取。
还应理解,对于第一结构化信息和第一平均状态数据进行比较也可以采用其他方式,只要能够获得表征两者差异度的第一比较参数即可。例如,可以对第一结构化信息中的各元素求取加权和,对第一平均状态数据的各元素求取加权和,通过两个加权和的差值来作为第一比较参数。
步骤S320,将所述第二结构化信息和与所述分类评价模型的第二平均状态信息比较获取第二比较参数。其中,所述第二平均状态信息根据历史音频数据对应的第二结构化信息获得。具体地,可以通过统计平均或加权平均来获得。
应理解,步骤S310和步骤S320可以同时执行也可以先后执行,在先后执行时,两者的执行顺序不受限制。
例如,第二结构化信息包括音频数据中的语音所对应的文本信息。可以通过如下方式来获取文本信息的平均状态信息。在文本处理中,通常采用向量空间模型(VSM)来标识文本。向量空间模型以一个向量来表征一段文本信息,向量中的每一项为特征项的权重。特征项可以是文本中信息的字、词或词组。通过分词以及统计词频等操作,就可以获取到文本的特征项以及特征项的权重。如果必要的话,可以对向量进行特征提取,降低向量的维数以降低数据处理的计算量。经过提取后的特征向量为文本信息在一个预定的特征空间中的映射,其可以唯一地表征文本信息。由此,在对于大量的同类文本进行向量化操作以及特征提取后,就可以获得每个文本对应的特征向量。这些特征向量的平均值就可以作为这一类文本信息的平均状态信息。上述过程中的分词、词频统计、文本的向量化表达以及特征提取均可以采用现有的各种文本处理技术来实现。对文本信息与平均状态信息进行比较的操作可以通过计算文本信息对应的特征向量和平均状态信息在特征空间中的距离或者在某一投影平面上的夹角来实现。所述距离和夹角的数值就可以表征文本信息与平均状态信息之间的差异程度,从而作为所述的第二比较参数。
又例如,第二结构化信息包括音频数据的语音长度信息和对话次数信息。第二平均状态信息则可以为根据历史音频数据提取获得的语音长度信息的平均值和对话次数信息的平均值。通过比较两个向量或上述信息的加权和的差异就可以得到第二比较参数。在这种情况下获取第二比较参数的方式与获取第一比较参数的方式类似。
又例如,第二结构化信息既包括文本信息又包括语音长度信息和对话次数信息,则可以先基于文本信息获取对应的文本比较参数,然后基于语音长度信息和对话次数信息获取非文本比较参数,将文本比较参数和非文本比较参数加权求和或加权平均就 可以得到第二比较参数。
步骤S330,根据所述第一比较参数和所述第二比较参数加权求和获取第一评价参数。
第一比较参数可以表征视频数据中与教学者表现相关的数据和历史视频数据的平均状态的差异。第二比较参数可以表征音频数据中与教学者表现相关的数据与历史音频数据的平均状态的差异。通过对两者进行加权求和就可以得到对视频数据和音频数据的第一评价参数。基于第一评价参数,可以对于视频数据和音频数据记录的教学者的教学过程进行快速、客观的评价。第一比较参数和第二比较参数的权重可以根据应用场景中视频和音频之间的相对重要性来设置。
上述实现方式提供了一种无监督的分类评价模型来进行分类。实际上,也可以采用其他的无监督的分类方式来获取第一评价参数。例如,可以将所有的视频数据和音频数据提取的第一结构化信息和第二结构化信息分别进行无监督聚类,基于无监督聚类结果来计算第一评价参数。无监督聚类可以采用例如K均值聚类、核K均值聚类、谱聚类等方法。
在另一个可选实现方式中,通过有监督的分类评价模型来获取第一评价参数。其中,有监督的分类评价模型根据带有标注的第一结构化信息样本和带有标注的第二结构化信息样本训练获得。所述分类评价模型以第一结构化信息和第二结构化信息为输入参数,以第一评价参数为输出参数。其中,所述第一结构化信息样本包括历史视频数据对应的第一结构化信息和人工标注的第一评价参数。所述第二结构化信息样本包括历史音频数据对应的第二结构化信息和人工标注的第一评价参数。对于本实施例,可以采用例如SVM(支持向量机,Support Vector Machines)、线性回归、逻辑回归、朴素贝叶斯、线性判别分析、决策树、K-NN(K-临近,K-nearest neighbor analysis)等各种现有的建模方式建立所述分类模型。
本发明实施例的技术方案根据从记录的视频数据和对应的音频数据中分别提取第一结构化信息和第二结构化信息,从而可以从图像和语音两个维度获取教学者的状态,并基于提取获得的上述状态通过分类评估模型来对在线教学的情况进行评估获取第一评估参数。由此,可以快速地对海量的在线教学的视频和音频数据进行较为客观、准确的评估处理。
进一步地,在线教学活动通常基于课件的展示来开展。不同的教学者基于相同的课件来开展教学活动时,教学活动会由于课件的存在呈现出更加结构化和标准化的趋 势。在此前提下,可以基于课件的结构来对视频数据和对应的音频数据进行划分,划分获得的视频数据片段和音频数据片段实际对应于课件的一页或者一部分。在本发明的另一个实施例中,可以按照与上述实施例相同的方式来对视频数据片段和音频数据片段分别进行结构化数据的提取,然后合并不同视频数据片段的结构化数据获得第一结构化信息,合并不同音频数据片段的结构化数据获得第二结构化信息。对于视频数据和音频数据的划分可以根据课件操作数据来进行。所述课件操作数据包括课件的操作记录,其中记录了教学者对课件进行页面切换操作的时间点。
图5为本发明第二实施例的数据处理方法的数据流程图。本实施例的方法适用于对在线教学的讲解情况进行了解和评估。如图5所示,在步骤S100’,结合课件操作数据从视频数据中提取所述第一结构化信息。其中,第一结构化信息包括不同课件操作区间对应的人脸信息和/或动作信息。在一个可选实现方式中,如图6所示,步骤S100’包括如下步骤:
在步骤S110,根据课件操作数据将时间轴划分为多个课件操作区块。
具体地,可以根据课件操作数据中的页面切换数据将每一页课件对应的时间轴作为一个课件操作区块。
在步骤S120,根据划分获得的课件操作区块从所述视频数据的对应部分中提取对应的第一相关信息。其中,所述第一相关信息包括所述视频数据的对应部分中的人脸信息和/或动作信息。
根据划分后的时间轴(也即不同的课件操作区块)就可以对视频数据进行分段,对每一段视频数据进行结构化数据的提取获得第一相关信息。这一提取的过程和对整个视频数据提取人脸信息或动作信息的方式相同。
在步骤S130,根据各课件操作区块的所述第一相关信息获取所述第一结构化信息。
通过将各课件操作区块的第一相关信息按顺序合并为一个数组或一维向量,就可以获得本实施例的第一结构化信息。也就是说,在本实施例中,第一结构化信息为第一相关信息组成的向量。
在步骤S200’,结合课件操作数据从所述音频数据中提取所述第二结构化信息。其中,第二结构化信息包括不同课件操作区间的语音识别信息。在一个可选实现方式中,如图7所示,步骤S200’包括如下步骤:
在步骤S210,根据课件操作数据将时间轴划分为多个课件操作区块。
在步骤S220,根据划分获得的课件操作区块从所述音频数据的对应部分中提取对应的第二相关信息。其中,所述第二相关信息包括所述音频数据的对应部分中的语音识别信息。第二相关信息的提取方式,与上一实施例中提取语音识别信息的方式相同。
在步骤S230,根据各课件操作区块的所述第二相关信息获取所述第二结构化信息。
具体地,通过将各课件操作区块的第二相关信息按顺序合并为一个数组或一维向量,就可以获得本实施例的第二结构化信息。也就是说,在本实施例中,第二结构化信息为第二相关信息组成的向量。
在步骤S300’,根据所述第一结构化信息和所述第二结构化信息获取第一评价参数。
具体地,根据第一结构化信息、第二结构化信息和分类评价模型获取第一评价参数。对应地,在本步骤中,可以按照本发明第一实施例相同的方式根据第一结构化信息和第二结构化信息对每个课件操作区间对应的视频数据片段的第一相关信息和音频数据片段的第二相关信息获取第一子评价参数,并根据不同课件操作区间的预定权重对第一子评价参数进行加权求和获取第一评价参数。
也可以将第一结构化信息和第二结构化信息整体输入到分类评价模型中,根据分类评价模型的输出直接获取第一评价参数。所述分类评价模型可以是无监督模型也可以是有监督模型。
由此,本实施例通过基于课件操作数据来对视频数据和音频数据进行划分,由此,可以有效地对教学内容相同的部分进行对标分析,提高评估的准确性。
图8是本发明第三实施例的数据处理方法的流程图。本实施例的方法适用于对在线教学的学习效果进行了解和评估。如图8所示,本实施例的数据处理方法包括如下步骤:
在步骤S1000,从视频数据中提取第一结构化信息。其中,所述第一结构化信息包括视频数据中的人脸信息。
其中,所述视频数据为服务器3记录的学习者在在线学习过程中的视频数据。具体地,所述视频数据可以根据评价的周期来选取。选取的方式具有较大的灵活性。例如,可以为一次在线教学过程的视频数据,也可以为一个教学单元对应的多次在线教学的视频数据的集合,也可以是一次在线教学过程的一个部分所对应的视频数据的 片段。
在步骤S2000,从所述视频数据对应的音频数据中提取第二结构化信息。其中,所述第二结构化信息包括音频数据中的语音识别信息。
在步骤S3000,根据所述第一结构化信息和所述第二结构化信息获取第一评价参数,其中,所述第一评价参数用于表征所述当前表现信息相对于同一学习者的历史表现信息的分类信息。
具体地,根据第一结构化信息和第二结构化信息获取学习者的当前表现信息,并根据当前表现信息获取第一评价参数。
本发明实施例的技术方案根据从记录的视频数据和对应的音频数据中提取第一结构化信息和第二结构化信息,从而可以从图像和语音两个维度获取学习者的表现信息,并基于提取获得的上述表现信息与同一学习者的历史表现信息进行纵向比较获得第一评估值。由此,可以快速地处理对海量的在线教学的数据涉及的学习质量进行较为客观、准确的评估处理。
优选地,本实施例的方法还可以包括步骤S4000,根据当前表现信息获取第二评价参数。其中,所述第二评价参数用于表征所述当前表现信息相对于不同学习者的表现信息的分类信息。
由此,可以进一步通过横向比较的获得学习者的当前表现信息在所有课堂表现中的分类情况,从而为客观评价学习者的学习效果获得更多的数据支持。
对于步骤S1000,视频数据可以看作图像的时间序列。通过对每一帧图像或某些关键帧图像进行识别处理可以识别得到图像中的人脸图像信息。进一步,根据沿时间轴排列的不同图像的人脸图像信息,就可以获取到视频数据中的人脸信息。在本步骤中,可以按照与本发明第一实施例相同的方式获取人脸图像信息。
在获取到各图像中的人脸位置信息和人脸表情分类信息后,就可以获得视频数据对应的上述两个信息的时间序列。根据上述时间序列就可以通过统计或其它手段获取对应的表现信息,进而进行进一步的处理和评估。
通过包括人脸信息的第一结构化信息,就可以对于学习者的可视表现进行评估。
应理解,步骤S1000和步骤S2000可以同时执行也可以先后执行,在先后执行时,两者的执行顺序不受限制。
对于步骤S2000,基于语音的交流是在线教学的重要手段。在线教学过程中,教学者和学习者的对话的所有的语音信息被录制为具有不同音轨的音频文件。其中,教 学者一侧终端采集的音频数据和学习者一侧终端采集的音频数据采用不同的音轨存储。因此,可以针对学习者的音频数据进行分析和评估。在本实施例中,通过从音频数据中提取第二结构化信息对学习者在语音方面呈现出来的表现进行评估。其中第二结构化信息包括对音频数据进行语音识别获得的语音识别信息。语音识别技术是对于包含语音信息的音频数据进行处理,以获取与语音内容相关的信息的技术。在本实施例中,通过语音识别获得的语音识别信息可以是语音时长信息,可以是语音信息对应的文本信息,还可以是对话次数信息,也可以是对话方切换时学习者语音的停顿时间信息。文本信息可以体现教学过程中教学者讲解的具体内容,其可以作为后续评估的基础。同时,语音时长信息是指音频数据中检测到语音的时间长度信息。由于在教学过程中教学者可能并不是持续地在进行讲解,因此,语音时长信息以及对话次数信息一定程度上可以反映学习者交流的积极性。对话方切换时学习者语音的停顿时间信息则可以反映在教学者提出问题或要求学习者进行复述的时候学习者的响应速度,这也可以反映学习者的课堂表现。
对于步骤S3000,综合表征视觉特征的第一结构化信息和表征语音特征的第二结构化信息进一步获得学习者的当前表现信息。所述当前表现信息是适于进行分类的特征信息,该特征信息表征学习者在当前被分析和评估的视频数据和音频数据中的表现。
具体地,可以通过统计的方式来从第一结构化信息和第二结构化信息中提取相关的特征信息,并进行合并以获取表现信息。例如,表现信息可以包括根据第一结构化信息获得的预定类别的人脸表情的数量信息和预定人脸姿态的数量信息中的至少一项。表现信息还可以包括根据第二结构化信息获得的对话次数信息、学习者的语音时长信息、学习者语音时长与教学者语音时长的比例信息中的至少一项。表现信息还可以包括第二结构化信息中的文本信息的特征向量。表现信息还可以包括每次对话方切换时学习者语音的停顿时间信息的向量或者所述停顿时间的总时长。
可以将上述各类信息合并为一个向量作为表现信息,向量的每个元素是对应的一类信息。
第一评价参数可以体现该学习者的当前表现与历史表现的对比,也即,当前表现信息相对于同一学习者的历史表现信息的分类信息。其中,历史表现信息可以通过对学习者的历史视频数据以及对应的历史音频数据分析获得。历史表现信息是与当前表现信息相同格式的一组向量。
在一个可选实现方式中,可以将当前表现信息和历史表现信息组成的集合进行 无监督的聚类分析,获得当前表现信息与历史表现信息的差异度信息作为分类信息。无监督聚类可以采用例如K均值聚类、核K均值聚类、谱聚类等方法。
在另一个可选实现方式中,可以将历史表现信息和对应的第一评价参数作为样本,来训练获得分类模型。该分类模型的输入为作为表现信息的向量,输出为第一评价参数。其中,样本中的第一评价参数可以是人工标注的,也可以一部分是由原有的分类模型计算获得的,一部分是人工标注的。通过不断增加新的样本数据修正分类模型,就可以不断提高分类模型进行评估的客观性和准确性。由此,可以将当前表现信息输入到分类模型中,获得对应的第一评价参数。对于本实施例,可以采用例如SVM(支持向量机,Support Vector Machines)、线性回归、逻辑回归、朴素贝叶斯、线性判别分析、决策树、K-NN(K-临近,K-nearest neighbor analysis)等各种现有的建模方式建立所述分类模型。
在本实施例中,第一评价参数可以是一个评价分数值,也可以是多个不同维度的评价分数组成的向量。例如,包括学习态度、主动性、扩展性等多个方面评价分数的向量。
上述实现方式提供了一种无监督的分类评价模型来进行分类。实际上,也可以采用其他的无监督的分类方式来获取评价参数。例如,可以将所有的视频数据和音频数据提取的第一结构化信息和第二结构化信息分别进行无监督聚类,基于无监督聚类结果来计算评价参数。无监督聚类可以采用例如K均值聚类、核K均值聚类、谱聚类等方法。
对于步骤S4000,根据当前表现信息对不同学习者的表现信息进行横向比较,以获取被评估的学习者的第二评价参数。第二评价参数可以呈现被评估的学习者与其他参加相同在线学习课程的学习者课堂表现的对比。其中,不同学习者的表现信息可以从一个或多个不同学习者的适配数据和音频数据中提取获得。
在一个可选实现方式中,可以将当前表现信息和不同学习者的表现信息组成的集合进行无监督的聚类分析,获得当前表现信息与其他学习者的表现信息的差异度信息作为分类信息。无监督聚类可以采用例如K均值聚类、核K均值聚类、谱聚类等方法。
在另一个可选实现方式中,可以将不同学习者的表现信息和对应的第二评价参数作为样本,来训练获得分类模型,该分类模型的输入为作为表现信息的向量,输出为第二评价参数。由此,可以将当前表现信息输入到分类模型中,获得对应的第二评价 参数。对于本实施例,可以采用例如SVM(支持向量机,Support Vector Machines)、线性回归、逻辑回归、朴素贝叶斯、线性判别分析、决策树、K-NN(K-临近,K-nearest neighbor analysis)等各种现有的建模方式建立所述分类模型。
利用所述第一评价参数和第二评价参数可以进一步获取一个统一的学习效果评价信息。
在一个可选实现方式中,可以仅根据第一评价参数和第二评价参数来获取学习效果评价信息。具体地,可以对第一评价参数和第二评价参数直接加权求和,或对其元素进行加权求和获取学习效果评价信息。
在另一些可选实现方式中,可以将第一评价参数和第二评价参数与其它与学习效果相关的参数结合来获取学习效果评价信息。
例如,可以根据所述第一评价参数、所述第二评价参数和对应课标的考核结果信息获取所述学习效果评价信息。其中,对应课标的考核结果信息可以是基于计算机测试获得的知识点测试信息。与上面所述的可选实现方式类似,在第一评价参数和第二评价参数为数值时,可以通过将上述参量加权求和获取学习效果评价信息。在第一评价参数和第二评价参数为向量时,可以对其元素进行加权求和获取学习效果评价信息。
又例如,可以根据所述第一评价参数、所述第二评价参数和人工评价信息获取所述学习效果评价信息。其中,人工评价信息是教学者在课后以人工方式从不同维度对学习者课堂表现的打分。在本实施例在线教学系统中,可以通过教学者操作的第一客户端1来进行打分操作。由此,可以综合人工评价与机器评价,从而获得更为综合性的评价。
又例如,可以根据所述第一评价参数、所述第二评价参数和学习者属性信息获取所述学习效果评价信息。其中,学习者的属性信息可以包括学习者的性格分类以及原有学习水平等信息。这些信息可以用来影响或调节第一评价参数和第二评价参数或他们的某元素的权重。例如,在学习者的性格分类为内向型时,斯虽然表现信息中的语音时长信息可能会相对少,但是课堂表现仍然具有较好的水平。在这个例子中,可以通过引入学习者属性信息来对第二评价参数进行调整,以使得结合学习者的固有性格来更准确地评估其学习效果。
又例如,可以根据所述第一评价参数、所述第二评价参数、学习者属性信息、对应课标的考核结果信息以及人工评价信息获取所述学习效果评价信息。由此,最大限度地提高学习效果评价信息的准确性和客观性。
可选地,还可以通过输出设备向数据分析人员呈现所述第一评价参数、所述第二评价参数、学习者属性信息、对应课标的考核结果信息以及人工评价信息以及对应的学习效果评价信息,以帮助教学组织者对学习者的做出恰当的学习建议。
图9是本发明第四实施例的电子设备的示意图。图9所示的电子设备为通用数据处理装置,其包括通用的计算机硬件结构,其至少包括处理器91和存储器92。处理器91和存储器92通过总线93连接。存储器92适于存储处理器91可执行的指令或程序。处理器91可以是独立的微处理器,也可以是多个微处理器的集合。由此,处理器91通过执行存储器92所存储的命令,从而执行如上所述的本发明实施例的方法流程实现对于数据的处理和对于其他装置的控制。总线93将上述多个组件连接在一起,同时将上述组件连接到显示控制器94和显示装置以及输入/输出(I/O)装置94。输入/输出(I/O)装置94可以是鼠标、键盘、调制解调器、网络接口、触控输入装置、体感输入装置、打印机以及本领域公知的其他装置。典型地,输入/输出(I/O)装置95通过输入/输出(I/O)控制器96与系统相连。
其中,存储器92可以存储软件组件,例如操作系统、通信模块、交互模块以及应用程序。以上所述的每个模块和应用程序都对应于完成一个或多个功能和在发明实施例中描述的方法的一组可执行程序指令。
上述根据本发明实施例的方法、设备(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应理解,流程图和/或框图的每个块以及流程图图例和/或框图中的块的组合可以由计算机程序指令来实现。这些计算机程序指令可以被提供至通用计算机、专用计算机或其它可编程数据处理设备的处理器,以产生机器,使得(经由计算机或其它可编程数据处理设备的处理器执行的)指令创建用于实现流程图和/或框图块或块中指定的功能/动作的装置。
同时,如本领域技术人员将意识到的,本发明实施例的各个方面可以被实现为系统、方法或计算机程序产品。因此,本发明实施例的各个方面可以采取如下形式:完全硬件实施方式、完全软件实施方式(包括固件、常驻软件、微代码等)或者在本文中通常可以都称为“电路”、“模块”或“系统”的将软件方面与硬件方面相结合的实施方式。此外,本发明的方面可以采取如下形式:在一个或多个计算机可读介质中实现的计算机程序产品,计算机可读介质具有在其上实现的计算机可读程序代码。
可以利用一个或多个计算机可读介质的任意组合。计算机可读介质可以是计算机可读信号介质或计算机可读存储介质。计算机可读存储介质可以是如(但不限于)电 子的、磁的、光学的、电磁的、红外的或半导体系统、设备或装置,或者前述的任意适当的组合。计算机可读存储介质的更具体的示例(非穷尽列举)将包括以下各项:具有一根或多根电线的电气连接、便携式计算机软盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或闪速存储器)、光纤、便携式光盘只读存储器(CD-ROM)、光存储装置、磁存储装置或前述的任意适当的组合。在本发明实施例的上下文中,计算机可读存储介质可以为能够包含或存储由指令执行系统、设备或装置使用的程序或结合指令执行系统、设备或装置使用的程序的任意有形介质。
计算机可读信号介质可以包括传播的数据信号,所述传播的数据信号具有在其中如在基带中或作为载波的一部分实现的计算机可读程序代码。这样的传播的信号可以采用多种形式中的任何形式,包括但不限于:电磁的、光学的或其任何适当的组合。计算机可读信号介质可以是以下任意计算机可读介质:不是计算机可读存储介质,并且可以对由指令执行系统、设备或装置使用的或结合指令执行系统、设备或装置使用的程序进行通信、传播或传输。
用于执行针对本发明各方面的操作的计算机程序代码可以以一种或多种编程语言的任意组合来编写,所述编程语言包括:面向对象的编程语言如Java、Smalltalk、C++、PHP、Python等;以及常规过程编程语言如“C”编程语言或类似的编程语言。程序代码可以作为独立软件包完全地在用户计算机上、部分地在用户计算机上执行。部分地在用户计算机上且部分地在远程计算机上执行;或者完全地在远程计算机或服务器上执行。在后一种情况下,可以将远程计算机通过包括局域网(LAN)或广域网(WAN)的任意类型的网络连接至用户计算机,或者可以与外部计算机进行连接(例如通过使用因特网服务供应商的因特网)。
以上所述仅为本发明的优选实施例,并不用于限制本发明,对于本领域技术人员而言,本发明可以有各种改动和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (20)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    从视频数据中提取第一结构化信息,所述视频数据为在线教学过程中录制的学习者视频,所述第一结构化信息包括视频数据中的人脸信息和/或动作信息;
    从所述视频数据对应的音频数据中提取第二结构化信息,所述第二结构化信息包括音频数据中的语音识别信息;以及
    根据所述第一结构化信息和所述第二结构化信息获取第一评价参数。
  2. 根据权利要求1所述的数据处理方法,其特征在于,所述根据所述第一结构化信息和所述第二结构化信息获取第一评价参数包括:
    根据所述第一结构化信息、所述第二结构化信息和分类评价模型获取对于所述视频数据和所述音频数据的第一评价参数。
  3. 根据权利要求1所述的数据处理方法,其特征在于,所述人脸信息包括人脸位置信息、表征检测到人脸的信息和人脸表情分类信息中的至少一项。
  4. 根据权利要求1所述的数据处理方法,其特征在于,所述动作信息包括手的轨迹信息。
  5. 根据权利要求1所述的数据处理方法,其特征在于,所述语音识别信息包括语音时长信息、语音信息对应的文本信息和对话次数信息和对话方切换过程中学习者语音的停顿时间信息中的至少一项。
  6. 根据权利要求1所述的数据处理方法,其特征在于,所述从视频数据中提取第一结构化信息包括:
    结合课件操作数据从视频数据中提取所述第一结构化信息;
    其中,课件操作数据包括课件的操作记录。
  7. 根据权利要求6所述的数据处理方法,其特征在于,所述结合课件操作数据从视频数据中提取所述第一结构化信息包括:
    根据课件操作数据将时间轴划分为多个课件操作区块;
    根据划分获得的课件操作区块从所述视频数据的对应部分中提取对应的第一相关信息;其中,所述第一相关信息包括所述视频数据的对应部分中的人脸信息和/或动作信息;
    根据各课件操作区块的所述第一相关信息获取所述第一结构化信息。
  8. 根据权利要求1所述的数据处理方法,其特征在于,所述从所述视频数据对 应的音频数据中提取第二结构化信息包括:
    结合课件操作数据从所述音频数据中提取所述第二结构化信息;
    其中,课件操作数据包括课件的操作记录。
  9. 根据权利要求8所述的数据处理方法,其特征在于,所述结合课件操作数据从所述音频数据中提取所述第二结构化信息包括:
    根据课件操作数据将时间轴划分为多个课件操作区块;
    根据划分获得的课件操作区块从所述音频数据的对应部分中提取对应的第二相关信息;其中,所述第二相关信息包括所述音频数据的对应部分中的语音识别信息;
    根据各课件操作区块的所述第二相关信息获取所述第二结构化信息。
  10. 根据权利要求2所述的数据处理方法,其特征在于,所述根据所述第一结构化信息、所述第二结构化信息和分类评价模型获取对于所述视频数据和所述音频数据的第一评价参数包括:
    将第一结构化信息与所述分类评价模型的第一平均状态信息比较获取第一比较参数,其中,所述第一平均状态信息根据历史视频数据对应的第一结构化信息获得;
    将所述第二结构化信息和与所述分类评价模型的第二状态信息比较获取第二比较参数,其中,所述第二平均状态信息根据历史音频数据对应的第二结构化信息获得;
    根据所述第一比较参数和所述第二比较参数加权求和获取所述第一评价参数。
  11. 根据权利要求2所述的数据处理方法,其特征在于,所述分类评价模型根据带有标注的第一结构化信息样本和带有标注的第二结构化信息样本训练获得,所述分类评价模型以第一结构化信息和第二结构化信息为输入参数,以所述第一评价参数为输出参数;其中,所述第一结构化信息样本包括历史视频数据对应的第一结构化信息,所述第二结构化信息样本包括历史音频数据对应的第二结构化信息。
  12. 根据权利要求1所述的数据处理方法,其特征在于,所述第一评价参数用于表征所述当前表现信息相对于同一学习者的历史表现信息的分类信息;
    所述根据所述第一结构化信息和所述第二结构化信息获取第一评价参数包括:
    根据所述第一结构化信息和所述第二结构化信息获取学习者的当前表现信息;以及
    根据所述当前表现信息获取所述第一评价参数。
  13. 根据权利要求12所述的数据处理方法,其特征在于,所述方法还包括:
    根据所述当前表现信息获取第二评价参数,其中,所述第二评价参数用于表征所 述当前表现信息相对于不同学习者的表现信息的分类信息。
  14. 根据权利要求12所述的数据处理方法,其特征在于,所述当前表现信息和所述历史表现信息包括根据第一结构化信息获得的预定类别的人脸表情的数量信息和预定人脸姿态的数量信息中的至少一项。
  15. 根据权利要求12所述的数据处理方法,特征在于,所述当前表现信息和所述历史表现信息包括根据第二结构化信息获得的对话次数信息、学习者的语音时长信息、学习者语音时长与教学者语音时长的比例信息、文本信息的特征向量和对话方切换时学习者语音的停顿时间信息中的至少一项。
  16. 根据权利要求13所述的数据处理方法,其特征在于,所述方法还包括:
    根据所述第一评价参数和所述第二评价参数获取学习效果评价信息。
  17. 根据权利要求16所述的数据处理方法,其特征在于,根据所述第一评价参数和所述第二评价参数获取学习效果评价信息包括:
    根据所述第一评价参数、所述第二评价参数和对应课标的考核结果信息获取所述学习效果评价信息;或者,
    根据所述第一评价参数、所述第二评价参数和人工评价信息获取所述学习效果评价信息;或者,
    根据所述第一评价参数、所述第二评价参数和学习者属性信息获取所述学习效果评价信息;或者,
    根据所述第一评价参数、所述第二评价参数、学习者属性信息、对应课标的考核结果信息以及人工评价信息获取所述学习效果评价信息。
  18. 根据权利要求1所述的数据处理方法,其特征在于,所述视频数据为一次在线教学过程的视频数据;或者,
    所述视频数据为一次在线教学过程的一个部分所对应的视频数据的片段;或者,
    所述视频数据为一个教学单元对应的多次在线教学过程的视频数据的集合。
  19. 一种计算机可读存储介质,其上存储计算机程序指令,其特征在于,所述计算机程序指令在被处理器执行时实现如权利要求1-18中任一项所述的方法。
  20. 一种电子设备,包括存储器和处理器,其特征在于,所述存储器用于存储一条或多条计算机程序指令,其中,所述一条或多条计算机程序指令被所述处理器执行以实现如权利要求1-18中任一项所述的方法。
PCT/CN2019/083368 2018-07-03 2019-04-19 数据处理方法、存储介质和电子设备 WO2020007097A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201810718955.9 2018-07-03
CN201810718955.9A CN108898115B (zh) 2018-07-03 2018-07-03 数据处理方法、存储介质和电子设备
CN201810759328.XA CN109063587B (zh) 2018-07-11 2018-07-11 数据处理方法、存储介质和电子设备
CN201810759328.X 2018-07-11

Publications (1)

Publication Number Publication Date
WO2020007097A1 true WO2020007097A1 (zh) 2020-01-09

Family

ID=69060750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/083368 WO2020007097A1 (zh) 2018-07-03 2019-04-19 数据处理方法、存储介质和电子设备

Country Status (1)

Country Link
WO (1) WO2020007097A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831704A (zh) * 2020-05-21 2020-10-27 北京嘀嘀无限科技发展有限公司 异常数据的确定方法、装置、存储介质和电子设备
CN112560663A (zh) * 2020-12-11 2021-03-26 南京谦萃智能科技服务有限公司 教学视频打点方法、相关设备及可读存储介质
CN114297488A (zh) * 2021-12-28 2022-04-08 海信集团控股股份有限公司 推荐竞赛题目的方法及装置
CN116452072A (zh) * 2023-06-19 2023-07-18 华南师范大学 一种教学评价方法、系统、设备和可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350062A (zh) * 2008-08-05 2009-01-21 浙江大学 一种基于视频的快速人脸检测方法
CN102103694A (zh) * 2009-12-21 2011-06-22 展讯通信(上海)有限公司 基于视频的人脸实时检测方法及其装置
CN107197384A (zh) * 2017-05-27 2017-09-22 北京光年无限科技有限公司 应用于视频直播平台的虚拟机器人多模态交互方法和系统
CN108898115A (zh) * 2018-07-03 2018-11-27 北京大米科技有限公司 数据处理方法、存储介质和电子设备
CN109063587A (zh) * 2018-07-11 2018-12-21 北京大米科技有限公司 数据处理方法、存储介质和电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350062A (zh) * 2008-08-05 2009-01-21 浙江大学 一种基于视频的快速人脸检测方法
CN102103694A (zh) * 2009-12-21 2011-06-22 展讯通信(上海)有限公司 基于视频的人脸实时检测方法及其装置
CN107197384A (zh) * 2017-05-27 2017-09-22 北京光年无限科技有限公司 应用于视频直播平台的虚拟机器人多模态交互方法和系统
CN108898115A (zh) * 2018-07-03 2018-11-27 北京大米科技有限公司 数据处理方法、存储介质和电子设备
CN109063587A (zh) * 2018-07-11 2018-12-21 北京大米科技有限公司 数据处理方法、存储介质和电子设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831704A (zh) * 2020-05-21 2020-10-27 北京嘀嘀无限科技发展有限公司 异常数据的确定方法、装置、存储介质和电子设备
CN111831704B (zh) * 2020-05-21 2023-12-08 北京嘀嘀无限科技发展有限公司 异常数据的确定方法、装置、存储介质和电子设备
CN112560663A (zh) * 2020-12-11 2021-03-26 南京谦萃智能科技服务有限公司 教学视频打点方法、相关设备及可读存储介质
CN114297488A (zh) * 2021-12-28 2022-04-08 海信集团控股股份有限公司 推荐竞赛题目的方法及装置
CN116452072A (zh) * 2023-06-19 2023-07-18 华南师范大学 一种教学评价方法、系统、设备和可读存储介质
CN116452072B (zh) * 2023-06-19 2023-08-29 华南师范大学 一种教学评价方法、系统、设备和可读存储介质

Similar Documents

Publication Publication Date Title
CN109063587B (zh) 数据处理方法、存储介质和电子设备
Dewan et al. Engagement detection in online learning: a review
WO2020007097A1 (zh) 数据处理方法、存储介质和电子设备
CN108898115B (zh) 数据处理方法、存储介质和电子设备
WO2019218427A1 (zh) 基于行为特征对比的关注度检测方法以及装置
US10176365B1 (en) Systems and methods for multi-modal performance scoring using time-series features
CN110069707A (zh) 一种人工智能自适应互动教学系统
US11475788B2 (en) Method and system for evaluating and monitoring compliance using emotion detection
CN110275987A (zh) 智能教学顾问生成方法、系统、设备及存储介质
US20230178217A1 (en) Adjusting mental state to improve task performance and coaching improvement
CN115239527A (zh) 基于知识库的教学特征融合与建模的教学行为分析系统
WO2020010882A1 (zh) 数据处理方法、存储介质和电子设备
US20230351907A1 (en) Artificial intelligence system to provide synergistic education linked to drawing skill evaluation and art education and writing education
CN113920534A (zh) 一种视频精彩片段提取方法、系统和存储介质
CN110956142A (zh) 一种智能交互培训系统
Villegas-Ch et al. Identification of emotions from facial gestures in a teaching environment with the use of machine learning techniques
US20180005116A1 (en) Method and system for automatic real-time identification and presentation of analogies to clarify a concept
KR102330159B1 (ko) 수업 행동 패턴 분석을 통한 온라인 수업 집중도 평가 시스템 및 그 평가 방법
Seneviratne et al. Student and lecturer performance enhancement system using artificial intelligence
Hsiao et al. Toward automating oral presentation scoring during principal certification program using audio-video low-level behavior profiles
CN111563697A (zh) 一种线上课堂学生情绪分析方法和系统
Vishnumolakala et al. In-class student emotion and engagement detection system (iSEEDS): an AI-based approach for responsive teaching
Gadaley et al. Classroom Engagement Evaluation Using Multi-sensor Fusion with a 180∘ Camera View
Dimitriadou et al. A Systematic Approach for Automated Lecture Style Evaluation Using Biometric Features
Utami et al. A Brief Study of The Use of Pattern Recognition in Online Learning: Recommendation for Assessing Teaching Skills Automatically Online Based

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19829958

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19829958

Country of ref document: EP

Kind code of ref document: A1