WO2022012605A1 - 基于预训练深度神经网络模型的用户流失预测系统 - Google Patents

基于预训练深度神经网络模型的用户流失预测系统 Download PDF

Info

Publication number
WO2022012605A1
WO2022012605A1 PCT/CN2021/106382 CN2021106382W WO2022012605A1 WO 2022012605 A1 WO2022012605 A1 WO 2022012605A1 CN 2021106382 W CN2021106382 W CN 2021106382W WO 2022012605 A1 WO2022012605 A1 WO 2022012605A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
corpus
sub
user
classroom
Prior art date
Application number
PCT/CN2021/106382
Other languages
English (en)
French (fr)
Inventor
王鑫
许昭慧
Original Assignee
上海松鼠课堂人工智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海松鼠课堂人工智能科技有限公司 filed Critical 上海松鼠课堂人工智能科技有限公司
Publication of WO2022012605A1 publication Critical patent/WO2022012605A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Definitions

  • the present application relates to the field of artificial intelligence education technology, for example, to a user churn prediction system based on a pre-trained deep neural network model.
  • the user churn prediction model is usually a model obtained by extracting the basic information and behavior information of the user, according to the characteristics of the user in a small training set, the label of the churn user, and the training algorithm.
  • the above model is based on the behavior data of students interacting with the learning system to infer the list of users who may be lost, ignoring the most important interaction between teachers and students in the online classroom, and it cannot provide real-time guidance and guidance for online classrooms. adjustment effect.
  • Analyzing teacher-student communication in online classrooms is an important source and basis for understanding teacher-student interactions and conducting teaching/service interventions.
  • Classroom discourse analysis technology is usually a statistical analysis of the texts of teachers and students’ historical posts in online forums. To identify problems with the language characteristics of students, the classroom discourse analysis technology cannot predict the loss of speech between students and teachers (human teachers, virtual artificial intelligence (AI) teachers) in real-time online classrooms.
  • the present application provides a user churn prediction system based on a pre-trained deep neural network model, including: a course teaching platform module, a corpus collection module, a churn state module, a prediction algorithm module, a real-time monitoring module and an offline query module, wherein,
  • the course teaching platform module is set to provide the work terminal platform about the predetermined education course to the teacher object and the student object respectively;
  • the corpus collection module is configured to collect the speech dialogue, text dialogue and teaching video of the teacher object and the student object in the classroom during the operation of the working terminal platform;
  • the loss status module is configured to mark the classroom corpus collected by the corpus acquisition module, mark the classroom corpus of the lost user as lost, and mark the classroom corpus collected by the corpus acquisition module except for the lost user's classroom corpus.
  • the classroom corpus is marked as not lost;
  • the prediction algorithm module is configured to calculate the user loss prediction result through the deep neural network model after training according to the classroom corpus collected by the corpus collection module and the classroom corpus marked by the loss status module;
  • the real-time monitoring module is configured to perform real-time monitoring on the course teaching platform module, and present the user loss prediction result calculated by the prediction algorithm module to the teacher object;
  • the offline query module is configured to provide the obtained user churn prediction result to the query object according to the query condition.
  • FIG. 1 is a schematic structural diagram of a user churn prediction system based on a pre-trained deep neural network model provided by an embodiment of the present application;
  • FIG. 2 is a schematic diagram showing a user churn prediction result according to an embodiment of the present application.
  • a user churn prediction system based on a pre-trained deep neural network model, comprising: a course teaching platform module, a corpus collection module, a churn state module, a prediction algorithm module, a real-time monitoring module and an offline query module, wherein , the course teaching platform module is set to provide teacher objects and student objects with work terminal platforms about predetermined educational courses respectively; the corpus collection module is set to collect the teacher objects and the The voice dialogue, text dialogue and teaching video of the student object in the classroom; the loss status module is set to mark the classroom corpus collected by the corpus collection module, mark the classroom corpus of the lost user as lost, and collect the corpus In the classroom corpus collected by the module, the classroom corpus other than the lost user's classroom corpus is marked as not lost; the prediction algorithm module is set to be based on the classroom corpus collected by the corpus collection module and the class corpus marked by the loss status module.
  • the real-time monitoring module is set to monitor the course teaching platform module in real time, and display the user loss prediction result calculated by the prediction algorithm module to all the teacher object;
  • the offline query module is configured to provide the query object with the obtained user churn prediction result according to the query condition.
  • the corpus collection module includes a word processing sub-module, a speech-to-word processing sub-module, and a teaching video extraction audio-to-word processing sub-module; wherein, the word processing sub-module is configured to store the teacher object and the student object in the The text dialogue collection and storage on the above-mentioned course teaching platform module, and the collected and stored text dialogues correspond to student information, class information, and time stamps; the speech-to-word processing sub-module is set to receive the real-time monitoring module Collect the voice dialogue between the teacher object and the student object on the course teaching platform module, and store the voice dialogue, convert the stored voice format of the voice dialogue into a text format through voice recognition, and store Described voice dialogue after format conversion, and described voice dialogue after format conversion and student information, class information, and time stamp are done corresponding; Described instructional video extracts audio frequency to word processing submodule and is set to be in described student object In the case where the teaching video is clicked to play in the course teaching platform module, the audio of the teaching video
  • the churn state module includes an update churn state sub-module and a churn state labelling sub-module; wherein, the update churn state sub-module is configured to receive a list of lost users sent by the sales system, and locate the courses that the lost user has taken;
  • the labeling loss state sub-module is set to be based on the courses that the lost user has taken located according to the update loss state sub-module, and the class corresponding to the course that the lost user has taken is extracted from the corpus collection module. corpus, mark the classroom corpus corresponding to the courses that the lost user has taken as lost, and mark the classroom corpus collected by the corpus collection module except the classroom corpus corresponding to the courses that the lost user has taken. not lost.
  • the definition of user churn may not be limited to training schools, but may also include dropouts in the school system, and the sales system may be understood as a data source with a list of lost users.
  • the prediction algorithm module includes a parameter information sub-module and a corpus loss prediction sub-module; wherein the parameter information sub-module is configured to store the parameters of the trained deep neural network model obtained after training the pre-trained deep neural network model , wherein the parameters are updated irregularly with the iterative training of the pre-trained deep neural network model; the corpus loss prediction sub-module is set to be collected according to the parameters stored in the parameter information sub-module and the corpus collection module Classroom corpus collected by the corpus acquisition module is classified by the trained deep neural network model, so as to judge whether there is a user loss state in the current education course in the process of teacher-student communication.
  • the real-time monitoring module includes a teacher-student communication monitoring sub-module, a prediction result display sub-module and a statistics sub-module; wherein the teacher-student communication monitoring sub-module is set to the teacher object and all the teacher objects in the predetermined education classroom.
  • the prediction result display sub-module is set to receive the user loss prediction result sent by the prediction algorithm module, Show the user loss prediction result to the teacher object, wherein the user loss prediction result is used to represent the possibility of user loss caused by the sentences in the classroom corpus collected by the corpus collection module;
  • the statistics sub-module is set to Statistics are made according to the user churn prediction result calculated by the prediction algorithm module, so that the teacher object can learn from the statistical data information that is helpful for the teacher object to adjust the tutoring strategy.
  • the offline query module includes a query sub-module, a prediction result display sub-module and a statistics sub-module; wherein the query sub-module is configured to receive the query condition input by the query object; the prediction result display sub-module is configured to In order to receive the user loss prediction result corresponding to the query condition sent by the prediction algorithm module according to the query condition, and display the classroom screen recording corresponding to the query condition and the loss prediction information of the teacher-student exchange corpus; the statistics The sub-module is configured to make statistics according to the user churn prediction result, so that the query object can know the relevant information required by the query object from the statistical data.
  • the prediction algorithm module is also set to use unsupervised corpus to carry out the pre-training (Pre-training) of the language model, wherein, the pre-training of the language model includes utilizing a converter-based bidirectional encoding representation (Bidirectional Encoder Representations from Transformers, BERT).
  • the self-attention mechanism as a context encoder, encode the teacher-student corpus to obtain the semantic vector representation of the teacher-student corpus, and pre-train the language model according to the semantic vector representation;
  • the labeled corpus of the number of unsupervised corpora fine-tunes the pre-trained language model (Fine-tuning), and adds a layer of network processing classification tasks on the basis of the pre-trained language model to judge whether the teacher-student corpus has caused the loss of users possible.
  • the pre-trained language model may also include language model embeddings (Embeddings from Language Models, ELMo), generative pre-training (Generative Pre-Training, GPT), knowledge-enhanced semantic representation model (Enhanced Representation from kNowledge IntEgration, ERNIE ) etc, BERT is just an example.
  • ELMo Language Models
  • GPT Geneative Pre-Training
  • ERNIE knowledge-enhanced semantic representation model
  • BERT is just an example.
  • the BERT is a new language model proposed by Google in October 2018.
  • the full name is Bidirectional Encoder Representations from Transformers.
  • BERT pre-trains deep bidirectional representations by jointly adjusting the left and right contexts at all layers. Sentences are used as input to enhance the understanding of long-range semantics, and BERT can be fine-tuned to be widely used in multi-class tasks, only need to add an additional output layer, without the need for task-specific model structure adjustment.
  • Step S1 the computer equipment builds a corpus based on the pre-training model BERT-Base provided by Google, and Chinese.
  • the Chinese version of the pre-training model is trained from the Wikipedia corpus. After loading the pre-training model, the trained model can be directly output. Word vector or sentence vector. This application uses this pre-training model to obtain sentence vectors and use them as inputs to subsequent network models.
  • Step S2 Use the BERT model to initialize the initial weight of the network, plus the data set of the domain task, that is, the labeled corpus of the loss state module, and continuously adjust the weight of the original model by continuing back-propagation training on the network.
  • Step S3 This application is a single-sentence classification task.
  • a network for processing the classification task is added, which can be a softmax network, a decision tree, a support vector machine (SVM) or other methods that can handle two classifications.
  • SVM support vector machine
  • the model of the problem is not limited in this application.
  • Step S4 The method of adding a layer of softmax network on the basis of the language model is to take the output representation of the first token (ie, the output representation of the [CLS] symbol), and feed it to a softmax layer to obtain the output of the classification result.
  • the output representation of the first token ie, the output representation of the [CLS] symbol
  • the user churn prediction system of the present application provides teacher objects and student objects with work terminal platforms for predetermined educational courses respectively through the course teaching platform module.
  • the lost user's classroom corpus is marked as lost through the loss status module, and the rest are marked as not lost, and then the prediction algorithm module is used based on the classroom corpus collected by the corpus acquisition module and the classroom marked by the loss status module.
  • the corpus, the user churn prediction result is calculated through the trained deep neural network model, and finally the real-time monitoring module is used to monitor the course teaching platform module in real time, and the user churn prediction result calculated by the prediction algorithm module is displayed to the teacher object, and can be accessed through the real-time monitoring module.
  • the offline query module provides the obtained user churn prediction result to the query object according to the query conditions. It can be seen that the user churn prediction system based on the pre-trained deep neural network model performs corresponding monitoring on the work terminal platform of the teacher object and the work terminal platform of the student object. The neural network model calculates the prediction results of user churn, which can remind users when there is a possibility of user churn, thereby ensuring good interaction between teachers and students in online education and reducing user churn.
  • the user churn prediction system based on the pre-trained deep neural network model provides the teacher object and the student object with the work terminal platform about the predetermined education course respectively through the course teaching platform module, and the student object clicks the Play the instructional video to obtain instructional instruction, and the instructional video extraction audio-to-word processing sub-module extracts what the student object listens to according to the time stamps of the start and end of the playback when the student object clicks to play the instructional video in the course teaching platform module.
  • the audio of the teaching video, and then the voice format of the teaching video is converted into a text format through speech recognition and stored, and corresponds to the student information, class information, and time stamp.
  • the "user churn” in this application means that the user stops learning before the expected course is completed, or the user has a long-term interruption of learning after the expected course is completed. "User churn” has different words in different scenarios or fields. In the training industry, users who apply for withdrawal/refund before the end of the purchased course, as well as dropout/suspend in the school system, can be "users" lost”.
  • the course teaching platform can be understood as any platform for teachers and students to communicate.
  • the instructional video extraction audio-to-word processing submodule can also be applied to the scene of recording teacher-student exchange and dialogue in smart classrooms or live classes, that is, as long as the teacher-student audio can be collected in the scene where the instructional video can be used. Extract the speech-to-text technology of the audio-to-word processing sub-module.
  • the student object and the teacher object conduct text communication and discussion through the course teaching platform module.
  • the teacher object can be a human teacher, or a virtual AI teacher who can interact with the student object. After collecting and storing the text exchange dialogue with the student object on the course teaching platform module, it corresponds to the student information, class information, and time stamp.
  • the student object and the teacher object conduct voice communication and discussion through the course teaching platform module.
  • the teacher object can be a human teacher, or a virtual AI teacher who can interact with the student object.
  • the voice format of the voice communication dialogue is converted into a text format through speech recognition and stored, and corresponds to the student information, class information, and time stamp.
  • the corpus acquisition module Through the corpus acquisition module, the conversational speech, text and teaching video of the teacher object and the student object in the classroom are obtained, and the lost user's classroom corpus is marked as lost through the loss status module, and the rest are marked as not lost.
  • Churn annotation can be defined differently according to business indicators, and the same user may have different churn in different disciplines or courses.
  • the classroom corpus can be understood as teacher-student communication corpus, and the corpus can be collected from the scene of classroom teaching, the scene of providing services, or the scene of providing tutoring.
  • the user loss prediction result is returned through model calculation, and finally the real-time monitoring module is used to monitor the course teaching platform module in real time, and the user loss prediction result of the prediction algorithm module is calculated. Show to the teacher object.
  • the real-time monitoring module includes a teacher-student communication monitoring sub-module, a prediction result display sub-module and a statistics sub-module; wherein, the teacher-student communication monitoring sub-module is to use sentences as a unit to collect the corpus when teachers and students start to communicate in the classroom.
  • the prediction result display sub-module is to receive the user loss prediction result returned by the prediction algorithm module, and use a simple and clear visualization method to allow the teacher object to see which sentence has the possibility of causing user loss;
  • the statistics sub-module is to make statistics according to the prediction results of user loss, so that the teacher object can learn from the statistical data information that is helpful for the teacher object to adjust the tutoring strategy, such as: the total number of lost corpus, the total number of dialogues, the number of teachers and students speaking. And information such as proportion and high-frequency words of lost corpus.
  • FIG. 2 is a schematic diagram showing a user churn prediction result provided by an embodiment of the present application.
  • the user churn prediction result is next to each teacher-student communication information box, and circles of different colors represent the churn prediction state, such as :
  • the user churn prediction result of this sentence is churn, which can be marked in red; the user churn prediction result of this sentence is non-churn, which can be marked in green, and in other cases, it can be marked in gray.
  • statistical data can be displayed in the statistical information area of FIG. 2 , for example, the teacher subject loses high-frequency words in terms, so that the teacher subject can adjust his tutoring terms for the student subject in real time.
  • the teacher object is a virtual AI teacher
  • the user churn prediction result returned by the prediction algorithm module will be sent to the conversation strategy processing unit of the virtual AI teacher.
  • the virtual AI teacher can adjust the terms of communication with the student object according to the corresponding conversation strategy to avoid user loss. .
  • the offline query module provides the queryer with the obtained user churn prediction result according to the query conditions.
  • the offline query module includes a query sub-module, a prediction result display sub-module and a statistics sub-module; wherein, the query sub-module is provided to the query object for query condition input; the prediction result display sub-module is used to receive prediction algorithms
  • the user churn prediction result returned by the module is provided to the query object as a list of lost users, and information such as classroom recordings corresponding to the list of lost users, loss prediction of teacher-student communication corpus, etc. can be viewed; the statistical sub-module is based on the user loss prediction results.
  • Statistics so that the query object can know the relevant information required by the query object from the statistical data. This data can also be combined with other databases (eg, demographics of student subjects, process data of learning systems, etc.) for in-depth analysis using one or more statistical models.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

本文公开了一种基于预训练深度神经网络模型的用户流失预测系统,包括:课程教授平台模块向教师对象和学生对象提供关于预定教育课程的工作终端平台;语料采集模块在工作终端平台运行过程中采集教师对象和学生对象在课堂中的语音对话、文字对话和教学视频;流失状态模块将流失用户的课堂语料标注为流失,将其余课堂语料标注为未流失;预测算法模块根据语料采集模块采集的课堂语料以及流失状态模块标注的课堂语料,通过训练后的深度神经网络模型计算用户流失预测结果;实时监控模块对课程教授平台模块进行实时监控,并将预测算法模块计算出的用户流失预测结果展现给教师对象。该用户流失预测系统达到了调整教育辅导策略的效果。

Description

基于预训练深度神经网络模型的用户流失预测系统
本申请要求在2020年07月16日提交中国专利局、申请号为202010689193.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能教育技术领域,例如涉及一种基于预训练深度神经网络模型的用户流失预测系统。
背景技术
用户流失预测模型通常是对用户的基本信息、行为信息进行特征提取,根据少量的训练集用户的特征、流失用户标签、以及训练算法得到的模型。上述模型是从学生和学习系统互动的行为数据推测可能流失的用户名单,忽略了在线课堂中最主要的老师和学生的教与学的交流互动过程,且无法对在线课堂起到实时的指导与调整的作用。分析在线课堂的师生交流是了解师生交互情况和进行教学/服务干预的重要来源与依据,课堂话语分析技术通常是对在线论坛中师生的历史贴文的文字进行统计分析,通过确定教师和学生的语言特性找出问题,该课堂话语分析技术无法对实时在线课堂中的学生与老师(人类老师、虚拟人工智能(Artificial Intelligence,AI)老师)的交流语音进行流失预测。
发明内容
本申请提供一种基于预训练深度神经网络模型的用户流失预测系统,包括:课程教授平台模块、语料采集模块、流失状态模块、预测算法模块、实时监控模块和离线查询模块,其中,
所述课程教授平台模块设置为向教师对象和学生对象分别提供关于预定教育课程的工作终端平台;
所述语料采集模块设置为在所述工作终端平台运行过程中采集所述教师对象和所述学生对象在课堂中的语音对话、文字对话和教学视频;
所述流失状态模块设置为对所述语料采集模块采集的课堂语料进行标注,将流失用户的课堂语料标注为流失,将所述语料采集模块采集的课堂语料中除所述流失用户的课堂语料外的课堂语料标注为未流失;
所述预测算法模块设置为根据所述语料采集模块采集的课堂语料以及所述流失状态模块标注的课堂语料,通过训练后的深度神经网络模型计算用户流失 预测结果;
所述实时监控模块设置为对所述课程教授平台模块进行实时监控,并将所述预测算法模块计算出的用户流失预测结果展现给所述教师对象;
所述离线查询模块设置为根据查询条件向查询对象提供获得的用户流失预测结果。
附图说明
图1为本申请实施例提供的一种基于预训练深度神经网络模型的用户流失预测系统的结构示意图;
图2为本申请实施例提供的一种用户流失预测结果的展示示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
本申请提供如下技术方案:一种基于预训练深度神经网络模型的用户流失预测系统,包括:课程教授平台模块、语料采集模块、流失状态模块、预测算法模块、实时监控模块和离线查询模块,其中,所述课程教授平台模块设置为向教师对象和学生对象分别提供关于预定教育课程的工作终端平台;所述语料采集模块设置为在所述工作终端平台运行过程中采集所述教师对象和所述学生对象在课堂中的语音对话、文字对话和教学视频;所述流失状态模块设置为对所述语料采集模块采集的课堂语料进行标注,将流失用户的课堂语料标注为流失,将所述语料采集模块采集的课堂语料中除所述流失用户的课堂语料外的课堂语料标注为未流失;所述预测算法模块设置为根据所述语料采集模块采集的课堂语料以及所述流失状态模块标注的课堂语料,通过训练后的深度神经网络模型计算用户流失预测结果;所述实时监控模块设置为对所述课程教授平台模块进行实时监控,并将所述预测算法模块计算出的用户流失预测结果展现给所述教师对象;所述离线查询模块设置为根据查询条件向查询对象提供获得的用户流失预测结果。
所述语料采集模块包括文字处理子模块、语音转文字处理子模块和教学视频提取音频转文字处理子模块;其中,所述文字处理子模块设置为将所述老师对象和所述学生对象在所述课程教授平台模块上的文字对话采集存储,将采集存储后的所述文字对话与学生信息、上课信息、以及时间戳做对应;所述语音转文字处理子模块设置为接收所述实时监控模块采集的所述老师对象和所述学生对象在所述课程教授平台模块上的语音对话,并存储所述语音对话,将存储 后的所述语音对话的语音格式经过语音识别转成文字格式,存储格式转化后的所述语音对话,并将格式转化后的所述语音对话与学生信息、上课信息、以及时间戳做对应;所述教学视频提取音频转文字处理子模块设置为在所述学生对象在所述课程教授平台模块中对教学视频点击播放的情况下,根据所述教学视频的开始播放和结束播放的时间戳提取出所述学生对象听取的所述教学视频的音频,将提取出的所述教学视频的语音格式经过语音识别转成文字格式,存储格式转化后的所述教学视频,并将格式转化后的所述教学视频与学生信息、上课信息、以及时间戳做对应。
所述流失状态模块,包括更新流失状态子模块和标注流失状态子模块;其中,所述更新流失状态子模块设置为接收销售系统发送的流失用户名单,定位出流失用户已上过的课程;所述标注流失状态子模块设置为根据所述更新流失状态子模块定位出的所述流失用户已上过的课程,从所述语料采集模块中提取出所述流失用户已上过的课程对应的课堂语料,将所述流失用户已上过的课程对应的课堂语料标注为流失,将所述语料采集模块采集的课堂语料中除所述流失用户已上过的课程对应的课堂语料外的课堂语料标注为未流失。
一实施例中,用户流失的定义可以不局限在培训学校,还可以包含学校体系的辍学,所述销售系统可以理解为有流失用户名单的数据来源方。
所述预测算法模块,包括参数信息子模块和语料流失预测子模块;其中,所述参数信息子模块设置为储存对预训练深度神经网络模型进行训练后得到的训练后的深度神经网络模型的参数,其中,所述参数随着对所述预训练深度神经网络模型的迭代训练不定期更新;所述语料流失预测子模块设置为根据所述参数信息子模块存储的参数以及所述语料采集模块采集的课堂语料,通过训练后的深度神经网络模型对所述语料采集模块采集的课堂语料进行分类,以判断当前教育课程中的师生交流过程是否存在用户流失状态。
所述实时监控模块,包括师生交流监听子模块、预测结果展示子模块和统计子模块;其中,所述师生交流监听子模块设置为在所述预定教育课堂中的所述教师对象和所述学生对象开始交流的情况下,以句子为单元采集语料并将采集的语料发送至所述语料采集模块;所述预测结果展示子模块设置为接收所述预测算法模块发送的用户流失预测结果,向所述老师对象展示所述用户流失预测结果,其中,所述用户流失预测结果用于表征所述语料采集模块采集的课堂语料中的语句造成用户流失的可能性;所述统计子模块设置为根据所述预测算法模块计算出的用户流失预测结果做统计,以使所述老师对象能从统计数据中得知有助于所述老师对象调整辅导策略的信息。
所述离线查询模块,包括查询子模块、预测结果展示子模块和统计子模块; 其中,所述查询子模块设置为接收所述查询对象输入的所述查询条件;所述预测结果展示子模块设置为根据所述查询条件,接收所述预测算法模块发送的所述查询条件对应的用户流失预测结果,并且展示所述查询条件对应的课堂录屏以及师生交流语料的流失预测信息;所述统计子模块设置为根据所述用户流失预测结果做统计,以使所述查询对象能从统计数据中得知所述查询对象所需要的相关信息。
所述预测算法模块还设置为使用无监督语料进行语言模型的预训练(Pre-training),其中,所述语言模型的预训练包括利用基于转换器的双向编码表征(Bidirectional Encoder Representations from Transformers,BERT)的自注意力机制做上下文编码器,对师生语料进行编码得到所述师生语料的语义向量表示,根据所述语义向量表示对所述语言模型进行预训练;再用数量少于所述无监督语料的数量的标注语料对预训练后的语言模型进行微调(Fine-tuning),在预训练语言模型的基础上加一层处理分类任务的网络,以判断师生语料是否有造成用户流失的可能。
本申请中,预训练语言模型还可以语言模型的嵌入(Embeddings from Language Models,ELMo)、生成式的预训练(Generative Pre-Training,GPT)、知识增强语义表示模型(Enhanced Representation from kNowledge IntEgration,ERNIE)等,BERT只是一个示例。
所述BERT是Google在2018年10月提出的一种新的语言模型,全称为Bidirectional Encoder Representations from Transformers,BERT通过在所有层联合调节左右两个上下文来预训练深层双向表示,此外还通过组装长句作为输入增强对长程语义的理解,BERT可以被微调(Fine-tuning)以广泛用于多类任务,仅需额外添加一个输出层,无需进行针对任务的模型结构调整。
所述BERT的实现步骤如下:
步骤S1、计算机设备基于Google提供的预训练模型BERT-Base,Chinese构建语料库,中文版的预训练模型是由维基百科语料训练而成的,将预训练模型载入后,可以直接输出训练好的字向量或句向量。本申请使用该预训练模型获取句向量并将其作为后续网络模型的输入。
步骤S2:利用BERT模型初始化网络的初始权重,再加上领域任务的数据集,也就是流失状态模块的标注语料,通过在网络上继续反向传播训练,不断调整原有模型的权重。
步骤S3:本申请是一种单句分类任务,在语言模型基础上加一层处理分类任务的网络,可以是softmax网络、决策树、支持向量机(Support Vector Machines, SVM)或其他可以处理二分类问题的模型,本申请对此不做限定。
步骤S4:在语言模型基础上加一层softmax网络的做法,就是取第一个token的输出表示((即[CLS]符号)的输出表示),喂给一个softmax层得到分类结果输出。
本申请的用户流失预测系统通过课程教授平台模块向教师对象和学生对象分别提供关于预定教育课程的工作终端平台,通过语料采集模块在工作终端平台运行过程中采集教师对象和学生对象在课堂中的语音对话、文字对话和教学视频,通过流失状态模块将流失用户的课堂语料标注为流失,其余则标注为未流失,接着借助预测算法模块根据语料采集模块采集的课堂语料以及流失状态模块标注的课堂语料,通过训练后的深度神经网络模型计算用户流失预测结果,最后通过实时监控模块对课程教授平台模块进行实时监控,并将预测算法模块计算出的用户流失预测结果展现给教师对象,并且可以通过离线查询模块根据查询条件向查询对象提供获得的用户流失预测结果。可见,该基于预训练深度神经网络模型的用户流失预测系统通过对教师对象的工作终端平台、学生对象的工作终端平台进行相应的监控,基于课堂中师生交流的语料通过训练后的预训练深度神经网络模型计算用户流失预测结果,其能够在存在用户流失可能性时进行提醒,从而保证在线教育中师生交流的良好互动,降低用户流失。
如图1所示,在一个实施例中,基于预训练深度神经网络模型的用户流失预测系统通过课程教授平台模块向教师对象和学生对象分别提供关于预定教育课程的工作终端平台,学生对象通过点击播放教学视频获得教学指导,所述教学视频提取音频转文字处理子模块在学生对象在课程教授平台模块中对教学视频点击播放的情况下,根据播放开始和结束的时间戳提取出学生对象听取的该段教学视频的音频,再将该段教学视频的语音格式经过语音识别转成文字格式后存储,并和学生信息、上课信息、以及时间戳做对应。
本申请中的“用户流失”是指用户在预期课程完成前不再进行学习或者用户在预期课程完成有较长时间的学习中断。“用户流失”在不同场景或领域内对应的用词不同,在培训行业中用户在所购买的课程结束前申请退课/退费,以及在学校体系里的辍学/休学,都可以是“用户流失”。
所述课程教授平台可以理解为任何可供师生进行交流的平台。
所述教学视频提取音频转文字处理子模块还可以应用于智能教室或者直播课中收录师生交流对话的场景,也就是说,只要是可以收集到师生音频的场景都可以采用所述教学视频提取音频转文字处理子模块的语音转文字技术。
学生对象和教师对象通过课程教授平台模块进行文字交流讨论,教师对象 可以是人类老师,或能和学生对象进行人机交互的虚拟AI老师,通过语料采集模块采集中的文字处理子模块将老师对象和学生对象在课程教授平台模块上的文字交流对话采集存储后和学生信息、上课信息、以及时间戳做对应。
学生对象和教师对象通过课程教授平台模块进行语音交流讨论,教师对象可以是人类老师,或能和学生对象进行人机交互的虚拟AI老师,所述语音转文字处理子模块是将老师对象和学生对象在课程教授平台模块上的语音交流对话采集存储后,再将语音交流对话的语音格式经过语音识别转成文字格式存储,并和学生信息、上课信息、以及时间戳做对应。
通过语料采集模块获取在课堂中教师对象和学生对象的对话语音、文字和教学视频,通过流失状态模块将流失用户的课堂语料标注为流失,其余则标注为未流失。流失标注可以根据业务的指标做不同的定义,同一个用户在不同学科或课程的流失情况可能有所不同。
所述课堂语料可以理解为师生交流语料,该语料可以从课堂授课的场景、提供服务的场景、或者提供辅导的场景中采集。
接着借助预测算法模块根据所述语料采集模块的语料,通过模型计算返回用户流失预测结果,最后通过实时监控模块对所述课程教授平台模块进行实时监控,将所述预测算法模块的用户流失预测结果展现给教师对象。
所述实时监控模块,包括师生交流监听子模块、预测结果展示子模块和统计子模块;其中,所述师生交流监听子模块是当课堂中师生开始交流时,以句子为单元将语料发送到语料采集模块;所述预测结果展示子模块是接收预测算法模块返回的用户流失预测结果,用简单清晰的形象化方式,让老师对象能看到哪一句话有造成用户流失的可能性;所述统计子模块是根据用户流失预测结果做统计,使老师对象能从统计数据中得知有助于老师对象调整辅导策略的信息,如:流失语料的总数、总对话数、师生发言次数与占比和流失语料高频词等信息。
图2为本申请实施例提供的一种用户流失预测结果的展示示意图,在一个实施例中,用户流失预测结果在每一个师生交流信息框旁,用不同颜色的圆圈表示流失预测状态,例如:该句子的用户流失预测结果为流失,可以用红色标示,该句子的用户流失预测结果为非流失,可以用绿色标示,其他情况,可以用灰色标示。
在一个实施例中,统计数据可以在图2的统计信息区域展示,例如:老师对象流失用语的高频词,让老师对象可以实时调整自己对学生对象的辅导用语。
当老师对象为虚拟AI老师时,预测算法模块返回的用户流失预测结果会发 送到虚拟AI老师的会话策略处理单元,虚拟AI老师可以根据相应的会话策略调整跟学生对象的交流用语,避免用户流失。
通过离线查询模块根据查询条件向查询人员提供获得的用户流失预测结果。所述离线查询模块,包括查询子模块、预测结果展示子模块和统计子模块;其中,所述查询子模块是提供给查询对象做查询条件输入的;所述预测结果展示子模块是接收预测算法模块返回的用户流失预测结果,提供给查询对象流失用户名单,并且可以查看流失用户名单对应的课堂录屏、师生交流语料的流失预测等信息;所述统计子模块是根据用户流失预测结果做统计,使查询对象能从统计数据中得知查询对象所需要的相关信息。该数据还可以与其他数据库合并(例如,学生对象的人口统计学特征、学习系统的过程数据等)运用一个或者多个统计模型进行深度的分析。

Claims (9)

  1. 一种基于预训练深度神经网络模型的用户流失预测系统,包括:课程教授平台模块、语料采集模块、流失状态模块、预测算法模块、实时监控模块和离线查询模块,其中,
    所述课程教授平台模块设置为向教师对象和学生对象分别提供关于预定教育课程的工作终端平台;
    所述语料采集模块设置为在所述工作终端平台运行过程中采集所述教师对象和所述学生对象在课堂中的语音对话、文字对话和教学视频;
    所述流失状态模块设置为对所述语料采集模块采集的课堂语料进行标注,将流失用户的课堂语料标注为流失,将所述语料采集模块采集的课堂语料中除所述流失用户的课堂语料外的课堂语料标注为未流失;
    所述预测算法模块设置为根据所述语料采集模块采集的课堂语料以及所述流失状态模块标注的课堂语料,通过训练后的深度神经网络模型计算用户流失预测结果;
    所述实时监控模块设置为对所述课程教授平台模块进行实时监控,并将所述预测算法模块计算出的用户流失预测结果展现给所述教师对象;
    所述离线查询模块设置为根据查询条件向查询对象提供获得的用户流失预测结果。
  2. 根据权利要求1所述的用户流失预测系统,其中,所述语料采集模块包括文字处理子模块、语音转文字处理子模块和教学视频提取音频转文字处理子模块;其中,
    所述文字处理子模块设置为将所述老师对象和所述学生对象在所述课程教授平台模块上的文字对话采集存储,将采集存储后的所述文字对话与学生信息、上课信息、以及时间戳做对应;
    所述语音转文字处理子模块设置为接收所述实时监控模块采集的所述老师对象和所述学生对象在所述课程教授平台模块上的语音对话,并存储所述语音对话,将存储后的所述语音对话的语音格式经过语音识别转成文字格式,存储格式转化后的所述语音对话,并将格式转化后的所述语音对话与学生信息、上课信息、以及时间戳做对应;
    所述教学视频提取音频转文字处理子模块设置为在所述学生对象在所述课程教授平台模块中对教学视频点击播放的情况下,根据所述教学视频的开始播放和结束播放的时间戳提取出所述学生对象听取的所述教学视频的音频,将提取出的所述教学视频的语音格式经过语音识别转成文字格式,存储格式转化后 的所述教学视频,并将格式转化后的所述教学视频与学生信息、上课信息、以及时间戳做对应。
  3. 根据权利要求1所述的用户流失预测系统,其中,所述流失状态模块,包括更新流失状态子模块和标注流失状态子模块;其中,
    所述更新流失状态子模块设置为接收销售系统发送的流失用户名单,定位出流失用户已上过的课程;
    所述标注流失状态子模块设置为根据所述更新流失状态子模块定位出的所述流失用户已上过的课程,从所述语料采集模块中提取出所述流失用户已上过的课程对应的课堂语料,将所述流失用户已上过的课程对应的课堂语料标注为流失,将所述语料采集模块采集的课堂语料中除所述流失用户已上过的课程对应的课堂语料外的课堂语料标注为未流失。
  4. 根据权利要求1所述的用户流失预测系统,其中,所述预测算法模块,包括参数信息子模块和语料流失预测子模块;其中,
    所述参数信息子模块设置为储存对预训练深度神经网络模型进行训练后得到的训练后的深度神经网络模型的参数,其中,所述参数随着对所述预训练深度神经网络模型的迭代训练不定期更新;
    所述语料流失预测子模块设置为根据所述参数信息子模块存储的参数以及所述语料采集模块采集的课堂语料,通过训练后的深度神经网络模型对所述语料采集模块采集的课堂语料进行分类,以判断当前教育课程中的师生交流过程是否存在用户流失状态。
  5. 根据权利要求2所述的用户流失预测系统,其中,所述实时监控模块,包括师生交流监听子模块、预测结果展示子模块和统计子模块;其中,
    所述师生交流监听子模块设置为在所述预定教育课堂中的所述教师对象和所述学生对象开始交流的情况下,以句子为单元采集语料并将采集的语料发送至所述语料采集模块;
    所述预测结果展示子模块设置为接收所述预测算法模块发送的用户流失预测结果,向所述老师对象展示所述用户流失预测结果,其中,所述用户流失预测结果用于表征所述语料采集模块采集的课堂语料中的语句造成用户流失的可能性;
    所述统计子模块设置为根据所述预测算法模块计算出的用户流失预测结果做统计,以使所述老师对象能从统计数据中得知有助于所述老师对象调整辅导策略的信息。
  6. 根据权利要求1所述的用户流失预测系统,其中,所述离线查询模块,包括查询子模块、预测结果展示子模块和统计子模块;其中,
    所述查询子模块设置为接收所述查询对象输入的所述查询条件;
    所述预测结果展示子模块设置为根据所述查询条件,接收所述预测算法模块发送的所述查询条件对应的用户流失预测结果,并且展示所述查询条件对应的课堂录屏以及师生交流语料的流失预测信息;
    所述统计子模块设置为根据所述用户流失预测结果做统计,以使所述查询对象能从统计数据中得知所述查询对象所需要的相关信息。
  7. 根据权利要求1所述的用户流失预测系统,其中,所述预测算法模块还设置为使用无监督语料进行语言模型的预训练,其中,所述语言模型的预训练包括利用基于转换器的双向编码表征BERT的自注意力机制做上下文编码器,对师生语料进行编码得到所述师生语料的语义向量表示,根据所述语义向量表示对所述语言模型进行预训练;再用数量少于所述无监督语料的数量的标注语料对预训练后的语言模型进行微调,在预训练语言模型的基础上加一层处理分类任务的网络,以判断师生语料是否有造成用户流失的可能。
  8. 根据权利要求7所述的用户流失预测系统,其中,BERT是Google在2018年10月提出的一种新的语言模型,全称为Bidirectional Encoder Representations from Transformers,BERT通过在所有层联合调节左右两个上下文来预训练深层双向表示,还通过组装长句作为输入增强对长程语义的理解,BERT被微调以用于多类任务,仅需额外添加一个输出层,无需进行针对任务的模型结构调整。
  9. 根据权利要求6或7所述的用户流失预测系统,其中,所述BERT的实现步骤如下:
    计算机设备基于Google提供的预训练模型BERT-Base,Chinese构建语料库,其中,所述预训练模型是由维基百科语料训练而成的,将所述预训练模型载入后,所述预训练模型可以直接输出训练好的字向量或句向量,使用所述预训练模型获取句向量并将获取的所述句向量作为后续网络模型的输入;
    利用所述预训练模型初始化网络的初始权重,再加上领域任务的数据集,通过在网络上进行反向传播训练,调整所述预训练模型的权重,其中,所述数据集为所述流失状态模块标注的课堂语料;
    在调整权重后的预训练模型基础上加一层处理单句分类任务的网络,其中,所述处理单句分类任务的网络包括:softmax网络、决策树、支持向量机SVM或其他处理二分类问题的模型,在所述调整权重后的预训练模型基础上加一层softmax网络是指取第一个token的输出表示,喂给一个softmax层得到分类结果 输出。
PCT/CN2021/106382 2020-07-16 2021-07-15 基于预训练深度神经网络模型的用户流失预测系统 WO2022012605A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010689193.1A CN111898810B (zh) 2020-07-16 2020-07-16 基于师生交流用户流失预测系统
CN202010689193.1 2020-07-16

Publications (1)

Publication Number Publication Date
WO2022012605A1 true WO2022012605A1 (zh) 2022-01-20

Family

ID=73189370

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106382 WO2022012605A1 (zh) 2020-07-16 2021-07-15 基于预训练深度神经网络模型的用户流失预测系统

Country Status (2)

Country Link
CN (1) CN111898810B (zh)
WO (1) WO2022012605A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074289A (zh) * 2022-11-28 2023-05-05 国网山东省电力公司信息通信公司 一种面向电网调度的sip电话系统及方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898810B (zh) * 2020-07-16 2021-06-01 上海松鼠课堂人工智能科技有限公司 基于师生交流用户流失预测系统
CN112862546B (zh) * 2021-04-25 2021-08-13 平安科技(深圳)有限公司 用户流失预测方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010130733A1 (en) * 2009-05-12 2010-11-18 International Business Machines Corporation Method and system for improving the quality of teaching through analysis using a virtual teaching device
CN110263326A (zh) * 2019-05-21 2019-09-20 平安科技(深圳)有限公司 一种用户行为预测方法、预测装置、存储介质及终端设备
CN110378812A (zh) * 2019-05-20 2019-10-25 北京师范大学 一种自适应在线教育系统及方法
CN110991381A (zh) * 2019-12-12 2020-04-10 山东大学 一种基于行为和语音智能识别的实时课堂学生状态分析与指示提醒系统和方法
CN111898810A (zh) * 2020-07-16 2020-11-06 上海松鼠课堂人工智能科技有限公司 基于师生交流用户流失预测系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620692A (zh) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 一种移动通信业务的客户流失分析方法
CN107958433A (zh) * 2017-12-11 2018-04-24 吉林大学 一种基于人工智能的在线教育人机交互方法与系统
CN109472729A (zh) * 2018-11-09 2019-03-15 拓维信息系统股份有限公司 在线教育大数据技术平台
CN110059716B (zh) * 2019-03-12 2023-06-02 西北大学 一种cnn-lstm-svm网络模型的构建及mooc辍学预测方法
CN110162787A (zh) * 2019-05-05 2019-08-23 西安交通大学 一种基于主题信息的类别预测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010130733A1 (en) * 2009-05-12 2010-11-18 International Business Machines Corporation Method and system for improving the quality of teaching through analysis using a virtual teaching device
CN110378812A (zh) * 2019-05-20 2019-10-25 北京师范大学 一种自适应在线教育系统及方法
CN110263326A (zh) * 2019-05-21 2019-09-20 平安科技(深圳)有限公司 一种用户行为预测方法、预测装置、存储介质及终端设备
CN110991381A (zh) * 2019-12-12 2020-04-10 山东大学 一种基于行为和语音智能识别的实时课堂学生状态分析与指示提醒系统和方法
CN111898810A (zh) * 2020-07-16 2020-11-06 上海松鼠课堂人工智能科技有限公司 基于师生交流用户流失预测系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIN WEN: "Dropout Study Based on Learners’ Behavior in Massive Open Online Courses", CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 January 2019 (2019-01-15), XP055888009 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074289A (zh) * 2022-11-28 2023-05-05 国网山东省电力公司信息通信公司 一种面向电网调度的sip电话系统及方法

Also Published As

Publication number Publication date
CN111898810A (zh) 2020-11-06
CN111898810B (zh) 2021-06-01

Similar Documents

Publication Publication Date Title
CN110033659B (zh) 一种远程教学互动方法、服务器、终端以及系统
WO2022012605A1 (zh) 基于预训练深度神经网络模型的用户流失预测系统
US11721230B2 (en) Personalized learning system and method for the automated generation of structured learning assets based on user data
US8682241B2 (en) Method and system for improving the quality of teaching through analysis using a virtual teaching device
WO2022095380A1 (zh) 基于ai的虚拟交互模型生成方法、装置、计算机设备及存储介质
CN109359215A (zh) 视频智能推送方法和系统
KR20180105693A (ko) 디지털 미디어 컨텐츠 추출 및 자연어 프로세싱 시스템
CA3011397A1 (en) Natural expression processing method, processing and response method, device and system
US11557217B1 (en) Communications training system
CN111930792A (zh) 数据资源的标注方法、装置、存储介质及电子设备
CN111651497A (zh) 用户标签挖掘方法、装置、存储介质及电子设备
CN116010569A (zh) 在线答疑方法、系统、电子设备及存储介质
CN116797417A (zh) 一种基于大语言模型的智能辅助系统
CN117252259A (zh) 基于深度学习的自然语言理解方法及ai助教系统
CN116825288A (zh) 孤独症康复课程记录方法、装置、电子设备及存储介质
CN115878766A (zh) 一种基于ai技术的教师经验型智能题库及其使用方法
CN112506405B (zh) 一种基于互联网监管领域的人工智能语音大屏指挥方法
Yunina ARTIFICIAL INTELLIGENCE TOOLS IN FOREIGN LANGUAGE TEACHING IN HIGHER EDUCATION INSTITUTIONS
US10453354B2 (en) Automatically generated flash cards
CN110413636A (zh) 一种数据处理方法和装置
Li et al. A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation
CN113704610B (zh) 一种基于学习成长数据的学习风格画像生成方法及系统
Zhao et al. Design and Implementation of a Teaching Verbal Behavior Analysis Aid in Instructional Videos
Shwe Yi Tun et al. Analysis of Modality-Based Presentation Skills Using Sequential Models
Tao et al. Self-study system assessment of spoken English considering the speech scientific computing knowledge assessment algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21841326

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21841326

Country of ref document: EP

Kind code of ref document: A1