WO2023036283A1 - 一种在线课堂交互的方法及在线课堂系统 - Google Patents

一种在线课堂交互的方法及在线课堂系统 Download PDF

Info

Publication number
WO2023036283A1
WO2023036283A1 PCT/CN2022/118052 CN2022118052W WO2023036283A1 WO 2023036283 A1 WO2023036283 A1 WO 2023036283A1 CN 2022118052 W CN2022118052 W CN 2022118052W WO 2023036283 A1 WO2023036283 A1 WO 2023036283A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
wfst
interaction
path
weight
Prior art date
Application number
PCT/CN2022/118052
Other languages
English (en)
French (fr)
Inventor
雷延强
Original Assignee
广州视源电子科技股份有限公司
广州视源人工智能创新研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司, 广州视源人工智能创新研究院有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2023036283A1 publication Critical patent/WO2023036283A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/08Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
    • G09B5/14Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations with provision for individual teacher-student communication

Definitions

  • the present application relates to the technical field of data processing, in particular to an online classroom interaction method and an online classroom system.
  • online education through video teaching has become more and more widely used.
  • online education is not limited by time and place. It can meet the learning needs of users only when there is a network, and the learning method is more flexible and free.
  • the present application provides an online classroom interaction method and an online classroom system to solve the problems of cumbersome interaction and low interaction efficiency caused by manual triggering of classroom interaction in related technologies.
  • the embodiment of the present application provides a method for online classroom interaction, the method includes:
  • the target recognition network includes a customized weighted finite state transition machine WFST generated according to the interaction keywords input by the teacher account, student identification and interaction content;
  • Interaction information is generated according to the voice recognition result, and the interaction information is sent to one or more peer accounts participating in the online class by the teacher account.
  • the target recognition network also includes an acoustic model
  • the speech recognition of the speech signal by the target recognition network is used, and the word sequence corresponding to the optimal path determined based on the customized WFST is obtained as the speech recognition result of the speech signal, including:
  • the path with the largest total weight is used as the optimal path, and the word sequence corresponding to the optimal path is used as the speech recognition result of the speech signal.
  • the custom WFST is constructed in the following manner:
  • the interaction keyword sequence determines the interaction between the interaction keyword sequence, the student identification sequence and the interaction content sequence The weight of each path to construct the language WFST;
  • the dictionary WFST and the language WFST are combined into a custom WFST.
  • the method also includes:
  • the generating interaction information according to the speech recognition result includes:
  • the speech recognition result is converted into interactive information by using the target content template.
  • the interaction keywords include comment keywords
  • the interaction content includes comment content
  • the embodiment of the present application also provides an online classroom system, the system includes:
  • a voice signal acquisition module configured to acquire the voice signal of the teacher account
  • the recognition network determination module is used to determine the target recognition network corresponding to the teacher account, and the target recognition network includes a customized weighted finite state transition machine WFST generated according to the interactive keywords input by the teacher account, student identification and interactive content;
  • a speech recognition module configured to use the target recognition network to perform speech recognition on the speech signal, and obtain a word sequence corresponding to the optimal path determined based on the customized WFST as a speech recognition result of the speech signal;
  • An interactive information generating module configured to generate interactive information according to the speech recognition result
  • An interaction information sending module configured to use the teacher account to send the interaction information to one or more peer accounts participating in the online classroom.
  • the target recognition network also includes an acoustic model
  • the speech recognition module includes:
  • a feature extraction submodule is used to extract an acoustic feature sequence from the speech signal
  • An acoustic model processing submodule configured to input the acoustic feature sequence into the acoustic model, and obtain the first weight value of each path from the acoustic feature to the phoneme output by the acoustic model;
  • a custom WFST processing submodule configured to input the phonemes output from each path from the acoustic feature to the phoneme into the custom WFST, and obtain the second weight of each path from the phoneme to the word sequence output by the custom WFST ;
  • the optimal path determination submodule is used to calculate the total weight of the first weight and the second weight of each path; the path with the largest total weight is used as the optimal path, and the word sequence corresponding to the optimal path As a speech recognition result of the speech signal.
  • the embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the above-mentioned method is implemented when the processor executes the program .
  • the embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the above-mentioned method is implemented when the program is executed by a processor.
  • the teacher user can interact with the classroom by inputting a voice signal.
  • the online classroom system obtains the voice signal, it can obtain the target recognition network corresponding to the teacher user.
  • the target The recognition network may include a customized WFST generated according to the interaction keywords input by the teacher account, the student ID and the interaction content. Then use the target recognition network to perform speech recognition on the speech signal, and obtain the word sequence corresponding to the optimal path determined based on the customized WFST as the speech recognition result of the current speech signal. Then, interaction information can be generated according to the voice recognition result, and the interaction information can be sent to one or more peer accounts participating in the online class. In this way, manual input of interactive content by teachers' accounts is avoided, and the interactive efficiency of online classrooms is improved.
  • FIG. 1 is a flow chart of an embodiment of an online classroom interaction method provided in Embodiment 1 of the present application;
  • FIG. 2 is a schematic diagram of a customized WFST network provided in Embodiment 1 of the present application;
  • FIG. 3 is a flow chart of an embodiment of an online classroom interaction method provided in Embodiment 2 of the present application.
  • FIG. 4 is a structural block diagram of an embodiment of an online classroom system provided in Embodiment 3 of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
  • Fig. 1 is a flow chart of an embodiment of an online classroom interaction method provided by Embodiment 1 of the present application.
  • This embodiment can be applied to an online classroom system.
  • the online classroom system can include a client and a server.
  • This embodiment can be composed of It may be completed by the client or by the server, which is not limited in this embodiment.
  • the following embodiments are described by taking the completion on the server side as an example, which may specifically include the following steps:
  • Step 110 acquiring the voice signal of the teacher's account.
  • the server may receive the voice signal sent from the client of the teacher account.
  • the voice sound wave input by the teacher user can be picked up by the sound collection device (such as a microphone) of the terminal where the client is located, and then a voice signal is generated.
  • the terminal may include, but not limited to, a smart phone, a personal computer, a tablet computer, a smart watch, a service robot, and the like.
  • the terminal and server may be communicatively connected via one or more networks, which may be wired or wireless networks, such as the Internet, cellular networks, satellite network local area networks and/or the like.
  • the voice signal may be an interactive instruction that the teacher user needs to input in the online classroom. According to different interactive functions in the online classroom system, the voice signal can have different functions.
  • the interactive function may include a class comment function, and the voice signal is a voice related to class comment entered by the teacher account.
  • Step 120 determine the target recognition network corresponding to the teacher account, and the target recognition network includes a customized weighted finite state transition machine WFST generated according to the interaction keyword, student ID and interaction content input by the teacher account.
  • the teacher user can pre-customize the decoding network he needs, and the online classroom system can associate each teacher account with the customized decoding network and store it in the designated database.
  • the teacher account can be searched in the designated database to obtain the pre-customized target recognition network of the teacher account.
  • the target recognition network can include weighted finite state transition machine (Weighted Finite State Transducers, referred to as WFST), WFST is used to generate a mapping from the input symbol sequence or string to the output string, WFST in addition to the input and output symbols also State transitions are weighted, where the weight values (aka weights) can be encoded probabilities, durations, or any other quantity accumulated along the path to compute the overall weight for mapping the input string to the output string (i.e. weight).
  • WFST is used in speech recognition and usually represents various possible path selections and corresponding probabilities (or weights) for outputting recognition results after inputting speech signals in speech processing.
  • the WFST may include a customized WFST generated according to the interaction keywords input by the teacher account, the student identification and the interaction content.
  • the interaction keywords, student IDs, and interaction content may be determined according to actual interaction requirements, which is not limited in this embodiment.
  • the interaction keywords can include comment keywords, such as "comment”, “reward”, “deduction”, “praise” and other keywords
  • the interaction content can include comment content, such as "positive speech” , “Listen to the class carefully", “1 little red flower", “1 point”, etc.
  • student identification can include student name, student number, etc. or a combination.
  • the customized WFST related to classroom comments generated according to the comments keywords, student names and comments content can be shown in Figure 2.
  • BG represents the background pronunciation model in the customized WFST, which is used to realize and build the customized WFST .
  • each path may also include a corresponding weight.
  • the weight may be a preset weight or a weight generated according to a preset weight rule.
  • the default decoding network can be used as the target recognition network.
  • Step 130 using the target recognition network to perform speech recognition on the speech signal, and obtain a word sequence corresponding to the optimal path output by the customized WFST as a speech recognition result of the speech signal.
  • the target recognition network can be used to decode the speech signal, so as to obtain the speech recognition result output by the target recognition network.
  • the weight value of each path output by the WFST can be obtained, the path with the largest weight value is regarded as the optimal path, and the word sequence corresponding to the optimal path is regarded as the speech recognition result.
  • the target recognition network may also include an acoustic model
  • step 130 may further include the following steps:
  • Step 130-1 extracting an acoustic feature sequence from the speech signal.
  • one of the acoustic feature extraction methods includes: dividing the speech signal into multiple speech signal frames, enhancing each speech signal frame by processing such as noise elimination and channel distortion, and then converting each speech signal frame from the time domain to frequency domain, and extract appropriate acoustic features from the transformed speech signal frame.
  • Acoustic features can be represented as sequences of acoustic features in various combinations.
  • Step 130-2 input the acoustic feature sequence into the acoustic model, and obtain the first weight value of each path from the acoustic feature to the phoneme output by the acoustic model.
  • the acoustic model is a pre-built general acoustic model, and this embodiment does not limit the construction method of the acoustic model.
  • the acoustic model can be Hidden Markov Model HMM (Hidden Markov Model).
  • the HMM model is a probabilistic model about time series, which describes the random sequence of unobservable states generated by a hidden Markov chain, and then the observations are generated by each state. random sequence process.
  • the parameters of the HMM include the set of all possible states and the set of all possible observations.
  • HMM is determined by initial probability distribution, state transition probability distribution and observation probability distribution. The initial probability distribution and the state transition probability distribution determine the state sequence, and the observation probability distribution determines the observation sequence.
  • the probability of observing the above observation sequence under the given model is calculated by the forward and backward algorithm; given the observation sequence, the model parameters are estimated by the expectation maximization algorithm to maximize the probability of the observation sequence under the model; Determine the model and observation sequence, and estimate the optimal state sequence through the Viterbi algorithm.
  • the acoustic model can also be an acoustic WFST.
  • the construction of the acoustic WFST is based on phonemes (the phonemes here can also be pinyin phonemes such as initials and finals) as the state, and the acoustic features as observations.
  • the HMM model describes the process of generating acoustic features from phonemes, and calculates the observation probability of the phoneme as a state observation under the HMM model to the acoustic features through the forward-backward algorithm; given the acoustic features, the HMM model parameters are estimated by the expectation maximization algorithm and the observation probability, so that Under this parameter, the probability of the acoustic feature observed by the phone as a state is the largest; using the model parameters, a phoneme is estimated by Viterbi, and the probability (ie, the first weight) of a given observation (acoustic feature) is generated under the condition of the phoneme.
  • Step 130-3 Input the phonemes output by the paths from the acoustic features to the phonemes into the customized WFST, and obtain the second weights of the paths from the phonemes to the word sequences output by the customized WFST.
  • the custom WFST is a decoding network from phonemes to word sequences.
  • the phonemes output from each path from the acoustic feature to the phoneme can be used as the input of the customized WFST, and the customized WFST then outputs the second weight value of each path from the phoneme to the word sequence according to the phonemes of each path, and each path Word sequences are also output.
  • Step 130-4 calculating the total weight of the first weight and the second weight of each path.
  • each path obtained from the decoding process from the acoustic feature to the word sequence may include the path segment from the acoustic feature to the phoneme and the path segment from the phoneme to the word sequence, and the first step of the path segment from the acoustic feature to the phoneme is calculated.
  • the sum of the first weight and the second weight of the path segment from the phoneme to the word sequence can obtain the total weight of each path.
  • Step 130-5 taking the path with the largest total weight as the optimal path, and using the word sequence corresponding to the optimal path as the speech recognition result of the speech signal.
  • the path with the largest total weight can be used as the optimal path of the current decoding result, and then the word sequence is extracted from the optimal path as The speech recognition result of the current speech signal.
  • a time-synchronous Viterbi beam (Time-synchronous Viterbi Beam) search algorithm can be used for searching, wherein the Viterbi-Beam search algorithm is a width
  • the core of the optimized frame synchronization algorithm is a nested loop. Every time a frame is advanced, the Viterbi algorithm is run separately for each node of the corresponding level.
  • the basic steps of the Viterbi Beam search algorithm are as follows:
  • Step 140 generating interaction information according to the speech recognition result, and sending the interaction information to one or more peer accounts participating in the online classroom through the teacher account.
  • interactive information can be generated according to the obtained word sequence. For example, in the scene of class comments, it is assumed that the generated word sequence is ⁇ reward, Zhang San, actively speaking ⁇ , then the interaction information generated according to the word sequence can be "reward Zhang San for actively speaking”.
  • step of generating interactive content in step 140 may further include the following steps:
  • the online classroom system can provide a variety of interactive content templates for users to choose from.
  • the user can select the desired target content template from the template display list, and then the system will generate interactive information from the speech recognition results according to the selected target content template. .
  • the generated word sequence is ⁇ reward, Zhang San, positive speech ⁇ , where "reward” is the comment key word, "positive speech” is the comment content, then the interaction information generated according to the target content template is "reward Zhang San for positive speech”.
  • the teacher user can interact with the classroom by inputting a voice signal.
  • the online classroom system obtains the voice signal, it can obtain the target recognition network corresponding to the teacher user.
  • the target The recognition network may include a customized WFST generated according to the interaction keywords input by the teacher account, the student ID and the interaction content. Then use the target recognition network to perform speech recognition on the speech signal, and obtain the word sequence corresponding to the optimal path determined based on the customized WFST as the speech recognition result of the current speech signal. Then, interaction information can be generated according to the voice recognition result, and the interaction information can be sent to one or more peer accounts participating in the online class. In this way, manual input of interactive content by teachers' accounts is avoided, and the interactive efficiency of online classrooms is improved.
  • Fig. 3 is a flow chart of an embodiment of an online classroom interaction method provided in Embodiment 2 of the present application. This embodiment is described on the basis of Embodiment 1, and may include the following steps:
  • Step 210 acquiring the interaction keyword sequence, student identification sequence and interaction content sequence input by the teacher account.
  • the online classroom system can provide a customized page for the user to customize the decoding network.
  • the customized page can include a customized classroom review page.
  • the user can input interactive keywords (multiple interactive The keywords consist of interactive keyword sequences), student IDs, and interactive content.
  • Step 220 perform phoneme annotation on the interaction keyword sequence, the student identification sequence and the interaction content sequence respectively, so as to construct a dictionary WFST.
  • the role of the dictionary WFST is to convert phonemes into words.
  • the phoneme of each word in the interaction keyword sequence, student identification sequence and interaction content sequence input by the teacher user can be obtained firstly, and one of the methods for obtaining the phoneme can be, for the interaction keyword sequence, student identification sequence
  • Each word in the sequence and interactive content sequence is phonetically tagged.
  • phonemes and words are numbered, and disambiguation symbols are introduced to solve problems such as homonyms.
  • disambiguation symbols are symbols #1, #2, #3, etc. inserted at the end of phoneme sequences in the dictionary.
  • a phoneme sequence is a prefix of another phoneme sequence in the dictionary, or occurs in more than one word, one of these symbols needs to be added after it to ensure the determinism of the WFST.
  • the dictionary generated by the above process expresses the word-phoneme mapping relationship in the form of WFST.
  • the lexicon WFST receives a sequence of phonemes and the output is a word. Each path from a phoneme to a word in the lexicon WFST has the same weight, or no weight.
  • Step 230 according to the interaction keyword sequence, the student identification sequence, the interaction content sequence and the set weight rules, determine the interaction keyword sequence, the student identification sequence and the interaction content sequence The weights of each path interacted with each other to construct the language WFST.
  • a language WFST may be constructed in a general manner, which is not limited in this embodiment.
  • a language WFST may include an N-gram language model, which utilizes a Markov model and assumes that the probability of a word appearing is only related to the N words that appear before it.
  • the 1-gram language model indicates that the occurrence of a word is only related to itself
  • the 2-gram indicates that the occurrence of a word is only related to the previous word
  • the 3-gram indicates that the occurrence of a word is only related to the first two words, and so on.
  • the weight values of various interaction paths can be determined according to the interaction keyword sequence, student identification sequence, interaction relationship of each word in the interaction content sequence, and the set weight rules, and then converted into language WFST.
  • the maximum likelihood estimation is used to estimate the probability, and the corresponding probability is calculated by calculating the number of times the N-gram word sequence appears in the corpus, and the above word sequence and its probability can be expressed as a state transition.
  • Step 240 combining the dictionary WFST and the language WFST into a customized WFST.
  • this embodiment may also include the following steps:
  • the teacher account updates one or more of the interaction keyword sequence, the student identification sequence, and the interaction content sequence; if so, use the updated interaction keyword sequence, the student identification
  • the dictionary WFST and the language WFST are updated by one or more of the sequence and the interactive content sequence.
  • the teacher user can also update one or more of the previously input interaction keyword sequence, student identification sequence, and interaction content sequence, for example, modify or add or delete the interaction keyword, student logo or interactive content, etc.
  • This system can capture the modification operation of the teacher user, obtain the modified content, and then update the dictionary WFST and language WFST according to the modified content.
  • Step 250 acquiring the voice signal of the teacher account.
  • Step 260 extracting an acoustic feature sequence from the speech signal.
  • Step 270 input the acoustic feature sequence into a pre-built acoustic model, and obtain the first weight value of each path from the acoustic feature to the phoneme output by the acoustic model.
  • Step 280 Input the phonemes output from each path from the acoustic feature to the phoneme into the customized WFST, and obtain the second weight value of each path from the phoneme to the word sequence output by the customized WFST.
  • Step 290 calculating the total weight of the first weight and the second weight of each path.
  • Step 2110 taking the path with the largest total weight as the optimal path, and using the word sequence corresponding to the optimal path as the speech recognition result of the speech signal.
  • Step 2120 generate interactive content according to the speech recognition result, and send the interactive content to one or more peer accounts participating in the online class with the teacher account.
  • the online classroom system can provide a network customization page for teacher users to input interactive keyword sequences, student identification sequences, and interactive content sequences.
  • an related custom WFST According to the interactive keyword sequences, student identification sequences, and interactive content sequences, an related custom WFST, and construct a recognition network including the custom WFST, so that when the teacher-user inputs a speech signal, the custom WFST is used to decode the speech signal to obtain a more accurate decoding result.
  • FIG. 4 is a structural block diagram of an embodiment of an online classroom system provided in Embodiment 3 of the present application, which may include the following modules:
  • a voice signal acquisition module 310 configured to acquire the voice signal of the teacher's account
  • the recognition network determination module 320 is used to determine the target recognition network corresponding to the teacher account, and the target recognition network includes a customized weighted finite state transition machine WFST generated according to the interaction keywords input by the teacher account, student identification and interaction content ;
  • Speech recognition module 330 for using the target recognition network to perform speech recognition on the speech signal, and obtain the word sequence corresponding to the optimal path determined based on the customized WFST as the speech recognition result of the speech signal;
  • An interactive information generating module 340 configured to generate interactive information according to the speech recognition result
  • the interaction information sending module 350 is configured to use the teacher account to send the interaction information to one or more peer accounts participating in the online class.
  • the target recognition network further includes an acoustic model
  • the speech recognition module 330 may include the following submodules:
  • a feature extraction submodule is used to extract an acoustic feature sequence from the speech signal
  • An acoustic model processing submodule configured to input the acoustic feature sequence into the acoustic model, and obtain the first weight value of each path from the acoustic feature to the phoneme output by the acoustic model;
  • a custom WFST processing submodule configured to input the phonemes output from each path from the acoustic feature to the phoneme into the custom WFST, and obtain the second weight of each path from the phoneme to the word sequence output by the custom WFST ;
  • the optimal path determination submodule is used to calculate the total weight of the first weight and the second weight of each path; the path with the largest total weight is used as the optimal path, and the word sequence corresponding to the optimal path As a speech recognition result of the speech signal.
  • system further includes a custom WFST building block, specifically for:
  • the interaction keyword sequence determines the interaction between the interaction keyword sequence, the student identification sequence and the interaction content sequence The weight of each path to construct the language WFST;
  • the dictionary WFST and the language WFST are combined into a custom WFST.
  • system may also include the following modules:
  • An update judgment module configured to detect whether the teacher account updates the interaction keyword sequence, the student identification sequence and/or the interaction content sequence
  • the WFST updating module is configured to update the dictionary WFST and the language WFST by using the updated interaction keyword sequence, the student identification sequence and/or the interaction content sequence.
  • the interaction information generating module 340 is specifically configured to:
  • the speech recognition result is converted into interactive information by using the target content template.
  • the interaction keywords include comment keywords
  • the interaction content includes comment content
  • the above-mentioned online classroom system provided by the embodiment of the present application can execute the online classroom interaction method provided in the first or second embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 5 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present application.
  • the electronic device includes a processor 410, a memory 420, an input device 430, and an output device 440;
  • the quantity can be one or more, and a processor 410 is taken as an example in FIG. Take the bus connection as an example.
  • the memory 420 can be used to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present application.
  • the processor 410 executes various functional applications and data processing of the electronic device by running software programs, instructions, and modules stored in the memory 420 , that is, implements the methods of the first to second embodiments above.
  • the memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the terminal, and the like.
  • the memory 420 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices.
  • the memory 420 may further include memory located remotely relative to the processor 410, and these remote memories may be connected to the electronic device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 430 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the electronic device.
  • the output device 440 may include a display device such as a display screen.
  • Embodiment 5 of the present application further provides a storage medium containing computer-executable instructions, and the computer-executable instructions are used to execute the method in any one of Embodiment 1 to Embodiment 2 when executed by a processor of the server.
  • the present application can be implemented by software and necessary general hardware, or by hardware.
  • the technical solution of the present application can be embodied in the form of software products in essence, and the computer software products can be stored in computer-readable storage media, such as computer floppy disks, read-only memory (Read-Only Memory, ROM), random access Memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the described in each embodiment of the present application. Methods.

Abstract

一种在线课堂交互的方法及在线课堂系统,其中方法包括:获取教师账户的语音信号(110,250);确定教师账户对应的目标识别网络,目标识别网络包括根据教师账户输入的交互关键字、学生标识及交互内容生成的定制加权有限状态转换机WFST(120);采用目标识别网络对语音信号进行语音识别,获得基于定制加权有限状态转换机WFST确定的最优路径对应的词序列作为语音信号的语音识别结果(130);根据语音识别结果生成交互信息,并以教师账户将交互信息发送至参与在线课堂的一个或多个对端账户中(140,2120),从而避免教师账户手动输入交互内容,提高在线课堂的交互效率。

Description

一种在线课堂交互的方法及在线课堂系统
本申请要求在2021年09月10日提交中国专利局、申请号为202111062087.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种在线课堂交互的方法及在线课堂系统。
背景技术
随着网络技术的发展,通过视频进行教学的在线教育得到了越来越广泛的使用。相比于线下教育,在线教育不受时间地点的限制,只需在有网络的情况下,就可满足用户的学习需求,学习方式更灵活自由。
在幼小线上课堂中,教师经常会表扬学生,例如:XXX同学发言积极、奖励XXX同学1朵小红花等,这些评语通常有固定句式。为了将这些评语录入系统,做法一般是通过老师手动操作,打开点评应用,选择XXX学生,再选择评语等,过程比较繁琐。
发明内容
本申请提供一种在线课堂交互的方法及在线课堂系统,以解决相关技术中课堂交互通过手动触发导致的交互繁琐、交互效率低的问题。
第一方面,本申请实施例提供了一种在线课堂交互的方法,所述方法包括:
获取教师账户的语音信号;
确定所述教师账户对应的目标识别网络,所述目标识别网络包括根据所述教师账户输入的交互关键字、学生标识及交互内容生成的定制加权有限状态转换机WFST;
采用所述目标识别网络对所述语音信号进行语音识别,获得基于所述定制WFST确定的最优路径对应的词序列作为所述语音信号的语音识别结果;
根据所述语音识别结果生成交互信息,并以所述教师账户将所述交互信息发送至参与在线课堂的一个或多个对端账户中。
可选地,所述目标识别网络还包括声学模型;
所述采用所述目标识别网络对所述语音信号进行语音识别,获得基于所述 定制WFST确定的最优路径对应的词序列作为所述语音信号的语音识别结果,包括:
从所述语音信号中提取声学特征序列;
将所述声学特征序列输入至所述声学模型中,并获取所述声学模型输出的从声学特征到音素的各路径的第一权值;
将所述从声学特征到音素的各路径输出的音素输入至所述定制WFST中,并获取所述定制WFST输出的音素到词序列的各路径的第二权值;
计算各路径的第一权值与第二权值的总权值;
将总权值最大的路径作为最优路径,并将所述最优路径对应的词序列作为所述语音信号的语音识别结果。
可选地,所述定制WFST采用如下方式构建:
获取教师账户输入的交互关键字序列、学生标识序列及交互内容序列;
分别对所述交互关键字序列、所述学生标识序列及所述交互内容序列进行音素标注,以词典WFST;
根据所述交互关键字序列、所述学生标识序列、所述交互内容序列以及设定的权值规则,确定所述交互关键字序列、所述学生标识序列及所述交互内容序列之间交互的各路径的权重,以构建语言WFST;
将所述词典WFST以及所述语言WFST组合成定制WFST。
可选地,所述方法还包括:
检测所述教师账户是否更新所述交互关键字序列、所述学生标识序列、所述交互内容序列中的一种或多种;
若是,则采用更新的所述交互关键字序列、所述学生标识序列、所述交互内容序列中的一种或多种,对所述词典WFST和所述语言WFST进行更新。
可选地,所述根据所述语音识别结果生成交互信息,包括:
确定所述教师账户选定的目标内容模板;
采用所述目标内容模板将所述语音识别结果转换成交互信息。
可选地,所述交互关键字包括点评关键字,所述交互内容包括点评内容。
第二方面,本申请实施例还提供了一种在线课堂系统,所述系统包括:
语音信号获取模块,用于获取教师账户的语音信号;
识别网络确定模块,用于确定所述教师账户对应的目标识别网络,所述目 标识别网络包括根据所述教师账户输入的交互关键字、学生标识及交互内容生成的定制加权有限状态转换机WFST;
语音识别模块,用于采用所述目标识别网络对所述语音信号进行语音识别,获得基于所述定制WFST确定的最优路径对应的词序列作为所述语音信号的语音识别结果;
交互信息生成模块,用于根据所述语音识别结果生成交互信息;
交互信息发送模块,用于以所述教师账户将所述交互信息发送至参与在线课堂的一个或多个对端账户中。
可选地,所述目标识别网络还包括声学模型;所述语音识别模块包括:
特征提取子模块,用于从所述语音信号中提取声学特征序列;
声学模型处理子模块,用于将所述声学特征序列输入至所述声学模型中,并获取所述声学模型输出的从声学特征到音素的各路径的第一权值;
定制WFST处理子模块,用于将所述从声学特征到音素的各路径输出的音素输入至所述定制WFST中,并获取所述定制WFST输出的音素到词序列的各路径的第二权值;
最优路径确定子模块,用于计算各路径的第一权值与第二权值的总权值;将总权值最大的路径作为最优路径,并将所述最优路径对应的词序列作为所述语音信号的语音识别结果。
第三方面,本申请实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述的方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的方法。
本申请具有如下有益效果:
在本实施例中,在在线课堂的交互场景中,教师用户可以通过输入语音信号来进行课堂交互,当在线课堂系统获得该语音信号以后,则可以获取该教师用户对应的目标识别网络,该目标识别网络中可以包括根据教师账户输入的交互关键字、学生标识及交互内容生成的定制WFST。然后采用目标识别网络对语音信号进行语音识别,并获得基于该定制WFST确定的最优路径对应的词序列作为当前语音信号的语音识别结果。然后可以根据该语音识别结果生成交互信息,并将该交互信息发送至参与在线课堂的一个或多个对端账户中。从而避免教师账户手动输入交互内容,提高在线课堂的交互效率。
附图说明
图1是本申请实施例一提供的一种在线课堂交互的方法实施例的流程图;
图2是本申请实施例一提供的一种定制WFST的网络示意图;
图3是本申请实施例二提供的一种在线课堂交互的方法实施例的流程图;
图4是本申请实施例三提供的一种在线课堂系统实施例的结构框图;
图5是本申请实施例四提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
实施例一
图1为本申请实施例一提供的一种在线课堂交互的方法实施例的流程图,本实施例可以应用于在线课堂系统中,该在线课堂系统可以包括客户端与服务器,本实施例可以由客户端完成,也可以由服务器完成,本实施例对此不作限制。以下实施例以在服务器侧完成为例进行说明,具体可以包括如下步骤:
步骤110,获取教师账户的语音信号。
在该步骤中,服务器可以接收来自教师账户的客户端发送的语音信号。而在客户端侧,可以通过该客户端所在的终端的声音采集装置(如麦克风)拾取教师用户输入的语音声波,然后生成语音信号。示例性地,该终端可以包括但不限于智能手机、个人计算机、平板电脑、智能手表、服务机器人等。终端和服务器可以通过一个或多个网络通信连接,网络可以是有线或无线网络,如因特网、蜂窝网络、卫星网络局域网和/或类似物。
该语音信号可以为教师用户在在线课堂中需要输入的交互指令。根据在线课堂系统中不同的交互功能,该语音信号可以有不同的作用。在一种可能的应用场景中,该交互功能可以包括课堂点评功能,则该语音信号为该教师账户输入的与课堂点评相关的语音。
步骤120,确定所述教师账户对应的目标识别网络,所述目标识别网络包括根据所述教师账户输入的交互关键字、学生标识及交互内容生成的定制加权有限状态转换机WFST。
在该步骤中,教师用户可以预先定制自己所需的解码网络,在线课堂系统 则可以将各教师账户与所定制的解码网络关联存储在指定数据库中。当在步骤110中获得某个教师账户的语音信号以后,则可以在该指定数据库中查找该教师账户,以获得该教师账户预先定制的目标识别网络。
其中,目标识别网络中可以包括加权有限状态转换机(Weighted Finite State Transducers,简称WFST),WFST用于生成从输入符号序列或字符串到输出字符串的映射,WFST除了输入和输出符号之外还对状态转换进行加权,其中的权重值(又可称为权值)可以是编码概率、持续时间或沿路径积累的任何其他数量,以计算将输入字符串映射到输出字符串的总体权重(即权值)。WFST用于语音识别通常是表示在语音处理中输入语音信号后输出识别结果的各种可能的路径选择及其相应的概率(或权值)。
在本实施例中,WFST可以包括根据教师账户输入的交互关键字、学生标识及交互内容生成的定制WFST。其中,交互关键字、学生标识及交互内容可根据实际的交互需求进行确定,本实施例对此不作限制。例如,在课堂点评的场景中,交互关键字可以包括点评关键字,如“点评”、“奖励”、“扣除”、“表扬”等关键词;交互内容可以包括点评内容,如“积极发言”、“听课认真”、“1朵小红花”、“1分”等;学生标识可以包括学生姓名、学号等一种或结合。则根据点评关键字、学生姓名和点评内容生成的与课堂点评相关的定制WFST可以如图2所示,在图2中,BG表示该定制WFST中的背景发音模型,用于实现并构建定制WFST。在定制WFST中,每条路径上还可以包括对应的权重,在实现时,该权重可以为人为预先设置的权重或者是根据预设的权重规则生成的权重。
在另一方面,如果在该指定数据库中没有查找到该教师账户,则可以将默认的解码网络作为目标识别网络。
步骤130,采用所述目标识别网络对所述语音信号进行语音识别,获得所述定制WFST输出的最优路径对应的词序列作为所述语音信号的语音识别结果。
在该步骤中,在获得当前教师账户对应的、使用定制WFST构建的目标识别网络以后,则可以使用该目标识别网络对语音信号进行语音解码,从而获得目标识别网络输出的语音识别结果。在实现时,可以获取WFST输出的各路径的权值,将权值最大的路径作为最佳路径,并将该最佳路径对应的词序列作为语音识别结果。
在一种实施例中,目标识别网络还可以包括声学模型,步骤130进一步可以包括如下步骤:
步骤130-1,从所述语音信号中提取声学特征序列。
声学特征提取的方式有多种,本实施例中并不对其进行特别限定。例如, 其中一种声学特征提取的方式包括:将语音信号划分成多个语音信号帧,通过消除噪音、信道失真等处理对各语音信号帧进行增强,再将各语音信号帧从时域转化到频域,并从转换后的语音信号帧内提取合适的声学特征。声学特征可以表现为各种组合的声学特征序列。
步骤130-2,将所述声学特征序列输入至所述声学模型中,并获取所述声学模型输出的从声学特征到音素的各路径的第一权值。
在一种实施例中,声学模型为预先构建的通用的声学模型,本实施例对声学模型的构建方式不作限定。例如,声学模型可以为隐马尔可夫模型HMM(Hidden Markov Model),HMM模型是关于时序的概率模型,描述由一个隐藏的马尔科夫链生成不可观测的状态随机序列,再由各个状态生成观测随机序列的过程。HMM的参数中包括所有可能的状态的集合,以及所有可能的观测的集合。HMM由初始概率分布、状态转移概率分布以及观测概率分布确定。初始概率分布和状态转移概率分布决定状态序列,观测概率分布决定观测序列。给定模型参数与观测序列,通过前后向算法计算给定模型下观测到上述观测序列的概率;给定观测序列,通过期望最大化算法估计模型参数,使得在该模型下观测序列概率最大;给定模型和观测序列,通过维特比Viterbi算法估计最优状态序列。
在另一种实施例中,该声学模型也可以是声学WFST,该声学WFST的构建是以音素(此处的音素还可以是声母、韵母等拼音音素)作为状态,以声学特征作为观测,采用HMM模型描述的由音素生成声学特征的过程,通过前后向算法计算HMM模型下音素作为状态观测到声学特征的观测概率;给定声学特征,通过期望最大化算法和观测概率估计HMM模型参数,使得在该参数下音素作为状态所观测到声学特征概率最大;利用模型参数,通过Viterbi估计一个音素,及在该音素条件下产生给定观测(声学特征)的概率(即第一权值)。
步骤130-3,将所述从声学特征到音素的各路径输出的音素输入至所述定制WFST中,并获取所述定制WFST输出的音素到词序列的各路径的第二权值。
在该步骤中,定制WFST是从音素到词序列的解码网络。步骤130-2中从声学特征到音素的各路径输出的音素,可以作为定制WFST的输入,定制WFST再根据各路径的音素,输出从音素到词序列的各路径的第二权值,各路径还会输出词序列。
步骤130-4,计算各路径的第一权值与第二权值的总权值。
在该步骤中,从声学特征到词序列的解码过程得到的各路径,可以包括从声学特征到音素的路径段以及从音素到词序列的路径段,计算从声学特征到音 素的路径段的第一权值与从音素到词序列的路径段的第二权值之和,可以得到各路径的总权值。
步骤130-5,将总权值最大的路径作为最优路径,并将所述最优路径对应的词序列作为所述语音信号的语音识别结果。
在该步骤中,当得到从声学特征到词序列的各路径的总权值以后,可以将总权值最大的路径作为当前解码结果的最优路径,然后从该最优路径中提取词序列作为当前语音信号的语音识别结果。
在一种实现中,在上述声学模型、词典WFST以及语言WFST的解码过程中,可以采用时间同步的维特比光束(Time-synchronousViterbi Beam)搜索算法进行搜索,其中,Viterbi-Beam搜索算法是一个宽度优化的帧同步算法,其核心是一嵌套循环,每当往后推移一帧,就针对相应层次的每个节点分别运行Viterbi算法。Viterbi Beam搜索算法的基本步骤如下:
1.初始化搜索路径,在当前路径集合A中添加起始路径,设该路径为解码网络的起始节点,并且设此刻时间t=0;
2.在t时刻,对于声学模型的路径集合A中的每一条路径,都向后扩展一帧至所有可以达到的状态,执行Viterbi算法。比较扩展路径前驱的得分,并保留最佳得分。再利用词典WFST和语言WFST对路径重新判断得分;
3.利用设置的门限(光束宽度)裁剪掉不可能得分或低于门限分数的路径,保留高于得分高于门限的路径。并将这些路径添加到A中,得到t+1时刻WFST的路径集合;
4.重复步骤2-3,直到所有语音帧计算完毕。回溯集合A中得分最高的路径。
步骤140,根据所述语音识别结果生成交互信息,并以所述教师账户将所述交互信息发送至参与在线课堂的一个或多个对端账户中。
在该步骤中,当对教师账户输入的语音信号进行解码得到对应的词序列以后,可以根据得到的词序列生成交互信息,例如,在课堂点评的场景中,假设生成的词序列为{奖励,张三,积极发言},则根据该词序列生成的交互信息可以为“奖励张三积极发言”。
在一种实施例中,步骤140中生成交互内容的步骤,进一步可以包括如下步骤:
确定所述教师账户选定的目标内容模板;采用所述目标内容模板将所述语音识别结果转换成交互信息。
具体的,在线课堂系统可以提供多种交互内容模板供用户选择,用户可以 从模板展示列表中选择所需的目标内容模板,然后,系统按照所选定的目标内容模板将语音识别结果生成交互信息。
例如,在课堂点评的场景中,假设选定的目标内容模板为“点评关键词学生姓名点评内容”,生成的词序列为{奖励,张三,积极发言},其中,“奖励”为点评关键词,“积极发言”为点评内容,则根据该目标内容模板生成的交互信息为“奖励张三积极发言”。
在本实施例中,在在线课堂的交互场景中,教师用户可以通过输入语音信号来进行课堂交互,当在线课堂系统获得该语音信号以后,则可以获取该教师用户对应的目标识别网络,该目标识别网络中可以包括根据教师账户输入的交互关键字、学生标识及交互内容生成的定制WFST。然后采用目标识别网络对语音信号进行语音识别,并获得基于该定制WFST确定的最优路径对应的词序列作为当前语音信号的语音识别结果。然后可以根据该语音识别结果生成交互信息,并将该交互信息发送至参与在线课堂的一个或多个对端账户中。从而避免教师账户手动输入交互内容,提高在线课堂的交互效率。
实施例二
图3为本申请实施例二提供的一种在线课堂交互的方法实施例的流程图,本实施例在实施例一的基础上进行说明,可以包括如下步骤:
步骤210,获取教师账户输入的交互关键字序列、学生标识序列及交互内容序列。
在该步骤中,在线课堂系统可以提供供用户定制解码网络的定制页面,例如,该定制页面可以包括课堂点评定制页面,在该定制页面中,用户可以根据实际需求输入交互关键字(多个交互关键字组成交互关键字序列)、学生标识以及交互内容。
步骤220,分别对所述交互关键字序列、所述学生标识序列及所述交互内容序列进行音素标注,以构建词典WFST。
其中,词典WFST的作用是将音素转换为字词。
在生成词典WFST时,可以首先获取教师用户输入的交互关键字序列、学生标识序列及交互内容序列中各个字词的音素,其中一种获取音素的方法可以是,对交互关键字序列、学生标识序列及交互内容序列中各个字词进行音素标注。然后对音素及字词进行编号,并引入消歧符号解决同音字等问题,例如,消歧符号是在词典中的音素序列末尾插入的符号#1,#2,#3等。当音素序列是词典中另一个音素序列的前缀,或者出现在一个以上的单词中时,需要在其后 加入这些符号之一,以确保WFST的确定性。上述过程生成的词典以WFST的形式表示词-音素的映射关系。词典WFST接收音素序列,输出是字词。词典WFST中从音素到字词的各路径的权重是相同的,或者无权重。
步骤230,根据所述交互关键字序列、所述学生标识序列、所述交互内容序列以及设定的权值规则,确定所述交互关键字序列、所述学生标识序列及所述交互内容序列之间交互的各路径的权重,以构建语言WFST。
在实现时,可以采用通用的方式构建语言WFST,本实施例对此不作限制。示例性地,语言WFST可以包括N-gram语言模型,其利用马尔可夫模型,假设一个词语出现的概率仅与其前面出现的N个词语有关。比如,1-gram语言模型表示词语出现仅与自身有关,2-gram表示词语出现仅与前一个词有关,3-gram表示词语出现仅与前两个词有关,等等。例如,可以根据交互关键字序列、学生标识序列、交互内容序列中的各个字词的交互关系,以及设定的权值规则来确定确定各种交互路径的权重值,然后转化为语言WFST。
在构造语言模型时采用最大似然估计来进行概率估计,通过计算N-gram词序列在语料中出现的次数来计算相应的概率,可以将上述词序列及其概率表示成状态转换。
步骤240,将所述词典WFST以及所述语言WFST组合成定制WFST。
在一种实施例中,本实施例还可以包括如下步骤:
检测所述教师账户是否更新所述交互关键字序列、所述学生标识序列、所述交互内容序列中的一种或多种;若是,则采用更新的所述交互关键字序列、所述学生标识序列、所述交互内容序列中的一种或多种,对所述词典WFST和所述语言WFST进行更新。
在该实施例中,教师用户还可以对在先输入的交互关键字序列、学生标识序列、交互内容序列中的一种或多种进行更新,例如,修改或新增或删除交互关键词、学生标识或交互内容等。本系统可以捕获到教师用户的修改操作,并获得修改的内容,然后根据修改的内容来更新词典WFST和语言WFST。
步骤250,获取所述教师账户的语音信号。
步骤260,从所述语音信号中提取声学特征序列。
步骤270,将所述声学特征序列输入至预先构建的声学模型中,并获取所述声学模型输出的从声学特征到音素的各路径的第一权值。
步骤280,将所述从声学特征到音素的各路径输出的音素输入至所述定制WFST中,并获取所述定制WFST输出的音素到词序列的各路径的第二权值。
步骤290,计算各路径的第一权值与第二权值的总权值。
步骤2110,将总权值最大的路径作为最优路径,并将所述最优路径对应的词序列作为所述语音信号的语音识别结果。
步骤2120,根据所述语音识别结果生成交互内容,并以所述教师账户将所述交互内容发送至参与在线课堂的一个或多个对端账户中。
在本实施例中,在线课堂系统可以提供网络定制页面给教师用户输入交互关键字序列、学生标识序列及交互内容序列,根据该交互关键字序列、学生标识序列及交互内容序列可以构建与教师用户相关的定制WFST,并构建包含该定制WFST的识别网络,从而在后续该教师用户输入语音信号时,采用该定制WFST对该语音信号进行解码时获得更准确的解码结果。
实施例三
图4为本申请实施例三提供的一种在线课堂系统实施例的结构框图,可以包括如下模块:
语音信号获取模块310,用于获取教师账户的语音信号;
识别网络确定模块320,用于确定所述教师账户对应的目标识别网络,所述目标识别网络包括根据所述教师账户输入的交互关键字、学生标识及交互内容生成的定制加权有限状态转换机WFST;
语音识别模块330,用于采用所述目标识别网络对所述语音信号进行语音识别,获得基于所述定制WFST确定的最优路径对应的词序列作为所述语音信号的语音识别结果;
交互信息生成模块340,用于根据所述语音识别结果生成交互信息;
交互信息发送模块350,用于以所述教师账户将所述交互信息发送至参与在线课堂的一个或多个对端账户中。
在一种实施例中,所述目标识别网络还包括声学模型;
所述语音识别模块330可以包括如下子模块:
特征提取子模块,用于从所述语音信号中提取声学特征序列;
声学模型处理子模块,用于将所述声学特征序列输入至所述声学模型中,并获取所述声学模型输出的从声学特征到音素的各路径的第一权值;
定制WFST处理子模块,用于将所述从声学特征到音素的各路径输出的音素输入至所述定制WFST中,并获取所述定制WFST输出的音素到词序列的各 路径的第二权值;
最优路径确定子模块,用于计算各路径的第一权值与第二权值的总权值;将总权值最大的路径作为最优路径,并将所述最优路径对应的词序列作为所述语音信号的语音识别结果。
在一种实施例中,所述系统还包括定制WFST构建模块,具体用于:
获取教师账户输入的交互关键字序列、学生标识序列及交互内容序列;
分别对所述交互关键字序列、所述学生标识序列及所述交互内容序列进行音素标注,以词典WFST;
根据所述交互关键字序列、所述学生标识序列、所述交互内容序列以及设定的权值规则,确定所述交互关键字序列、所述学生标识序列及所述交互内容序列之间交互的各路径的权重,以构建语言WFST;
将所述词典WFST以及所述语言WFST组合成定制WFST。
在一种实施例中,所述系统还可以包括如下模块:
更新判断模块,用于检测所述教师账户是否更新所述交互关键字序列、所述学生标识序列和/或所述交互内容序列;
WFST更新模块,用于采用更新的所述交互关键字序列、所述学生标识序列和/或所述交互内容序列,对所述词典WFST和所述语言WFST进行更新。
在一种实施例中,所述交互信息生成模块340具体用于:
确定所述教师账户选定的目标内容模板;
采用所述目标内容模板将所述语音识别结果转换成交互信息。
在一种实施例中,所述交互关键字包括点评关键字,所述交互内容包括点评内容。
需要说明的是,本申请实施例所提供的上述在线课堂系统可执行本申请实施例一或实施例二所提供的在线课堂交互的方法,具备执行方法相应的功能模块和有益效果。
实施例四
图5为本申请实施例四提供的一种电子设备的结构示意图,如图5所示,该电子设备包括处理器410、存储器420、输入装置430和输出装置440;电子设备中处理器410的数量可以是一个或多个,图5中以一个处理器410为例;电子设备中的处理器410、存储器420、输入装置430和输出装置440可以通过 总线或其他方式连接,图5中以通过总线连接为例。
存储器420作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中的方法对应的程序指令/模块。处理器410通过运行存储在存储器420中的软件程序、指令以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述实施例一至实施例二的方法。
存储器420可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器420可进一步包括相对于处理器410远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置430可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。输出装置440可包括显示屏等显示设备。
实施例五
本申请实施例五还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由服务器的处理器执行时用于执行实施例一至实施例二中任一实施例中的方法。
通过以上关于实施方式的描述,本申请可借助软件及必需的通用硬件来实现,也可以通过硬件实现。本申请的技术方案本质上可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
值得注意的是,上述装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。

Claims (10)

  1. 一种在线课堂交互的方法,包括:
    获取教师账户的语音信号;
    确定所述教师账户对应的目标识别网络,所述目标识别网络包括根据所述教师账户输入的交互关键字、学生标识及交互内容生成的定制加权有限状态转换机WFST;
    采用所述目标识别网络对所述语音信号进行语音识别,获得基于所述定制WFST确定的最优路径对应的词序列作为所述语音信号的语音识别结果;
    根据所述语音识别结果生成交互信息,并以所述教师账户将所述交互信息发送至参与在线课堂的一个或多个对端账户中。
  2. 根据权利要求1所述的方法,其中,所述目标识别网络还包括声学模型;所述采用所述目标识别网络对所述语音信号进行语音识别,获得基于所述定制WFST确定的最优路径对应的词序列作为所述语音信号的语音识别结果,包括:
    从所述语音信号中提取声学特征序列;
    将所述声学特征序列输入至所述声学模型中,并获取所述声学模型输出的从声学特征到音素的各路径的第一权值;
    将所述从声学特征到音素的各路径输出的音素输入至所述定制WFST中,并获取所述定制WFST输出的音素到词序列的各路径的第二权值;
    计算各路径的第一权值与第二权值的总权值;
    将总权值最大的路径作为最优路径,并将所述最优路径对应的词序列作为所述语音信号的语音识别结果。
  3. 根据权利要求2所述的方法,其中,所述定制WFST采用如下方式构建:
    获取教师账户输入的交互关键字序列、学生标识序列及交互内容序列;
    分别对所述交互关键字序列、所述学生标识序列及所述交互内容序列进行音素标注,以构建词典WFST;
    根据所述交互关键字序列、所述学生标识序列、所述交互内容序列以及设定的权值规则,确定所述交互关键字序列、所述学生标识序列及所述交互内容序列之间交互的各路径的权重,以构建语言WFST;
    将所述词典WFST以及所述语言WFST组合成定制WFST。
  4. 根据权利要求3所述的方法,还包括:
    检测所述教师账户是否更新所述交互关键字序列、所述学生标识序列、所 述交互内容序列中的一种或多种;
    若是,则采用更新的所述交互关键字序列、所述学生标识序列、所述交互内容序列中的一种或多种,对所述词典WFST和所述语言WFST进行更新。
  5. 根据权利要求1-4任一项所述的方法,其中,所述根据所述语音识别结果生成交互信息,包括:
    确定所述教师账户选定的目标内容模板;
    采用所述目标内容模板将所述语音识别结果转换成交互信息。
  6. 根据权利要求1所述的方法,其中,所述交互关键字包括点评关键字,所述交互内容包括点评内容。
  7. 一种在线课堂系统,包括:
    语音信号获取模块,用于获取教师账户的语音信号;
    识别网络确定模块,用于确定所述教师账户对应的目标识别网络,所述目标识别网络包括根据所述教师账户输入的交互关键字、学生标识及交互内容生成的定制加权有限状态转换机WFST;
    语音识别模块,用于采用所述目标识别网络对所述语音信号进行语音识别,获得基于所述定制WFST确定的最优路径对应的词序列作为所述语音信号的语音识别结果;
    交互信息生成模块,用于根据所述语音识别结果生成交互信息;
    交互信息发送模块,用于以所述教师账户将所述交互信息发送至参与在线课堂的一个或多个对端账户中。
  8. 根据权利要求7所述的系统,其中,所述目标识别网络还包括声学模型;所述语音识别模块包括:
    特征提取子模块,用于从所述语音信号中提取声学特征序列;
    声学模型处理子模块,用于将所述声学特征序列输入至所述声学模型中,并获取所述声学模型输出的从声学特征到音素的各路径的第一权值;
    定制WFST处理子模块,用于将所述从声学特征到音素的各路径输出的音素输入至所述定制WFST中,并获取所述定制WFST输出的音素到词序列的各路径的第二权值;
    最优路径确定子模块,用于计算各路径的第一权值与第二权值的总权值;将总权值最大的路径作为最优路径,并将所述最优路径对应的词序列作为所述语音信号的语音识别结果。
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1-6中任一所述的方法。
  10. 一种计算机可读存储介质,存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-6任一所述的方法。
PCT/CN2022/118052 2021-09-10 2022-09-09 一种在线课堂交互的方法及在线课堂系统 WO2023036283A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111062087.1 2021-09-10
CN202111062087.1A CN115798277A (zh) 2021-09-10 2021-09-10 一种在线课堂交互的方法及在线课堂系统

Publications (1)

Publication Number Publication Date
WO2023036283A1 true WO2023036283A1 (zh) 2023-03-16

Family

ID=85417119

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118052 WO2023036283A1 (zh) 2021-09-10 2022-09-09 一种在线课堂交互的方法及在线课堂系统

Country Status (2)

Country Link
CN (1) CN115798277A (zh)
WO (1) WO2023036283A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136199A (zh) * 2006-08-30 2008-03-05 国际商业机器公司 语音数据处理方法和设备
JP2011164336A (ja) * 2010-02-09 2011-08-25 Nippon Telegr & Teleph Corp <Ntt> 音声認識装置、重みベクトル学習装置、音声認識方法、重みベクトル学習方法、プログラム
CN106875936A (zh) * 2017-04-18 2017-06-20 广州视源电子科技股份有限公司 语音识别方法及装置
US20180068653A1 (en) * 2016-09-08 2018-03-08 Intel IP Corporation Method and system of automatic speech recognition using posterior confidence scores
CN108899013A (zh) * 2018-06-27 2018-11-27 广州视源电子科技股份有限公司 语音搜索方法、装置和语音识别系统
CN109697682A (zh) * 2019-01-21 2019-04-30 武汉迈辽网络科技有限公司 一种基于移动智能终端的在线教育系统
CN109949189A (zh) * 2019-03-13 2019-06-28 上海复岸网络信息科技有限公司 一种线上教学互动效果评价方法与装置
CN112927682A (zh) * 2021-04-16 2021-06-08 西安交通大学 一种基于深度神经网络声学模型的语音识别方法及系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136199A (zh) * 2006-08-30 2008-03-05 国际商业机器公司 语音数据处理方法和设备
JP2011164336A (ja) * 2010-02-09 2011-08-25 Nippon Telegr & Teleph Corp <Ntt> 音声認識装置、重みベクトル学習装置、音声認識方法、重みベクトル学習方法、プログラム
US20180068653A1 (en) * 2016-09-08 2018-03-08 Intel IP Corporation Method and system of automatic speech recognition using posterior confidence scores
CN106875936A (zh) * 2017-04-18 2017-06-20 广州视源电子科技股份有限公司 语音识别方法及装置
CN108899013A (zh) * 2018-06-27 2018-11-27 广州视源电子科技股份有限公司 语音搜索方法、装置和语音识别系统
CN109697682A (zh) * 2019-01-21 2019-04-30 武汉迈辽网络科技有限公司 一种基于移动智能终端的在线教育系统
CN109949189A (zh) * 2019-03-13 2019-06-28 上海复岸网络信息科技有限公司 一种线上教学互动效果评价方法与装置
CN112927682A (zh) * 2021-04-16 2021-06-08 西安交通大学 一种基于深度神经网络声学模型的语音识别方法及系统

Also Published As

Publication number Publication date
CN115798277A (zh) 2023-03-14

Similar Documents

Publication Publication Date Title
US8532994B2 (en) Speech recognition using a personal vocabulary and language model
CN109346064B (zh) 用于端到端语音识别模型的训练方法及系统
US11797772B2 (en) Word lattice augmentation for automatic speech recognition
KR20220035222A (ko) 음성 인식 오류 정정 방법, 관련 디바이스들, 및 판독 가능 저장 매체
JP2018081298A (ja) 自然語処理方法及び装置と自然語処理モデルを学習する方法及び装置
JP7170920B2 (ja) トリガードアテンションを用いたエンドツーエンド音声認識のためのシステムおよび方法
US20210193121A1 (en) Speech recognition method, apparatus, and device, and storage medium
KR20170022445A (ko) 통합 모델 기반의 음성 인식 장치 및 방법
US11093110B1 (en) Messaging feedback mechanism
CN111090727B (zh) 语言转换处理方法、装置及方言语音交互系统
JP2016057986A (ja) 音声翻訳装置、方法およびプログラム
Kadyan et al. Refinement of HMM model parameters for punjabi automatic speech recognition (PASR) system
JP2000029496A (ja) 連続音声認識において句読点を自動的に生成する装置および方法
KR102199246B1 (ko) 신뢰도 측점 점수를 고려한 음향 모델 학습 방법 및 장치
KR102167157B1 (ko) 발음 변이를 적용시킨 음성 인식 방법
US20200320976A1 (en) Information processing apparatus, information processing method, and program
JP2012018201A (ja) テキスト補正方法及び認識方法
JP6300394B2 (ja) 誤り修正モデル学習装置、及びプログラム
JP2016024325A (ja) 言語モデル生成装置、およびそのプログラム、ならびに音声認識装置
JPWO2018043137A1 (ja) 情報処理装置及び情報処理方法
WO2023036283A1 (zh) 一种在线课堂交互的方法及在线课堂系统
JP2021503104A (ja) 自動音声認識装置及び方法
CN113436629A (zh) 语音控制方法、装置、电子设备及存储介质
US10929601B1 (en) Question answering for a multi-modal system
KR101971696B1 (ko) 음향모델 생성 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866744

Country of ref document: EP

Kind code of ref document: A1