WO2021008128A1 - 一种基于智能语音识别的学习方法、终端及存储介质 - Google Patents

一种基于智能语音识别的学习方法、终端及存储介质 Download PDF

Info

Publication number
WO2021008128A1
WO2021008128A1 PCT/CN2020/073079 CN2020073079W WO2021008128A1 WO 2021008128 A1 WO2021008128 A1 WO 2021008128A1 CN 2020073079 W CN2020073079 W CN 2020073079W WO 2021008128 A1 WO2021008128 A1 WO 2021008128A1
Authority
WO
WIPO (PCT)
Prior art keywords
learning
terminal
video
user
online
Prior art date
Application number
PCT/CN2020/073079
Other languages
English (en)
French (fr)
Inventor
岳顺
Original Assignee
深圳创维-Rgb电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳创维-Rgb电子有限公司 filed Critical 深圳创维-Rgb电子有限公司
Publication of WO2021008128A1 publication Critical patent/WO2021008128A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4825End-user interface for program selection using a list of items to be played back in a given order, e.g. playlists
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present disclosure relates to the field of terminal application, for example, to a learning method, terminal and storage medium based on intelligent voice recognition.
  • intelligent speech recognition With the development of hardware and software, the technology of intelligent speech recognition has also been developed rapidly; although intelligent speech recognition can use algorithms to accurately convert speech information into text information and understand the user’s intentions; however, for those who need to learn For users, intelligent voice recognition cannot obtain the user's learning progress, so as to find the learning resources that the user needs at this stage and improve the user's learning ability.
  • the present disclosure provides a learning method, terminal and storage medium based on intelligent voice recognition.
  • the learning scene of online video is extracted through intelligent voice recognition, and the learning scene is converted into Learning data, so that users can flexibly control the playback of knowledge points when watching learning videos, and improve the user's learning ability.
  • the present disclosure provides a learning method based on intelligent voice recognition, wherein the learning method based on intelligent voice recognition includes the following steps:
  • the terminal obtains the online learning video configured by the server, and displays the online learning video in the form of a list on the display screen of the terminal;
  • the terminal When the terminal receives the play instruction input by the user, it plays the online learning video according to the play instruction and enters the learning mode;
  • the terminal When the terminal enters the learning mode, it receives a voice instruction input by the user through intelligent voice recognition, and switches the online learning video to a corresponding learning scene according to the voice instruction.
  • the learning scenario includes a word learning scenario, a dialogue learning scenario, and a word dialogue cross learning scenario.
  • the method further includes:
  • the online learning video is configured on the server.
  • the terminal acquiring the online learning video configured by the server, and displaying the online learning video in the form of a list on the display screen of the terminal specifically includes the following steps:
  • the terminal receives the list of online learning videos sent by the server, and displays the list on the display screen of the terminal.
  • playing the online learning video according to the playing instruction and entering the learning mode specifically includes the following steps:
  • the playing instruction is an instruction to play the online learning video
  • the terminal turns on the intelligent voice recognition function.
  • the method further includes:
  • the online video on the display screen of the terminal is divided into an online learning video area and a non-learning video area.
  • the prompting on the display screen of the terminal whether the user enters the learning mode specifically includes:
  • the user is prompted in the form of a dialog box whether to enter the learning mode.
  • the playing instruction is an instruction to play the online learning video
  • the method further includes the following steps:
  • the user is prompted on the display screen of the terminal to learn rules and learning methods.
  • receiving the voice instruction input by the user through intelligent voice recognition, and switching the online learning video to the corresponding learning scene according to the voice instruction specifically includes the following steps:
  • the switching the online learning video to a corresponding learning scene according to the voice instruction, and playing the corresponding learning content in the online learning video specifically includes the following steps:
  • receiving a voice instruction input by a user through intelligent voice recognition, and switching the online learning video to a corresponding learning scene according to the voice instruction further includes:
  • the voice content learning segment input by the user is received through intelligent voice recognition.
  • the method further includes the following steps :
  • the present disclosure also provides a terminal, which includes a processor and a memory connected to the processor, the memory stores a learning program based on intelligent voice recognition, and the learning program based on intelligent voice recognition is processed by the When the device is executed, it is used to realize the learning method based on intelligent voice recognition.
  • the terminal has a voice collection function and an intelligent voice recognition function.
  • the present disclosure also provides a storage medium, wherein the storage medium stores a learning program based on intelligent voice recognition, and when the learning program based on intelligent voice recognition is executed by the processor, it is used to realize the intelligent voice recognition Learning method.
  • the present disclosure provides a learning method, terminal and storage medium based on intelligent voice recognition.
  • the online learning video configured by the server is obtained and displayed on the display screen of the terminal in the form of a list, so that the user can select the required learning from the list Video; when the user selects the corresponding online learning video, the user is prompted to enter the learning mode; when the user enters the learning mode, enter the corresponding learning scene in the online learning video according to the voice command input by the user, so that the user can learn the corresponding learning scene Learning content;
  • corresponding learning suggestions are proposed to the user according to the user's voice information during the learning process, so that the user can correct errors in the subsequent learning process; the present disclosure extracts online video information through intelligent voice recognition
  • the learning scene is converted into learning data so that the user can flexibly control the playback of knowledge points when watching the learning video, and improve the user's learning ability.
  • Fig. 1 is a flowchart of a preferred embodiment of a learning method based on intelligent speech recognition in the present disclosure.
  • Figure 2 is a functional block diagram of a terminal and a server in the present disclosure.
  • FIG. 3 is a sequence diagram of interaction when the user uses it in the present disclosure.
  • Fig. 4 is a processing flowchart (Part 1) of the terminal in the present disclosure.
  • Fig. 5 is a processing flowchart of the terminal in the present disclosure (Part 2).
  • Fig. 6 is a sequence diagram of a terminal and a server creating learning content in the present disclosure.
  • Fig. 7 is a functional block diagram of the terminal in the present disclosure.
  • FIG. 1 is a flowchart of a preferred embodiment of the learning method based on intelligent voice recognition of the present disclosure.
  • the learning method based on intelligent speech recognition includes the following steps:
  • step S100 the terminal obtains the online learning video configured by the server, and displays the online learning video in the form of a list on the display screen of the terminal.
  • two devices are mainly used to implement the learning method based on intelligent voice recognition.
  • One is a terminal device (that is, a playback device), which can obtain and play online videos through the network.
  • a terminal device that is, a playback device
  • voice collection that is, it can use far-field voice or near-field voice
  • intelligent voice recognition which can recognize the user's voice commands and audio data played by video content, and can make preliminary judgments Whether the video played is learnable.
  • the other device is a back-end server, which needs to have the capability of voice recognition, which can identify and analyze the audio and video uploaded by the terminal device, and can process the audio and video uploaded by the terminal device to obtain corresponding learning
  • the data is saved as a database so that the terminal can obtain the learning data when it enters the learning mode; moreover, the server also needs to have the ability to process and analyze the voice information input by the user, that is, the voice information input by the user Analyze, get the score, and give corrective suggestions and methods; after getting the score and suggestion, save the user's scoring result, and give a comprehensive evaluation and learning suggestion within a use cycle.
  • the online learning video needs to be configured on the server in advance.
  • the server analyzes the audio content of the online video played during daily use by each user to determine whether the online video is It belongs to an online learning video. If it is, save the learning segment in the online video as learning data for the user to enter the learning mode when learning again; after the server configures the online learning video, the online learning The video is saved in the corresponding list so that users can find the learning content they need from the list.
  • the terminal When the user uses the terminal device, the terminal will obtain the online learning video configured by the server, and display the online learning video in the form of a list on the display screen to display it to the user for viewing, so that the user can read from the list Find the required learning content; specifically, when the terminal obtains the online learning video configured by the server, it will send a request to the server to obtain the online learning video. After that, when the server receives the request, it will send the pre-configured list to Terminal so that the terminal can display the list on the display screen.
  • step S100 specifically includes the following steps:
  • Step S110 The terminal sends a request to obtain the online learning video to the server;
  • Step S120 The terminal receives the list of online learning videos sent by the server, and displays the list on the display screen of the terminal.
  • users can inquire about the corresponding learning content from the list in the learning process.
  • Step S200 When the terminal receives the play instruction input by the user, it plays the online learning video according to the play instruction and enters the learning mode.
  • the terminal device when a user clicks on an online video on a terminal device, the terminal device will make a judgment based on the user's click operation to determine whether the video clicked by the user is an online learning video; specifically, it is displayed on the terminal screen It is divided into an online learning video area and a non-learning video area.
  • the play instruction input by the user is an instruction to play the online learning video.
  • the terminal When it is determined that the play instruction input by the user is an instruction to play the online learning video, the terminal will download the corresponding learning video and learning data for the learning mode (ie learning segment) to the server; and, when the download is completed, the terminal will notify the user Prompt learning rules and learning methods; when the terminal is playing the downloaded learning video, it will prompt the user whether to enter the learning mode in the form of a dialog box; when the user chooses to enter the learning mode, the terminal will enable the intelligent voice recognition function to facilitate the user’s presence Voice dialogue learning in the described learning mode.
  • the learning mode ie learning segment
  • step S200 specifically includes the following steps:
  • Step S210 when the terminal receives the play instruction, judge whether the play instruction is an instruction to play the online learning video
  • Step S220 when the play instruction is an instruction to play the online learning video, send a request to download the online learning video to the server;
  • Step S230 When the download of the online learning video is completed, the user is prompted on the display screen of the terminal to learn rules and learning methods;
  • Step S240 receiving and playing the online learning video downloaded from the server, and prompting the user on the display screen of the terminal whether to enter the learning mode;
  • Step S250 When the user chooses to enter the learning mode, the terminal turns on the intelligent voice recognition function.
  • the terminal can prompt the user whether to enter the learning mode according to the user's click operation, and then enter the corresponding learning mode according to the user's selection.
  • Step S300 When the terminal enters the learning mode, it receives a voice command input by the user through intelligent voice recognition, and switches the online learning video to a corresponding learning scene according to the voice command.
  • one is a vocabulary learning scenario based on words.
  • the terminal plays a word and the user follows it to learn a word;
  • the second is a learning scenario of language dialogue In this learning scenario, the terminal and the user learn in the form of dialogue;
  • the third is a cross learning scenario of words and dialogues, in this learning scenario, a combination of word learning scenarios and dialogue learning scenarios.
  • the playback device When the user enters the corresponding learning scene, the playback device supports two kinds of voice dialogue input: one is voice command, such as: “next word”, “restart”, “end”, etc.; the other is voice content Learning segments, such as: a certain word, a certain dialogue, etc.; when the voice recognition reaches the voice input, it will jump to the corresponding content to play the corresponding learning segment.
  • voice command such as: "next word”, "restart”, "end”, etc.
  • voice content Learning segments such as: a certain word, a certain dialogue, etc.
  • the voice instruction input by the user is received through intelligent voice recognition, and then the online learning video is switched to the corresponding learning scene according to the voice instruction, and the corresponding learning content in the online learning video is played; specifically, when the terminal plays When learning content, it will judge whether the user has input voice information within a preset time (such as 10 seconds); when the user inputs voice information within the preset time, the terminal will jump to the online learning video being played The time point corresponding to the information; for example, when the user is playing an online learning video, the user enters the voice information of the "next word", at this time, the terminal will jump the current online learning video to the time point of the "next word” , And then play the learning segment corresponding to the voice information in the learning content according to the time point.
  • a preset time such as 10 seconds
  • the terminal when the terminal playback is completed, the terminal uploads the voice information input by the user during the learning process to the server, and then, according to the score and error correction sent by the server, displays the score and error on the display screen. Correct, and display the corresponding learning suggestions on the display.
  • step S300 specifically includes the following steps:
  • Step S310 When the terminal enters the learning mode, receive a voice command input by the user through intelligent voice recognition;
  • Step S320 Switch the online learning video to a corresponding learning scene according to the voice instruction, and play the corresponding learning content in the online learning video;
  • Step S330 When the terminal playback is completed, upload the voice information input by the user in the learning scene to the server;
  • Step S340 Receive the score and error correction sent by the server, and put forward corresponding learning suggestions to the user on the display screen of the terminal according to the voice information.
  • the step S320 specifically includes the following steps:
  • Step S321 Switch the online learning video to a corresponding learning scene according to the voice instruction, and play the corresponding learning content in the online learning video;
  • Step S322 When the terminal plays the learning content, judge whether the user inputs voice information within a preset time;
  • Step S323 when the user inputs voice information within a preset time, jump to a time point corresponding to the voice information;
  • Step S324 playing a learning segment corresponding to the voice information in the learning content according to the time point.
  • the learning scene of online video is extracted through intelligent voice recognition, and the learning scene can be converted into learning data, so that users can interact with voice when watching learning videos, flexibly control the playback of knowledge points, and deepen the impression of learning.
  • this embodiment it is mainly by analyzing the audio content of the online video played by the user to determine whether the type of the video belongs to the online learning video. If so, analyze the learning segment of the video and save it as learning data for the user to perform Voice control is performed in the learning mode.
  • the playback device When the user enters the learning mode to play the video, the playback device will prompt the user to learn the content first, so that the user can understand the learning content in advance, and can learn the corresponding segment in a targeted manner; when the user speaks a voice command related to the learning content, the playback device will Jump to the corresponding learning segment for playback, and pause when the playback is complete; when the user speaks a voice control command, the playback device will also jump to the corresponding learning segment for playback.
  • the playback device While the user's voice input, the playback device will upload the user's voice input to the server, analyze the voice, score it and correct errors, so that it can give effective suggestions for the user's voice input, which can be gradually Improve your own learning ability and knowledge ability; the back-end server will also give a learning curve according to the usage period and usage cycle to ensure a stable or gradually improved learning ability during the learning process.
  • the playback device when the user uses the playback device, the playback device will obtain the data of the learning content area configured by the server (this data is composed of the learning data after each user uses the online video, or it is directly edited by manual ), and display the data to the user for viewing; when the user clicks on the corresponding learning film source, the playback device will obtain the corresponding learning data from the server (the learning data is an ordered list of learning clips, mainly including types [Keyword type/language segment type], playback tag [keyword/voice segment], playback time point, segment duration, etc.).
  • the playback device After the acquisition of the playback device is completed, the user will be prompted "Whether to enter the learning mode".
  • the playback device will give learning rules; the learning rules mainly include the content of the learning segment and the rules for learning voice recognition; for example: voice Content rules allow users to say keywords [word learning/keywords in response to voice scenes] or voice fragments of voice scenes; voice command rules allow users to speak voice commands that control learning, such as: do it again, next One, start over, end, etc.
  • the playback device When the playback device enters the learning mode, if the user has no voice input within 10S, the playback device enters normal playback; during normal playback, if the voice command input by the user is received and the voice command input by the user is detected as a keyword, then Jump to the learning segment corresponding to the keyword, and match to the time point corresponding to the keyword to play; when the play is completed, pause; if the voice command is received again during the play, the match is performed again ; If the match fails, a prompt is given; if the received voice command is a language segment, then jump to the time point corresponding to the language segment for playback, and record the context in this learning scene for scene-oriented Learn.
  • the playback device will upload the user’s voice to the server, score and correct the pronunciation, and record the current voice and score. If used multiple times, the highest and lowest values will be counted. Average, record multiple segments of voice to prompt and correct the user; when the user finishes using it, the playback device will prompt the user with the overall score and pronunciation suggestions of the current source.
  • Figure 6 is a description of the time sequence diagram of the creation of learning data; when the user plays the online source, the playback device will decode the audio of the online video and identify the content in the online video, if the following conditions are met , The online video is judged to be a source supporting the learning mode:
  • the playback device When the playback device recognizes a video source that supports the learning mode, it prompts the user whether to enter the learning mode; if the user chooses to enter the learning mode, the audio decoding of the online video is sent to the server, and the server will analyze the entire audio data; According to the above conditions, the corresponding learning segment information is generated; when the entire audio data is analyzed, the server will organize the content of the entire learning segment to generate learning rules, such as listing all words, or the corresponding learning scene Arrangement, etc.; finally, the server generates the corresponding language content instruction set according to the learning rules.
  • learning rules such as listing all words, or the corresponding learning scene Arrangement, etc.
  • the processing result will be returned to the playback device and displayed on the playback device (display learning rules and language content instruction set); when the playback device is playing online learning videos, if the user wants to enter the learning mode, the server The learning data will be sent to the playback device, and then according to the use sequence diagram in Figure 3, voice interaction between the user, playback device and the server; in order to confirm the validity of the learning data, the server background will support manual editing and adjustment, the same It also supports manual input, which is convenient for creating learning areas.
  • the playback device includes the following steps during processing:
  • Step S11 Obtain and display the online learning video configured by the server
  • Step S12 Determine whether the online video selected by the user is an online learning video; if it is, then execute step S13; if not, then return to step S11;
  • Step S13 play and prompt whether to enter the learning mode
  • Step S14 it is judged whether the user chooses to enter the learning mode; if yes, go to step S15; if not, go back to step S13;
  • Step S15 download the learning content rules and the learning content list
  • Step S16 the download is completed, and the learning rules and learning methods are prompted;
  • Step S17 judge whether the user has voice input within 10s; if yes, go to step S18; if not, go to step S21;
  • Step S18 input of voice learning content
  • Step S19 Switch to the corresponding learning content time point, play it, and upload the voice to the server;
  • Step S20 learning on-demand is completed
  • Step S21 enter normal play
  • Step S22 receiving input of learning content
  • Step S23 receiving a voice operation instruction
  • Step S24 switch to corresponding learning content playback
  • Step S25 playback is completed
  • Step S26 giving scores and suggestions.
  • the learning method based on intelligent voice recognition in this embodiment not only breaks away from the monotonous video watching learning, but also brings the fun of interactive learning, and can improve the learning efficiency; instead of just watching, it also improves It can be expressed in spoken language and can truly learn in the learning scene; at the same time, the back end of the server will also collect the user's voice input to compare the score with the standard pronunciation, and give learning suggestions and learning track analysis, which makes it easier to improve learning efficiency And error correction.
  • FIG. 7 is a functional block diagram of a terminal in a preferred embodiment of the present disclosure.
  • the embodiment of the present disclosure provides a terminal.
  • the mobile terminal of the embodiment of the present disclosure may be a mobile terminal (such as a mobile phone or a tablet computer) or a smart terminal (such as a smart TV or other smart devices).
  • the terminal of this embodiment includes a processor 10 and a memory 20 connected to the processor 10;
  • the memory 20 stores a learning program based on intelligent voice recognition, and when the learning program based on intelligent voice recognition is executed by the processor 10, it is used to implement the above-mentioned learning method based on intelligent voice recognition; the details are as described above.
  • the embodiments of the present disclosure provide a storage medium, wherein the storage medium stores a learning program based on intelligent voice recognition, and the learning program based on intelligent voice recognition is used to implement the above-mentioned learning based on intelligent voice recognition when executed by a processor.
  • Method specifically as described above.
  • the present disclosure provides a learning method, terminal, and storage medium based on intelligent voice recognition.
  • the online learning video configured by the server is obtained and displayed on the display screen of the terminal in the form of a list, so that the user can read the list.
  • the user selects the corresponding online learning video, prompt the user to enter the learning mode; when entering the learning mode, enter the corresponding learning scene in the online learning video according to the voice command input by the user, so that the user can learn
  • the corresponding learning content is learned in the scene; in addition, after the user's learning is completed, corresponding learning suggestions are provided to the user according to the user's voice information during the learning process, so that the user can correct errors in the subsequent learning process;
  • this disclosure uses intelligent voice Recognize and extract learning scenes from online videos, and convert the learning scenes into learning data, so that users can flexibly control the playback of knowledge points when watching learning videos, and improve users' learning capabilities.
  • the program may include the processes of the foregoing method embodiments when executed.
  • the storage medium mentioned may be a memory, a magnetic disk, an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

本公开公开了一种基于智能语音识别的学习方法、终端及存储介质,本公开通过智能语音识别提取在线视频的学习场景,并将学习场景转换为学习数据,以便于用户在观看学习类的视频时,能够灵活控制知识点的播放,提升用户的学习能力。

Description

一种基于智能语音识别的学习方法、终端及存储介质 技术领域
本公开涉及终端应用领域,例如涉及一种基于智能语音识别的学习方法、终端及存储介质。
背景技术
随着互联网技术的发展,消费者可以更加容易地利用互联网来获取各种各样的在线视频资源,并利用在线视频资源来进行观看和学习;针对一些学习的视频资源,用户大都是反复观看视频内容,以此来获取视频中的知识,因此,缺乏与视频内容之间的互动,不能加强对学习内容的理解。
伴随着硬件和软件的发展,智能语音识别的技术也得到了快速的发展;虽然,智能语音识别能够利用算法精准地将语音信息转换为文字信息,并理解用户的意图;但是,对于需要学习的用户来说,智能语音识别并不能获取用户的学习进度,从而找到用户现阶段需要的学习资源,提升用户的学习能力。
因此,现有技术还有待于改进和发展。
发明内容
本公开要解决的技术问题在于,针对现有技术缺陷,本公开提供一种基于智能语音识别的学习方法、终端及存储介质,通过智能语音识别提取在线视频的学习场景,并将学习场景转换为学习数据,以便于用户在观看学习类的视频时,能够灵活控制知识点的播放,提升用户的学习能力。
本公开解决技术问题所采用的技术方案如下:
本公开提供一种基于智能语音识别的学习方法,其中,所述基于智能语音识别的学习方法包括以下步骤:
终端获取服务器配置的在线学习视频,并将所述在线学习视频以列表的形式在所述终端的显示屏上进行显示;
当所述终端接收用户输入的播放指令时,根据所述播放指令播放所述在线学习视频,并进入学习模式;
当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指 令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景。
进一步地,所述学习场景包括单词学习场景、对话学习场景以及单词对话交叉学习场景。
进一步地,所述终端获取服务器配置的在线学习视频,并将所述在线学习视频以列表的形式在所述终端的显示屏上进行显示之前还包括:
在所述服务器上配置所述线学习视频。
进一步地,所述终端获取服务器配置的在线学习视频,并将所述在线学习视频以列表的形式在所述终端的显示屏上进行显示具体包括以下步骤:
所述终端向所述服务器发送获取所述在线学习视频的请求;
所述终端接收所述服务器发送的所述在线学习视频的列表,并将所述列表显示在所述终端的显示屏上。
进一步地,所述当所述终端接收用户输入的播放指令时,根据所述播放指令播放所述在线学习视频,并进入学习模式具体包括以下步骤:
当所述终端接收所述播放指令时,判断所述播放指令是否为播放所述在线学习视频的指令;
当所述播放指令为播放所述在线学习视频的指令时,向所述服务器发送下载所述在线学习视频的请求;
接收并播放从所述服务器下载的所述在线学习视频,并在所述终端的显示屏上提示所述用户是否进入所述学习模式;
当所述用户选择进入学习模式时,所述终端开启智能语音识别功能。
进一步地,所述当所述终端接收所述播放指令时,判断所述播放指令是否为播放所述在线学习视频的指令之前还包括:
将所述终端的显示屏上的在线视频划分为在线学习视频专区和非学习视频区域。
进一步地,所述在所述终端的显示屏上提示所述用户是否进入所述学习模式具体包括:
以对话框的形式提示所述用户是否进入学习模式。
进一步地,所述当所述播放指令为播放所述在线学习视频的指令时,向所述服务器发送下载所述在线学习视频的请求之后还包括以下步骤:
当所述在线学习视频下载完成时,在所述终端的显示屏上向所述用户提示学习规则和学习方式。
进一步地,所述当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景具体包括以下步骤:
当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令;
根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容。
进一步地,所述根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容具体包括以下步骤:
根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容;
当所述终端播放所述学习内容时,判断所述用户是否在预设时间内输入语音信息;
当所述用户在预设时间内输入语音信息时,跳转到与所述语音信息对应的时间点;
根据所述时间点播放所述学习内容中与所述语音信息对应的学习片段。
进一步地,所述当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景还包括:
当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音内容学习片段。
进一步地,所述当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景之后还包括以下步骤:
当所述终端播放完成时,将所述用户在所述学习场景中输入的语音信息上传到所述服务器中;
接收所述服务器发送的评分和错误纠正,并根据所述语音信息在所述终端的 显示屏上向所述用户提出相应的学习建议。
本公开还提供一种终端,其中,包括处理器,以及与所述处理器连接的存储器,所述存储器存储有基于智能语音识别的学习程序,所述基于智能语音识别的学习程序被所述处理器执行时用于实现所述基于智能语音识别的学习方法。
进一步地,所述终端具备语音采集功能以及智能语音识别功能。
本公开还提供一种存储介质,其中,所述存储介质存储有基于智能语音识别的学习程序,所述基于智能语音识别的学习程序被所述处理器执行时用于实现所述基于智能语音识别的学习方法。
本公开提供一种基于智能语音识别的学习方法、终端及存储介质,通过获取服务器配置的在线学习视频,并以列表的形式显示在终端的显示屏中,以便于用户从列表中选择需要的学习视频;当用户选择相应的在线学习视频后,提示用户进入学习模式;当进入学习模式时,根据用户输入的语音指令进入在线学习视频中相应的学习场景,以便于用户在学习场景中学习相应的学习内容;另外,在用户学习完成之后,根据用户在学习过程中的语音信息向用户提出相应的学习建议,以便于用户在后续学习过程中进行纠错;本公开通过智能语音识别提取在线视频的学习场景,并将学习场景转换为学习数据,以便于用户在观看学习类的视频时,能够灵活控制知识点的播放,提升用户的学习能力。
附图说明
图1是本公开中基于智能语音识别的学习方法较佳实施例的流程图。
图2是本公开中终端与服务器的功能框图。
图3是本公开中用户使用时的交互时序图。
图4是本公开中终端的处理流程图(其一)。
图5是本公开中终端的处理流程图(其二)。
图6是本公开中终端与服务器创建学习内容的时序图。
图7是本公开中终端的原理框图。
具体实施方式
为使本公开的目的、技术方案及优点更加清楚、明确,以下参照附图并举实施例对本公开进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解 释本公开,并不用于限定本公开。
实施例1
本公开较佳实施例所述的基于智能语音识别的学习方法,如图1所示,图1是本公开基于智能语音识别的学习方法较佳实施例的流程图。
所述基于智能语音识别的学习方法包括以下步骤:
步骤S100,终端获取服务器配置的在线学习视频,并将所述在线学习视频以列表的形式在所述终端的显示屏上进行显示。
如图2所示,在本实施例中,主要通过两个设备来实现基于智能语音识别的学习方法,其一为终端设备(即播放设备),它能够通过网络获取并播放在线视频,同时,它还需要具备语音采集的功能,即可以使用远场语音或者近场语音;而且,它还需要支持智能语音识别的功能,能够识别用户的语音指令以及视频内容播放的音频数据,并能初步判断所播放的视频是否具有可学习性。
另一设备则为后台服务器,它需要具备语音识别的能力,能够将终端设备上传的音频和视频进行识别分析,并能将所述终端设备上传的音频和视频进行相应的处理,得到相应的学习数据,并保存为数据库,以便于终端在进入学习模式时,能够获取到所述学习数据;而且,服务器还需要具备对用户输入的语音信息进行处理和分析的能力,即对用户输入的语音信息进行分析,得到评分,并给出纠正的建议与方法;在得到评分和建议之后,将用户的评分结果进行保存,并在一个使用周期内给出综合评价与学习建议。
在本实施例中,需要预先在服务器上配置在线学习视频,服务器在配置所述在线学习视频时,通过分析每一个用户在日常使用时播放的在线视频的音频内容,进而判断所述在线视频是否属于在线学习视频,如果是,则将所述在线视频中的学习片段保存为学习数据,以供用户再次学习时进入学习模式;所述服务器配置完所述在线学习视频后,将所述在线学习视频保存在相应的列表中,以便于用户可以从列表中查找到需要的学习内容。
当用户使用终端设备时,终端会获取服务器配置的在线学习视频,并且,将所述在线学习视频以列表的形式显示在显示屏当中,以此展现给用户查看,以便于用户从所述列表中查找到需要的学习内容;具体地,终端在获取服务器配置的在线学习视频时,会向服务器发送一个获取在线学习视频的请求,之后,服务器 接收所述请求时,会将预先配置的列表发送给终端,以便于终端将所述列表显示在显示屏上。
即所述步骤S100具体包括以下步骤:
步骤S110,所述终端向所述服务器发送获取所述在线学习视频的请求;
步骤S120,所述终端接收所述服务器发送的所述在线学习视频的列表,并将所述列表显示在所述终端的显示屏上。
通过获取服务器配置的在线学习视频的列表,并显示在显示屏当中,以便于用户在学习的过程当中,针对性地从列表中查询到相应的学习内容。
步骤S200,当所述终端接收用户输入的播放指令时,根据所述播放指令播放所述在线学习视频,并进入学习模式。
在本实施例中,当用户在终端设备上点击在线视频时,终端设备会根据用户的点击操作进行判断,判断用户所点击的视频是否为在线学习视频;具体表现为,在终端的显示屏上划分为在线学习视频专区和非学习视频区域,当用户在所述在线学习视频专区中进行操作时,即可判定用户所输入的播放指令为播放在线学习视频的指令。
当判定用户输入的播放指令为播放在在线学习视频的指令时,终端会向服务器下载相应的学习视频和用于学习模式的学习数据(即学习片段);并且,在下载完成时,终端向用户提示学习规则和学习方式;当终端在播放下载的学习视频时,会以对话框的形式提示用户是否进入学习模式;当用户选择进入学习模式时,终端开启智能语音识别功能,以便于用户在所述学习模式中进行语音对话学习。
即所述步骤S200具体包括以下步骤:
步骤S210,当所述终端接收所述播放指令时,判断所述播放指令是否为播放所述在线学习视频的指令;
步骤S220,当所述播放指令为播放所述在线学习视频的指令时,向所述服务器发送下载所述在线学习视频的请求;
步骤S230,当所述在线学习视频下载完成时,在所述终端的显示屏上向所述用户提示学习规则和学习方式;
步骤S240,接收并播放从所述服务器下载的所述在线学习视频,并在所述终端的显示屏上提示所述用户是否进入所述学习模式;
步骤S250,当所述用户选择进入学习模式时,所述终端开启智能语音识别功能。
通过判断用户点击的视频是否为在线学习视频,使得终端可以根据用户的点击操作提示用户是否进入学习模式,然后再根据用户的选择进入到相应的学习模式。
步骤S300,当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景。
在本实施例中,主要有三种学习场景:其一为以单词为主的单词学习场景,在这种学习场景当中,终端播放一个单词,用户跟着学习一个单词;其二为语言对话的学习场景,在这种学习场景当中,终端与用户之间采取对话的形式进行学习;其三为单词和对话的交叉学习场景,在这种学习场景当中,结合单词学习场景和对话学习场景。
当用户进入到相应的学习场景时,播放设备支持两种语音对话输入:一种为语音指令,比如:“下一个单词”,“重新开始”,“结束”等;另一种则为语音内容学习片段,比如:某个单词,某段对话等;语音识别到语音输入时,会跳转到对应的内容进行播放对应的学习片段。
在本实施例中,通过智能语音识别接收用户输入的语音指令,然后再根据语音指令将在线学习视频切换到对应的学习场景,并播放在线学习视频中对应的学习内容;具体地,当终端播放学习内容时,会判断用户是否在预设时间(比如10秒)内输入语音信息;当用户在预设时间内输入语音信息时,终端会将正在播放的在线学习视频跳转到与所述语音信息对应的时间点;比如,用户在播放在线学习视频时,用户输入“下一个单词”的语音信息,此时,终端会将当前播放的在线学习视频跳转到“下一个单词”的时间点,然后再根据该时间点播放学习内容中与语音信息对应的学习片段。
另外,在本实施例中,当终端播放完成时,终端会将学习过程中用户输入的语音信息上传到服务器当中,然后,根据服务器发送的评分和错误纠正,在显示屏上显示该评分和错误纠正,并在显示屏上显示相应的学习建议等。
即所述步骤S300具体包括以下步骤:
步骤S310,当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令;
步骤S320,根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容;
步骤S330,当所述终端播放完成时,将所述用户在所述学习场景中输入的语音信息上传到所述服务器中;
步骤S340,接收所述服务器发送的评分和错误纠正,并根据所述语音信息在所述终端的显示屏上向所述用户提出相应的学习建议。
在上述步骤当中,所述步骤S320具体包括以下步骤:
步骤S321,根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容;
步骤S322,当所述终端播放所述学习内容时,判断所述用户是否在预设时间内输入语音信息;
步骤S323,当所述用户在预设时间内输入语音信息时,跳转到与所述语音信息对应的时间点;
步骤S324,根据所述时间点播放所述学习内容中与所述语音信息对应的学习片段。
通过智能语音识别提取在线视频的学习场景,并能将学习场景转化为学习数据,方便用户在观看学习类视频时,能够语音交互,灵活的控制知识点的播放,加深学习印象。
以下结合图3至图6对本实施例进行进一步地说明:
在本实施例中,主要是通过分析用户播放在线视频的音频内容,来判断视频的类型是否属于在线学习的视频,如果是,则分析视频的学习片段,将其保存为学习数据供用户再进行学习模式的状态下进行语音控制。
当用户进入学习模式播放视频时,播放设备会先提示用户学习内容,方便用户提前了解学习内容,并且能够有针对性的学习对应的片段;当用户说出学习内容相关的语音指令,播放设备会跳转到对应的学习片段进行播放,当播放完成时就会暂停;当用户说出语音控制指令时,播放设备也会跳转到对应的学习片段进行播放。
在用户语音输入的同时,播放设备会把用户的语音输入,上传到服务器端,进行语音的分析,对它进行评分与错误纠正,这样可以为用户的语音输入给出有效的建议,能够逐步的提升自己的学习能力与知识能力;后台服务器也会根据使用的使用时段与使用周期给出一个学习曲线,保证在学习的过程中有一个稳定或者逐步提升的学习能力。
具体地,如图3所示,当用户使用播放设备时,播放设备会获取服务器配置的学习内容专区的数据(该数据由每个用户使用在线视频后的学习数据组成,或者由人工手动直接编辑),并将数据展示给用户,以供用户进行查看;当用户点击对应的学习片源时,播放设备就会从服务器获取对应的学习数据(学习数据为学习片段的有序列表,主要包含类型[关键字类型/语言片段类型],播放标签[关键字/语音片段],播放时间点,片段时长等)。
当播放设备获取完成之后,提示用户“是否进入学习模式”,当用户选择进入学习模式时,播放设备给出学习规则;学习规则主要包括学习片段的内容,以及学习语音识别的规则;比如:语音内容规则,让用户说出关键字[单词学习/根据语音场景回复的关键字]或者说出语音场景的语音片段;语音指令规则,让用户说出控制学习的语音指令,比如:再来一次,下一个,重来,结束等。
当播放设备进入学习模式时,如果用户在10S内没有语音输入,播放设备就进入正常播放;在正常播放过程中,如果接收到了用户输入的语音指令,检测用户输入的语音指令为关键字,则跳转到与该关键字所对应的学习片段,并匹配到该关键字所对应的时间点进行播放;当播放完成之后,则暂停;如果在播放过程中再次接收到语音指令,则再次进行匹配;如果匹配失败,则给出提示;如果接收到的语音指令为语言片段,则跳转到与该语言片段所对应的时间点进行播放,并在这种学习场景中记录上下文,进行场景化的学习。
在学习的过程中,只要用户输入语音,播放设备会把用户的语音上传到服务器,进行发音的评分与纠正,并记录下当前的语音与评分,多次使用则会统计最高值,最低值,平均值,记录多段语音给用户提示与纠正;当用户使用完成之后,播放设备会把当前的片源的总体评分与发音建议提示给用户。
如图6所示,图6是描述是学习数据的创建时序图;用户在播放在线片源时,播放设备会对在线视频的音频进行解码,并识别该在线视频中的内容,如果满足 以下条件,则判定该在线视频为支持学习模式的片源:
1.以某类型的单词为主的片源,中间间隔有规律的时间;
2.以某类型的场景对话为主的片源。
当播放设备识别到支持学习模式的片源时,提示用户是否进入学习模式;如果用户选择进入学习模式,则把该在线视频的音频解码发送给服务器,由服务器进行整个音频数据的分析;服务器会根据上述的条件,生成对应的学习片段信息;当整个音频数据被分析完之后,服务器会把整个学习片段的内容进行组织,生成学习规则,比如:将所有单词列出,或者把对应的学习场景进行编排等;最后,服务器根据学习规则生成对应的语言内容指令集。
当服务器处理完成之后,会把处理结果返回给播放设备,并在播放设备上进行显示(显示学习规则和语言内容指令集);当播放设备播放在线学习视频时,如果用户要进入学习模式,服务器会将学习数据发送给播放设备,然后按照图3中的使用时序图,在用户、播放设备以及服务器之间进行语音交互;为了确认学习数据的有效性,服务器后台会支持人工编辑和调整,同样也支持人工手动录入,方便创建学习专区。
如图4和图5所示,播放设备在处理时包括以下步骤:
步骤S11,获取服务器配置的在线学习视频并显示;
步骤S12,判断用户选择的在线视频是否为在线学习视频;如果是,则执行步骤S13;如果否,则返回步骤S11;
步骤S13,播放并提示是否进入学习模式;
步骤S14,判断用户选择是否为进入学习模式;如果是,则执行步骤S15;如果否,则返回步骤S13;
步骤S15,下载学习内容规则和学习内容列表;
步骤S16,下载完成,提示学习规则和学习方式;
步骤S17,判断用户在10s内是否有语音输入;如果是,则执行步骤S18;如果否,则执行步骤S21;
步骤S18,语音学习内容输入;
步骤S19,切换到对应的学习内容时间点,进行播放,并上传语音到服务器;
步骤S20,学习点播完成;
步骤S21,进入正常播放;
步骤S22,接收学习内容输入;
步骤S23,接收语音操作指令;
步骤S24,切换到对应的学习内容播放;
步骤S25,播放完成;
步骤S26,给出评分与建议。
本实施例中的基于智能语音识别的学习方法,不仅仅脱离了单调的视频观看学习,而且也带来了互动学习的乐趣,更能提升学习效率;以语音互动的方式替代只观看,也提升可以口语的表达,真正做到在学习场景学习;同时,服务器后端也会采集用户的语音输入进行评分与标准的发音对比,给出学习的建议与学习轨迹分析,这样就更容易提升学习效率和错误纠正。
实施例2
请参见图7,图7是本公开较佳实施例中终端的功能原理框图。
如图7所示,本公开实施例提供了一种终端,本公开实施例的移动终端可以为移动终端(比如手机或者平板电脑),也可以为智能终端(比如智能电视或者其他智能设备),其中,本实施例的终端包括处理器10,以及与所述处理器10连接的存储器20;
所述存储器20存储有基于智能语音识别的学习程序,该基于智能语音识别的学习程序被所述处理器10执行时用于实现上述基于智能语音识别的学习方法;具体如上所述。
实施例3
本公开实施例提供了一种存储介质,其中,所述存储介质存储有基于智能语音识别的学习程序,该基于智能语音识别的学习程序被处理器执行时用于实现上述基于智能语音识别的学习方法;具体如上所述。
综上所述,本公开提供一种基于智能语音识别的学习方法、终端及存储介质,通过获取服务器配置的在线学习视频,并以列表的形式显示在终端的显示屏中,以便于用户从列表中选择需要的学习视频;当用户选择相应的在线学习视频后,提示用户进入学习模式;当进入学习模式时,根据用户输入的语音指令进入在线学习视频中相应的学习场景,以便于用户在学习场景中学习相应的学习内容;另 外,在用户学习完成之后,根据用户在学习过程中的语音信息向用户提出相应的学习建议,以便于用户在后续学习过程中进行纠错;本公开通过智能语音识别提取在线视频的学习场景,并将学习场景转换为学习数据,以便于用户在观看学习类的视频时,能够灵活控制知识点的播放,提升用户的学习能力。
当然,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关硬件(如处理器,控制器等)来完成,所述的程序可存储于一计算机可读取的存储介质中,所述程序在执行时可包括如上述各方法实施例的流程。其中所述的存储介质可为存储器、磁碟、光盘等。
应当理解的是,本公开的应用不限于上述的举例,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,所有这些改进和变换都应属于本公开所附权利要求的保护范围。

Claims (15)

  1. 一种基于智能语音识别的学习方法,其特征在于,所述基于智能语音识别的学习方法包括以下步骤:
    终端获取服务器配置的在线学习视频,并将所述在线学习视频以列表的形式在所述终端的显示屏上进行显示;
    当所述终端接收用户输入的播放指令时,根据所述播放指令播放所述在线学习视频,并进入学习模式;
    当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景。
  2. 根据权利要求1所述的基于智能语音识别的学习方法,其特征在于,所述学习场景包括单词学习场景、对话学习场景以及单词对话交叉学习场景。
  3. 根据权利要求1所述的基于智能语音识别的学习方法,其特征在于,所述终端获取服务器配置的在线学习视频,并将所述在线学习视频以列表的形式在所述终端的显示屏上进行显示之前还包括:
    在所述服务器上配置所述线学习视频。
  4. 根据权利要求1所述的基于智能语音识别的学习方法,其特征在于,所述终端获取服务器配置的在线学习视频,并将所述在线学习视频以列表的形式在所述终端的显示屏上进行显示具体包括以下步骤:
    所述终端向所述服务器发送获取所述在线学习视频的请求;
    所述终端接收所述服务器发送的所述在线学习视频的列表,并将所述列表显示在所述终端的显示屏上。
  5. 根据权利要求1所述的基于智能语音识别的学习方法,其特征在于,所述当所述终端接收用户输入的播放指令时,根据所述播放指令播放所述在线学习视频,并进入学习模式具体包括以下步骤:
    当所述终端接收所述播放指令时,判断所述播放指令是否为播放所述在线学习视频的指令;
    当所述播放指令为播放所述在线学习视频的指令时,向所述服务器发送下载所述在线学习视频的请求;
    接收并播放从所述服务器下载的所述在线学习视频,并在所述终端的显示屏上提示所述用户是否进入所述学习模式;
    当所述用户选择进入学习模式时,所述终端开启智能语音识别功能。
  6. 根据权利要求5所述的基于智能语音识别的学习方法,其特征在于,所述当所述终端接收所述播放指令时,判断所述播放指令是否为播放所述在线学习视频的指令之前还包括:
    将所述终端的显示屏上的在线视频划分为在线学习视频专区和非学习视频区域。
  7. 根据权利要求5所述的基于智能语音识别的学习方法,其特征在于,所述在所述终端的显示屏上提示所述用户是否进入所述学习模式具体包括:
    以对话框的形式提示所述用户是否进入学习模式。
  8. 根据权利要求5所述的基于智能语音识别的学习方法,其特征在于,所述当所述播放指令为播放所述在线学习视频的指令时,向所述服务器发送下载所述在线学习视频的请求之后还包括以下步骤:
    当所述在线学习视频下载完成时,在所述终端的显示屏上向所述用户提示学习规则和学习方式。
  9. 根据权利要求1所述的基于智能语音识别的学习方法,其特征在于,所述当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景具体包括以下步骤:
    当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令;
    根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容。
  10. 根据权利要求9所述的基于智能语音识别的学习方法,其特征在于,所述根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容具体包括以下步骤:
    根据所述语音指令将所述在线学习视频切换到对应的学习场景,并播放所述在线学习视频中对应的学习内容;
    当所述终端播放所述学习内容时,判断所述用户是否在预设时间内输入语音信息;
    当所述用户在预设时间内输入语音信息时,跳转到与所述语音信息对应的时间点;
    根据所述时间点播放所述学习内容中与所述语音信息对应的学习片段。
  11. 根据权利要求9所述的基于智能语音识别的学习方法,其特征在于,所述当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景还包括:
    当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音内容学习片段。
  12. 根据权利要求1所述的基于智能语音识别的学习方法,其特征在于,所述当所述终端进入所述学习模式时,通过智能语音识别接收用户输入的语音指令,并根据所述语音指令将所述在线学习视频切换到对应的学习场景之后还包括以下步骤:
    当所述终端播放完成时,将所述用户在所述学习场景中输入的语音信息上传到所述服务器中;
    接收所述服务器发送的评分和错误纠正,并根据所述语音信息在所述终端的显示屏上向所述用户提出相应的学习建议。
  13. 一种终端,其特征在于,包括处理器,以及与所述处理器连接的存储器,所述存储器存储有基于智能语音识别的学习程序,所述基于智能语音识别的学习程序被所述处理器执行时用于实现权利要求1-12任一项所述基于智能语音识别的学习方法。
  14. 根据权利要求13所述的终端,其特征在于,所述终端具备语音采集功能以及智能语音识别功能。
  15. 一种存储介质,其特征在于,所述存储介质存储有基于智能语音识别的学习程序,所述基于智能语音识别的学习程序被所述处理器执行时用于实现权利要求1-12任一项所述基于智能语音识别的学习方法。
PCT/CN2020/073079 2019-07-15 2020-01-20 一种基于智能语音识别的学习方法、终端及存储介质 WO2021008128A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910636555.8 2019-07-15
CN201910636555.8A CN110430465B (zh) 2019-07-15 2019-07-15 一种基于智能语音识别的学习方法、终端及存储介质

Publications (1)

Publication Number Publication Date
WO2021008128A1 true WO2021008128A1 (zh) 2021-01-21

Family

ID=68409523

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/073079 WO2021008128A1 (zh) 2019-07-15 2020-01-20 一种基于智能语音识别的学习方法、终端及存储介质

Country Status (2)

Country Link
CN (1) CN110430465B (zh)
WO (1) WO2021008128A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344318A (zh) * 2021-04-16 2021-09-03 华蔚集团(广东)有限公司 一种基于校本课程的在线测评系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110430465B (zh) * 2019-07-15 2021-06-01 深圳创维-Rgb电子有限公司 一种基于智能语音识别的学习方法、终端及存储介质
CN111028828A (zh) * 2019-12-20 2020-04-17 京东方科技集团股份有限公司 一种基于画屏的语音交互方法、画屏及存储介质
WO2021155812A1 (zh) * 2020-02-07 2021-08-12 海信视像科技股份有限公司 接收装置、服务器以及语音信息处理系统
CN114520003B (zh) * 2022-02-28 2024-06-28 安徽淘云科技股份有限公司 语音交互方法、装置、电子设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011162508A2 (ko) * 2010-06-25 2011-12-29 Seo Dong-Hyuck 비교 영상을 이용한 발음 학습 방법 및 장치
CN106227335A (zh) * 2016-07-14 2016-12-14 广东小天才科技有限公司 预习讲义与视频课程的交互学习方法及应用学习客户端
CN106293347A (zh) * 2016-08-16 2017-01-04 广东小天才科技有限公司 一种人机交互的学习方法及装置、用户终端
CN107135418A (zh) * 2017-06-14 2017-09-05 北京易世纪教育科技有限公司 一种视频播放的控制方法及装置
CN108766071A (zh) * 2018-04-28 2018-11-06 北京猎户星空科技有限公司 一种内容推送与播放的方法、装置、存储介质及相关设备
CN110430465A (zh) * 2019-07-15 2019-11-08 深圳创维-Rgb电子有限公司 一种基于智能语音识别的学习方法、终端及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681920B (zh) * 2015-12-30 2017-03-15 深圳市鹰硕音频科技有限公司 一种具有语音识别功能的网络教学方法及系统
CN105872828A (zh) * 2016-03-30 2016-08-17 乐视控股(北京)有限公司 一种电视交互学习的方法及装置
US20210174693A1 (en) * 2017-11-23 2021-06-10 Bites Learning Ltd. An interface for training content over a network of mobile devices
CN109191349A (zh) * 2018-11-02 2019-01-11 北京唯佳未来教育科技有限公司 一种英文学习内容的展示方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011162508A2 (ko) * 2010-06-25 2011-12-29 Seo Dong-Hyuck 비교 영상을 이용한 발음 학습 방법 및 장치
CN106227335A (zh) * 2016-07-14 2016-12-14 广东小天才科技有限公司 预习讲义与视频课程的交互学习方法及应用学习客户端
CN106293347A (zh) * 2016-08-16 2017-01-04 广东小天才科技有限公司 一种人机交互的学习方法及装置、用户终端
CN107135418A (zh) * 2017-06-14 2017-09-05 北京易世纪教育科技有限公司 一种视频播放的控制方法及装置
CN108766071A (zh) * 2018-04-28 2018-11-06 北京猎户星空科技有限公司 一种内容推送与播放的方法、装置、存储介质及相关设备
CN110430465A (zh) * 2019-07-15 2019-11-08 深圳创维-Rgb电子有限公司 一种基于智能语音识别的学习方法、终端及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344318A (zh) * 2021-04-16 2021-09-03 华蔚集团(广东)有限公司 一种基于校本课程的在线测评系统

Also Published As

Publication number Publication date
CN110430465B (zh) 2021-06-01
CN110430465A (zh) 2019-11-08

Similar Documents

Publication Publication Date Title
WO2021008128A1 (zh) 一种基于智能语音识别的学习方法、终端及存储介质
US10950228B1 (en) Interactive voice controlled entertainment
US11853536B2 (en) Intelligent automated assistant in a media environment
US20210125604A1 (en) Systems and methods for determining whether to trigger a voice capable device based on speaking cadence
US10692504B2 (en) User profiling for voice input processing
CN109643549B (zh) 基于说话者识别的语音识别方法和装置
US9405741B1 (en) Controlling offensive content in output
US20190333515A1 (en) Display apparatus, method for controlling the display apparatus, server and method for controlling the server
US9548053B1 (en) Audible command filtering
CN102568478B (zh) 一种基于语音识别的视频播放控制方法和系统
US20140006022A1 (en) Display apparatus, method for controlling display apparatus, and interactive system
AU2016320681A1 (en) Intelligent automated assistant for media search and playback
CN107403011B (zh) 虚拟现实环境语言学习实现方法和自动录音控制方法
CN108882101B (zh) 一种智能音箱的播放控制方法、装置、设备及存储介质
CN106796496A (zh) 显示设备及其操作方法
US11651775B2 (en) Word correction using automatic speech recognition (ASR) incremental response
JP7153681B2 (ja) 音声対話方法及び装置
CN104932862A (zh) 基于语音识别的多角色交互方法
Wittenburg et al. The prospects for unrestricted speech input for TV content search
CN109859773A (zh) 一种声音的录制方法、装置、存储介质及电子设备
CN115866339A (zh) 电视节目推荐方法、装置、智能设备及可读存储介质
CN112380871A (zh) 语义识别方法、设备及介质
JP2017182275A (ja) 情報処理装置、情報処理方法、及びプログラム
CN109903594A (zh) 口语练习辅助方法、装置、设备及存储介质
WO2023093846A1 (zh) 一种电子装置的控制方法和系统,及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20841482

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20841482

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/06/2022)