CN110808038A - Mandarin assessment method, device, equipment and storage medium - Google Patents

Mandarin assessment method, device, equipment and storage medium Download PDF

Info

Publication number
CN110808038A
CN110808038A CN201911095274.2A CN201911095274A CN110808038A CN 110808038 A CN110808038 A CN 110808038A CN 201911095274 A CN201911095274 A CN 201911095274A CN 110808038 A CN110808038 A CN 110808038A
Authority
CN
China
Prior art keywords
robot
mandarin
voice
conversation
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911095274.2A
Other languages
Chinese (zh)
Inventor
柳青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911095274.2A priority Critical patent/CN110808038A/en
Publication of CN110808038A publication Critical patent/CN110808038A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics

Abstract

The application discloses a mandarin assessment method, a mandarin assessment device, equipment and a storage medium, wherein in the method, a client displays a mandarin test interface with a simulated dialogue robot; acquiring robot conversation voice to be output by a conversation robot; simulating animation of the conversation robot outputting robot conversation voice in a mandarin test interface, and playing the robot conversation voice; obtaining user conversation voice input by a user aiming at the robot conversation voice; sending the user session voice to a server; and displaying the mandarin grading result of the user conversation voice returned by the server on a mandarin test interface. The scheme of the application is beneficial to improving the accuracy of assessment of Putonghua.

Description

Mandarin assessment method, device, equipment and storage medium
Technical Field
The present application relates to the field of speech recognition and processing technologies, and in particular, to a mandarin chinese evaluating method, apparatus, device, and storage medium.
Background
In the current mandarin chinese testing software, a text is generally displayed in an interface, and after a user reads and records the displayed text, the mandarin chinese score corresponding to the voice in the recording can be evaluated.
However, reading mandarin according to the displayed text has a high requirement on the recognition level of the user, and if the user does not recognize some characters or words in the text, the normal reading of the user may be affected, so that the mandarin evaluation cannot be completed, or the final evaluation result of mandarin does not accord with the actual mandarin level of the user.
Disclosure of Invention
In view of this, the present application provides a mandarin evaluation method, apparatus, device and storage medium, so as to reduce the complexity of mandarin evaluation performed by a user and facilitate improvement of mandarin evaluation accuracy.
In order to achieve the above object, in one aspect, the present application provides a mandarin chinese evaluating method applied to a client, including:
displaying a mandarin test interface, wherein the mandarin test interface displays a simulated conversation robot;
acquiring robot conversation voice to be output by the conversation robot, wherein the robot conversation voice is a voice signal for guiding a user to perform voice interaction;
simulating animation of the conversation robot outputting the robot conversation voice in the Mandarin testing interface, and playing the robot conversation voice;
obtaining user conversation voice input by a user aiming at the robot conversation voice;
sending the user session voice to a server;
and displaying the mandarin grading result of the user conversation voice returned by the server on the mandarin test interface.
In another aspect, the present application further provides a mandarin chinese evaluating method applied to a server, including:
obtaining a Mandarin assessment request sent by a client, wherein the Mandarin assessment request carries user conversation voice input by a user of the client in a Mandarin test interface, and the Mandarin test interface displays a simulated conversation robot;
analyzing the user conversation voice to obtain a mandarin grading result of the user conversation voice;
determining robot conversation voice to be output by the conversation robot;
and sending the mandarin grading result and the robot conversation voice to the client so that the client can display the mandarin grading result on the mandarin test interface and simulate the animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
In another aspect, the present application further provides a mandarin chinese evaluating apparatus applied to a client, including:
the interface display unit is used for displaying a mandarin test interface, and the mandarin test interface displays the simulated conversation robot;
the robot conversation voice obtaining unit is used for obtaining robot conversation voice to be output by the conversation robot, and the robot conversation voice is a voice signal used for guiding a user to perform voice interaction;
the dialogue scene simulation unit is used for simulating the animation of the dialogue robot outputting the robot conversation voice in the Mandarin test interface and playing the robot conversation voice;
a user voice obtaining unit for obtaining a user conversation voice input by a user for the robot conversation voice;
the user voice sending unit is used for sending the user conversation voice to a server;
and the scoring result display unit is used for displaying the Mandarin scoring result of the user conversation voice returned by the server on the Mandarin testing interface.
In another aspect, the present application further provides a mandarin chinese evaluating apparatus, including:
the system comprises a request acquisition unit, a dialogue robot and a dialogue processing unit, wherein the request acquisition unit is used for acquiring a Mandarin assessment request sent by a client, the Mandarin assessment request carries user conversation voice input by a user of the client in a Mandarin test interface, and the Mandarin test interface displays a simulated dialogue robot;
the voice analyzing unit is used for analyzing the user conversation voice to obtain a mandarin grading result of the user conversation voice;
a machine voice determination unit for determining a robot conversation voice to be output by the conversation robot;
and the data sending unit is used for sending the mandarin scoring result and the robot conversation voice to the client so that the client can display the mandarin scoring result on the mandarin test interface and simulate the animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
In yet another aspect, the present application further provides a terminal comprising a processor and a memory; wherein:
a processor and a memory;
the processor is used for calling and executing the program stored in the memory;
the memory is configured to store the program, the program at least to:
displaying a mandarin test interface, wherein the mandarin test interface displays a simulated conversation robot;
acquiring robot conversation voice to be output by the conversation robot, wherein the robot conversation voice is a voice signal for guiding a user to perform voice interaction;
simulating animation of the conversation robot outputting the robot conversation voice in the Mandarin testing interface, and playing the robot conversation voice;
obtaining user conversation voice input by a user aiming at the robot conversation voice;
sending the user session voice to a server;
and displaying the mandarin grading result of the user conversation voice returned by the server on the mandarin test interface.
In yet another aspect, the present application further provides a server comprising a processor and a memory; wherein:
a processor and a memory;
the processor is used for calling and executing the program stored in the memory;
the memory is configured to store the program, the program at least to:
obtaining a Mandarin assessment request sent by a client, wherein the Mandarin assessment request carries user conversation voice input by a user of the client in a Mandarin test interface, and the Mandarin test interface displays a simulated conversation robot;
analyzing the user conversation voice to obtain a mandarin grading result of the user conversation voice;
determining robot conversation voice to be output by the conversation robot;
and sending the mandarin grading result and the robot conversation voice to the client so that the client can display the mandarin grading result on the mandarin test interface and simulate the animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
In yet another aspect, the present application further provides a storage medium having stored therein computer-executable instructions, which when loaded and executed by a processor, implement the mandarin chinese evaluation method as any one of the above.
According to the technical scheme, the simulated dialogue robot is displayed in the mandarin test interface, the robot conversation voice can be output through the dialogue robot, and the robot conversation voice can guide the user to carry out voice interaction, so that the user can input the user conversation voice used as the mandarin test according to the robot conversation voice, and the complexity of the mandarin test caused by the fact that the user conversation voice can only be read according to the text is avoided. Moreover, the testing process is carried out in a voice interaction mode between the conversation robot and the user, so that a voice signal which can reflect the actual mandarin speech condition of the user more truly can be obtained, and the actual mandarin speech level of the user can be evaluated more truly and accurately according to the user conversation voice input by the user.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the provided drawings without creative efforts.
Fig. 1 shows a schematic diagram of a composition structure of a scene to which the solution of the present application is applicable;
FIG. 2 illustrates a flow diagram of one implementation of the Mandarin assessment method of the present application on the client side;
FIG. 3 illustrates a schematic diagram of a Mandarin testing interface in the present application;
FIG. 4 is a flow chart illustrating another implementation of the Mandarin assessment method of the present application on the server side;
FIG. 5 illustrates a flow interaction diagram of a Mandarin assessment method of the present application;
FIG. 6 is a schematic diagram illustrating an application scenario in which the Mandarin assessment method of the present application is applied;
FIG. 7 illustrates yet another flowchart interaction diagram of a Mandarin assessment method of the present application;
FIG. 8 is a schematic diagram of a component structure of a Mandarin assessment apparatus according to the present application;
FIG. 9 is a schematic diagram of another alternative embodiment of a Mandarin assessment apparatus according to the present application;
fig. 10 is a schematic diagram illustrating a structure of a terminal according to the present application.
Detailed Description
The method and the device for testing the mandarin level of the user are suitable for testing the mandarin level of the user, so that the tested mandarin level is more fit with the actual mandarin level of the user, and the test result is more real and accurate.
The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence robot simulation, dialogue interaction and the like, and also relates to the technologies of voice recognition, processing and the like.
Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.
For ease of understanding, a scenario in which the scheme of the present application is applicable will be described.
Fig. 1 is a schematic diagram showing a structural diagram of a system to which the mandarin chinese evaluation method of the present application is applied.
As shown in fig. 1, the system may include a client 101 and at least one server 102, wherein the client and the server may be connected via a network.
The client can be a mobile phone, a tablet computer or a notebook computer and the like provided with a mandarin evaluation application, and the mandarin evaluation application can be an independent evaluation application and can be an entertainment function module in some game applications. The server may be a server that evaluates applications for the mandarin chinese language. Accordingly, the client can establish connection with the server through the mandarin evaluation application.
In the embodiment of the application, the client can obtain the user conversation voice to be evaluated of the user through voice interaction with the user, and send the user conversation voice to the server so as to analyze the mandarin level of the user conversation voice through the server.
The mandarin chinese evaluation method of the present application is described below with reference to a flowchart.
For ease of understanding, the description is first from the client side. As shown in fig. 2, which shows a schematic flow chart of an embodiment of a mandarin chinese evaluating method according to the present application, the method of the embodiment may be applied to a client, and the method of the embodiment may include:
and S201, displaying a Mandarin testing interface.
Wherein, the mandarin assessment interface displays a simulated conversation robot. The conversation robot can be a virtual robot simulated based on artificial intelligence.
In the embodiment of the application, the mandarin chinese testing interface is an interface presented by the client and used for realizing voice interaction between the user and the simulated conversation robot.
As an optional mode, the client can implement various mandarin test scenarios in the present application, for example, the client can provide mandarin practice tests; alternatively, a Mandarin test is simulated. Correspondingly, the Mandarin testing interface can be a Mandarin practice testing interface, a chatting mode between the user and the conversation robot can be realized based on the Mandarin practice testing interface, and the Mandarin level of the user can be evaluated according to the voice input by the user in the chatting process. The mandarin test interface can also be a simulated mandarin test interface, and some voice interaction required by the user in the mandarin test process can be simulated based on the mandarin test interface.
And S202, acquiring the robot conversation voice to be output by the conversation robot.
Wherein, the robot conversation voice is a voice signal used for guiding a user to perform voice interaction. For example, the robot conversation voice may be a voice question, a voice consultation, etc. that guides the user to a voice chat, such as "what your hobbies are". For another example, the robot conversation voice may be a voice guiding the user to input mandarin contents that need to be tested, for example, the text that needs to be input during mandarin test is "quick", and the robot conversation voice may be "what the word is that the adjective action is particularly quick".
The robot conversation voice may be a conversation voice generated by a client, for example, the client acquires a voice corpus from a preset conversation corpus, and uses the acquired voice corpus as a robot conversation voice to be output. The conversation corpus comprises a plurality of preset voices serving as corpora. The client can randomly obtain the voice corpora from the conversation corpus each time, and also can select the voice corpora which can be used as the reply voice of the conversation voice of the user from the conversation corpus by combining the conversation voice of the user input last time.
Optionally, in consideration of the limited data processing performance of the client, the application may further obtain robot conversation voice sent by the server for the conversation robot to output. For example, the client requests the conversation voice from the server so that the server determines the robot conversation voice that needs to be output.
The robot conversation voice may be a voice signal to be output by the conversation robot determined by the server according to the user conversation voice input by the user of the client for the last time. For the convenience of distinction, the user conversation voice input by the user last time is referred to as historical conversation voice.
The robot conversation voice can also be the current topic prompt voice to be output, which is inquired by the server from the topic prompt voice library. For example, the server may maintain a topic prompt voice library including topic prompt voices corresponding to a plurality of topics, where the topics may be topics in an examination question library simulating a mandarin test or text topics capable of implementing a dialog.
S203, simulating the animation of the robot conversation voice output by the dialogue robot in the Mandarin test interface, and playing the robot conversation voice.
In order to enable a user to feel scenes similar to real conversation, the robot conversation voice output animation in the Mandarin testing interface can be simulated by the conversation robot to output the robot conversation voice while the audio output unit of the client outputs the robot conversation voice, so that the user feels real feelings of face-to-face chatting with the robot.
The simulation of the robot conversation voice output by the conversation robot in the mandarin chinese test interface may be a simulation of a corresponding mouth shape of the robot conversation voice output by the conversation robot, so as to present an animation effect of the robot conversation voice output by the conversation robot.
Of course, in order to avoid the situation that part of the robot conversation voice is missed due to the reasons of inattention of the user and the like, the conversation text corresponding to the robot conversation voice is displayed in the mandarin test interface while the robot conversation voice is output by the dialogue robot. As shown in fig. 3, which shows a schematic diagram of a mandarin test interface, in fig. 3, a dialog robot 301 in the mandarin test interface may present a dynamic effect of outputting speech. Meanwhile, the text 302 of the robot conversation voice output by the dialogue robot is also displayed in the mandarin chinese test interface.
It can be understood that, in order to further improve the real scene experience of the user in conversation with the conversation robot, the robot pose feature associated with the conversation voice of the robot can be obtained. The robot pose features may be the poses that the conversational robot needs to assume, e.g., the robot pose features may include: facial expressive features of the conversation robot, and actions of the conversation robot, among others. For example, the facial expression features may be smiling, skin toning, anger or the like; the actions of the conversation robot may include: hand waving, turning, jumping, movement of a part of the upper or lower limb, etc.
The robot pose characteristics may be generated by the client, for example, robot pose characteristics corresponding to different conversation voices may be preset in the client, and accordingly, robot pose characteristics suitable for the conversation voices of the robot may be queried from a plurality of preset robot pose characteristics. Certainly, the client may also preset some artificial intelligence algorithms, and construct a suitable robot pose feature according to the robot conversation voice in combination with the artificial intelligence algorithms, but the complexity is relatively high.
In consideration of the fact that the client has limited performance and cannot efficiently and accurately generate the robot posture features or is not suitable for storing a large number of robot posture features, the robot posture features obtained by the client can be robot posture features generated by the server according to robot conversation voice. The manner of generating the robot pose feature by the server may be various, and a simple introduction may be performed on the server side subsequently, which is not limited herein.
Correspondingly, after the client obtains the relevant robot posture characteristics, the client can simulate the animation of the conversation robot outputting the robot conversation voice in the mandarin test interface according to the robot posture characteristics. For example, according to the relevant data of the robot pose characteristics, the animation effect of the corresponding expression and action of the dialogue robot is loaded in the Mandarin test interface.
And S204, obtaining the user conversation voice input by the user aiming at the robot conversation voice.
For the sake of convenience of distinction, the voice input by the user is referred to as user conversation voice. The user conversation voice is some voice contents which are required to be fed back by the user according to the received robot conversation voice, so that voice reply given by aiming at the robot conversation voice is realized.
For example, after the simulated dialogue robot outputs the robot dialogue speech, the client may start a sound collection unit, such as a microphone, and collect the user dialogue speech input by the user.
Optionally, in order to enable the user to input a user conversation voice signal as required, the client may obtain the user conversation voice input by the user after the simulated dialogue robot outputs the robot conversation voice and after a voice input instruction of the user is detected.
The voice input instruction of the user can be generated by pressing an execution case. For example, a voice input control key may be set in the mandarin chinese testing interface, and the voice signal input by the user is acquired when the voice input control key is detected to be in a pressed state. As shown in fig. 3, a microphone identifier 303 is displayed in the mandarin chinese testing interface, and if the user wishes to input voice, the user can continuously touch the microphone identifier 303, and the client collects the voice signal input by the user. Accordingly, if the user lifts the finger, but the microphone identifier is not in the touch-down state, the collection of the voice signal of the user is finished.
And S205, sending the user session voice to a server.
The client can directly send the user conversation voice, and the server can execute the analysis of the user conversation voice after receiving the user conversation voice so as to determine the Mandarin level of the user.
In one possible scenario, the client may send a mandarin chinese assessment request to the server, the mandarin chinese assessment request carrying the user conversational speech. Accordingly, the client may parse the user session voice and determine a corresponding mandarin level in response to the mandarin chinese assessment request.
S206, displaying the mandarin grading result of the user conversation voice returned by the server on the mandarin test interface.
The mandarin level score returned by the server may be a mandarin level grade, for example, the mandarin level grade is divided into three levels, i.e., a first level, a second level and a third level, each level is divided into two levels, and correspondingly, the mandarin level score may be a specific grade. The mandarin score may also be a score that characterizes a mandarin level, e.g., a mandarin level of up to 100 points, the mandarin score may be a score between 0 and 100 points.
The server may be provided with a mandarin chinese scoring rule, such as a scoring rule for mandarin chinese level, or a scoring rule for a specific score, etc. Accordingly, after obtaining the user conversation voice, the server can give a corresponding scoring result according to a corresponding scoring rule.
The method for displaying the mandarin chinese scoring result in the mandarin chinese test interface may be various, for example, a display frame may be popped up in the mandarin chinese test interface, and the mandarin chinese scoring result may be displayed in the display frame. In order to reduce interference of displaying the mandarin chinese scoring result with a conversation between the user and the conversation robot, the mandarin chinese scoring result may be displayed in a designated area of a mandarin chinese test interface. For example, the mandarin chinese scoring results are displayed in the lower right corner of the interface, upon which the user's voice conversation interaction with the conversation robot can continue.
It will be appreciated that whether the user is performing a Mandarin exercise or a simulated test using the concepts of the present application, multiple voice interactions between the user and the conversation robot may be involved. Therefore, after the client obtains the user conversation voice, if the server is required to provide the robot conversation voice to be output next by the conversation robot, the server can also determine the robot conversation voice to be output next by the conversation robot while determining the mandarin scoring result corresponding to the user conversation voice of the user by the server.
Correspondingly, the server may send the robot conversation voice to be output to the client while feeding back the mandarin chinese scoring result to the client, in which case the client may return to continue to perform step S202 as above, so that the conversation robot and the user may continue to perform voice interaction.
Optionally, an end identifier for triggering to end the session may be further provided in the mandarin chinese testing interface, and when it is detected that the end identifier is touched, the client may send a session end instruction to the server, where the session end instruction is used to notify the server of the end of the session. In this case, the server may not need to determine the next robot session voice.
Further, in order to enable the user to know the comprehensive mandarin level of the user session voice input in multiple rounds of voice interaction in the session, when the server receives a session ending instruction, the server may determine a comprehensive mandarin scoring result corresponding to at least one user session voice input in the session, and send the comprehensive mandarin scoring result to the client. Accordingly, the client may present the composite mandarin scoring result.
Therefore, the simulated dialogue robot is displayed in the mandarin test interface, the robot conversation voice can be output through the dialogue robot, and the robot conversation voice can guide the user to carry out voice interaction, so that the user can input the user conversation voice used as the mandarin test according to the robot conversation voice, and the complexity of the mandarin test caused by reading only according to the text is avoided. Moreover, the testing process is carried out in a voice interaction mode between the conversation robot and the user, so that a voice signal which can reflect the actual mandarin speech condition of the user more truly can be obtained, and the actual mandarin speech level of the user can be evaluated more truly and accurately according to the user conversation voice input by the user.
The mandarin chinese evaluation method of the present application is described below from the server side. As shown in fig. 4, which shows a schematic flow chart of another embodiment of the mandarin chinese evaluating method according to the present application, the method of the present embodiment is applied to a server side, and the method of the present embodiment may include:
s401, obtaining a Mandarin assessment request sent by a client.
The mandarin test request carries user conversation voice input by a user of the client in a mandarin test interface, and the mandarin test interface displays the simulated conversation robot.
Of course, the client may also send the user session voice directly after obtaining the user session voice, without carrying the user session voice in the form of a mandarin chinese assessment request.
The process of the client obtaining the user session voice may refer to the related introduction of the client side, and is not described herein again.
S402, analyzing the conversation voice of the user to obtain a common Chinese grading result of the conversation voice of the user.
The manner of determining the scoring conditions such as the mandarin level of a segment of voice by the server can be various, and the application does not limit the manner. For ease of understanding, several cases are given as examples for simplicity of explanation:
for example, in a possible case, when the text content of the voice that the user needs to input is known, the standard voice corresponding to the text may be compared with the conversation voice of the user to obtain the voice difference between the conversation voice of the user and the standard voice, and the mandarin chinese scoring result is determined according to the voice difference. For example, in a scenario of simulating a mandarin chinese test, a user conversation voice input by a user is actually a conversation voice matching a text in a test question, in which case a standard voice corresponding to the text in the test question is known, so that a mandarin chinese scoring result can be obtained by comparing the standard voice with the user conversation voice actually input by the user.
In yet another possible scenario, the server recognizes a text characterized by the user session speech using a speech recognition method, determines standard Mandarin speech using the text, then determines a degree of difference between the user session speech and the standard Mandarin speech, and obtains a Mandarin score result based on the degree of difference. Of course, the server may also combine a speech recognition algorithm or a trained mandarin speech scoring model to obtain a mandarin speech scoring result of the user conversation speech accurately.
Of course, the above is an example of two implementations of determining the mandarin score corresponding to the speech of the user session, in practical applications, under the condition of known speech, there are various ways of determining the mandarin level in the speech, and the present application does not limit what way to determine the mandarin score result.
And S403, determining the robot conversation voice to be output by the conversation robot.
For example, the server may randomly determine an interactable voice as the robot conversation voice. For example, one challenge-form sentence is randomly selected as the robot conversation voice from a preset voice library including a plurality of challenge-form voices.
It can be understood that the manner of randomly selecting the voice may cause the directions of each round of voice interaction of the user to be different, which results in no association between each round of voice interaction, affects the fluency of the voice interaction, and does not take advantage of the interest of the voice interaction of the user. Therefore, the robot conversation voice to be output can be comprehensively determined by combining the historical user conversation voice input last time by the user and/or the Mandarin test mode in which the client is positioned.
For example, as an alternative, before the server obtains the mandarin test request, the server may further obtain a test mode indication sent by the client, where the test mode indication is used to indicate the mandarin test mode currently located by the client. Wherein, this mandarin chinese test pattern includes: a mandarin practice mode and a simulated mandarin examination mode.
Accordingly, in the case that the client is in the mandarin chinese exercise mode, the server may determine the reply voice of the user conversation voice matched from the interactive voice library as the robot conversation voice to be output by the dialogue robot. Wherein, the interactive voice library comprises a plurality of voices.
For example, the server may parse the content of the user conversation voice through a voice recognition technology, an artificial intelligence algorithm, and the like, and determine the conversation voice matching the content from the interactive voice library according to the parsed content. The conversation voice matched from the interactive voice library is actually the reply voice interacted with the user under the condition that the user inputs the conversation voice of the user. For example, if the conversation voice of the user is "very good weather today", the matched reply voice may be "weather is good for going out for playing, where you want to go? ".
Similarly, under the condition that the client is in the test mode of simulating Mandarin, the server can inquire the current question prompt voice to be output from the question prompt voice library associated with the Mandarin test library. The Mandarin test question library can comprise a plurality of test questions for testing Mandarin, each test question can be associated with a question prompt voice, and the question prompt voice library comprises a plurality of question prompt voices. The subject prompt voice user carries out voice prompt on the content to be tested of the test subject, so that the user can understand the test subject conveniently, and the user can obtain the specific content or meaning of the test subject.
The server may determine the test questions to be tested each time in sequence according to the question output progress in the test mode of simulating mandarin chinese, for example, after receiving the user session voice fed back by the client, the server may determine the next test question to be tested according to the sequence of the test questions, and obtain the question prompt voice associated with the next test question to be tested.
S404, the Mandarin grading result and the robot conversation voice are sent to the client, so that the client displays the Mandarin grading result on the Mandarin test interface, and animation of the conversation robot outputting the robot conversation voice is output in a simulation mode on the Mandarin test interface.
It can be understood that after the server sends the robot conversation voice to the client, the client may continue to simulate the conversation robot to output the robot conversation voice to guide the user to continue replying to the voice to obtain the user conversation voice, so as to repeat the process continuously to implement multiple voice interactions with the user.
In practical application, before the server receives the robot conversation voice sent by the client, when the server determines that the client presents the mandarin test interface or the server receives that the client needs to start the mandarin test, the server can also determine the robot conversation voice to be output and send the robot conversation voice to the client, so that the client simulates animation of the conversation robot outputting the robot conversation voice and plays the robot conversation voice to guide the user to input the voice.
For example, when the client detects that the user initiates a mandarin test, the client may send the mandarin test start instruction to the server, and the server may determine a robot conversation voice and send the same to the client in response to the mandarin test start instruction.
Optionally, in order to improve the reality of a scene in which the conversation robot interacts with the user in the client, after the server determines the conversation voice of the robot, the server may further generate the robot posture feature according to the conversation voice of the robot.
For example, the server may query, according to the emotion type expressed by the robot conversation voice, robot pose features matching the emotion type from a preset robot pose feature library. For example, if the robot conversation voice is "you are really doing so", the robot pose feature may be pose data that can present "facial expression is happy and the mouth can be covered with hands". For another example, the server may generate a model according to the set pose characteristics, input the robot conversation voice into the model, and output the appropriate robot pose characteristics through the model; or the server combines some gesture feature generation algorithms to construct robot gesture features suitable for the current robot conversation voice, and the like.
Correspondingly, when the server sends the robot dialogue voice to the client, the server can also send the robot gesture feature to the client, so that the client can display the more vivid external gesture feature of the dialogue robot.
Therefore, in this embodiment, the simulated dialogue robot is displayed in the mandarin test interface of the client, the server generates the robot dialogue voice for the telephone robot, and the robot dialogue voice can guide the user to perform voice interaction, so that the user can input the user dialogue voice for mandarin test according to the robot dialogue voice, and complexity of the mandarin test caused by reading only according to a text is avoided. Moreover, the testing process is carried out in a voice interaction mode between the conversation robot and the user, so that a voice signal which can reflect the actual mandarin level of the user more truly is obtained, and the server can evaluate the actual mandarin level of the user more truly and accurately according to the user conversation voice input by the user.
In the embodiment of the present application, in order to improve the flexibility of the user in testing the mandarin level, the client of the present application has two mandarin test modes, one is a mandarin practice mode, and the other is a simulated mandarin examination mode. The user may choose to enter different mandarin chinese test modes as desired.
For example, after the client is started, the client may first present a main interface, where a first test mode option and a second test mode option are presented in the main interface, the first test mode option is an option for triggering entry into a mandarin practice mode, and the second test mode option is an option for triggering entry into a simulated mandarin test mode.
Correspondingly, if the fact that the user clicks or touches the first test mode option is detected, the client responds to the first test mode option selected by the user, and displays a Mandarin Chinese exercise test interface.
And if the user clicks or touches a second test mode option in the main interface, the client responds to the second test mode option selected by the user and displays the simulated Mandarin test interface.
It can be understood that the operations performed by the client and the server may also be different in different mandarin test modes, and for convenience of understanding, the mandarin evaluation method of the present application is described below with respect to the two mandarin test modes, respectively.
The following first introduces a mandarin evaluation method in the mandarin practice mode. As shown in fig. 5, which shows a flow interaction diagram of a mandarin chinese evaluation method according to the present application, the method of this embodiment may include:
s501, the client displays the main interface.
Wherein, the main interface shows: a first test mode option for triggering entry into a Mandarin test mode, and a second test mode option for triggering entry into a simulated Mandarin test mode.
S502, the client displays a Mandarin Chinese exercise test interface in response to the first test mode option selected by the user.
S503, the client sends a test mode indication to the server.
The test mode indication is to indicate that the client is in a Mandarin Chinese exercise mode.
The execution sequence of steps S503 and S502 is not limited to that shown in fig. 5. In practical applications, step S503 may be executed first, and then step S502 is executed, for example, when the client displays data required for the mandarin chinese exercise test interface or some image contents in the interface need to be provided by the server, step S503 may be executed first, so that the server returns data required for displaying the mandarin chinese exercise test interface to the client, and then step S502 is executed. Of course, steps S502 and S503 may be performed simultaneously.
In addition, the step S503 may also be an optional step, for example, the step S503 may not be executed separately, and the test mode indication is fed back to the server when the subsequent client generates the user session voice that needs to be analyzed by the server, which is also applicable to the embodiment.
S504, the client side obtains the voice corpus from the preset conversation corpus, determines the robot conversation voice to be output by the conversation robot according to the voice corpus, and calls preset robot posture characteristics.
It can be understood that, since the client has not opened the voice interaction between the user and the conversation robot, there is no historical user conversation voice that the user has input last time currently, in which case, the client may select a voice corpus from a preset conversation corpus as the voice that the conversation robot starts to output.
The conversational corpus includes a plurality of speech corpora.
For example, the client may randomly select speech corpora from a conversational corpus.
Optionally, the conversational corpus may include at least one speech corpus corresponding to multiple topics. For example, topics may include: weather, occupation, interests, etc. Accordingly, the client can select a target topic to be interacted from multiple topics, and then select a voice corpus matched with the target topic from the conversation corpus.
The client side can preset some basic gesture features of some conversation robots, under the condition, the client side can directly obtain the preset gesture features of the robots, loading efficiency of a mandarin practice test interface is improved, and waiting time of a user is prevented from being too long.
And S505, the client simulates the dialogue robot to output the animation of the robot conversation voice in the Mandarin practice test interface according to the acquired robot posture characteristics, and plays the robot conversation voice.
Wherein, besides the related expressions and actions of the dialog robot, the animation displayed by the mandarin chinese exercise testing interface may also display the text of the robot conversation voice output by the dialog robot, as shown by the text characters 302 in fig. 3; there may also be some explanation matching drawings associated with the robot conversation voice, the explanation matching drawings are used to assist the user in understanding the robot conversation voice, of course, there may also be other relevant matching drawings and the like to assist the user in understanding or prompting the user to communicate, and the like, which is not limited herein.
It should be noted that the above steps S504 and S505 may be executed while loading the mandarin chinese exercise test interface, so that after the mandarin chinese exercise test interface is displayed, an animation of the dialog robot outputting the conversation voice may be quickly presented, so as to improve the dialog effect of the user with the dialog robot.
It is understood that the embodiment of the present application is described by way of example, after the mandarin chinese exercise test interface is presented, the first output robot conversation voice and robot pose are determined by the client. Certainly, in practical applications, during or after the process of presenting the mandarin practice test interface by the client, the server may also feed back the robot conversation voice and the robot pose to the client when confirming that the client needs to present the mandarin practice test interface, so as to complete the actions of outputting the robot conversation voice and the related pose by the conversational robot in the mandarin practice test interface.
S506, the client acquires the user conversation voice input by the user aiming at the robot conversation voice.
As shown in fig. 3, after detecting that the user clicks the icon for inputting the voice, the client may collect the voice input by the user, and use the collected voice as the user conversation voice. Of course, other ways of obtaining the user conversation voice are also applicable to the present embodiment.
Optionally, in order to avoid that the analysis accuracy is affected by the fact that the data size of the user conversation voice is too large due to too long time for the user to input the voice, the client may further set the maximum input duration for the user to input the voice, and if the maximum input duration is exceeded, the client may end the collection of the user conversation voice.
S507, the client sends a Mandarin assessment request to the server.
The mandarin chinese evaluation request carries the user session speech input by the user.
S508, the server analyzes the conversation voice of the user to obtain a common Chinese grading result of the conversation voice of the user.
S509, the server parses the user conversation voice, and generates a conversation reply voice for the conversation robot to reply to the user conversation voice.
For example, the session reply voice may be a reply voice that the server determines from the interactive voice library for use as the historical user session voice.
In this embodiment, the robot conversation voice generated by the server is a voice signal for replying the user conversation voice, so that the conversation robot can make a corresponding reply according to the voice input by the user.
And S510, the server determines the robot posture characteristics suitable for the conversation robot according to the generated robot conversation voice.
The steps S508 to S510 can refer to the related description of the previous embodiment, and are not described herein again.
And S511, the server sends the robot conversation voice, the robot posture characteristic and the Mandarin scoring result to the client.
S512, the client displays the grading result of the Mandarin in the Mandarin contact testing interface, takes the dialogue reply voice as the robot dialogue voice to be output by the dialogue robot, and returns to execute the step S505 until detecting the input contact finishing instruction.
It is understood that after the client obtains the user conversation voice input by the user, the robot conversation voice output by the conversation robot of the client is determined by the server. That is, in the case that there is a historical user conversation voice, the client determines the conversation reply voice returned by the server for the historical user conversation voice as the robot conversation voice to be output by the conversation robot.
In order to enable a clearer understanding of the embodiment of fig. 5, an application scenario is described below.
Fig. 6 is a schematic diagram illustrating an application scenario of the mandarin chinese evaluation method of the present application.
As shown in fig. 6, the client presents a main interface 610 including a dialog robot 611 displayed therein, a "chatting mode" icon 612 for selecting to enter a mandarin practice mode, and a "test mode" icon 613 for the user to select to enter a simulated mandarin test mode.
If the user wishes to practice Mandarin and obtain their own Mandarin level through chatting with the conversation robot, the user may click on a "chatty mode" icon 612 in the main interface 610. In this case, the client would select a topic and present a Mandarin exercise test interface 620 that is related to the topic.
As can be seen from fig. 6, the topic selected in the mandarin chinese exercise testing interface 620 of the client is a topic related to weather, and the client acquires a start sentence related to the topic and a voice corresponding to the start sentence. Meanwhile, the client simulates a dynamic effect that the conversation robot 621 outputs the start sentence "what is the weather today", and plays corresponding voice. At the same time, the beginning sentence corresponding text 622 is displayed in the exercise test interface.
Meanwhile, a user voice input icon 623 is further arranged in the mandarin chinese exercise testing interface 620, and when the user touches and presses the user voice input icon (or after the dialogue robot completes voice output), the client can collect user conversation voice input by the user. To limit the duration of the user's voice input, there is still a remaining voice input duration below the user voice input icon 623 after the user voice capture is initiated, such as a "countdown" displayed below the user voice input icon 623: 29 seconds ".
After the client acquires the user session voice of the user, the client may send the user session voice to the server 630, as shown in step S631 in fig. 6.
Correspondingly, the server 630 parses the user conversation voice, determines a mandarin score, and determines the robot conversation voice to be output in the next sentence of the client, and robot pose characteristics such as robot actions and robot expressions. For example, if the user conversation voice input by the user is "rain at forecast today", the server may analyze that the robot conversation voice to be output may be "rain with umbrella".
Then, the server transmits the mandarin chinese score, the robot conversation voice to be output, and the robot pose feature to the client, as shown in step S632 of fig. 6.
After the client obtains the data returned by the server, the client may update the Mandarin Chinese exercise test interface, as shown in the updated Mandarin Chinese exercise test interface 640 in FIG. 6. In addition to updating the mouth shape and the motion expression of the conversation robot 641 according to the robot conversation voice and the robot posture feature returned from the server, the mandarin practice test interface 640 displays the mandarin score 642 of "forecast raining today" input by the user returned from the server. Also displayed in the interface 640 is text 643 of the speech output by the conversation robot, such as "remember to put umbrella" in rainy weather "displayed in the interface 640.
Wherein to allow the user to continue conversational chat with the conversation robot, the mandarin score 642 portion may be updated in the interface 640 as the user voice input icon (as shown in interface 620, not shown in the interface 640) after the conversation robot completes the voice output of "remember to have umbrella". Of course, the user voice input icon may be displayed while the mandarin chinese score is displayed on the interface 640 so that the foregoing operations may be repeated to enable the user to continue conversational chat with the conversation robot.
Further, after the user has chatted with the conversation robot for a period of time, if the user wishes to end the Mandarin chat exercise, the icon for ending the conversation may be clicked, as shown by "end conversation" icon 644 in interface 640. In this case, the client may request the server for a comprehensive mandarin score for the mandarin exercise, as shown in step S651 of fig. 6.
The server may synthesize the mandarin speech score of the user session speech input by the user each time in the current session, determine the synthesized mandarin speech score of the user this time, and feed back to the client, as shown in step S652 in fig. 6.
Accordingly, the client may present a test interface 660 in which the comprehensive mandarin scores of the user's multiple rounds of conversation with the conversation robot may be displayed, such as the content displayed in the test result display box 661 in the interface 660.
A mandarin evaluation method under the simulated mandarin test mode is described below. As shown in fig. 7, which shows another flow interaction diagram of a mandarin chinese evaluation method according to the present application, the method of this embodiment may include:
s701, the client displays the main interface.
Wherein, the main interface shows: a first test mode option for triggering entry into a Mandarin test mode, and a second test mode option for triggering entry into a simulated Mandarin test mode.
S702, the client responds to the second test mode option selected by the user, and loads and displays the Mandarin simulation test interface.
S703, the client sends the examination question request to the server.
The test question request is used for requesting the server to distribute test questions of Mandarin simulation tests and question prompt voices related to the test questions.
The test question request may indicate that the client is in a simulated Mandarin test mode, and thus, the test question request corresponds to a test mode indication.
Similar to the embodiment of fig. 5, the execution sequence of steps S703 and S702 is not limited to that shown in fig. 7, and the two steps may be performed in the same time or in the same order.
S704, the server selects a set of simulation test questions from the simulation test question library, and determines the test questions to be output according to the sequence of the test questions in the simulation test questions.
S705, the server obtains question prompt voices to be output from a question prompt voice library associated with the simulated test question library according to test questions to be output, and constructs robot posture characteristics related to the question prompt voices.
Wherein, the question prompt voice to be output is used for carrying out voice prompt on the test question to be output.
The process of constructing the robot posture feature related to the topic prompt voice is similar to the process of determining the robot posture feature related to the robot conversation voice, and is not repeated herein.
It can be understood that the present embodiment is an implementation manner of the currently to-be-output question prompt voice queried from the question prompt voice library associated with the mandarin test library. That is, in this embodiment, the server selects a set of simulation test questions from the simulation test question library, and then sequentially uses the test questions in the selected pattern test questions as the test questions to be output. Certainly, in practical application, the method and the device can also simulate an examination question library and also can directly comprise a plurality of test questions, the server can select the plurality of test questions to be tested, and then the test questions to be output are sequentially determined according to the sequence of the plurality of test questions to be tested.
S706, the server sends the test question to be output, the question prompt voice and the robot posture characteristic to the client.
And S707, the client simulates a dialog robot to output animation of the question prompt voice and display the test question on a Mandarin simulation test interface according to the acquired robot posture characteristics, and plays the question prompt voice.
It can be understood that, in this embodiment, the client outputs the question prompting voice through the session robot and displays the content of the test question on the mandarin analog test interface at the same time.
For example, after the server queries the current question prompting voice to be output from the question prompting voice library associated with the Mandarin test question library, the server directly sends the current question prompting voice to the client, and the client prompts the test question currently tested by the user through outputting the question prompting voice.
S708, the client obtains the user conversation voice input by the user.
This step can be referred to the related description of the previous embodiment, and is not described herein again.
S709, the client sends a request message to the server.
The request message carries a Mandarin test request and a test question request, and the Mandarin test request carries the user conversation voice.
Of course, the client may also only send the mandarin test request, and in this case, the mandarin test request may also trigger the server to determine the next test question to be output and the question prompt voice.
S710, the server responds to the Mandarin test request, analyzes the user conversation voice, and obtains a Mandarin scoring result of the user conversation voice.
The step S710 can refer to the related description of the previous embodiment, and is not described herein again.
S711, the server responds to the examination question request, determines the test question to be output according to the sequence of the test questions in the simulated test question, obtains question prompt voices related to the test question to be output, and constructs robot posture characteristics related to the question prompt voices.
And S712, the server sends the mandarin grading result, the test questions to be output, the question prompt voice and the robot posture characteristics to the client.
And S713, the client displays the mandarin grading result and returns to execute the step S707 until the user finishes the mandarin simulation test or finishes the mandarin simulation test.
In another aspect, the present application further provides a mandarin evaluation device corresponding to the mandarin evaluation method at the client side of the present application. As shown in fig. 8, which shows a schematic structural diagram of an embodiment of a mandarin chinese evaluating apparatus according to the present application, the apparatus of the embodiment is applied to a client, and includes:
the interface display unit 801 is used for displaying a mandarin test interface, and the mandarin test interface displays a simulated conversation robot;
a robot voice obtaining unit 802, configured to obtain a robot conversation voice to be output by the conversation robot, where the robot conversation voice is a voice signal used for guiding a user to perform voice interaction;
a dialogue scene simulation unit 803, configured to simulate, in the mandarin chinese test interface, an animation of the dialogue robot outputting the robot conversation voice, and play the robot conversation voice;
a user speech obtaining unit 804, configured to obtain user conversation speech input by a user for the robot conversation speech;
a user voice sending unit 805, configured to send the user session voice to a server;
a scoring result display unit 806, configured to display, on the mandarin chinese test interface, a mandarin chinese scoring result of the user conversation voice returned by the server.
In one possible implementation, the machine speech obtaining unit includes:
the robot conversation voice receiving unit is used for obtaining robot conversation voice which is sent by a server and is used for being output by the conversation robot, and the robot conversation voice is a voice signal to be output by the conversation robot and determined by the server according to historical user conversation voice input by the user last time; or the server inquires the current to-be-output title prompt voice from the title prompt voice library.
In yet another possible implementation manner, the dialog scene simulation unit includes:
the gesture obtaining unit is used for obtaining a robot gesture feature associated with the robot conversation voice, wherein the robot gesture feature is a robot gesture feature preset in the client and associated with the robot conversation voice, or a robot gesture feature generated by the server according to the robot conversation voice;
the animation simulation unit is used for simulating the animation of the conversation robot outputting the robot conversation voice in the Mandarin test interface according to the robot posture characteristics;
and the voice playing unit is used for playing the robot conversation voice while the animation simulation unit simulates the animation of the robot conversation voice output by the conversation robot.
In the above embodiment, in a possible implementation manner, the machine speech obtaining unit further includes:
the robot speech generating unit is used for acquiring speech corpora from a preset conversation corpus under the condition that the historical user conversation speech does not exist at present, and determining the speech corpora as the robot conversation speech to be output by the conversation robot;
the machine voice receiving unit is specifically configured to, in the presence of the historical user session voice, determine a session reply voice returned by the server for the historical user session voice as a robot session voice to be output by the conversation robot, where the session reply voice is a reply voice determined by the server from an interactive voice library and used as the historical user session voice.
Optionally, the apparatus further comprises:
the interface display unit is used for displaying a Mandarin test interface, and the interface display unit is used for displaying a main interface which is provided with a first test mode option and a second test mode option;
the interface display unit is specifically used for responding to a first test mode option selected by a user and displaying a Mandarin Chinese practice test interface.
In another possible implementation manner, the robot speech receiving unit is specifically configured to obtain robot conversation speech returned by the server, where the robot conversation speech is currently to-be-output topic prompt speech queried by the server from a topic prompt speech library associated with a mandarin test library.
Optionally, the apparatus may further include:
the interface display unit is used for displaying a Mandarin test interface, and the interface display unit is used for displaying a main interface before the Mandarin test interface is displayed by the interface display unit, wherein a first test mode option and a second test mode option are displayed in the main interface, the first test mode option is an option for triggering to enter a Mandarin exercise mode, and the second test mode option is an option for triggering to enter a simulated Mandarin test mode;
the interface display unit is specifically used for responding to a second test mode option selected by a user and displaying a mandarin analog test interface.
Further, the apparatus may further include:
the examination question request unit is used for sending an examination question request to the server while the interface display unit displays the Mandarin simulated examination interface, and the examination question request is used for requesting the server to distribute a test question of the Mandarin simulated examination and question prompt voice related to the test question;
the robot voice receiving unit is specifically used for acquiring a test question returned by the server and robot conversation voice related to the test question;
correspondingly, the device also comprises:
and the question display unit is used for displaying the test question in the Mandarin test interface while the conversation scene simulation unit or the animation simulation unit simulates the animation of the conversation robot outputting the robot conversation voice in the Mandarin test interface.
In another aspect, the present application further provides another mandarin evaluation device corresponding to the operation of the server side in the mandarin evaluation method of the present application.
Fig. 9 is a schematic diagram showing another structure of a mandarin chinese evaluating apparatus according to the present application, where the apparatus is applied to a server, and includes:
a request obtaining unit 901, configured to obtain a mandarin assessment request sent by a client, where the mandarin assessment request carries user session voice input by a user of the client in a mandarin test interface, and the mandarin test interface displays a simulated conversation robot;
a voice parsing unit 902, configured to parse the user conversation voice to obtain a mandarin chinese scoring result of the user conversation voice;
a machine voice determination unit 903, configured to determine a robot conversation voice to be output by the conversation robot;
a data sending unit 904, configured to send the mandarin scoring result and the robot conversation voice to the client, so that the client displays the mandarin scoring result on the mandarin test interface, and simulates an animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
Optionally, the apparatus may further include:
a mode receiving unit, configured to obtain a test mode indication sent by a client before the request obtaining unit obtains a mandarin assessment request sent by the client, where the test mode indication is used to indicate a mandarin test mode in which the client is currently located, and the mandarin test mode includes: a Mandarin practice mode and a Mandarin test simulation mode;
the machine voice determination unit includes:
a first machine voice determining unit, configured to determine, when the client is in a mandarin chinese exercise mode, a reply voice of the user session voice matched from an interactive voice library, and determine the reply voice as a robot session voice to be output by the conversation robot;
and the second machine voice determining unit is used for inquiring the current question prompting voice to be output from the question prompting voice library associated with the Mandarin test library under the condition that the client is in the simulated Mandarin test mode.
Optionally, the apparatus may further include: and the gesture construction unit is used for determining the robot gesture characteristics of the conversation robot.
In another aspect, the present application further provides a terminal, and referring to fig. 10, a schematic structural diagram of a terminal to which the mandarin chinese evaluating method according to the embodiment of the present application is applied is shown. In fig. 10, the terminal 1000 can include: a processor 1001 and a memory 1002.
Optionally, the terminal may further include: a communication interface 1003, an input unit 1004, a display 1005, and a communication bus 1006. The processor 1001, the memory 1002, the communication interface 1003, the input unit 1004, and the display 1005 communicate with each other via the communication bus 1006.
In the embodiment of the present application, the processor 1001 may be a central processing unit, an application specific integrated circuit or other programmable logic device.
The processor may call a program stored in the memory 1002, and in particular, the processor may perform the operations performed by the terminal side in fig. 2, 5 to 7.
The memory 1002 is used for storing one or more programs, which may include program codes including computer operation instructions, and in this embodiment, the memory stores at least the programs for implementing the following functions:
displaying a mandarin test interface, wherein the mandarin test interface displays a simulated conversation robot;
acquiring robot conversation voice to be output by the conversation robot, wherein the robot conversation voice is a voice signal for guiding a user to perform voice interaction;
simulating animation of the conversation robot outputting the robot conversation voice in the Mandarin testing interface, and playing the robot conversation voice;
obtaining user conversation voice input by a user aiming at the robot conversation voice;
sending the user session voice to a server;
and displaying the mandarin grading result of the user conversation voice returned by the server on the mandarin test interface.
In one possible implementation, the memory 1002 may include a program storage area and a data storage area, wherein the program storage area may store an operating system and the like; the storage data area may store data created according to the use of the terminal.
The communication interface 1003 may be an interface of a communication module, such as an interface of a GSM module.
The present application may further include an input unit 1005, which may include a touch sensing unit, a keyboard, and the like.
The display 1004 includes a display panel, such as a touch display panel or the like.
Of course, the terminal structure shown in fig. 10 does not constitute a limitation of the terminal in the embodiment of the present application, and in practical applications, the terminal may include more or less components than those shown in fig. 10, or some components may be combined.
In yet another aspect, the present application further provides a server, which may comprise a memory and a processor.
The processor is used for calling and executing the program stored in the memory;
the memory is for storing the program, the program at least for:
obtaining a Mandarin assessment request sent by a client, wherein the Mandarin assessment request carries user conversation voice input by a user of the client in a Mandarin test interface, and the Mandarin test interface displays a simulated conversation robot;
analyzing the user conversation voice to obtain a mandarin grading result of the user conversation voice;
determining robot conversation voice to be output by the conversation robot;
and sending the mandarin grading result and the robot conversation voice to the client so that the client can display the mandarin grading result on the mandarin test interface and simulate the animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
It will be appreciated that the server may also include: the specific structure of the communication interface, the display, the input unit, and the like may be similar to that shown in fig. 10, and will not be described herein again.
In another aspect, the present application further provides a storage medium having stored therein computer-executable instructions, which when loaded and executed by a processor, implement the mandarin chinese evaluation method as in any one of the above embodiments.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (15)

1. A Mandarin Chinese evaluation method is applied to a client and comprises the following steps:
displaying a mandarin test interface, wherein the mandarin test interface displays a simulated conversation robot;
acquiring robot conversation voice to be output by the conversation robot, wherein the robot conversation voice is a voice signal for guiding a user to perform voice interaction;
simulating animation of the conversation robot outputting the robot conversation voice in the Mandarin testing interface, and playing the robot conversation voice;
obtaining user conversation voice input by a user aiming at the robot conversation voice;
sending the user session voice to a server;
and displaying the mandarin grading result of the user conversation voice returned by the server on the mandarin test interface.
2. The method of claim 1, wherein the obtaining the robot conversation voice to be output by the conversational robot comprises:
acquiring robot conversation voice which is sent by a server and used for being output by the conversation robot, wherein the robot conversation voice is a voice signal to be output by the conversation robot and is determined by the server according to historical user conversation voice input by the user last time; or the server inquires the current to-be-output title prompt voice from the title prompt voice library.
3. The method of claim 2, wherein the simulating in the Mandarin test interface the dialog robot outputting the animation of the robot session speech comprises:
acquiring a robot posture characteristic associated with the robot conversation voice, wherein the robot posture characteristic is a robot posture characteristic preset in the client and associated with the robot conversation voice, or the robot posture characteristic is generated by the server according to the robot conversation voice;
and according to the robot posture characteristics, simulating the animation of the conversation robot outputting the robot conversation voice in the Mandarin test interface.
4. The method according to claim 2 or 3, wherein the obtaining of the robot conversation voice to be output by the dialogue robot further comprises:
under the condition that the historical user conversation voice does not exist currently, acquiring a voice corpus from a preset conversation corpus, and determining the voice corpus as the robot conversation voice to be output by the conversation robot;
the obtaining of the robot conversation voice sent by the server and used for the conversation robot to output comprises the following steps:
and under the condition that the historical user conversation voice exists, determining the conversation reply voice returned by the server aiming at the historical user conversation voice as the robot conversation voice to be output by the conversation robot, wherein the conversation reply voice is determined by the server from an interactive voice library and is used as the reply voice of the historical user conversation voice.
5. The method of claim 4, prior to said presenting a Mandarin testing interface, further comprising:
displaying a main interface, wherein a first test mode option and a second test mode option are displayed in the main interface, the first test mode option is an option for triggering to enter a Mandarin practice mode, and the second test mode option is an option for triggering to enter a simulated Mandarin test mode;
the reveal mandarin chinese test interface includes:
and displaying the Mandarin Chinese exercise test interface in response to the first test mode option selected by the user.
6. The method of claim 2 or 3, wherein obtaining the robot conversation voice sent by the server for output by the conversation robot comprises:
and acquiring robot conversation voice returned by the server, wherein the robot conversation voice is the current question prompting voice to be output, which is inquired by the server from a question prompting voice library associated with the Mandarin test library.
7. The method of claim 6, wherein prior to said presenting a Mandarin testing interface, further comprising:
displaying a main interface, wherein a first test mode option and a second test mode option are displayed in the main interface, the first test mode option is an option for triggering to enter a Mandarin practice mode, and the second test mode option is an option for triggering to enter a simulated Mandarin test mode;
the reveal mandarin chinese test interface includes:
and displaying the Mandarin simulation test interface in response to the second test mode option selected by the user.
8. The method of claim 7, while presenting the Mandarin simulation test interface, further comprising:
sending an examination question request to a server, wherein the examination question request is used for requesting the server to distribute a test question of a Mandarin simulation examination and question prompt voice related to the test question;
the obtaining of the robot conversation voice returned by the server comprises the following steps:
acquiring a test question returned by a server and robot conversation voice related to the test question;
when the mandarin chinese test interface simulates the animation of the robot conversation voice output by the dialogue robot, the method further comprises the following steps:
and displaying the test title in the Mandarin testing interface.
9. A Mandarin Chinese evaluating method is applied to a server and comprises the following steps:
obtaining a Mandarin assessment request sent by a client, wherein the Mandarin assessment request carries user conversation voice input by a user of the client in a Mandarin test interface, and the Mandarin test interface displays a simulated conversation robot;
analyzing the user conversation voice to obtain a mandarin grading result of the user conversation voice;
determining robot conversation voice to be output by the conversation robot;
and sending the mandarin grading result and the robot conversation voice to the client so that the client can display the mandarin grading result on the mandarin test interface and simulate the animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
10. The method according to claim 9, wherein before the obtaining of the mandarin chinese assessment request sent by the client, further comprising:
obtaining a test mode indication sent by the client, where the test mode indication is used to indicate a mandarin test mode in which the client is currently located, and the mandarin test mode includes: a Mandarin practice mode and a Mandarin test simulation mode;
the determining of the robot conversation voice to be output by the dialogue robot includes:
under the condition that the client is in a Mandarin practice mode, determining reply voice of the user conversation voice matched from an interactive voice library as robot conversation voice to be output by the conversation robot;
and under the condition that the client is in a simulated Mandarin test mode, inquiring current question prompt voice to be output from a question prompt voice library associated with the Mandarin test library.
11. A mandarin chinese evaluating device, characterized in that, be applied to the customer end, includes:
the interface display unit is used for displaying a mandarin test interface, and the mandarin test interface displays the simulated conversation robot;
the robot conversation voice obtaining unit is used for obtaining robot conversation voice to be output by the conversation robot, and the robot conversation voice is a voice signal used for guiding a user to perform voice interaction;
the dialogue scene simulation unit is used for simulating the animation of the dialogue robot outputting the robot conversation voice in the Mandarin test interface and playing the robot conversation voice;
a user voice obtaining unit for obtaining a user conversation voice input by a user for the robot conversation voice;
the user voice sending unit is used for sending the user conversation voice to a server;
and the scoring result display unit is used for displaying the Mandarin scoring result of the user conversation voice returned by the server on the Mandarin testing interface.
12. A mandarin chinese evaluating apparatus, comprising:
the system comprises a request acquisition unit, a dialogue robot and a dialogue processing unit, wherein the request acquisition unit is used for acquiring a Mandarin assessment request sent by a client, the Mandarin assessment request carries user conversation voice input by a user of the client in a Mandarin test interface, and the Mandarin test interface displays a simulated dialogue robot;
the voice analyzing unit is used for analyzing the user conversation voice to obtain a mandarin grading result of the user conversation voice;
a machine voice determination unit for determining a robot conversation voice to be output by the conversation robot;
and the data sending unit is used for sending the mandarin scoring result and the robot conversation voice to the client so that the client can display the mandarin scoring result on the mandarin test interface and simulate the animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
13. A terminal comprising a processor and a memory; wherein:
a processor and a memory;
the processor is used for calling and executing the program stored in the memory;
the memory is configured to store the program, the program at least to:
displaying a mandarin test interface, wherein the mandarin test interface displays a simulated conversation robot;
acquiring robot conversation voice to be output by the conversation robot, wherein the robot conversation voice is a voice signal for guiding a user to perform voice interaction;
simulating animation of the conversation robot outputting the robot conversation voice in the Mandarin testing interface, and playing the robot conversation voice;
obtaining user conversation voice input by a user aiming at the robot conversation voice;
sending the user session voice to a server;
and displaying the mandarin grading result of the user conversation voice returned by the server on the mandarin test interface.
14. A server, comprising a processor and a memory; wherein:
a processor and a memory;
the processor is used for calling and executing the program stored in the memory;
the memory is configured to store the program, the program at least to:
obtaining a Mandarin assessment request sent by a client, wherein the Mandarin assessment request carries user conversation voice input by a user of the client in a Mandarin test interface, and the Mandarin test interface displays a simulated conversation robot;
analyzing the user conversation voice to obtain a mandarin grading result of the user conversation voice;
determining robot conversation voice to be output by the conversation robot;
and sending the mandarin grading result and the robot conversation voice to the client so that the client can display the mandarin grading result on the mandarin test interface and simulate the animation of the conversation robot outputting the robot conversation voice on the mandarin test interface.
15. A storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, implement a mandarin chinese evaluation method according to any one of claims 1 to 8, or 9 to 10.
CN201911095274.2A 2019-11-11 2019-11-11 Mandarin assessment method, device, equipment and storage medium Pending CN110808038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911095274.2A CN110808038A (en) 2019-11-11 2019-11-11 Mandarin assessment method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911095274.2A CN110808038A (en) 2019-11-11 2019-11-11 Mandarin assessment method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110808038A true CN110808038A (en) 2020-02-18

Family

ID=69501974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911095274.2A Pending CN110808038A (en) 2019-11-11 2019-11-11 Mandarin assessment method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110808038A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111968621A (en) * 2020-08-10 2020-11-20 腾讯科技(深圳)有限公司 Audio testing method and device and computer readable storage medium
CN112037763A (en) * 2020-08-27 2020-12-04 腾讯科技(深圳)有限公司 Service testing method and device based on artificial intelligence
CN112365752A (en) * 2020-12-03 2021-02-12 安徽信息工程学院 Parent-child interaction type early education system
CN113691595A (en) * 2021-08-12 2021-11-23 深圳追一科技有限公司 Interactive interface generation method and device, computer equipment and storage medium
CN114339303A (en) * 2021-12-31 2022-04-12 北京有竹居网络技术有限公司 Interactive evaluation method and device, computer equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568475A (en) * 2011-12-31 2012-07-11 安徽科大讯飞信息科技股份有限公司 System and method for assessing proficiency in Putonghua
CN103258340A (en) * 2013-04-17 2013-08-21 中国科学技术大学 Pronunciation method of three-dimensional visual Chinese mandarin pronunciation dictionary with pronunciation being rich in emotion expression ability
CN105741831A (en) * 2016-01-27 2016-07-06 广东外语外贸大学 Spoken language evaluation method based on grammatical analysis and spoken language evaluation system
CN205656788U (en) * 2015-12-08 2016-10-19 山东理工职业学院 Mandarin simulation tests learning system
CN106777018A (en) * 2016-12-08 2017-05-31 竹间智能科技(上海)有限公司 To the optimization method and device of read statement in a kind of intelligent chat robots
CN107294837A (en) * 2017-05-22 2017-10-24 北京光年无限科技有限公司 Engaged in the dialogue interactive method and system using virtual robot
CN109712627A (en) * 2019-03-07 2019-05-03 深圳欧博思智能科技有限公司 It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon
CN109785698A (en) * 2017-11-13 2019-05-21 上海流利说信息技术有限公司 Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test
CN109817244A (en) * 2019-02-26 2019-05-28 腾讯科技(深圳)有限公司 Oral evaluation method, apparatus, equipment and storage medium
US20190166071A1 (en) * 2017-11-27 2019-05-30 Electronics And Telecommunications Research Institute Chatbot system and service method thereof
CN110085226A (en) * 2019-04-25 2019-08-02 广州智伴人工智能科技有限公司 A kind of voice interactive method based on robot

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568475A (en) * 2011-12-31 2012-07-11 安徽科大讯飞信息科技股份有限公司 System and method for assessing proficiency in Putonghua
CN103258340A (en) * 2013-04-17 2013-08-21 中国科学技术大学 Pronunciation method of three-dimensional visual Chinese mandarin pronunciation dictionary with pronunciation being rich in emotion expression ability
CN205656788U (en) * 2015-12-08 2016-10-19 山东理工职业学院 Mandarin simulation tests learning system
CN105741831A (en) * 2016-01-27 2016-07-06 广东外语外贸大学 Spoken language evaluation method based on grammatical analysis and spoken language evaluation system
CN106777018A (en) * 2016-12-08 2017-05-31 竹间智能科技(上海)有限公司 To the optimization method and device of read statement in a kind of intelligent chat robots
CN107294837A (en) * 2017-05-22 2017-10-24 北京光年无限科技有限公司 Engaged in the dialogue interactive method and system using virtual robot
CN109785698A (en) * 2017-11-13 2019-05-21 上海流利说信息技术有限公司 Method, apparatus, electronic equipment and medium for spoken language proficiency evaluation and test
US20190166071A1 (en) * 2017-11-27 2019-05-30 Electronics And Telecommunications Research Institute Chatbot system and service method thereof
CN109817244A (en) * 2019-02-26 2019-05-28 腾讯科技(深圳)有限公司 Oral evaluation method, apparatus, equipment and storage medium
CN109712627A (en) * 2019-03-07 2019-05-03 深圳欧博思智能科技有限公司 It is a kind of using speech trigger virtual actor's facial expression and the voice system of mouth shape cartoon
CN110085226A (en) * 2019-04-25 2019-08-02 广州智伴人工智能科技有限公司 A kind of voice interactive method based on robot

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111968621A (en) * 2020-08-10 2020-11-20 腾讯科技(深圳)有限公司 Audio testing method and device and computer readable storage medium
CN111968621B (en) * 2020-08-10 2022-08-30 腾讯科技(深圳)有限公司 Audio testing method and device and computer readable storage medium
CN112037763A (en) * 2020-08-27 2020-12-04 腾讯科技(深圳)有限公司 Service testing method and device based on artificial intelligence
CN112037763B (en) * 2020-08-27 2023-10-13 腾讯科技(深圳)有限公司 Service testing method and device based on artificial intelligence
CN112365752A (en) * 2020-12-03 2021-02-12 安徽信息工程学院 Parent-child interaction type early education system
CN113691595A (en) * 2021-08-12 2021-11-23 深圳追一科技有限公司 Interactive interface generation method and device, computer equipment and storage medium
CN114339303A (en) * 2021-12-31 2022-04-12 北京有竹居网络技术有限公司 Interactive evaluation method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110808038A (en) Mandarin assessment method, device, equipment and storage medium
CN110288077B (en) Method and related device for synthesizing speaking expression based on artificial intelligence
US11397888B2 (en) Virtual agent with a dialogue management system and method of training a dialogue management system
CN110807388B (en) Interaction method, interaction device, terminal equipment and storage medium
CN107818798B (en) Customer service quality evaluation method, device, equipment and storage medium
CN107040452B (en) Information processing method and device and computer readable storage medium
CN112262430A (en) Automatically determining language for speech recognition of a spoken utterance received via an automated assistant interface
CN111538456A (en) Human-computer interaction method, device, terminal and storage medium based on virtual image
CN111899576A (en) Control method and device for pronunciation test application, storage medium and electronic equipment
KR20220130000A (en) Ai avatar-based interaction service method and apparatus
CN110955818A (en) Searching method, searching device, terminal equipment and storage medium
KR20200059112A (en) System for Providing User-Robot Interaction and Computer Program Therefore
CN116821290A (en) Multitasking dialogue-oriented large language model training method and interaction method
CN111639218A (en) Interactive method for spoken language training and terminal equipment
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
KR101567154B1 (en) Method for processing dialogue based on multiple user and apparatus for performing the same
CN115565518B (en) Method for processing player dubbing in interactive game and related device
CN114330285B (en) Corpus processing method and device, electronic equipment and computer readable storage medium
CN111324710B (en) Online investigation method and device based on virtual person and terminal equipment
CN112820265B (en) Speech synthesis model training method and related device
CN113987142A (en) Voice intelligent interaction method, device, equipment and storage medium with virtual doll
CN110717020B (en) Voice question-answering method, device, computer equipment and storage medium
CN114519347A (en) Method and device for generating conversation content for language and vocabulary learning training
CN109359177B (en) Multi-mode interaction method and system for story telling robot
CN111523343B (en) Reading interaction method, device, equipment, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40021622

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination