CN110085226A - A kind of voice interactive method based on robot - Google Patents

A kind of voice interactive method based on robot Download PDF

Info

Publication number
CN110085226A
CN110085226A CN201910337225.9A CN201910337225A CN110085226A CN 110085226 A CN110085226 A CN 110085226A CN 201910337225 A CN201910337225 A CN 201910337225A CN 110085226 A CN110085226 A CN 110085226A
Authority
CN
China
Prior art keywords
voice
robot
user
intonation
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910337225.9A
Other languages
Chinese (zh)
Other versions
CN110085226B (en
Inventor
王健
苏战
刘卫平
王浩宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhi Co Artificial Intelligence Technology Co Ltd
Original Assignee
Guangzhou Zhi Co Artificial Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhi Co Artificial Intelligence Technology Co Ltd filed Critical Guangzhou Zhi Co Artificial Intelligence Technology Co Ltd
Priority to CN201910337225.9A priority Critical patent/CN110085226B/en
Publication of CN110085226A publication Critical patent/CN110085226A/en
Application granted granted Critical
Publication of CN110085226B publication Critical patent/CN110085226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Manipulator (AREA)

Abstract

The present invention provides a kind of voice interactive methods based on robot, comprising: robot receives the speech trigger instruction of user's input, instructs starting robot according to the speech trigger;Enter corresponding operating mode according to the operating mode selection instruction of user's input;It is operated according to the corresponding operating process of corresponding operating mode.

Description

A kind of voice interactive method based on robot
Technical field
The present invention relates to technical field of voice interaction, in particular to a kind of voice interactive method based on robot.
Background technique
With the rapid development of information technology, the development of especially internet, data information deepens continuously.Robot Gradually play the part of some important roles in our daily life, while requirement of the people for robot is also higher and higher, Robot is needed to have more functions to further improve the quality of life of the mankind.
With the development of society, children education electronic equipment is more and more intelligent, robot becomes the important of children education electronic equipment Component part, robot can help children to carry out edifying intelligence and training language, by proposing some problems to children, improve Children organize the ability of language, to promote comprehensive growth of children.
Summary of the invention
The present invention provides a kind of voice interactive method based on robot, to provide one kind can by various modes with The technology of user's progress interactive voice.
In order to solve the above technical problems, the invention proposes a kind of voice interactive methods based on robot, comprising:
Robot receives the speech trigger instruction of user's input, instructs starting robot according to the speech trigger;
Enter corresponding operating mode according to the operating mode selection instruction of user's input;
It is operated according to the corresponding operating process of corresponding operating mode.
Preferably, the operating mode includes with reading mode;
It is described to include: with reading mode
The robot actively plays the first voice;
The robot waits the response of user after playing first voice, and acquires the second voice of user's sending;
The robot analyzes second voice, and obtains the parameter attribute of second voice;
The robot carries out the parameter attribute of second voice parameter attribute corresponding with first voice pair Than, and made an appraisal according to comparing result to second voice.
Preferably, the robot is by the parameter attribute of second voice parameter attribute corresponding with first voice It compares, comprising:
First voice is cut into j sections, after extract this j sections of n parameter index value respectively after sentenced using formula (1) It is disconnected
Average=S/n
(1)
Wherein btiAfter first voice is cut, the value of t sections of i-th of parameter index, aiFor the second language I-th of parameter index value of sound, average are final score, and n is parameter index number;S,StFor intermediate parameters, t=1,2, 3 ... j, i=1,2,3 ... n finally judge whether average is greater than 0.9, if it is, determining the ginseng of second voice Number feature parameter attribute corresponding with first voice passes through comparison;
At this point, described made an appraisal according to comparing result to second voice includes: to issue the user with reading effect Good evaluation.
Preferably, the operating mode includes dialogue mode;
Each speech recognition result and its corresponding first identifier are stored in the robot;It is also stored up in the robot There are third voice and second identifier corresponding with the third voice;The robot includes sound bank;The first identifier There are corresponding relationships with second identifier;
The dialogue mode includes:
Step A1, the described robot by the analysis of the second voice that issues of collected user, obtain the second language The speech recognition result of sound;
Step A2, judge in the sound bank with the presence or absence of first identifier corresponding with the speech recognition result of the second voice Corresponding second identifier:
If step A3, robot can be inquired in the sound bank second voice speech recognition result it is corresponding Second identifier corresponding to first identifier, then robot plays the corresponding first identifier of speech recognition result of second voice Third voice corresponding to corresponding second identifier;
If robot does not inquire the corresponding first identifier institute of speech recognition result of the second voice in the sound bank Corresponding second identifier, then robot executes predetermined registration operation.
Preferably, the robot executes predetermined registration operation, comprising: prompt user re-emits the second voice;
Or
The robot executes predetermined registration operation, comprising:
It is stored with a speech database in robot, the corresponding k of p different phonetic are included in the speech database Index, the matrix Y of composition
Wherein bisFor s-th of finger target value of i-th voice in speech database, i=1,2,3 ... ..p s=1,2, 3,……k;This k index is extracted to the second voice that user issues, vector X is formed, vector X is increased to the first of Y matrix Row, obtains new matrix Z, and wherein Z is k column p+1 row, then available matrix Z
Wherein the value of the i-th column of Z is Zi=(z1i, z2i, z3i…z(p+1)iThe value of each column is all carried out with following formula) ', Standardization,
zzis=zis/max(Zs)
Wherein, max (Zs) be matrix Z s arrange maximum value, zisFor the value of the s column of the i-th row of matrix Z, zzisFor Value after standardization after the value of column all for Z all standardizes, forms new matrix ZZ, then judges the 2nd row of new matrix ZZ To the last line correlation with the data of the first row respectively, acquired using following formula:
Wherein, piThe relevance size of i-th voice, ZZ in the second voice and speech database issued for user1For square The value of all column passes through zz in battle array Zis=zis/max(Zs) the first row of new matrix ZZ that is formed after processing, ZZi+1For matrix ZZ's I+1 row, zz1tFor t-th of finger target value of the 1st row of new matrix ZZ;E (X) is the desired value of X, i=1,2,3......p, s =1,2,3.....k, final choice piIn it is maximum one value, then extract the language in speech database corresponding to the maximum value Sound obtains speech recognition result corresponding to the voice extracted, using the speech recognition result as second voice Speech recognition result, and it re-execute the steps A2-A3.
Preferably, the operating mode includes enquirement mode;The enquirement mode includes:
The robot acquires audio and image information within the scope of its pre-determined distance, and to the audio and image information It is analyzed, obtains and put question to theme;
The 4th voice corresponding with the enquirement theme is inquired according to the enquirement theme in sound bank by the robot,
If the robot inquires the 4th voice, robot plays the 4th voice;
If the robot does not inquire the 4th voice, the enquirement of robot user according to the second speech analysis is inclined To robot is inclined in sound bank according to the enquirement and inquires the 5th voice corresponding with the enquirement tendency, and robot is broadcast Put the 5th voice;
If robot does not inquire the 5th voice, robot shuffle is pre-stored in the 6th language in the sound bank Sound;
Defining user is the 7th voice to the response of the 4th voice, the 5th voice or the 6th voice;
The robot analyzes collected 7th voice, and makes based on the analysis results to the 7th voice Evaluation.
Preferably, the robot is by the parameter attribute of second voice parameter attribute corresponding with first voice It compares, pronunciation and the pronunciation comparison process of first voice including second voice, comprising:
The robot obtains the pronunciation character of the second voice;
The robot obtains the pronunciation character of second voice according to the pronunciation character of the first voice and the second voice Corresponding target information;
Target voice corresponding with the target information is searched in sound bank by the robot, obtains the target voice Corresponding articulation problems collection;
The robot identifies the articulation problems of second voice according to the articulation problems collection;
A variety of articulation problems that the robot can concentrate the articulation problems score respectively, and voice output institute State the articulation problems of the second voice;
And/or
The robot carries out the parameter attribute of second voice parameter attribute corresponding with first voice pair Than, and made an appraisal according to comparing result to second voice, comprising:
The articulation problems include front and back nasal sound problem, put down and stick up tongue problem and tone problem, wherein
The robot is obtained the front and back nasal sound scoring of the second voice respectively, is put down and stick up tongue by the analysis to the second voice Scoring and tone scoring;
If front and back nasal sound scoring flat stuck up tongue scoring and tone scoring one of them or multinomial is commented lower than preset standard Point, then robot prompt user issues the second voice again;
If front and back nasal sound scoring, it is flat stick up tongue scoring and tone scoring is above preset standard scoring, robot is defeated The pronunciation grade of second voice is outstanding out.
Preferably, the robot is by the parameter attribute of second voice parameter attribute corresponding with first voice It compares, and is made an appraisal according to comparing result to second voice, the intonation including second voice and described the The intonation comparison process of one voice, comprising:
The robot obtains the intonation feature of the second voice;
The robot obtains intonation similarity according to the intonation feature of the first voice and the second voice;
Intonation similarity between being defined on 100%~90% is top grade;
Intonation similarity between being defined on 90%~70% is middle rank;
It is rudimentary for being defined on 70% intonation similarity below;
The robot judges the intonation grade of second voice for top grade, middle rank or low according to the intonation similarity Grade;
When the intonation of second voice is determined as rudimentary, the robot prompt user issues the second language again Sound;
And/or
The robot carries out the parameter attribute of second voice parameter attribute corresponding with first voice pair Than, and made an appraisal according to comparing result to second voice, the language of the word speed including the second voice and first voice Fast comparison process, comprising:
The robot detects the change point in the second voice, is divided into second voice according to the change point more A voice segments;
The robot extracts energy envelope, and the local maxima by finding out energy envelope from the multiple voice segments Value point determines syllable number, and then determines the word speed of second voice;
The word speed of first voice is standard word speed;
If the word speed of second voice is in the preset range of the standard word speed, robot determines described second The word speed grade of voice is standard, while exporting the judgement result;
If the word speed of second voice is not in the preset range of the standard word speed, robot prompts user again It is primary to issue the second voice.
Preferably, the robot further includes a display screen, and the display screen can show that the pronunciation of second voice is asked Topic, pronunciation grade, intonation grade, word speed grade or prompting user issue the prompt of the second voice again.
Preferably, the robot further includes micro-camera device, for acquiring the figure in the robot preset range As information, and described image information is analyzed, determines the first enquirement theme;
The robot further includes audio collecting device, for acquiring the audio in the robot preset range, and it is right The audio is analyzed, and determines the second enquirement theme;
The robot puts question to theme, second that theme or the first enquirement theme and second is putd question to put question to master according to described first The combination of topic, which determines, puts question to theme.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the block diagram of voice interactive method in the embodiment of the present invention.
Specific embodiment
Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, it should be understood that preferred reality described herein Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.
The embodiment of the present invention proposes a kind of voice interactive method based on robot, robot is used for, such as Fig. 1 institute Show, the method comprising the steps of 101-103:
Step 101, robot receive the speech trigger instruction of user's input, instruct starting machine according to the speech trigger People.
Step 102 enters corresponding operating mode according to the operating mode selection instruction of user's input.
The operating mode may include with the either mode in reading mode, dialogue mode and enquirement mode.
Step 103 is operated according to the corresponding operating process of corresponding operating mode.
In one embodiment, described to include: with reading mode
The robot actively plays the first voice;
The robot waits the response of user after playing first voice, and acquires the second voice of user's sending;
The robot analyzes second voice, and obtains the parameter attribute of second voice;The ginseng Number feature includes pronunciation, intonation, word speed etc.;
The robot carries out the parameter attribute of second voice parameter attribute corresponding with first voice pair Than, and made an appraisal according to comparing result to second voice.
Above-mentioned technical proposal the utility model has the advantages that the present embodiment by the way that the voice of the voice of user and robot is compared Analysis, the voice issued to user make specific evaluation, can allow a user knowledge of itself current articulation problems, be continuously improved and use The pronunciation level at family.
Wherein, the robot by the parameter attribute of second voice parameter attribute corresponding with first voice into Row comparison, comprising:
First voice is cut into j sections, after extract this j sections of n parameter index value respectively after sentenced using formula (1) It is disconnected
Average=S/n
(1)
Wherein btiAfter first voice is cut, the value of t sections of i-th of parameter index, aiFor the second language I-th of parameter index value of sound, average are final score, and n is parameter index number;S,StFor intermediate parameters, t=1,2, 3 ... j, i=1,2,3 ... n finally judge whether average is greater than 0.9, if it is, determining the ginseng of second voice Number feature parameter attribute corresponding with first voice passes through comparison;
At this point, described make an appraisal to second voice according to comparing result, comprising: issue the user with reading effect Good evaluation.
By the above calculating process, may make finally more acurrate to the evaluation of the second voice of user.
In one embodiment, each speech recognition result and its corresponding first identifier are stored in the robot; Third voice and second identifier corresponding with the third voice are also stored in the sound bank of the robot;Third voice is The pre-set feedback voice to speech recognition result;There are corresponding relationships with second identifier for the first identifier;
The dialogue mode includes:
Step A1, the described robot by the analysis of the second voice that issues of collected user, obtain the second language The speech recognition result of sound;
Step A2, judge in the sound bank with the presence or absence of first identifier corresponding with the speech recognition result of the second voice Corresponding second identifier:
If step A3, robot can be inquired in the sound bank second voice speech recognition result it is corresponding Second identifier corresponding to first identifier, then robot plays the corresponding first identifier of speech recognition result of second voice Third voice corresponding to corresponding second identifier;
If robot does not inquire the corresponding first identifier institute of speech recognition result of the second voice in the sound bank Corresponding second identifier, then robot executes predetermined registration operation.
Above-mentioned technical proposal the utility model has the advantages that being able to achieve the response for the problem of robot is to user through this embodiment.
Wherein, the robot executes predetermined registration operation, comprising: prompt user re-emits the second voice.
Or
The robot executes predetermined registration operation, comprising:
It is stored with a speech database in robot, the corresponding k of p different phonetic are included in the speech database Index, the matrix Y of composition
Wherein bisFor s-th of finger target value of i-th voice in speech database, i=1,2,3.....p s=1,2, 3 ... k;The second voice issued to user extracts this k index, and the k index includes voice duration, amplitude, frequency Deng forming vector X, vector X increased to the first row of Y matrix, obtains new matrix Z, wherein Z is k column p+1 row, then can be with Obtain matrix Z
Wherein the value of the i-th column of Z is Zi=(z1i, z2i, z3i…z(p+1)iThe value of each column is all carried out with following formula) ', Standardization,
zzis=zis/max(Zs)
Wherein, max (Zs) be matrix Z s arrange maximum value, zisFor the value of the s column of the i-th row of matrix Z, zzisFor Value after standardization after the value of column all for Z all standardizes, forms new matrix ZZ, then judges the 2nd row of new matrix ZZ To the last line correlation with the data of the first row respectively, acquired using following formula:
Wherein, piThe relevance size of i-th voice, ZZ in the second voice and speech database issued for user1For square The value of all column passes through zz in battle array Zis=zis/max(Zs) the first row of new matrix ZZ that is formed after processing, ZZi+1For matrix ZZ's I+1 row, zz1tFor t-th of finger target value of the 1st row of new matrix ZZ;E (X) is the desired value of X, i=1,2,3......p, s =1,2,3.....k, final choice piIn it is maximum one value, then extract the language in speech database corresponding to the maximum value Sound obtains speech recognition result corresponding to the voice extracted, using the speech recognition result as second voice Speech recognition result, and it re-execute the steps A2-A3.
By the above calculating process, the speech recognition knot of the second voice issued with user can be not present in robot When the corresponding feedback voice of fruit, it can find that the second voice issued with user is immediate, machine by algorithm above The speech recognition result of people's storage to play the corresponding feedback voice of the speech recognition result avoids that user can not be carried out Feedback, and it is also higher to feed back accuracy.
Also, by the technology, not only accurately the voice in second voice and speech database can be carried out Matching, while the rejecting of voice segments has been carried out before matching, eliminate the sound of ambient noise that may be present and other people Interference keep result more reliable, and matching when used many indexes carry out matching make matched accuracy rate higher, with And with simple machine algorithm, allow matched efficiency faster.Simultaneously in matching because every column data is all standardized, Will not be huge or huge small because of some index value when allowing to carry out association analysis below and influence associated value, and Will not be different because of the linear module between various indexs, and associated result is had an impact.
In one embodiment, operating mode can also include enquirement mode, and enquirement mode includes:
Robot acquires audio and image information in its preset range, and analyzes audio and image information, obtains Theme must be putd question to;
The 4th voice corresponding with theme is putd question to is inquired according to enquirement theme in sound bank by robot,
If robot inquires the 4th voice, robot plays the 4th voice;
If robot does not inquire the 4th voice, robot is used according to the second speech analysis that collected user issues The enquirement at family is inclined to, robot the 5th voice corresponding with tendency is putd question to of the inquiry in sound bank, robot according to enquirement tendency Play the 5th voice;
If robot does not inquire the 5th voice, robot shuffle is pre-stored in the 6th voice in sound bank;
Defining user is the 7th voice to the response of the 4th voice, the 5th voice or the 6th voice;
Robot analyzes collected 7th voice, and makes an appraisal based on the analysis results to the 7th voice.
The working principle of above-mentioned technical proposal: in the present embodiment, when user's selection enters enquirement mode, robot is adopted Audio and image information in its preset range collected and analyzed, the enquirement theme that robot will be putd question to actively, robot obtained The 4th voice corresponding with the enquirement theme is inquired in sound bank, if robot inquires the 4th voice, robot is broadcast Put the 4th voice;If robot does not inquire the 4th voice, the second voice that robot issues before this according to the user The enquirement tendency of the user is obtained, the 5th voice corresponding with enquirement tendency is inquired in sound bank by robot, if robot The 5th voice is inquired, then robot plays the 5th voice;If robot does not inquire the 5th voice, robot with The 6th voice being pre-stored in sound bank is put in machine sowing;The response that final robot can make user to the enquirement of robot (as the 7th voice) is analyzed, and is made an appraisal based on the analysis results to the 7th voice.
Above-mentioned technical proposal the utility model has the advantages that may be implemented through this embodiment robot actively and user interaction, and It makes an appraisal to the response of user, so that user thinks hard.
In one embodiment, the pronunciation of aforementioned second voice and the pronunciation comparison process of the first voice include:
Robot obtains the pronunciation character of the second voice;
Robot obtains the corresponding mesh of pronunciation character of the second voice according to the pronunciation character of the first voice and the second voice Mark information;
Target voice corresponding with target information is searched in sound bank by robot, obtains the corresponding pronunciation of target voice and asks Topic collection;
Robot identifies the articulation problems of the second voice according to articulation problems collection;
A variety of articulation problems that robot can concentrate articulation problems score respectively, and the second voice of voice output Articulation problems.
The working principle of above-mentioned technical proposal: in the present embodiment, robot is by the pronunciation of the second voice and the first voice Feature obtains the corresponding target information of pronunciation character of the second voice, robot according to the corresponding target voice of the target information, The corresponding articulation problems collection of the target voice can be obtained, robot identifies that the pronunciation of the second voice is asked according to the articulation problems collection Topic, and score a variety of articulation problems of the second voice, and the articulation problems for the second voice that it is issued are exported to user.
Above-mentioned technical proposal the utility model has the advantages that can quickly and efficiently identify through this embodiment user issue the second language The articulation problems of sound facilitate the pronunciation of correcting user.
In one embodiment, articulation problems include front and back nasal sound problem, put down and stick up tongue problem and tone problem, wherein
Robot obtains the front and back nasal sound scoring of the second voice, the flat tongue that sticks up scores by the analysis to the second voice respectively It scores with tone;
If the scoring of front and back nasal sound, it is flat stick up tongue scoring and tone scoring one of them or it is multinomial score lower than preset standard, Robot prompt user issues the second voice again;
If the scoring of front and back nasal sound, it is flat stick up tongue scoring and tone scoring is above preset standard scoring, robot output the The pronunciation grade of two voices is outstanding.
The working principle of above-mentioned technical proposal: in the present embodiment, articulation problems include front and back nasal sound problem, flat stick up tongue and ask Topic and tone problem, for the articulation problems of the second voice, robot respectively from front and back nasal sound problem, flat stick up tongue problem and tone Problem is started with, and is scored respectively the second voice, preset standard scoring is above when three, then robot evaluates second language The pronunciation grade of sound be it is outstanding, otherwise robot prompt user issue the second voice again.
Above-mentioned technical proposal the utility model has the advantages that through this embodiment can from many aspects to user issue the second voice hair Mail topic is analyzed, and the particular problem that user understands its pronunciation is facilitated.
In one embodiment, the intonation of the second voice and the intonation comparison process of the first voice include:
Robot obtains the intonation feature of the second voice;
Robot obtains intonation similarity according to the intonation feature of the first voice and the second voice;
Intonation similarity between being defined on 100%~90% is top grade;
Intonation similarity between being defined on 90%~70% is middle rank;
It is rudimentary for being defined on 70% intonation similarity below;
Robot judges the intonation grade of the second voice for top grade, middle rank or rudimentary according to intonation similarity;
When the intonation of the second voice is determined as rudimentary, robot prompt user issues the second voice again.
The working principle of above-mentioned technical proposal: predefining in the present embodiment, in robot there are many intonation similarity, will The intonation similarity of the second voice got is compared with predefined a variety of intonation similarities, to judge the second voice Intonation grade, when robot determine the second voice intonation grade be it is rudimentary when, then robot prompt user issue again Second voice.
Above-mentioned technical proposal the utility model has the advantages that in the present embodiment, based on the first voice, passing through the language of the second voice Adjust the intonation similarity that second voice is calculated with the departure degree of the intonation of the first voice, and predefined intonation similarity etc. Grade, to accurately judge the intonation grade of the second voice.
In one embodiment, the word speed of the second voice and the word speed comparison process of the first voice include:
Robot detects the change point in the second voice, and the second voice is divided into multiple voice segments according to change o'clock;
Robot extracts energy envelope from multiple voice segments, and the local maximum point by finding out energy envelope determines Syllable number, and then determine the word speed of the second voice;
The word speed of first voice is standard word speed;
If the word speed of the second voice is in the preset range of standard word speed, robot determines the word speed etc. of the second voice Grade is standard, while exporting judgement result;
If the word speed of the second voice is not in the preset range of standard word speed, robot prompt user issues again Second voice.
In one embodiment, robot further includes a display screen, and display screen can show the articulation problems of the second voice, hair Sound grade, intonation grade, word speed grade or prompting user issue the prompt of the second voice again.
In one embodiment, robot further includes micro-camera device, for acquiring the figure in robot preset range As information, and image information is analyzed, determines the first enquirement theme;
Robot further includes audio collecting device, is carried out for acquiring the audio in robot preset range, and to audio Analysis, determines the second enquirement theme.
In one embodiment, robot puts question to theme, second to put question to theme or first put question to theme and the according to first Two, which put question to the combination of theme to determine, puts question to theme.
In one embodiment, include: by the voice switch that voice can trigger robot
Robot receives the phonetic order that user issues;
Phonetic order is pre-processed, pretreated phonetic order is obtained;
Pretreated phonetic order is analyzed, the first phonetic feature of pretreated phonetic order is obtained;
First phonetic feature is matched with the sound bank in robot, judges whether have in sound bank and the first voice Successful second phonetic feature of characteristic matching;
If so, the first control instruction corresponding with the second phonetic feature in voice inquirement library, and the first control instruction is sent out It send to voice switch;
If no, sending the first identification solicited message to the server of network side, server is and the first phonetic feature pair The region sound bank server answered, first identifies to include the first phonetic feature in solicited message;
The first response message that server returns is received, the first response message includes server for the first phonetic feature and takes Local voice library carries out the second control instruction obtained after successful match in business device, and the second control instruction is sent to voice and is opened It closes;
The first phonetic feature with local voice library carries out matching unsuccessful in server when, to server send second know Other solicited message, second identifies to include pretreated phonetic order in solicited message;
The second response message that server returns is received, the second response message includes that server refers to pretreated voice The corresponding third control instruction of speech recognition result after being sent to human translation end progress human translation is enabled, and third is controlled Instruction is sent to voice switch.
The working principle of above-mentioned technical proposal: the present embodiment mainly provide user by issue speech trigger voice switch from And start the process of robot, after robot pre-processes the phonetic order of the user received, obtain pretreated Phonetic order;Judge in sound bank with the presence or absence of the first phonetic feature with the pretreated phonetic order obtained by analysis Second phonetic feature of successful match, wherein
If robot judges that there are second phonetic features in sound bank, and it is matched with second phonetic feature to continue inquiry First control instruction, first control instruction can trigger voice switch;
If robot judges that, there is no second phonetic feature in sound bank, robot passes through network system to server The first identification solicited message is sent, specifically,
If the first identification solicited message can be successful with the local voice storehouse matching in server, then robot can receive Response (the first response message) to server to the first identification solicited message includes and the second voice is special in the first response message Corresponding second control instruction is levied, which can trigger voice switch;
If the local voice storehouse matching in the first identification solicited message and server fails, robot is sent to server Second identification solicited message (including pretreated phonetic order), pretreated phonetic order can be sent to by server Human translation end, then the corresponding third control instruction of speech recognition result after human translation can be back to machine by server People, the third control instruction can trigger voice switch.
Above-mentioned technical proposal the utility model has the advantages that the technical program can by sound bank be not present correspond to the first phonetic feature Second phonetic feature of (phonetic feature obtained after the phonetic order analysis that user issues), can make the first phonetic feature and clothes Local voice library in business device is matched again, will be pretreated when also it fails to match in the local voice library of server Phonetic order is sent directly to human translation end, so that robot obtains more accurate speech recognition, to trigger language in time Sound switch.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of voice interactive method based on robot characterized by comprising
Robot receives the speech trigger instruction of user's input, instructs starting robot according to the speech trigger;
Enter corresponding operating mode according to the operating mode selection instruction of user's input;
It is operated according to the corresponding operating process of corresponding operating mode.
2. the method as described in claim 1, which is characterized in that the operating mode includes with reading mode;
It is described to include: with reading mode
The robot actively plays the first voice;
The robot waits the response of user after playing first voice, and acquires the second voice of user's sending;
The robot analyzes second voice, and obtains the parameter attribute of second voice;
The robot compares the parameter attribute of second voice parameter attribute corresponding with first voice, and It is made an appraisal according to comparing result to second voice.
3. according to the method described in claim 2, it is characterized in that, the robot by the parameter attribute of second voice with The corresponding parameter attribute of first voice compares, comprising:
First voice is cut into j sections, after extract this j sections of n parameter index value respectively after judged using formula (1)
Average=S/n
(1)
Wherein btiAfter first voice is cut, the value of t sections of i-th of parameter index, aiFor the second voice I-th of parameter index value, average are final score, and n is parameter index number;S,StFor intermediate parameters, t=1,2,3 ... J, i=1,2,3 ... n finally judge whether average is greater than 0.9, if it is, determining that the parameter of second voice is special It levies parameter attribute corresponding with first voice and passes through comparison;
At this point, described made an appraisal according to comparing result to second voice includes: that the sending user is good with reading effect Evaluation.
4. the method according to claim 1, wherein the operating mode includes dialogue mode;
Each speech recognition result and its corresponding first identifier are stored in the robot;It is also stored in the robot Third voice and second identifier corresponding with the third voice;The robot includes sound bank;The first identifier and the There are corresponding relationships for two marks;
The dialogue mode includes:
Step A1, the described robot by the analysis of the second voice that issues of collected user, obtain the second voice Speech recognition result;
Step A2, it is right with the presence or absence of first identifier corresponding with the speech recognition result of the second voice in the sound bank to judge The second identifier answered:
If step A3, robot can inquire the speech recognition result corresponding first of second voice in the sound bank The corresponding second identifier of mark, then robot play second voice the corresponding first identifier institute of speech recognition result it is right Third voice corresponding to the second identifier answered;
If robot is not inquired in the sound bank corresponding to the corresponding first identifier of speech recognition result of the second voice Second identifier, then robot execute predetermined registration operation.
5. according to the method described in claim 4, it is characterized in that,
The robot executes predetermined registration operation, comprising: prompt user re-emits the second voice;
Or
The robot executes predetermined registration operation, comprising:
It is stored with a speech database in robot, the corresponding k finger of p different phonetic is included in the speech database Mark, the matrix Y of composition
Wherein bisFor s-th of finger target value of i-th voice in speech database, i=1,2,3 ... ..p s=1,2, 3,……k;This k index is extracted to the second voice that user issues, vector X is formed, vector X is increased to the first of Y matrix Row, obtains new matrix Z, and wherein Z is k column p+1 row, then available matrix Z
Wherein the value of the i-th column of Z is Zi=(z1i,z2i,z3i...z(p+1)iThe value of each column is all carried out standard with following formula by) ', Change,
zzis=zis/max(Zs)
Wherein, max (Zs) be matrix Z s arrange maximum value, zisFor the value of the s column of the i-th row of matrix Z, zzisFor standard Value after change after the value of column all for Z all standardizes, forms new matrix ZZ, then judges the 2nd row of new matrix ZZ to most A line correlation with the data of the first row respectively afterwards, is acquired using following formula:
Wherein, piThe relevance size of i-th voice, ZZ in the second voice and speech database issued for user1For matrix Z In all column value pass through zzis=zis/max(Zs) the first row of new matrix ZZ that is formed after processing, ZZi+1It is the i-th of matrix ZZ + 1 row, zz1tFor t-th of finger target value of the 1st row of new matrix ZZ;Desired value of the E (X) for X, i=1,2,3 ... p, s=1, 2,3 ... ..k, final choice piIn it is maximum one value, then extract the voice in speech database corresponding to the maximum value, obtain Speech recognition result corresponding to the voice extracted is taken, is known the speech recognition result as the voice of second voice Not as a result, and re-executeing the steps A2-A3.
6. the method according to claim 1, wherein the operating mode includes enquirement mode;The enquirement mould Formula includes:
The robot acquires audio and image information within the scope of its pre-determined distance, and carries out to the audio and image information Analysis obtains and puts question to theme;
The 4th voice corresponding with the enquirement theme is inquired according to the enquirement theme in sound bank by the robot,
If the robot inquires the 4th voice, robot plays the 4th voice;
If the robot does not inquire the 4th voice, the enquirement of robot user according to the second speech analysis is inclined to, Robot is inclined in sound bank according to the enquirement and inquires corresponding with the enquirement tendency the 5th voice, robot broadcasting the Five voices;
If robot does not inquire the 5th voice, robot shuffle is pre-stored in the 6th voice in the sound bank;
Defining user is the 7th voice to the response of the 4th voice, the 5th voice or the 6th voice;
The robot analyzes collected 7th voice, and makes commenting to the 7th voice based on the analysis results Valence.
7. according to the method described in claim 2, it is characterized in that, the robot by the parameter attribute of second voice with The corresponding parameter attribute of first voice compares, pronunciation and the pronunciation of first voice including second voice Comparison process, comprising:
The robot obtains the pronunciation character of the second voice;
The robot obtains the pronunciation character correspondence of second voice according to the pronunciation character of the first voice and the second voice Target information;
Target voice corresponding with the target information is searched in sound bank by the robot, and it is corresponding to obtain the target voice Articulation problems collection;
The robot identifies the articulation problems of second voice according to the articulation problems collection;
A variety of articulation problems that the robot can concentrate the articulation problems score respectively, and the described in voice output The articulation problems of two voices;
And/or
The robot compares the parameter attribute of second voice parameter attribute corresponding with first voice, and It is made an appraisal according to comparing result to second voice, comprising:
The articulation problems include front and back nasal sound problem, put down and stick up tongue problem and tone problem, wherein
The robot obtains the front and back nasal sound scoring of the second voice, the flat tongue that sticks up scores by the analysis to the second voice respectively It scores with tone;
If front and back nasal sound scoring, it is flat stick up tongue scoring and tone scoring one of them or it is multinomial score lower than preset standard, Robot prompt user issues the second voice again;
If front and back nasal sound scoring, it is flat stick up tongue scoring and tone scoring is above preset standard scoring, robot exports institute It is outstanding for stating the pronunciation grade of the second voice.
8. according to the method described in claim 2, it is characterized in that, the robot by the parameter attribute of second voice with The corresponding parameter attribute of first voice compares, and is made an appraisal according to comparing result to second voice, including The intonation comparison process of the intonation of second voice and first voice, comprising:
The robot obtains the intonation feature of the second voice;
The robot obtains intonation similarity according to the intonation feature of the first voice and the second voice;
Intonation similarity between being defined on 100%~90% is top grade;
Intonation similarity between being defined on 90%~70% is middle rank;
It is rudimentary for being defined on 70% intonation similarity below;
The robot judges the intonation grade of second voice for top grade, middle rank or rudimentary according to the intonation similarity;
When the intonation of second voice is determined as rudimentary, the robot prompt user issues the second voice again;
And/or
The robot compares the parameter attribute of second voice parameter attribute corresponding with first voice, and It is made an appraisal according to comparing result to second voice, the word speed of the word speed including the second voice and first voice compares Process, comprising:
The robot detects the change point in the second voice, and second voice is divided into multiple languages according to the change point Segment;
The robot extracts energy envelope, and the local maximum point by finding out energy envelope from the multiple voice segments It determines syllable number, and then determines the word speed of second voice;
The word speed of first voice is standard word speed;
If the word speed of second voice is in the preset range of the standard word speed, robot determines second voice Word speed grade be standard, while exporting the judgement result;
If the word speed of second voice is not in the preset range of the standard word speed, robot prompts user again Issue the second voice.
9. method described in -7 according to claim 1, which is characterized in that
The robot further includes a display screen, the display screen can show the articulation problems of second voice, pronunciation grade, Intonation grade, word speed grade or prompting user issue the prompt of the second voice again.
10. according to the method described in claim 6, it is characterized in that,
The robot further includes micro-camera device, for acquiring the image information in the robot preset range, and it is right Described image information is analyzed, and determines the first enquirement theme;
The robot further includes audio collecting device, for acquiring the audio in the robot preset range, and to described Audio is analyzed, and determines the second enquirement theme;
The robot puts question to theme, second that theme or the first enquirement theme and second is putd question to put question to theme according to described first Theme is putd question in conjunction with determining.
CN201910337225.9A 2019-04-25 2019-04-25 Voice interaction method based on robot Active CN110085226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910337225.9A CN110085226B (en) 2019-04-25 2019-04-25 Voice interaction method based on robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910337225.9A CN110085226B (en) 2019-04-25 2019-04-25 Voice interaction method based on robot

Publications (2)

Publication Number Publication Date
CN110085226A true CN110085226A (en) 2019-08-02
CN110085226B CN110085226B (en) 2021-05-11

Family

ID=67416642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910337225.9A Active CN110085226B (en) 2019-04-25 2019-04-25 Voice interaction method based on robot

Country Status (1)

Country Link
CN (1) CN110085226B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110808038A (en) * 2019-11-11 2020-02-18 腾讯科技(深圳)有限公司 Mandarin assessment method, device, equipment and storage medium
CN110808038B (en) * 2019-11-11 2024-05-31 腾讯科技(深圳)有限公司 Mandarin evaluating method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145283A (en) * 2006-09-12 2008-03-19 董明 Embedded type language teaching machine with pronunciation quality evaluation
CN104992705A (en) * 2015-05-20 2015-10-21 普强信息技术(北京)有限公司 English oral automatic grading method and system
US20160307464A1 (en) * 2009-09-04 2016-10-20 BrainPOP ESL LLC System and method for providing scalable educational content
CN106128457A (en) * 2016-08-29 2016-11-16 昆山邦泰汽车零部件制造有限公司 A kind of control method talking with robot
CN106297801A (en) * 2016-08-16 2017-01-04 北京云知声信息技术有限公司 Method of speech processing and device
CN107564521A (en) * 2017-09-10 2018-01-09 绵阳西真科技有限公司 A kind of guest-meeting robot voice interface method and system
CN206998936U (en) * 2017-05-16 2018-02-13 贵州侃奇智能技术有限公司 A kind of intelligent sound robot
CN108335543A (en) * 2018-03-20 2018-07-27 河南职业技术学院 A kind of English dialogue training learning system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145283A (en) * 2006-09-12 2008-03-19 董明 Embedded type language teaching machine with pronunciation quality evaluation
US20160307464A1 (en) * 2009-09-04 2016-10-20 BrainPOP ESL LLC System and method for providing scalable educational content
CN104992705A (en) * 2015-05-20 2015-10-21 普强信息技术(北京)有限公司 English oral automatic grading method and system
CN106297801A (en) * 2016-08-16 2017-01-04 北京云知声信息技术有限公司 Method of speech processing and device
CN106128457A (en) * 2016-08-29 2016-11-16 昆山邦泰汽车零部件制造有限公司 A kind of control method talking with robot
CN206998936U (en) * 2017-05-16 2018-02-13 贵州侃奇智能技术有限公司 A kind of intelligent sound robot
CN107564521A (en) * 2017-09-10 2018-01-09 绵阳西真科技有限公司 A kind of guest-meeting robot voice interface method and system
CN108335543A (en) * 2018-03-20 2018-07-27 河南职业技术学院 A kind of English dialogue training learning system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110808038A (en) * 2019-11-11 2020-02-18 腾讯科技(深圳)有限公司 Mandarin assessment method, device, equipment and storage medium
CN110808038B (en) * 2019-11-11 2024-05-31 腾讯科技(深圳)有限公司 Mandarin evaluating method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110085226B (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN109940627B (en) Man-machine interaction method and system for picture book reading robot
CN108075892B (en) Voice processing method, device and equipment
CN106205633B (en) It is a kind of to imitate, perform practice scoring system
CN106851216B (en) A kind of classroom behavior monitoring system and method based on face and speech recognition
CN107993665B (en) Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
CN106448288A (en) Interactive English learning system and method
CN102262634B (en) Automatic questioning and answering method and system
CN110413783B (en) Attention mechanism-based judicial text classification method and system
CN105989842B (en) The method, apparatus for comparing vocal print similarity and its application in digital entertainment VOD system
CN107203953A (en) It is a kind of based on internet, Expression Recognition and the tutoring system of speech recognition and its implementation
CN108563627B (en) Heuristic voice interaction method and device
CN108269133A (en) A kind of combination human bioequivalence and the intelligent advertisement push method and terminal of speech recognition
CN106485984A (en) A kind of intelligent tutoring method and apparatus of piano
CN106156799A (en) The object identification method of intelligent robot and device
CN111126280B (en) Gesture recognition fusion-based aphasia patient auxiliary rehabilitation training system and method
CN106340215A (en) Auxiliary musical instrument learning and experience system based on AR and adaptive recognition
CN108074440A (en) The error correction method and system of a kind of piano performance
CN110223678A (en) Audio recognition method and system
CN111178081B (en) Semantic recognition method, server, electronic device and computer storage medium
KR100995847B1 (en) Language training method and system based sound analysis on internet
CN113849627B (en) Training task generation method and device and computer storage medium
CN105244024B (en) A kind of audio recognition method and device
JP5723711B2 (en) Speech recognition apparatus and speech recognition program
CN113837907A (en) Man-machine interaction system and method for English teaching
CN110085226A (en) A kind of voice interactive method based on robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510000 unit on the fourth floor, 1st, 2nd, 3rd floor, west side, 1383-5, Guangzhou Avenue South, Haizhu District, Guangzhou City, Guangdong Province (office only)

Applicant after: GUANGZHOU ZIB ARTIFICIAL INTELLIGENCE TECHNOLOGY Co.,Ltd.

Address before: Room a, unit 1902, 374-2, Beijing Road, Yuexiu District, Guangzhou, Guangdong 510000

Applicant before: GUANGZHOU ZIB ARTIFICIAL INTELLIGENCE TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant