CN110265019B

CN110265019B - Voice recognition method and voice robot system

Info

Publication number: CN110265019B
Application number: CN201910595687.0A
Authority: CN
Inventors: 贺君
Original assignee: Zhongtong Zhixin Wuhan Technology Research And Development Co ltd
Current assignee: Icsoc Beijing Communication Technology Co ltd
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2021-04-06
Anticipated expiration: 2039-07-03
Also published as: CN110265019A

Abstract

The invention provides a voice recognition method and a voice robot system, wherein the method mainly comprises the following steps: s1: collecting voice information of a user; s2: converting the collected information into character information; s3: judging whether the converted character information passes the first grammar check, if so, forming an instruction by taking the converted character information as identification information, and performing step S5, otherwise, performing step S4; s4: correcting the converted character information, forming an instruction by using the corrected character information as identification information, and performing step S5; s5: making feedback according to the identification information; according to the voice recognition method provided by the invention, the collected information is converted into the characters and then the grammar check is carried out, and then the character information which does not pass the grammar check is corrected, so that the accuracy rate of voice recognition is obviously improved.

Description

Voice recognition method and voice robot system

Technical Field

The invention relates to the technical field of voice robots, in particular to a voice recognition method and a voice robot system.

Background

Although the existing voice recognition technology is mature day by day, in some fields, the accuracy of voice recognition is very low, the requirements of users cannot be satisfied, and due to different accents of the users, the expressed voices are also very different, which increases certain difficulty for voice recognition.

Disclosure of Invention

In order to solve the technical problem, the invention provides a voice recognition method and a voice robot system.

The specific technical scheme of the invention is as follows:

a method for speech recognition, the method mainly includes the following steps:

s1: collecting voice information of a user;

s2: converting the collected information into character information;

s3: judging whether the converted character information passes the first grammar check, if so, taking the converted character information as identification information, and performing step S5, otherwise, performing step S4;

s4: correcting the converted character information, using the corrected character information as identification information, and performing step S5;

s5: feedback is made based on the identification information.

The invention also provides a voice robot system which mainly comprises an information acquisition module, a conversion module, a first grammar checking module, a character information correction module and a feedback module;

the information acquisition module: the voice information acquisition module is used for acquiring voice information;

a conversion module: the system is used for converting the collected information into character information;

the first syntax checking module: the system is used for judging whether the converted character information passes the first grammar check, if so, the converted character information is used as identification information, and an instruction is sent to the feedback module; if not, sending an instruction to the character information correction module;

the character information correction module: the system is used for correcting the converted character information, taking the corrected character information as identification information and sending an instruction to the feedback module;

a feedback module: for executing instructions.

According to the voice recognition method provided by the invention, the collected information is converted into the characters and then the grammar check is carried out, and then the character information which does not pass the grammar check is corrected, so that the accuracy rate of voice recognition is obviously improved.

Drawings

FIG. 1 is a flowchart of a speech recognition method according to embodiment 1;

FIG. 2 is a flowchart of a speech recognition method according to embodiment 1;

FIG. 3 is a flowchart of a speech recognition method according to embodiment 1;

FIG. 4 is a flowchart of a speech recognition method according to embodiment 2;

FIG. 5 is a flowchart of a speech recognition method according to embodiment 2;

FIG. 6 is a flowchart of a method of speech recognition according to embodiment 2;

FIG. 7 is a block diagram showing the structure of the speech robot according to embodiment 5;

FIG. 8 is a block diagram showing the structure of a text message correction module according to embodiment 5;

FIG. 9 is a block diagram showing the structure of the semantic/syntactic correcting module according to embodiment 5;

FIG. 10 is a block diagram showing the structure of the speech robot according to embodiment 6;

FIG. 11 is a block diagram showing the structure of the semantic/syntactic correcting module according to embodiment 6;

FIG. 12 is a block diagram showing the structure of the Pinyin transform module according to embodiment 6;

FIG. 13. binary search tree of embodiment 4.

Detailed Description

The invention is further described with reference to the following figures and examples, which are provided for the purpose of illustrating the general inventive concept and are not intended to limit the scope of the invention.

Example 1

The invention provides a speech recognition method, as shown in fig. 1, the method comprises the following steps:

s1: collecting voice information of a user;

s2: converting the collected information into character information;

s5: making feedback according to the identification information;

as shown in fig. 2, step S4 further includes the following steps:

s41: dividing the character information into a plurality of fields, judging whether the semanteme of each field is clear, if so, setting the field as a clear semanteme field, and performing step S42, otherwise, setting the field as a disordered semanteme field, and performing step S43;

s42: judging whether the current sorting of the semantic clear fields passes the second grammar check, if so, setting the fields as identification information, and performing step S5, if not, setting the fields as grammar confusion fields, and performing step S43;

s43: performing semantic correction and/or grammar correction on the semantic confusion field or the grammar confusion field, and performing step S41;

as shown in fig. 3, step S43 semantically corrects the semantic confusion field by:

s431: respectively constructing a mandarin language database and a dialect language database;

s432: converting the semantic confusion field into pinyin, searching homophone fields in a mandarin language database, if the homophone fields are found, performing step S433, and if the homophone fields are not found, performing step S434;

s433: all homophonic fields are substituted for the original field and are brought into the converted character information, the number X of homophonic fields which pass the second grammar check is judged, if X is 0, the step S434 is carried out, if X is 1, the homophonic fields conforming to the grammar are substituted for the original field to be used as new character information, and if X is more than 1, the step S435 is carried out;

s434: searching homophonic fields in a dialect corpus, if found, performing step S433, and if not, performing identification through pinyin transformation;

s435: judging the association degree of a plurality of homophonic fields and an application scene, and replacing the original field with the homophonic field with the maximum association degree with the application scene as new character information;

step S43 is to perform syntax rectification on the syntax confusion field by:

s436: the fields are reordered to pass the third syntax check.

For example, a microphone is used to collect voice information of a user, and the converted text is "how many money and one jin? Your cross ", at this time, the result of the judgment in the step S3 is that the grammar check is not passed (" cross "is wrong), at this time, the step S41 is performed, and the converted characters are divided into the following fields according to the grammar: "how much money", "jin", "your' S", "cross", the judgment result is "how much money" and "jin" are semantically clear fields, the "cross" is a semantically disordered field, step S43 is performed to semantically correct the "cross", S432 converts the semantically disordered field "cross" into pinyin "shizi", homophonic fields such as "real", "lion", "persimmon", "teacher", "stone", "literacy", "formula", "louse", "shizi", and the like are searched in the mandarin corpus constructed in S431, step S433 brings the homophonic fields replacing the original field "cross" into the converted characters, judges that homophonic fields passing the grammar check have "real", "lion", "persimmon", "teacher", "stone", "formula", "louse", and "louse", judges that the number X of homophonic fields passing the second grammar check is larger than 1, step S435 is performed, the homophonic field with the highest correlation degree with the application scene is judged to replace the original field as identification information, if the current application scene is a fruit online shop, the persimmon is used for replacing the cross as the identification information, and step S42 is performed again: judging whether the current sequencing of the identification information ' jin ' S, your ' S and persimmons ' is consistent with the grammar, if so, performing step S44, reordering the fields to pass the third grammar check, namely ' jin ' S of persimmons ' S ', then performing step S42, if so, setting the field ' jin ' S of persimmons ' as the identification information, and performing step S5 to execute instructions, including but not limited to ' the price of persimmons is 3.5 yuan/jin '.

The mandarin corpus and the dialect corpus at least store the following information, and the grammar checking method is the prior art: pinyin and character information corresponding to the pinyin; the field splitting method, namely the word segmentation method is the prior art and can be realized through a word segmentation device; the speech recognition method provided by the embodiment adopts grammar check to screen text information of speech conversion, corrects text contents which do not pass grammar check, makes semantic fuzzy fields clear, can adjust grammar on the basis, realizes double-layer adjustment, and obviously improves the accuracy of speech recognition.

Example 2

The method for speech recognition of the present embodiment differs from embodiment 1 in that, as shown in fig. 4, the method further includes steps S6 and S7;

s6: calculating the waiting time after feedback, comparing the waiting time with the waiting time threshold, when the waiting time is more than or equal to the waiting time threshold, carrying out satisfaction survey on the user, and whether the execution result of the survey content which is an instruction is consistent with the voice conversation, when the feedback of the terminal is consistent, carrying out step S7, and when the feedback of the terminal is inconsistent, not carrying out processing;

s7: constructing a learning storage library, wherein the learning storage library is used for correcting the converted character information;

as shown in fig. 5, step S430 is further included between step S431 and step S432;

s430: and converting the semantic confusion field or the grammar confusion field into pinyin, searching homophone fields in a learning storage library, if the homophone fields are found, performing step S433, and if the homophone fields are not found, performing step S432.

As shown in fig. 6, the pinyin transformation in step S434 specifically includes:

s4341: replacing part of contents in the pinyin, wherein the replacing contents comprise one or more of "n" and "l", "h" and "f", "z" and "zh", "c" and "ch", "s" and "sh", "eng" and "en", "an" and "ang", "in" and "ong", "un" and "ing";

s4342: and (3) replacing the original pinyin content with the replaced pinyin content, searching homophone fields in a mandarin corpus and/or a dialect prediction library and/or a learning repository, if the homophone fields are found, performing step S433, and if the homophone fields are not found, using the character information converted in step S2 as identification information.

For example, voice information of a user is collected, the converted text is "zepto at nothing", at this time, the result determined in step S3 is that syntax check is not passed, at this time, step S41 is performed, and the converted text is divided into the following fields according to syntax: "zepto", "negotiable speech", "good", "split" and "do", the judgment result is that "zege", "negotiable speech" and "good" in the above fields are semantic confusion fields, step S43 is performed, the semantic confusion fields are converted into pinyin "zege", "suiguotan" and "haoci", homophonic fields which pass grammar check are found in the mandarin corpus and the dialect corpus, the pinyin transformation of step S434 is performed, the semantic correction is performed on the semantic confusion fields, the pinyin after the transformation is "zhege", "shuigutang" and "haochi", the homophonic fields after the replacement are searched for the homophonic fields as "this", "fructoses", "good", and step S433 is performed, the homophonic fields are substituted into the characters after the conversion, and the homophonic fields which pass the grammar check are judged to be the grammar. "this", "fruit candy" and "good taste" are processed in step S434, and "how this fruit candy is good taste" is used as the identification information, and then step S5 is processed to make specific feedback according to the identification information, wherein the feedback content includes but is not limited to "good taste".

And step S6, calculating the waiting time after feedback, when the waiting time is more than or equal to the waiting time threshold (such as 1min), performing satisfaction survey on the user, and the information fed back by the user through the terminal is that whether the survey content is an instruction execution result and the voice conversation are consistent, at the moment, updating the voice information of the user, the pinyin corresponding to the voice and the corresponding identification information into a learning storage base, when the voice information is collected again, searching for a homophone field in the learning storage base, and then finding out the identification information by adopting the step S433.

The embodiment enables the voice recognition method provided by the invention to have learning capacity by arranging the learning repository, and through continuously enriching the learning repository, after the voice information of the user is collected, the homophonic field is directly searched in the learning repository, so that the character information with the same accent can be quickly found, the grammar and semantic correction processes are not needed, the step of searching the homophonic field in a mandarin language database and a dialect language database is also omitted, and the speed of voice recognition is obviously improved.

Example 3

The method for speech recognition of this embodiment is different from embodiment 1 in that step S2 further includes step S10 after converting the collected information into text information: dividing character information into a plurality of fields, converting each field into second pinyin information, performing fuzzy matching on a professional lexicon and each second pinyin information, counting the number of n professional lexicons matched with the second pinyin information, calculating the number of the professional lexicons with the highest matching rate, performing step S3 when the matching rate of all the professional lexicons is 0, performing step S3 when the number of the professional lexicons with the highest matching rate is 1, replacing original fields with fields corresponding to the pinyin information identified this time in the professional lexicon with the highest matching rate as identification information, randomly selecting the fields corresponding to the pinyin information identified this time in 1 of the professional lexicons as identification information when the number of the professional lexicons with the highest matching rate is more than 1, and performing step S5;

step S5 includes step S20 after the feedback is made according to the identification information: judging the feedback times K1, and when K1 is equal to the feedback time threshold Kn, calculating the times P of successful matching of the n professional word banks, and sequencing the times P as follows: pn1 and Pn2 … Pnm preferentially match the characters converted by the voice information in the order of professional word databases corresponding to Pn1 and Pn2 … Pnm when identifying at K1+ n times, replace the original field with the field matched firstly as identification information, and perform step S5, if the characters do not match with the n professional word databases, perform step S3.

The construction of the professional word stock is the prior art, and comprises but is not limited to a computer professional word stock, a medical professional word stock, a pharmaceutical professional word stock and a mechanical professional word stock; the invention can identify some professional words and phrases by setting the professional lexicon, improve the identification accuracy, for example, when the collected voice information is the cold meaning of deep well net, the character information is converted into second pinyin information 'shen king long luo de hanyi', the second pinyin information is matched with third pinyin information corresponding to the professional lexicon, wherein 'shen king' is respectively matched with 'nerve' in the medical professional lexicon, 'nerve' in the computer professional lexicon and 'deep well' in the mine field, no matched field exists with other professional lexicons, the 'wang luo' is matched with 'network' in the computer professional lexicon, no matched field exists with other professional lexicons, because the professional lexicon with the highest matching rate is 1 (the computer professional lexicon), the 'nerve' and the 'network' are used as the substitute field of the 'net', meanwhile, the professional word banks which are most matched with the voice content of the user can be screened out by sequencing the successful matching times P of the n professional word banks, and the recognition speed is further improved.

Example 4

The present embodiment provides a voice robot system, which is different from embodiment 1 in that after converting the collected information into text information in step S2, the method further includes step S10: dividing the character information into a plurality of fields, converting each field into second pinyin information, performing fuzzy matching on the professional lexicon corresponding to each second pinyin information, counting the number of n professional lexicons matched with the second pinyin information, calculating the number of the professional lexicons with the highest matching rate, and when the matching rate of all the professional lexicons is 0, step S3 is performed, when the number of the professional lexicon with the highest matching rate is 1, the original field is replaced by the field corresponding to the pinyin identification information in the professional lexicon with the highest matching rate, when the number of the professional lexicon with the highest matching rate is more than 1, randomly selecting a field corresponding to the pinyin identification information in 1 of the professional lexicons to replace an original field as identification information, extracting associated sub-lexicons from all the professional lexicons with the highest matching rate, and putting the associated sub-lexicons into a temporary lexicon; and proceeds to step S5;

step S5 includes step S20 after the feedback is made according to the identification information: judging the feedback times K2, preferentially matching the characters converted by the voice information with the temporary database when K2 is equal to the feedback time threshold Kn, performing step S3 when the number of successful matching is 0, replacing the original field with the field as identification information when the number of successful matching is 1, performing step S5, and randomly selecting one field to replace the original field as identification information when the number of successful matching is more than 1, and performing step S5.

The construction of the professional lexicon is the prior art, and includes, but is not limited to, a computer-based professional lexicon, a medical-based professional lexicon, a pharmaceutical-based professional lexicon, and a mechanical-based professional lexicon, and the "association sub-library" described in this embodiment, in which all the association sub-libraries in the professional lexicon with the highest matching rate are placed in the temporary database, is defined as follows: the association sub-library refers to keywords located in the uplink and downlink of the field in the database, for example, in the binary search tree data structure of fig. 13, if the field is 8, then all the uplink and downlink 1, 3, 12, and 13 of 8 are placed in the temporary professional word library, and of course, other forms of databases are also applicable; according to the invention, by setting the professional lexicon, the keywords related to the conversation can be collected to form the temporary lexicon after the robot conversation is carried out for a plurality of times, and the matching rate of the temporary lexicon and the voice information of the user is obviously higher than that of other professional lexicons, so that the pinyin information converted from the voice information is preferentially matched with the temporary lexicon after the robot conversation is carried out for a plurality of times, and the matching efficiency can be obviously improved.

Example 5

The embodiment provides a voice robot system, as shown in fig. 7, the voice robot system includes an information acquisition module 1, a conversion module 2, a first grammar checking module 3, a text information correction module 4, and a feedback module 5;

the information acquisition module 1: the voice information acquisition module is used for acquiring voice information;

the conversion module 2: the system is used for converting the collected information into character information;

the first syntax checking module 3: the system is used for judging whether the converted character information passes the first grammar check, if so, the converted character information is used as identification information and sends an instruction to the feedback module 5; if not, sending an instruction to the character information correction module 4;

the character information correction module 4: the system is used for correcting the converted character information, taking the corrected character information as identification information and sending an instruction to the feedback module 5;

the feedback module 5: for executing instructions;

as shown in fig. 8, the text information correction module 4 further includes a semantic clearness determination module 41, a second syntax checking module 42, and a semantic/syntax correction module 43;

the semantic clarity judging module 41: the system is used for dividing the text information into a plurality of fields, judging whether the semanteme of each field is clear or not, if so, setting the field as a semanteme clear field and sending an instruction to the second grammar checking module 42, and if not, setting the field as a semanteme chaotic field and sending an instruction to the semanteme/grammar correcting module 43;

the second syntax checking module 42: the semantic/grammar correction module 43 is used for judging whether the current sorting of the semantic clear fields passes the second grammar check, if so, setting the fields as identification information and sending instructions to the feedback module 5, and if not, setting the fields as grammar confusion fields and sending instructions to the semantic/grammar correction module 43;

semantic/grammar rectification module 43: the semantic and/or syntactic correcting module is used for performing semantic correction and/or syntactic correction on the semantic chaotic field or the syntactic chaotic field and sending an instruction to the semantic clear judging module 41;

as shown in fig. 9, the semantic/grammar correcting module 43 further includes a corpus constructing module 431, a pinyin converting module 432, a first homophonic field judging module 433, a second homophonic field judging module 434, a scene association degree judging module 435, a grammar correcting module 436 and a pinyin transforming module 437;

corpus construction module 431: the method is used for constructing a mandarin language database and a dialect language database;

pinyin conversion module 432: the system is used for converting the semantic confusion field into pinyin, searching homophone fields in a mandarin language database, if the homophone fields are found, sending an instruction to the first homophone field judging module 433, and if the homophone fields are not found, sending an instruction to the second homophone field judging module 434;

the first homophonic field judging module 433: the system is used for bringing all homophonic fields into the converted character information by replacing original fields, judging the number X of homophonic fields which pass the second grammar check, if X is 0, sending an instruction to a second homophonic field judgment module 434, if X is 1, replacing the original fields with homophonic fields conforming to the grammar to be used as new character information, and if X is more than 1, sending an instruction to a scene association degree judgment module 435;

the second homophonic field determining module 434: searching homophonic fields in a dialect corpus, if found, sending an instruction to a first homophonic field judgment module 433, and if not found, sending an instruction to a pinyin transformation module 437;

scene association degree judging module 435: judging the association degree of a plurality of homophonic fields and an application scene, and replacing the original field with the homophonic field with the maximum association degree with the application scene as new character information;

grammar rectification module 436: reordering the fields until the fields pass through the third syntax check;

the pinyin transformation module 437: and identifying through pinyin transformation.

The voice robot provided by the embodiment can screen the text information of voice conversion by arranging the modules, correct the text content which does not pass grammar check, make the semantic fuzzy field clear, adjust grammar on the basis, perform double-layer adjustment, and remarkably improve the accuracy of voice recognition.

Example 6

The present embodiment provides a voice robot system, which is different from embodiment 5 in that, as shown in fig. 10, the voice robot system further includes a satisfaction survey module 6 and a learning repository construction module 7;

satisfaction survey module 6: the system is used for calculating the waiting time after feedback, comparing the waiting time with the waiting time threshold, carrying out satisfaction investigation on a user when the waiting time is larger than or equal to the waiting time threshold, investigating whether the execution result of the instruction is consistent with the voice conversation or not, sending the instruction to the learning repository construction module 7 when the feedback of the terminal is consistent, and not processing when the feedback of the terminal is inconsistent;

learning repository construction module 7: the system is used for constructing a learning storage library which is used for correcting the converted character information;

as shown in fig. 11, the semantic/grammar correcting module 43 further includes a third homophonic field judging module 438;

third homophonic field determination module 438: the module is used for converting the semantic confusion field or the grammar confusion field into pinyin, searching homophone fields in a learning storage library, if the homophone fields are found, sending an instruction to the first homophone field judging module 433, and if the homophone fields are not found, sending an instruction to the pinyin conversion module 432.

As shown in fig. 12, the pinyin transformation module 437 further includes a pinyin content replacement module 4371 and a fourth homophone field determination module 4372;

pinyin content replacement module 4371: the method is used for replacing part of contents in pinyin, and the replacement contents comprise one or more of "n" and "l", "h" and "f", "z" and "zh", "c" and "ch", "s" and "sh", "eng" and "en", "an" and "ang", "in" and "ong", "un" and "ing";

fourth homophonic field determination module 4372: the system is used for searching homophone fields in a mandarin language database and/or a dialect prediction database and/or a learning storage library by replacing the original pinyin content with the replaced pinyin content, if the homophone fields are found, sending an instruction to the first homophone field judging module 433, and if the homophone fields are not found, using character information converted by the converting module 2 as identification information.

The voice robot system provided by the embodiment has the learning capacity by arranging the learning repository, and through continuously enriching the learning repository, after the voice information of a user is collected, homophonic fields are directly searched in the learning repository, so that the character information with the same accent can be quickly found, the grammar and semantic correction processes are not needed, the step of searching homophonic fields in a mandarin language database and a dialect language database is omitted, and the speed of voice recognition is obviously improved.

Claims

1. A method for speech recognition, the method mainly comprising the steps of:

s1: collecting voice information of a user;

s2: converting the collected information into character information;

s5: making feedback according to the identification information;

step S4 further includes the steps of:

s41: dividing the character information into a plurality of fields, judging whether the semanteme of each field is clear, if so, setting the field as a semanteme clear field, and performing step S42, otherwise, setting the field as a semanteme chaotic field, and performing step S43;

wherein the step S43 semantically corrects the semantic confusion field by:

s432: converting the semantic confusion field into pinyin, searching homophone fields in the mandarin corpus, if the homophone fields are found, performing step S433, and if the homophone fields are not found, performing step S434;

s434: searching homophonic fields in the dialect corpus, if found, performing step S433, and if not, performing identification through pinyin transformation;

step S43 is to perform syntax rectification on the syntax confusion field by:

s436: the fields are reordered to pass the third syntax check.

2. The method of speech recognition according to claim 1, wherein the method further comprises steps S6 and S7;

step S430 is also included between step S431 and step S432;

3. The method of speech recognition according to claim 2, wherein the pinyin transformation of step S434 specifically includes:

s4342: and (3) replacing the original pinyin content with the replaced pinyin content, searching homophone fields in the mandarin corpus and/or the dialect corpus and/or the learning repository, if the homophone fields are found, performing step (S433), and if the homophone fields are not found, using the character information converted in step (S2) as identification information.

4. The method of speech recognition according to claim 1, wherein the step S2, after converting the collected information into text information, further comprises the step S10: dividing character information into a plurality of fields, converting each field into second pinyin information, performing fuzzy matching on a professional lexicon and each second pinyin information, counting the number of n professional lexicons matched with the second pinyin information, calculating the number of the professional lexicons with the highest matching rate, performing step S3 when the matching rate of all the professional lexicons is 0, performing step S3 when the number of the professional lexicons with the highest matching rate is 1, replacing original fields with fields corresponding to the pinyin information identified this time in the professional lexicon with the highest matching rate as identification information, randomly selecting the fields corresponding to the pinyin information identified this time in 1 of the professional lexicons as identification information when the number of the professional lexicons with the highest matching rate is more than 1, and performing step S5;

step S5 includes step S20 after the feedback is made according to the identification information: judging the feedback times K, calculating the times P of successful matching of the n professional word banks when the K is equal to the feedback time threshold Kn, and sequencing the times P as follows: pn1 and Pn2 … Pnm, when identifying at the K + n time, matching the characters converted from the voice information preferentially in the order of the professional word database corresponding to Pn1 and Pn2 … Pnm, replacing the original field with the first matched field as the identification information, and performing step S5, if the matching is not achieved, performing step S3.

5. A voice robot system is characterized by mainly comprising an information acquisition module (1), a conversion module (2), a first grammar checking module (3), a character information correction module (4) and a feedback module (5);

the information acquisition module (1): the voice information acquisition module is used for acquiring voice information;

the conversion module (2): the system is used for converting the collected information into character information;

the first syntax checking module (3): the feedback module is used for judging whether the converted character information passes the first grammar check, if so, the converted character information is used as identification information, and an instruction is sent to the feedback module (5); if the character information does not pass the correction module, sending an instruction to the character information correction module (4);

the character information correction module (4): the system is used for correcting the converted character information, taking the corrected character information as identification information and sending an instruction to the feedback module (5);

the feedback module (5): for making feedback based on the identification information;

the character information correction module (4) further comprises a semantic clear judgment module (41), a second grammar checking module (42) and a semantic/grammar correction module (43);

the semantic clarity judging module (41): the system is used for dividing the text information into a plurality of fields, judging whether the semanteme of each field is clear or not, if so, setting the field as a semanteme clear field, and sending an instruction to the second grammar checking module (42), and if not, setting the field as a semanteme chaotic field, and sending an instruction to the semanteme/grammar correcting module (43);

the second syntax checking module (42): the system is used for judging whether the current sorting of the semantic clear fields passes the second grammar check, if so, the fields are set as identification information and an instruction is sent to the feedback module (5), and if not, the fields are set as grammar confusion fields and an instruction is sent to the semantic/grammar correction module (43);

the semantic/syntactic correcting module (43): the semantic confusion module is used for performing semantic correction and/or syntax correction on the semantic confusion field or the syntax confusion field and sending an instruction to the semantic clearness judgment module (41).

6. The voice robot system of claim 5, wherein the semantic/grammar correcting module (43) further comprises a corpus constructing module (431), a pinyin converting module (432), a first homophonic field judging module (433), a second homophonic field judging module (434), a scene association judging module (435), a grammar correcting module (436), and a pinyin transforming module (437);

the corpus construction module (431): the method is used for constructing a mandarin language database and a dialect language database;

the pinyin conversion module (432): the system is used for converting semantic confusion fields into pinyin, searching homophone fields in the mandarin corpus, if the homophone fields are found, sending an instruction to the first homophone field judgment module (433), and if the homophone fields are not found, sending an instruction to the second homophone field judgment module (434);

the first homophonic field judgment module (433): the system is used for bringing all homophonic fields into converted character information by replacing original fields, judging the number X of the homophonic fields checked through the second grammar, if X is 0, sending an instruction to a second homophonic field judgment module (434), if X is 1, replacing the original fields with the homophonic fields conforming to the grammar to be used as new character information, and if X is more than 1, sending an instruction to a scene relevance judgment module (435);

the second homophonic field determination module (434): searching homophone fields in the dialect corpus, if the homophone fields are found, sending an instruction to the first homophone field judgment module (433), and if the homophone fields are not found, sending an instruction to the pinyin transformation module (437);

the scene association degree judging module (435): judging the association degree of a plurality of homophonic fields and an application scene, and replacing the original field with the homophonic field with the maximum association degree with the application scene as new character information;

the grammar rectification module (436): reordering the fields until the fields pass through the third syntax check;

the pinyin transformation module (437): and identifying through pinyin transformation.

7. The voice robot system according to claim 6, further comprising a satisfaction survey module (6) and a learning repository construction module (7);

the satisfaction survey module (6): the system is used for calculating the waiting time after feedback, comparing the waiting time with the waiting time threshold, carrying out satisfaction investigation on a user when the waiting time is larger than or equal to the waiting time threshold, investigating whether the execution result of the instruction is consistent with the voice conversation or not, sending the instruction to the learning repository construction module (7) when the feedback of the terminal is consistent, and not processing when the feedback of the terminal is inconsistent;

the learning repository construction module (7): the system is used for constructing a learning storage library, and the learning storage library is used for correcting the converted character information;

said semantic/syntactic correcting module (43) further comprises a third homophonic field judging module (438);

the third homophonic field determination module (438): the method is used for converting the semantic confusion field or the grammar confusion field into pinyin, searching homophone fields in a learning storage library, if the homophone fields are found, sending an instruction to the first homophone field judging module (433), and if the homophone fields are not found, sending an instruction to the pinyin conversion module (432).

8. The voice robot system of claim 7, wherein the pinyin transformation module (437) further includes a pinyin content replacement module (4371) and a fourth homophonic field judgment module (4372);

the pinyin content replacement module (4371): the pinyin is used for replacing part of contents in the pinyin, and the replacing contents comprise one or more of "n" and "l", "h" and "f", "z" and "zh", "c" and "ch", "s" and "sh", "eng" and "en", "an" and "ang", "in" and "ong", "un" and "ing";

-said fourth homophonic field judging module (4372): the system is used for searching homophone fields in the Mandarin language database and/or the dialect prediction database and/or the learning storage library by replacing the original pinyin content with the replaced pinyin content, if the homophone fields are found, an instruction is sent to the first homophone field judgment module (433), and if the homophone fields are not found, the character information converted by the conversion module (2) is used as identification information.