CN106649278B - Extend the method and system of spoken dialogue system corpus - Google Patents

Extend the method and system of spoken dialogue system corpus Download PDF

Info

Publication number
CN106649278B
CN106649278B CN201611255063.7A CN201611255063A CN106649278B CN 106649278 B CN106649278 B CN 106649278B CN 201611255063 A CN201611255063 A CN 201611255063A CN 106649278 B CN106649278 B CN 106649278B
Authority
CN
China
Prior art keywords
corpus
parsing
sentence
user
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611255063.7A
Other languages
Chinese (zh)
Other versions
CN106649278A (en
Inventor
周进华
崔计平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN201611255063.7A priority Critical patent/CN106649278B/en
Publication of CN106649278A publication Critical patent/CN106649278A/en
Application granted granted Critical
Publication of CN106649278B publication Critical patent/CN106649278B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Abstract

This application discloses a kind of methods for extending spoken dialogue system corpus, comprising: carries out secondary semantic parsing to the sentence that can not be parsed, obtains candidate parsing result;If user has selected candidate from candidate parsing result, then mapping ruler is generated between the selected candidate of the sentence and user, rule generated is added to the preprocessing rule library of user-specific, and candidate sentence selected by user and corresponding semantic information are stored in the corresponding exclusive secondary corpus of the user;When parsing user's sentence, main corpus is assisted to carry out semantic parsing in conjunction with the preprocessing rule library of user-specific, based on the rule that secondary corpus generates.Disclosed herein as well is a kind of corresponding systems.It can be improved the robustness of corpus using the application, reduce the cost of maintenance corpus, provide the error correction of spoken dialogue system, and enhance the availability of spoken dialogue system.

Description

Extend the method and system of spoken dialogue system corpus
Technical field
This application involves the expansion techniques of corpus in spoken dialogue system, in particular to extension spoken dialogue system corpus The method and system in library.
Background technique
Spoken dialogue system is a kind of computer system that can be talked with people by sound.Spoken dialogue system is dialogue One kind of system is compared with general text conversation system, has mainly had more speech recognition and voice synthetic module.
Conversational system (dialog system) is mainly by input identifier/decoder (input recognizer/ Decoder), natural language understanding unit (Natural Language Understanding unit), dialog manager (dialog manager), task manager (task managers), spatial term unit (Natural Language Generator unit) and output renderer (output renderer) composition.
The core of spoken dialogue system is natural language understanding unit, usually contains huge text corpus, and wrap Containing three main modulars: proper noun recognition, part-of-speech tagging and semantic parser (Semantic Parser).
After the text that speech recognition generates enters natural language understanding unit, by proper noun recognition and part-of-speech tagging, Analysis is carried out by semantic parser again and obtains semantic information, dialog manager is then given and determines semanteme, finally by natural language Generation unit responds or makes corresponding task arrangement by task manager.For Chinese speech recognition, semantic parser It also needs to carry out some processing before analysis is semantic, including Chinese word segmentation etc..
Therefore, spoken dialogue system needs to establish text corpus, wherein semantic analytic method is divided into rule-based approach With based on statistical method, present invention is primarily concerned with the semanteme parsings of rule-based approach.The corpus master of rule-based method It to include proper noun dictionary, common dictionary, participle library and most important rule base.
It is limited by current technology, the semantic parser of rule-based approach is often towards narrow field, a vocabulary Limited system, subject matter have tourism inquiry, ticket booking, database retrieval etc..After establishing subject matter, can targetedly it establish Corresponding dictionary and rule base.
Famous spoken language analyzing device such as Carnegie Mellon University (Carnegie Mellon University, CMU) Phoenix can establish individual rule model for certain field, all semantic networks are established in rule model and are established Dictionary, each node of semantic network are exactly word or other sub-networks, then carry out part analysis to input sentence by resolver, Network path corresponding to the fixed semanteme under the field is matched, one or more parsing results is obtained, finally takes again most Excellent parsing result.Such as voice seat reservation system, the dictionary of all words relevant to ticket booking is established, all possibility of booking tickets are established The corresponding semantic network of behavior can know that it is substantially right as long as obtaining user's sentence completely or partially matched network path The keyword of corresponding part in user's sentence is extracted in the ticket booking behavior answered, and obtains the required parameter of ticket booking behavior, system is just It can give a response.
It is existing most of spoken right in the case that the sentence of user can not parse or parsing result does not reach requirement Telephone system can return to such as " I can not understand your meaning " or the relevant reply of other semanteme.User day is collected on backstage simultaneously Corpus is analyzed by maintenance personnel and extended to will.
In practical applications, the problem of natural language understanding unit maximum is that, due to the complexity of language, system can not be contained All grammers are covered, identification mistake or the semantic user's sentence that can not be understood still often occur.Especially because dialect, various regions The difference of area's speech habits and user's habit brings many difficult points to the corpus foundation of spoken dialogue system, and often one very Huge corpus can not also parse correct all user's words.Therefore, even if speech recognition system can identify user institute The sentence said, into after natural language understanding unit, it is also possible to since semantic parser can not find corresponding rule to solve The sentence is analysed, it is not high so as to cause whole discrimination.Therefore, developer needs to take considerable time updating maintenance corpus Library just can guarantee high discrimination.
Summary of the invention
This application provides a kind of method and systems for extending spoken dialogue system corpus, to improve spoken dialogue system Discrimination.
The method of the extension spoken dialogue system corpus, comprising:
Secondary semantic parsing is carried out to the sentence that can not be parsed, obtains candidate parsing result;
If user has selected candidate from candidate parsing result, between the selected candidate of the sentence and user Mapping ruler is generated, rule generated is added to the preprocessing rule library of user-specific, and by candidate language selected by user Sentence and corresponding semantic information are stored in the corresponding exclusive secondary corpus of the user;
When parsing user's sentence, generated in conjunction with the preprocessing rule library of user-specific, based on secondary corpus regular auxiliary Main corpus is helped to carry out semantic parsing.
Preferably, it is described when parsing user's sentence, in conjunction with the preprocessing rule library of user-specific, based on secondary corpus life At rule assist main corpus to carry out semantic parsing including:
If being greater than the first threshold of setting based on the success rate that main corpus carries out semantic parsing, it is first based on subject material Library carries out semantic parsing, if it succeeds, responding user according to parsing result;If parsing failure, then based on secondary corpus into The parsing of row corpus;
If being less than the second threshold of setting based on the success rate that main corpus carries out semantic parsing, first based on secondary corpus Library carries out semantic parsing;If carrying out semantic parsing based on secondary corpus parsing failure, then based on main corpus.
The system of the extension spoken dialogue system corpus, comprising: semantic meaning analysis module, front end interactive module and secondary language Expect database management module, in which:
Semantic meaning analysis module is used for when parsing user's sentence, in conjunction with the preprocessing rule library of user-specific, based on secondary language Expect that the rule that library generates assists main corpus to carry out semantic parsing;When semantic meaning analysis module parses failure, secondary semanteme is carried out Parsing generates candidate parsing result and gives front end interactive module;
Front end interactive module, the candidate parsing result for generating semantic meaning analysis module feed back to user and interact, Include interactive interface and interaction process;
Secondary corpus database management module, for creating secondary corpus and the pretreatment of user-specific on the basis of main corpus Rule base, will be written secondary corpus from the received candidate sentence of front end interactive module and corresponding semantic information, and by user's language Preprocessing rule library is written in the mapping ruler of sentence to candidate sentence, and is responsible for based on the sentence generative semantics parsing in secondary corpus Rule.
Preferably, the semantic meaning analysis module carries out semantic parsing in the following way:
If being greater than the first threshold of setting based on the success rate that main corpus carries out semantic parsing, it is first based on subject material Library carries out semantic parsing, if it succeeds, responding user according to parsing result;If parsing failure, then based on secondary corpus into The parsing of row corpus;
If being less than the second threshold of setting based on the success rate that main corpus carries out semantic parsing, first based on secondary corpus Library carries out semantic parsing;If carrying out semantic parsing based on secondary corpus parsing failure, then based on main corpus.
As seen from the above technical solution, the application realizes corpus extension by adding secondary corpus by user.Specifically For, the present invention can generate it is following the utility model has the advantages that
1, corpus scaling problem is targetedly solved, the robustness of corpus is improved.
Rule in general corpus can parse most of sentence.But due to the generation of new sentence and new term, voice Identification mistake and what is generated do not know the reasons such as so-called word or sentence, dialect, individualized language habit, need to corpus into Row extension.And the method extends corpus by user's selection, it is wrong, single user for the new sentence of neologisms, dialect, identification Habit etc., realizes dynamic expansion, improves the robustness of corpus.
2, reduce the cost of maintenance corpus.
Secondary corpus is added, and a user only corresponds to a secondary corpus, does not have to change main corpus, uses secondary language Expect that library assists main corpus to complete parsing, and interacts by secondary semantic parsing and with user to extend corpus, rather than Corpus is extended by maintenance personnel dependent on traditional, reduces maintenance cost.
3, the error correction of spoken dialogue system is provided, the availability of spoken dialogue system is enhanced.
The sentence that the present invention will be unable to parsing is presented to the user, and user is not it is known that the sentence described in oneself can why Parsing, the sentence that on the one hand can be modified by the sentence described in user or on its basis are added to corpus, on the other hand It can abandon the availability that this result re-enters enhancing spoken dialogue system.
Detailed description of the invention
Fig. 1 is the system module figure of present invention extension spoken dialogue system corpus;
Fig. 2 is the method flow diagram of present invention extension spoken dialogue system corpus;
Fig. 3 is that the secondary corpus of the present invention and main corpus combine the flow chart for carrying out semantic parsing;
Fig. 4 is the work flow diagram for the spoken dialogue system that the present invention extends function with corpus.
Specific embodiment
It is right hereinafter, referring to the drawings and the embodiments, for the objects, technical solutions and advantages of the application are more clearly understood The application is described in further detail.
For the problems of prior art, the invention discloses a kind of extending out for former corpus in spoken language analyzing device The method for opening up corpus, this method is by collecting the user's sentence that can not be parsed, and parsing obtains possible correctly candidate knot again Fruit is met the candidate sentence of its intention in utterance by user's selection, is stored in exclusive secondary corpus (the i.e. present invention for user's creation In the corpus of the external expansion of former corpus), the main corpus (that is: former corpus) of expansion system enhances the strong of semantic parsing Strong property, and improve the correctness and coverage rate of corpus.When semantic meaning analysis module parses failure, present system is received Collect user's sentence, secondary semantic parsing is carried out to the sentence, is used on the basis of existing system and debases the standard and restrain threshold values Method filter out it is a certain amount of may correct candidate result, and feed back to user, such as: be shown in front end page. Front end page increases a feedback page simultaneously, shows candidate result list, selects one to meet oneself intention by user It is candidate;If not finding the candidate for meeting user's intention, this input can be abandoned, or modify wherein optimal candidate sentence Or it is semantic, so that it meets user's intention.If wherein some candidate has been selected, just in the sentence and the selected candidate of user Between generate mapping ruler, rule generated is added to the preprocessing rule library of user-specific;It simultaneously will be selected by user Candidate sentence and corresponding semantic information are inserted into secondary corpus.The preprocessing rule library of user-specific and secondary corpus will not influence The semantic parsing of other users.
When user first logs into, secondary corpus and the preprocessing rule library of user-specific are created.Secondary corpus manages mould Block is based on the sentence of user's addition in secondary corpus and semantic information generative semantics rule.Secondary corpus is a dynamic corpus Library is extended as user adds the progress of sentence, and effect is that the main corpus of auxiliary participates in semantic parsing.Based on secondary language Before expecting that library carries out semantic parsing to sentence, first with the regular former piece in statement matching preprocessing rule to be resolved library, if matching Success, then be substituted for consequent for sentence to be resolved, then carries out semantic parsing.
The specific embodiment of the method for the present invention and system is specifically introduced below in conjunction with attached drawing.
Fig. 1 is the system module figure of present invention extension spoken dialogue system corpus.The system comprises the following modules:
Semantic meaning analysis module: being used for when parsing user's sentence, in conjunction with the preprocessing rule library of user-specific, based on secondary language Expect that the rule that library generates assists main corpus to carry out semantic parsing;When semantic meaning analysis module parses failure, secondary semanteme is carried out Parsing generates candidate parsing result and gives front end interactive module;
Front end interactive module: the candidate parsing result for generating semantic meaning analysis module feeds back to user and interacts, Include interactive interface and interaction process;
Secondary corpus database management module: for creating secondary corpus and the pretreatment of user-specific on the basis of main corpus Rule base, will be written secondary corpus from the received candidate sentence of front end interactive module and corresponding semantic information, and by user's language Preprocessing rule library is written in the mapping ruler of sentence to candidate sentence.In addition to this, it is also responsible for raw based on the sentence in secondary corpus At semantic resolution rules.
Above three module is described in detail respectively below.
1, semantic meaning analysis module
User is firstly the need of login spoken dialogue system.It is spoken by client to phonetic incepting equipment, client will be used Family voice data passes to server end, after the speech recognition module of server end identifies user's sentence, parses mould into semanteme Block.
After carrying out the processing such as proper noun recognition, standardization, Chinese word segmentation to user's sentence, semantic meaning analysis module is to user Sentence is parsed.
In the case that user has logged in and there is secondary corpus, semantic meaning analysis module calls secondary corpus and pretreatment rule Then library assists main corpus to carry out semantic parsing.
The method for assisting main corpus to carry out semantic parsing using secondary corpus is as follows:
If it is relatively high based on the success rate that main corpus carries out semantic parsing, for example be more than or equal to 60%, then it is first based on Main corpus carries out semantic parsing, if it succeeds, responding user according to parsing result;If parsing failure, then based on secondary language Expect that library carries out corpus parsing;
Before carrying out sentence parsing based on secondary corpus, the rule in preprocessing rule library are first successively matched with sentence to be resolved Then former piece is then based on secondary corpus and carries out semantic parsing if it does, then being replaced sentence to be resolved with consequent;Such as All all it fails to match for regular former piece for fruit, then is directly based upon secondary corpus and carries out semantic parsing;
If responding user according to parsing result based on secondary corpus parsing sentence success;If parsing failure, right Sentence to be resolved carries out secondary parsing, i.e., is parsed based on wider loose constraint condition.
If it is not high based on the success rate that main corpus carries out semantic parsing, for example be lower than 50%, then first based on secondary corpus Library carries out semantic parsing;If then carrying out semantic parsing based on main corpus again based on secondary corpus parsing failure;If base In the direct successfully resolved of secondary corpus, then user is responded according to parsing result, carries out semanteme without being based on main corpus again Parsing;If all parsing failure based on secondary corpus and main corpus, secondary parsing is carried out to user's sentence, i.e., based on wider Loose constraint condition is parsed.If successfully resolved is completed, enters other systems and carry out subsequent processing;If parsing is lost It loses, not user's sentence in releasing memory, carries out secondary semantic parsing.
Secondary semantic parsing uses existing analytic method, only relaxes the condition of successfully resolved, reduces threshold values, obtain one Or multiple possible candidate results.Such as " that the train ticket to us ", it is assumed that system can identify voice but cannot parse Dialect " we ", then first time parsing result is semantic unknown to the sentence under normal circumstances.Secondary semantic parsing is using more Loose constraint condition, for example part is allowed to match, find the opposite optimal solution of this: " to that train ticket ", this when The semanteme of " searching train ticket " can be matched, which will be used as candidate result.But since lookup target is indefinite, the language The processing of justice will list file names with the train ticket in all railway stations as response result.Finally by candidate parsing result by server The semantic meaning analysis module at end passes to front end interactive module.If user is " Nanjing " people, user is selected " to the fire in Nanjing Ticket ".Secondary corpus database management module can be by candidate sentence " to the train ticket in Nanjing " and its semantic information, for example " searches train Ticket " is inserted into secondary corpus;And mapping ruler " to us that train ticket -> to the train ticket in Nanjing " is inserted into pretreatment Rule base.
2, front end interactive module
The candidate parsing result of one or more that secondary semantic parsing obtains is sent to front end interactive module, front end interaction Module with some form, such as: list is fed back to user's selection, while user can see the text of oneself spoken utterance Word is as a result, i.e. candidate sentence and corresponding semantic information.If user's selection is abandoned, this parsing result is not saved.Such as Fruit does not meet the candidate of user's intention, and user can modify user's sentence and/or semantic information to meet its intention;If with Family selection is one of semantic, user's selection " the searching train ticket " such as in upper example, then " that the train ticket to us " will It establishes and contacts with semantic " searching train ticket ".It uploads candidate sentence " to the train ticket in Nanjing " and semantic information arrives server end Secondary corpus database management module carries out subsequent processing.
3, secondary corpus database management module
After user logs in spoken dialogue system for the first time, server provides the secondary corpus of space creation user-specific and pre- Handle rule base.Secondary corpus is used to save the sentence and semantic information that user adds in interaction page, and preprocessing rule library For storing the mapping ruler for the candidate sentence that user's sentence is selected to user.In addition to this, it is also responsible for based in secondary corpus Sentence generative semantics resolution rules;The semanteme of the method for specific generative semantics rule and the form of expression of semantic rules and use Analytic method is related.
Fig. 4 is the work flow diagram for the spoken dialogue system that the present invention extends function with corpus, the specific steps of which are as follows:
Step 401, it is user interactive module, receives user speech, user speech is transmitted to speech recognition module;By two The candidate of secondary parsing module output is presented to the user inspection and selection;If not meeting the candidate of user's intention, allow user It modifies any candidate and complies with user's intention.
Step 402, speech reception module receives the language of user, and sends voice data to speech recognition module.
Step 403, speech recognition module is then passed to nature language by the speech recognition of user spoken utterances at text Speech understands cell processing, i.e., step 404 herein.
Step 404, semantic meaning analysis module assists main corpus to carry out semantic parsing, obtains user's meaning using secondary corpus Figure.Method particularly includes: if the success rate for carrying out semantic parsing based on main corpus is relatively high, for example is more than or equal to 60%, then Semantic parsing is first carried out based on main corpus, if it succeeds, responding user according to parsing result;If parsing failure, then base Corpus parsing is carried out in secondary corpus;Before carrying out sentence parsing based on secondary corpus, first successively matched with sentence to be resolved Regular former piece in preprocessing rule library is then based on secondary corpus if it does, then replacing sentence to be resolved with consequent Library carries out semantic parsing;If all it fails to match for rule former piece, it is directly based upon secondary corpus and carries out semantic parsing;If Based on the parsing sentence success of secondary corpus, then user is responded according to parsing result;If parsing failure, to sentence to be resolved Secondary parsing is carried out, i.e., is parsed based on wider loose constraint condition.If based on main corpus carry out semantic parsing at Power is not high, for example is lower than 50%, then first carries out semantic parsing based on secondary corpus;If based on secondary corpus parsing failure, Then semantic parsing is carried out based on main corpus again;If rung based on the direct successfully resolved of secondary corpus according to parsing result Using family, semantic parsing is carried out without being based on main corpus again;If all parsing mistake based on secondary corpus and main corpus It loses, then secondary parsing is carried out to user's sentence, i.e., parsed based on wider loose constraint condition.
Step 405, main corpus, secondary corpus and the preprocessing rule library that semantic parsing indirectly relies on;Based on secondary language Before expecting library parsing user's statement semantics, the regular former piece in preprocessing rule library is first matched successively with sentence to be resolved, if Match, then sentence to be resolved is replaced with consequent, then carries out semantic parsing;If not finding to match, directly to be resolved Sentence carries out semantic parsing.
Step 406, secondary parsing module, the more loose constraint condition of use is to semantic meaning analysis module solution in step 404 The sentence of analysis failure carries out secondary parsing, exports a certain number of candidates and is confirmed whether to meet its intention for user's selection;It It is not necessarily an individual module, can be combined into one with the semantic meaning analysis module in step 404, because they can be used Same analytic method, only secondary parsing module uses wider loose constraint condition;Certainly, if secondary parsing module and step The method that semantic meaning analysis module in rapid 404 uses is inconsistent, then needs it separately as a module.
Step 407, whether semantic parsing succeeds in judgment step 404, if the semantic successfully resolved of step 404, process Then go to the respond module of step 408;Otherwise, it goes to step 406 and carries out secondary semantic parsing.
Step 408, respond module gives user response appropriate according to the semantic information of acquisition, such as operation equipment, returns Query information etc.;If failing to obtain effective semantic information, with the information response user of similar " failing to understand user ".
Step 409, whether judgement shows the candidate of user and its semanteme correct, if correctly, selecting its to update secondary Corpus;Otherwise, it is candidate to inquire whether the user needs to modification, complies with user's intention.
Step 410, ask the user whether that modification is candidate.It can also abandon modifying, directly with the information of " failing to understand user " Respond user.
Step 411, secondary corpus database management module creates secondary corpus and the preprocessing rule library of user-specific.Wherein, secondary Corpus is used to save the sentence and semantic information that user add in interaction page, and preprocessing rule library is used to store user's sentence The mapping ruler of the candidate sentence selected to user.In addition to this, it is also responsible for based on the sentence generative semantics solution in secondary corpus Analysis rule;The method of specific generative semantics rule is related with the semantic analytic method of use.
Fig. 2 is the method flow diagram of present invention extension spoken dialogue system corpus, the specific steps of which are as follows:
Step 201: collecting the sentence that failure is currently parsed by semantic parsing.
Step 202: secondary parsing being carried out to the sentence, relaxes analysis condition, output threshold values is reduced, obtains one or more Candidate parsing result.
Step 203: it is secondary it is parsed after, a certain amount of candidate parsing result and corresponding semanteme are supplied to user's choosing It selects.
Step 204: judging whether there is the candidate candidate for meeting user's intention and selected if there is the candidate for meeting user's intention It selects it and updates secondary corpus and pretreatment corpus, execute step 205, otherwise, execute step 207.
Step 205: being selected according to user, the mapping ruler for the candidate sentence that user's sentence is selected to user is generated, by it It is appended to preprocessing rule library;And the candidate sentence of user's selection and corresponding semantic information are added to the secondary language of active user Expect library.
Step 206: " can not understand use according to obtained semantic generation user response or system operatio or response user are similar The information at family " etc.
Step 207: terminating this spoken dialog.
Fig. 3 is the flow chart that the secondary corpus of the present invention and main corpus combine the semantic parsing of progress, the specific steps of which are as follows:
Step 301: semantic meaning analysis module obtains the text of user's sentence by Chinese word segmentation and name Entity recognition.
Step 302: secondary corpus, preprocessing rule library and main corpus of the semantic meaning analysis module based on active user to Family sentence is parsed.
The method for assisting main corpus to carry out semantic parsing using secondary corpus is as follows:
If it is relatively high based on the success rate that main corpus carries out semantic parsing, for example be more than or equal to 60%, then it is first based on Main corpus carries out semantic parsing, if it succeeds, responding user according to parsing result;If parsing failure, then based on secondary language Expect that library carries out corpus parsing;
Before carrying out sentence parsing based on secondary corpus, the rule in preprocessing rule library are first successively matched with sentence to be resolved Then former piece is then based on secondary corpus and carries out semantic parsing if it does, then being replaced sentence to be resolved with consequent;Such as All all it fails to match for regular former piece for fruit, then is directly based upon secondary corpus and carries out semantic parsing;If based on secondary corpus parsing Sentence success, then respond user according to parsing result;If parsing failure carries out secondary parsing to sentence to be resolved, i.e., It is parsed based on wider loose constraint condition.
If it is not high based on the success rate that main corpus carries out semantic parsing, for example be lower than 50%, then first based on secondary corpus Library carries out semantic parsing;If then carrying out semantic parsing based on main corpus again based on secondary corpus parsing failure;If base In the direct successfully resolved of secondary corpus, then user is responded according to parsing result, carries out semanteme without being based on main corpus again Parsing;If all parsing failure based on secondary corpus and main corpus, secondary parsing is carried out to user's sentence, i.e., based on wider Loose constraint condition is parsed.
Step 303: judge whether parsing succeeds, if successfully resolved, executes step 304, it is no to then follow the steps 305.
Step 304: " can not understand use according to obtained semantic generation user response or system operatio or response user are similar The information at family " etc.
Step 305: into the process for extending spoken dialogue system corpus in Fig. 2.
The present invention realizes corpus extension by adding secondary corpus by user.Specifically, the present invention can generate It is following the utility model has the advantages that
1, corpus scaling problem is targetedly solved, the robustness of corpus is improved.
Rule in general corpus can parse most of sentence.But due to the generation of new sentence and new term, voice Identification mistake and what is generated do not know the reasons such as so-called word or sentence, dialect, individualized language habit, need to corpus into Row extension.And the method extends corpus by user's selection, it is wrong, single user for the new sentence of neologisms, dialect, identification Habit etc., realizes dynamic expansion, improves the robustness of corpus.
2, reduce the cost of maintenance corpus.
Secondary corpus is added, and a user only corresponds to a secondary corpus, does not have to change main corpus, uses secondary language Expect that library assists main corpus to complete parsing, and interacts by secondary semantic parsing and with user to extend corpus, rather than Corpus is extended by maintenance personnel dependent on traditional, reduces maintenance cost.
3, the error correction of spoken dialogue system is provided, the availability of spoken dialogue system is enhanced.
The sentence that the present invention will be unable to parsing is presented to the user, and user is not it is known that the sentence described in oneself can why Parsing, the sentence that on the one hand can be modified by the sentence described in user or on its basis are added to corpus, on the other hand It can abandon the availability that this result re-enters enhancing spoken dialogue system.
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims (10)

1. a kind of method for extending spoken dialogue system corpus characterized by comprising
Secondary semantic parsing is carried out to the sentence that can not be parsed, obtains candidate parsing result;The sentence that can not be parsed is knot It closes the preprocessing rule library of user-specific, all parse the sentence of failure based on secondary corpus and main corpus;
If user has selected candidate from candidate parsing result, generated between the selected candidate of the sentence and user Rule generated, is added to the preprocessing rule library of user-specific by mapping ruler, and by candidate sentence selected by user and Corresponding semantic information is stored in the corresponding exclusive secondary corpus of the user;
When parsing user's sentence, led in conjunction with the preprocessing rule library of user-specific, based on the rule auxiliary that secondary corpus generates Corpus carries out semantic parsing.
2. the method according to claim 1, wherein described when parsing user's sentence, in conjunction with user-specific Preprocessing rule library includes: based on the main semantic parsing of corpus progress of rule auxiliary that secondary corpus generates
If being greater than the first threshold of setting based on the success rate that main corpus carries out semantic parsing, first based on main corpus into The semantic parsing of row, if it succeeds, responding user according to parsing result;If parsing failure, then language would be carried out based on secondary corpus Expect library parsing;
If being less than the second threshold of setting based on the success rate that main corpus carries out semantic parsing, first based on secondary corpus into The semantic parsing of row;If carrying out semantic parsing based on secondary corpus parsing failure, then based on main corpus.
3. according to the method described in claim 2, it is characterized by:
If responding user according to parsing result based on secondary corpus successfully resolved;
If all parsing failure based on secondary corpus and main corpus, secondary parsing is carried out to sentence.
4. according to the method in claim 2 or 3, it is characterised in that:
Before carrying out sentence parsing based on secondary corpus, before the rule in preprocessing rule library is first successively matched with sentence to be resolved Part is then based on secondary corpus and carries out semantic parsing if it does, then being replaced sentence to be resolved with consequent;If complete All it fails to match for portion's rule former piece, then is directly based upon secondary corpus and carries out semantic parsing.
5. according to the method described in claim 3, it is characterized by:
It is described that secondary parsing is carried out to sentence are as follows: sentence is parsed based on wider loose constraint condition is parsed than the last time.
6. a kind of system for extending spoken dialogue system corpus characterized by comprising semantic meaning analysis module, front end interaction Module and secondary corpus database management module, in which:
Semantic meaning analysis module is used for when parsing user's sentence, in conjunction with the preprocessing rule library of user-specific, based on secondary corpus The rule of generation assists main corpus to carry out semantic parsing;When semantic meaning analysis module parses failure, secondary semantic parsing is carried out, It generates candidate parsing result and gives front end interactive module;
Front end interactive module, the candidate parsing result for generating semantic meaning analysis module feed back to user and interact, and include Interactive interface and interaction process;
Secondary corpus database management module, for creating the secondary corpus and preprocessing rule of user-specific on the basis of main corpus Library will be written secondary corpus from the received candidate sentence of front end interactive module and corresponding semantic information, and user's sentence will be arrived Preprocessing rule library is written in the mapping ruler of candidate sentence, and is responsible for parsing rule based on the sentence generative semantics in secondary corpus Then.
7. system according to claim 6, which is characterized in that the semantic meaning analysis module carries out semanteme in the following way Parsing:
If being greater than the first threshold of setting based on the success rate that main corpus carries out semantic parsing, first based on main corpus into The semantic parsing of row, if it succeeds, responding user according to parsing result;If parsing failure, then language would be carried out based on secondary corpus Expect library parsing;
If being less than the second threshold of setting based on the success rate that main corpus carries out semantic parsing, first based on secondary corpus into The semantic parsing of row;If carrying out semantic parsing based on secondary corpus parsing failure, then based on main corpus.
8. system according to claim 7, it is characterised in that:
If based on secondary corpus successfully resolved, semantic meaning analysis module responds user according to parsing result;
If all parsing failure based on secondary corpus and main corpus, semantic meaning analysis module carries out secondary parsing to sentence.
9. system according to claim 7 or 8, it is characterised in that:
Semantic meaning analysis module first successively matches preprocessing rule before carrying out sentence parsing based on secondary corpus with sentence to be resolved Regular former piece in library is then based on secondary corpus and carries out semanteme if it does, then being replaced sentence to be resolved with consequent Parsing;If all it fails to match for rule former piece, it is directly based upon secondary corpus and carries out semantic parsing.
10. system according to claim 8, it is characterised in that:
It is described that secondary parsing is carried out to sentence are as follows: sentence is parsed based on wider loose constraint condition is parsed than the last time.
CN201611255063.7A 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus Active CN106649278B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611255063.7A CN106649278B (en) 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611255063.7A CN106649278B (en) 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus

Publications (2)

Publication Number Publication Date
CN106649278A CN106649278A (en) 2017-05-10
CN106649278B true CN106649278B (en) 2019-11-15

Family

ID=58837348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611255063.7A Active CN106649278B (en) 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus

Country Status (1)

Country Link
CN (1) CN106649278B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107240398B (en) * 2017-07-04 2020-11-17 科大讯飞股份有限公司 Intelligent voice interaction method and device
CN109753976B (en) * 2017-11-01 2021-03-19 中国电信股份有限公司 Corpus labeling device and method
CN110032740A (en) * 2019-04-20 2019-07-19 卢劲松 It customizes individual character semanteme and learns application method
CN110942765B (en) * 2019-11-11 2022-05-27 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994967A (en) * 1988-01-12 1991-02-19 Hitachi, Ltd. Information retrieval system with means for analyzing undefined words in a natural language inquiry
US7412440B2 (en) * 2003-12-05 2008-08-12 International Business Machines Corporation Information search system, information search supporting system, and method and program for information search
CN102663016A (en) * 2012-03-21 2012-09-12 上海汉翔信息技术有限公司 System and method for implementing input information extension on input candidate box on electronic device
CN103246641A (en) * 2013-05-16 2013-08-14 李营 Text semantic information analyzing system and method
CN105786793A (en) * 2015-12-23 2016-07-20 百度在线网络技术(北京)有限公司 Method and device for analyzing semanteme of spoken language text information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994967A (en) * 1988-01-12 1991-02-19 Hitachi, Ltd. Information retrieval system with means for analyzing undefined words in a natural language inquiry
US7412440B2 (en) * 2003-12-05 2008-08-12 International Business Machines Corporation Information search system, information search supporting system, and method and program for information search
CN102663016A (en) * 2012-03-21 2012-09-12 上海汉翔信息技术有限公司 System and method for implementing input information extension on input candidate box on electronic device
CN103246641A (en) * 2013-05-16 2013-08-14 李营 Text semantic information analyzing system and method
CN105786793A (en) * 2015-12-23 2016-07-20 百度在线网络技术(北京)有限公司 Method and device for analyzing semanteme of spoken language text information

Also Published As

Publication number Publication date
CN106649278A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
JP7346609B2 (en) Systems and methods for performing semantic exploration using natural language understanding (NLU) frameworks
US8874443B2 (en) System and method for generating natural language phrases from user utterances in dialog systems
CN107665706B (en) Rapid voice interaction method and system
JP6675463B2 (en) Bidirectional stochastic rewriting and selection of natural language
CN106649278B (en) Extend the method and system of spoken dialogue system corpus
KR101768509B1 (en) On-line voice translation method and device
WO2018034118A1 (en) Dialog system and computer program therefor
US6963831B1 (en) Including statistical NLU models within a statistical parser
CN109642843A (en) Paraphrase is used when receiving language in automating assistant
JP2005084681A (en) Method and system for semantic language modeling and reliability measurement
JP2009193448A (en) Dialog system, method, and program
KR20070102267A (en) Dialog management system, and method of managing dialog using example-based dialog modeling technique
JP2015219583A (en) Topic determination device, utterance device, method, and program
DE602004004310T2 (en) System with combined statistical and rule-based grammar model for speech recognition and understanding
TW201701270A (en) A language interaction method
CN109766556B (en) Corpus restoration method and device
CN104485106B (en) Audio recognition method, speech recognition system and speech recognition apparatus
CN114218375A (en) Dialogue guiding method, device, equipment and medium based on atlas
Kohonen et al. Phonetic typewriter for Finnish and Japanese
US20120096028A1 (en) Information retrieving apparatus, information retrieving method, information retrieving program, and recording medium on which information retrieving program is recorded
Hakkani-Tür et al. Bootstrapping domain detection using query click logs for new domains
Tsiakoulis et al. Dialogue context sensitive HMM-based speech synthesis
JP5158022B2 (en) Dialog processing device, dialog processing method, and dialog processing program
KR102358485B1 (en) Dialogue system by automatic domain classfication
CN109800430B (en) Semantic understanding method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant