CN106649278A - Method and system for extending spoken language dialogue system corpora - Google Patents

Method and system for extending spoken language dialogue system corpora Download PDF

Info

Publication number
CN106649278A
CN106649278A CN201611255063.7A CN201611255063A CN106649278A CN 106649278 A CN106649278 A CN 106649278A CN 201611255063 A CN201611255063 A CN 201611255063A CN 106649278 A CN106649278 A CN 106649278A
Authority
CN
China
Prior art keywords
corpus
parsing
user
sentence
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611255063.7A
Other languages
Chinese (zh)
Other versions
CN106649278B (en
Inventor
周进华
崔计平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics China R&D Center
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics China R&D Center, Samsung Electronics Co Ltd filed Critical Samsung Electronics China R&D Center
Priority to CN201611255063.7A priority Critical patent/CN106649278B/en
Publication of CN106649278A publication Critical patent/CN106649278A/en
Application granted granted Critical
Publication of CN106649278B publication Critical patent/CN106649278B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for extending spoken language dialogue system corpora. The method includes the steps that secondary semantic analysis is conducted on a sentence which cannot be analyzed to obtain a candidate analysis result; if a user selects a candidate from the candidate analysis result, a mapping rule is formed between the sentence and the candidate selected by the user, the generated rule is added into a user exclusive preprocessing rule library, and the candidate sentence selected by the user and corresponding semantic information are stored in an exclusive auxiliary corpus corresponding to the user; when the user sentence is analyzed, semantic analysis is conducted by means of the user exclusive preprocessing rule library and a rule-assisted main corpus generated on the basis of the auxiliary corpus. The invention further discloses a corresponding system. By means of the method and system, robustness of the corpora can be improved, the cost for maintaining the corpora is reduced, a spoken language system correction function is provided, and the usability of the spoken language system is enhanced.

Description

The method and system of extension spoken dialogue system corpus
Technical field
The application is related to the expansion technique of corpus in spoken dialogue system, more particularly to extends spoken dialogue system language material The method and system in storehouse.
Background technology
Spoken dialogue system is a kind of computer system that can be talked with by sound with people.Spoken dialogue system is dialogue One kind of system, compares with general text conversation system, has mainly had more speech recognition and voice synthetic module.
Conversational system (dialog system) is main by input evaluator/decoder (input recognizer/ Decoder), natural language understanding unit (Natural Language Understanding unit), dialog manager (dialog manager), task manager (task managers), spatial term unit (Natural Language Generator unit) and output renderer (output renderer) composition.
The core of spoken dialogue system is natural language understanding unit, usually contains huge text corpus, and is wrapped Containing three main modulars:Proper noun recognition, part-of-speech tagging and semantic parser (Semantic Parser).
The text that speech recognition is produced is entered after natural language understanding unit, through proper noun recognition and part-of-speech tagging, Again acquisition semantic information is analyzed by semantic parser, then gives dialog manager and determine semanteme, finally by natural language Signal generating unit is responded or makes corresponding task arrangement by task manager.For Chinese speech recognition, semantic parser Some process of advance row semantic in analysis are also needed to, including Chinese word segmentation etc..
Therefore, spoken dialogue system needs to set up text corpus, wherein semantic analytic method is divided into rule-based approach With based on statistical method, present invention is primarily concerned with the semantic parsing of rule-based approach.The corpus master of rule-based method To include proper noun dictionary, common dictionary, participle storehouse, and most important rule base.
Limited by current technology, the semantic parser of rule-based approach is often towards narrow field, a vocabulary quantity Limited system, its subject matter has tourism inquiry, ticket booking, database retrieval etc..After establishing subject matter, can targetedly set up Corresponding dictionary and rule base.
Famous spoken language analyzing device such as Carnegie Mellon University (Carnegie Mellon University, CMU) Phoenix, can set up single rule model for certain field, all semantic networks are set up in rule model and is set up Dictionary, each node of semantic network is exactly word or other sub-networks, then carries out part analysis to being input into sentence by resolver, The network path corresponding to the fixed semanteme under the field is matched, one or more analysis results are drawn, is finally taken again most Excellent analysis result.Such as voice seat reservation system, sets up the dictionary of all words related to ticket booking, sets up ticket booking and is possible to The corresponding semantic network of behavior, as long as obtaining the network path that user's sentence is completely or partially matched, just can know that it is substantially right The ticket booking behavior answered, is extracted the keyword of corresponding part in user's sentence, obtains the desired parameters of ticket booking behavior, and system is just Can give a response.
It is existing most of spoken right when the sentence of user cannot be parsed or in the case that analysis result is not reaching to require Telephone system can return such as " I cannot understand your meaning " or other semantic related reply.Simultaneously user's day is collected on backstage Will, is analyzed by attendant and is extended corpus.
In actual applications, the maximum problem of natural language understanding unit is that, due to the complexity of language, system cannot be contained All grammers are covered, user's sentence still Jing that identification mistake or semanteme cannot understand often occurs.Especially because dialect, various places The difference of area's language convention and user's custom, sets up to the corpus of spoken dialogue system and brings many difficult points, and often one very Huge corpus cannot also parse correct all user's words.Therefore, even if speech recognition system can be with identifying user institute The sentence said, into after natural language understanding unit, it is also possible to cannot find corresponding rule to solve due to semantic parser The sentence is analysed, so as to cause overall discrimination not high.Therefore, developer needs to take considerable time updating maintenance language material Storehouse, just can guarantee that high discrimination.
The content of the invention
This application provides a kind of method and system of extension spoken dialogue system corpus, to improve spoken dialogue system Discrimination.
The method of the extension spoken dialogue system corpus, including:
Sentence to parsing carries out secondary semantic parsing, obtains candidate's analysis result;
If user have selected candidate from candidate's analysis result, between the sentence and the selected candidate of user Mapping ruler is generated, the rule for being generated is added to into the preprocessing rule storehouse of user-specific, and by the candidate's language selected by user Sentence and corresponding semantic information are stored in the corresponding exclusive secondary corpus of the user;
When user's sentence is parsed, preprocessing rule storehouse with reference to user-specific, based on secondary corpus generate it is regular auxiliary Main corpus is helped to carry out semantic parsing.
It is preferred that described when user's sentence is parsed, preprocessing rule storehouse with reference to user-specific, given birth to based on secondary corpus Into rule aid in main corpus to carry out semantic parsing including:
The first threshold of the success rate more than setting of semantic parsing is carried out if based on main corpus, is then first based on subject material Storehouse carries out semantic parsing, if it succeeds, responding user according to analysis result;If parsing failure, then entered based on secondary corpus Row corpus is parsed;
The Second Threshold of the success rate less than setting of semantic parsing is carried out if based on main corpus, then first based on secondary language material Storehouse carries out semantic parsing;Fail if based on the parsing of secondary corpus, then semantic parsing is carried out based on main corpus.
The system of the extension spoken dialogue system corpus, including:Semantic meaning analysis module, front end interactive module and secondary language Material database management module, wherein:
Semantic meaning analysis module, for when user's sentence is parsed, preprocessing rule storehouse with reference to user-specific, based on secondary language The rule that material storehouse generates aids in main corpus to carry out semantic parsing;In semantic meaning analysis module parsing failure, secondary semanteme is carried out Parsing, generates candidate's analysis result and gives front end interactive module;
Front end interactive module, the candidate's analysis result for semantic meaning analysis module to be generated feeds back to user and interacts, Comprising interactive interface and interaction process;
Secondary language material database management module, for creating secondary corpus and the pretreatment of user-specific on the basis of main corpus Rule base, by the candidate's sentence received from front end interactive module and the secondary corpus of corresponding semantic information write, and by user's language Sentence to the mapping ruler of candidate's sentence writes preprocessing rule storehouse, and is responsible for based on the sentence generative semantics parsing in secondary corpus Rule.
It is preferred that the semantic meaning analysis module carries out in such a way semantic parsing:
The first threshold of the success rate more than setting of semantic parsing is carried out if based on main corpus, is then first based on subject material Storehouse carries out semantic parsing, if it succeeds, responding user according to analysis result;If parsing failure, then entered based on secondary corpus Row corpus is parsed;
The Second Threshold of the success rate less than setting of semantic parsing is carried out if based on main corpus, then first based on secondary language material Storehouse carries out semantic parsing;Fail if based on the parsing of secondary corpus, then semantic parsing is carried out based on main corpus.
As seen from the above technical solution, the application realizes corpus extension by adding secondary corpus by user.Specifically For, the present invention can produce following beneficial effect:
1st, corpus scaling problem is targetedly solved, improves the vigorousness of corpus.
Rule in general corpus can parse most of sentence.But due to the generation of new sentence and new term, voice Identification mistake and produce do not know the reasons such as so-called word or sentence, dialect, individualized language custom, need to enter corpus Row extension.It is wrong, unique user for the new sentence of neologisms, dialect, identification and the method extends corpus by user's selection Custom etc., realizes dynamic expansion, improves the vigorousness of corpus.
2nd, the cost for safeguarding corpus is reduced.
Secondary corpus is added, and a user only corresponds to a secondary corpus, without changing main corpus, using secondary language Material storehouse aids in main corpus to complete parsing, and is parsed by secondary semanteme and extend corpus with user mutual, rather than Depend on it is traditional corpus is extended by attendant, reduce maintenance cost.
The 3rd, the error correction of spoken dialogue system is provided, strengthens the availability of spoken dialogue system.
The sentence that the present invention will be unable to parse presents to user, and user is it is known that the sentence described in oneself cannot why Parsing, on the one hand can be added to corpus by the sentence described in user or the sentence changed on its basis, on the other hand This result can be abandoned and re-enter the availability for strengthening spoken dialogue system.
Description of the drawings
Fig. 1 is the system module figure of present invention extension spoken dialogue system corpus;
Fig. 2 is the method flow diagram of present invention extension spoken dialogue system corpus;
Fig. 3 is that pair corpus of the invention and main corpus combine the flow chart for carrying out semantic parsing;
Fig. 4 is the workflow diagram of the spoken dialogue system with corpus expanded function of the invention.
Specific embodiment
Purpose, technical scheme and advantage to make the application becomes more apparent, and develop simultaneously referring to the drawings embodiment, right The application is described in further detail.
For the problems of prior art, the invention discloses a kind of former corpus in spoken language analyzing device is extended out The method of exhibition corpus, by collecting the user's sentence that cannot be parsed, parsing again draws possible correct candidate's knot to the method Really, selected to meet candidate's sentence of its intention in utterance by user, be stored in exclusive secondary corpus (the i.e. present invention created for user In the corpus of the external expansion of former corpus), extend the main corpus of system (i.e.:Former corpus), strengthen the strong of semantic parsing Strong property, and improve the correctness and coverage rate of corpus.When semantic meaning analysis module parsing failure, present system is received Collection user's sentence, secondary semantic parsing is carried out to the sentence, is adopted on the basis of existing system and is debased the standard and restrain threshold values Method filter out it is a certain amount of may correct candidate result, and feed back to user, for example:It is displayed in front end page. Simultaneously front end page increases a feedback page, presents candidate result list, selects one to meet oneself intention by user Candidate;If not finding the candidate for meeting user view, this input, or candidate's sentence that modification is wherein optimum can be abandoned Or it is semantic, so that it meets user view.If selected for wherein certain candidate, just in the sentence and the selected candidate of user Between generate mapping ruler, the rule for being generated is added to into the preprocessing rule storehouse of user-specific;Simultaneously by selected by user Candidate's sentence and the secondary corpus of corresponding semantic information insertion.The preprocessing rule storehouse of user-specific and secondary corpus are not interfered with The semantic parsing of other users.
When user first logs into, secondary corpus and the preprocessing rule storehouse of user-specific is created.Secondary language material library management mould Sentence and semantic information generative semantics rule of the block based on user's addition in secondary corpus.Secondary corpus is a dynamic language material Storehouse, it extends as user adds the carrying out of sentence, and its effect is that the main corpus of auxiliary participates in semantic parsing.Based on secondary language Material storehouse is carried out before semantic parsing to sentence, first with the regular former piece in statement matching preprocessing rule storehouse to be resolved, if matching Success, then be substituted for consequent by sentence to be resolved, then carries out semantic parsing.
The specific embodiment of the inventive method and system is specifically introduced below in conjunction with accompanying drawing.
Fig. 1 is the system module figure of present invention extension spoken dialogue system corpus.The system is included with lower module:
Semantic meaning analysis module:For when user's sentence is parsed, preprocessing rule storehouse with reference to user-specific, based on secondary language The rule that material storehouse generates aids in main corpus to carry out semantic parsing;In semantic meaning analysis module parsing failure, secondary semanteme is carried out Parsing, generates candidate's analysis result and gives front end interactive module;
Front end interactive module:Candidate's analysis result for semantic meaning analysis module to be generated feeds back to user and interacts, Comprising interactive interface and interaction process;
Secondary language material database management module:For creating secondary corpus and the pretreatment of user-specific on the basis of main corpus Rule base, by the candidate's sentence received from front end interactive module and the secondary corpus of corresponding semantic information write, and by user's language Mapping ruler write preprocessing rule storehouse of the sentence to candidate's sentence.In addition, it is also responsible for based on the sentence life in secondary corpus Into semantic resolution rules.
Above three module is described in detail respectively below.
1st, semantic meaning analysis module
User is firstly the need of login spoken dialogue system.Spoken to phonetic incepting equipment by client, client will be used Family speech data passes to server end, and the sound identification module of server end is identified after user's sentence, into semantic parsing mould Block.
After the process such as proper noun recognition, standardization, Chinese word segmentation is carried out to user's sentence, semantic meaning analysis module is to user Sentence is parsed.
In the case that user has logged in and there is secondary corpus, semantic meaning analysis module calls secondary corpus and pretreatment rule Then storehouse aids in main corpus to carry out semantic parsing.
Aid in the method that main corpus carries out semantic parsing as follows using secondary corpus:
The success rate for carrying out semantic parsing if based on main corpus is higher, such as more than or equal to 60%, be then first based on Main corpus carries out semantic parsing, if it succeeds, responding user according to analysis result;If parsing failure, then based on secondary language Material storehouse carries out corpus parsing;
Before line statement parsing is entered based on secondary corpus, the rule in preprocessing rule storehouse are first matched successively with sentence to be resolved Then former piece, if it does, then replacing sentence to be resolved with consequent, being then based on secondary corpus carries out semantic parsing;Such as All it fails to match for the whole rule former pieces of fruit, then being directly based upon secondary corpus carries out semantic parsing;
Sentence success is parsed if based on secondary corpus, then user is responded according to analysis result;If parsing failure, right Sentence to be resolved carries out secondary parsing, i.e., parsed based on wider loose constraint condition.
The success rate for carrying out semantic parsing if based on main corpus is not high, such as less than 50%, then first based on secondary language material Storehouse carries out semantic parsing;Fail if based on the parsing of secondary corpus, then again semantic parsing carried out based on main corpus;If base In the direct successfully resolved of secondary corpus, then user is responded according to analysis result, without based on main corpus carrying out semanteme again Parsing;Failure is all parsed if based on secondary corpus and main corpus, then secondary parsing is carried out to user's sentence, i.e., based on wider Loose constraint condition is parsed.If successfully resolved is completed, into other systems subsequent treatment is carried out;If parsing is lost Lose, not the user's sentence in releasing memory, carry out secondary semantic parsing.
Secondary semantic parsing adopts existing analytic method, simply relaxes the condition of successfully resolved, reduces threshold values, draws one Or multiple possible candidate results.For example " to our that train tickets ", it is assumed that but system can recognize voice can not parse Dialect " we ", then in general first time analysis result is semantic not clear to the sentence.Secondary semantic parsing is using more Loose constraints, such as allow part to match, and finds this relative optimal solution:" to that train ticket ", this when The semanteme of " lookup train ticket " can be matched, the semanteme will be used as candidate result.But due to search target it is indefinite, the language The process of justice will list file names with the train ticket in all railway stations as response result.Finally by candidate's analysis result by server The semantic meaning analysis module at end passes to front end interactive module.If user is " Nanjing " people, then user is selected " to the fire in Nanjing Ticket ".Candidate's sentence " to the train ticket in Nanjing " and its semantic information such as " can be searched train by secondary language material database management module Ticket ", is inserted into secondary corpus;And mapping ruler " to our that train tickets->To the train ticket in Nanjing " it is inserted into pretreatment Rule base.
2nd, front end interactive module
One or more candidate's analysis results that secondary semantic parsing draws are sent to front end interactive module, front end interaction Module with some form, such as:List, is fed back to user's selection, while user can see the text of oneself spoken utterance Word result, i.e. candidate's sentence and corresponding semantic information.If user selects to abandon, this analysis result is not preserved.Such as Fruit does not meet the candidate of user view, and user can change user's sentence and/or semantic information to meet its intention;If with Family selects one of which semantic, and user selects " lookup train ticket " such as in upper example, then " to our that train tickets " will Contact is set up with semantic " lookup train ticket ".Candidate's sentence " to the train ticket in Nanjing " and semantic information are uploaded to server end Secondary language material database management module carries out subsequent treatment.
3rd, secondary language material database management module
User is logged in for the first time after spoken dialogue system, and server provides the secondary corpus that space creates user-specific and pre- Process rule base.Secondary corpus is used to preserve sentence and the semantic information that user adds in interaction page, and preprocessing rule storehouse For storing the mapping ruler of candidate's sentence that user's sentence is selected to user.In addition, it is also responsible for based in secondary corpus Sentence generative semantics resolution rules;The concrete method of generative semantics rule and the form of expression of semantic rule and the semanteme for adopting Analytic method is relevant.
Fig. 4 is the workflow diagram of the spoken dialogue system with corpus expanded function of the invention, and it is comprised the following steps that:
Step 401, is user interactive module, its receive user voice, and user speech is passed to into sound identification module;By two The candidate of secondary parsing module output presents to user and checks and select;If not meeting the candidate of user view, it is allowed to user Change any candidate and comply with user view.
Step 402, speech reception module, the language of receive user, and send speech data to sound identification module.
Step 403, sound identification module, by the speech recognition of user spoken utterances into word, is then passed to nature language Speech understand cell processing, i.e., herein the step of 404.
Step 404, semantic meaning analysis module aids in main corpus to carry out semantic parsing using secondary corpus, obtains user's meaning Figure.Concrete grammar is:The success rate for carrying out semantic parsing if based on main corpus is higher, such as more than or equal to 60%, then First semantic parsing is carried out based on main corpus, if it succeeds, responding user according to analysis result;If parsing failure, then base Corpus parsing is carried out in secondary corpus;Before line statement parsing is entered based on secondary corpus, first matched successively with sentence to be resolved Regular former piece in preprocessing rule storehouse, if it does, then replacing sentence to be resolved with consequent, is then based on secondary language material Storehouse carries out semantic parsing;If all it fails to match for rule former piece, being directly based upon secondary corpus carries out semantic parsing;If Sentence success is parsed based on secondary corpus, then user is responded according to analysis result;If parsing failure, to sentence to be resolved Secondary parsing is carried out, i.e., is parsed based on wider loose constraint condition.If based on main corpus carry out semantic parsing into Power is not high, such as less than 50%, then being first based on secondary corpus carries out semantic parsing;Fail if based on the parsing of secondary corpus, Then again semantic parsing carried out based on main corpus;If based on the direct successfully resolved of secondary corpus, then rung according to analysis result Using family, without based on main corpus carrying out semantic parsing again;Mistake is all parsed if based on secondary corpus and main corpus Lose, then secondary parsing is carried out to user's sentence, i.e., parsed based on wider loose constraint condition.
Step 405, main corpus, secondary corpus and preprocessing rule storehouse that semanteme parsing is indirectly relied on;Based on secondary language Before material storehouse parsing user's statement semantics, the regular former piece in preprocessing rule storehouse is first matched successively with sentence to be resolved, if Match somebody with somebody, then sentence to be resolved is replaced with consequent, then carry out semantic parsing;If it was found that matching, directly to be resolved Sentence carries out semantic parsing.
Step 406, secondary parsing module, using more loose constraints to semantic meaning analysis module solution in step 404 The sentence of analysis failure carries out secondary parsing, and a number of candidate of its output selects to be confirmed whether to meet its intention for user;It A single module is not necessarily, can be united two into one with the semantic meaning analysis module in step 404, because they can be adopted Same analytic method, simply secondary parsing module is using wider loose constraint condition;Certainly, if secondary parsing module and step The method that semantic meaning analysis module in rapid 404 is adopted is inconsistent, then need it separately as a module.
Step 407, judges whether semantic parsing is successful in step 404, if the semantic successfully resolved of step 404, flow process Then go to the respond module of step 408;Otherwise, going to step 406 carries out secondary semantic parsing.
Step 408, respond module gives user appropriate response according to the semantic information for obtaining, such as operation equipment is returned Query Information etc.;Failing to obtain effective semantic information, then with the information response user of similar " not being understood that user ".
Whether step 409, judgement shows the candidate of user and its semanteme correct, if correctly, selects it to update secondary Corpus;Otherwise, ask the user whether to need to change candidate, comply with user view.
Step 410, asks the user whether to change candidate.Also modification can be abandoned, directly with the information of " not being understood that user " Response user.
Step 411, secondary language material database management module creates secondary corpus and the preprocessing rule storehouse of user-specific.Wherein, it is secondary Corpus is used to preserve sentence and the semantic information that user adds in interaction page, and preprocessing rule storehouse is used to store user's sentence The mapping ruler of the candidate's sentence selected to user.In addition, it is also responsible for based on the sentence generative semantics solution in secondary corpus Analysis rule;The method of concrete generative semantics rule is relevant with the semantic analytic method for adopting.
Fig. 2 is the method flow diagram of present invention extension spoken dialogue system corpus, and it is comprised the following steps that:
Step 201:Collect the current sentence for passing through semantic parsing and parsing failure.
Step 202:Secondary parsing is carried out to the sentence, relaxes analysis condition, reduce output threshold values, draw one or more Candidate's analysis result.
Step 203:It is secondary it is parsed after, be supplied to user to select a certain amount of candidate's analysis result and corresponding semanteme Select.
Step 204:Determine whether that candidate meets the candidate of user view, if there is the candidate for meeting user view, select Select it and update secondary corpus and pretreatment corpus, execution step 205, otherwise, execution step 207.
Step 205:Selected according to user, the mapping ruler of candidate's sentence that user's sentence is selected to user is generated, by it It is appended to preprocessing rule storehouse;And candidate's sentence and corresponding semantic information of user's selection are added to into the secondary language of active user Material storehouse.
Step 206:Semantic generation user response or system operatio or response user according to obtaining is similar to and " cannot understand use The information at family " etc.
Step 207:Terminate this spoken dialog.
Fig. 3 is that pair corpus of the invention and main corpus combine the flow chart for carrying out semantic parsing, and it is comprised the following steps that:
Step 301:Semantic meaning analysis module obtains the text of the user's sentence through Chinese word segmentation and name Entity recognition.
Step 302:Semantic meaning analysis module based on the secondary corpus of active user, preprocessing rule storehouse and main corpus to Family sentence is parsed.
Aid in the method that main corpus carries out semantic parsing as follows using secondary corpus:
The success rate for carrying out semantic parsing if based on main corpus is higher, such as more than or equal to 60%, be then first based on Main corpus carries out semantic parsing, if it succeeds, responding user according to analysis result;If parsing failure, then based on secondary language Material storehouse carries out corpus parsing;
Before line statement parsing is entered based on secondary corpus, the rule in preprocessing rule storehouse are first matched successively with sentence to be resolved Then former piece, if it does, then replacing sentence to be resolved with consequent, being then based on secondary corpus carries out semantic parsing;Such as All it fails to match for the whole rule former pieces of fruit, then being directly based upon secondary corpus carries out semantic parsing;If based on the parsing of secondary corpus Sentence success, then respond user according to analysis result;If parsing failure, carries out secondary parsing, i.e., to sentence to be resolved Parsed based on wider loose constraint condition.
The success rate for carrying out semantic parsing if based on main corpus is not high, such as less than 50%, then first based on secondary language material Storehouse carries out semantic parsing;Fail if based on the parsing of secondary corpus, then again semantic parsing carried out based on main corpus;If base In the direct successfully resolved of secondary corpus, then user is responded according to analysis result, without based on main corpus carrying out semanteme again Parsing;Failure is all parsed if based on secondary corpus and main corpus, then secondary parsing is carried out to user's sentence, i.e., based on wider Loose constraint condition is parsed.
Step 303:Judge whether parsing is successful, if successfully resolved, execution step 304, otherwise execution step 305.
Step 304:Semantic generation user response or system operatio or response user according to obtaining is similar to and " cannot understand use The information at family " etc.
Step 305:The flow process of spoken dialogue system corpus is extended in Fig. 2.
The present invention realizes corpus extension by adding secondary corpus by user.Specifically, the present invention can be produced Following beneficial effect:
1st, corpus scaling problem is targetedly solved, improves the vigorousness of corpus.
Rule in general corpus can parse most of sentence.But due to the generation of new sentence and new term, voice Identification mistake and produce do not know the reasons such as so-called word or sentence, dialect, individualized language custom, need to enter corpus Row extension.It is wrong, unique user for the new sentence of neologisms, dialect, identification and the method extends corpus by user's selection Custom etc., realizes dynamic expansion, improves the vigorousness of corpus.
2nd, the cost for safeguarding corpus is reduced.
Secondary corpus is added, and a user only corresponds to a secondary corpus, without changing main corpus, using secondary language Material storehouse aids in main corpus to complete parsing, and is parsed by secondary semanteme and extend corpus with user mutual, rather than Depend on it is traditional corpus is extended by attendant, reduce maintenance cost.
The 3rd, the error correction of spoken dialogue system is provided, strengthens the availability of spoken dialogue system.
The sentence that the present invention will be unable to parse presents to user, and user is it is known that the sentence described in oneself cannot why Parsing, on the one hand can be added to corpus by the sentence described in user or the sentence changed on its basis, on the other hand This result can be abandoned and re-enter the availability for strengthening spoken dialogue system.
The preferred embodiment of the application is the foregoing is only, not to limit the application, all essences in the application Within god and principle, any modification, equivalent substitution and improvements done etc. should be included within the scope of the application protection.

Claims (10)

1. it is a kind of extension spoken dialogue system corpus method, it is characterised in that include:
Sentence to parsing carries out secondary semantic parsing, obtains candidate's analysis result;
If user have selected candidate from candidate's analysis result, generate between the sentence and the selected candidate of user Mapping ruler, by the rule for being generated the preprocessing rule storehouse of user-specific is added to, and by the candidate's sentence selected by user and Corresponding semantic information is stored in the corresponding exclusive secondary corpus of the user;
When user's sentence is parsed, the preprocessing rule storehouse with reference to user-specific, the rule auxiliary master based on the generation of secondary corpus Corpus carries out semantic parsing.
2. method according to claim 1, it is characterised in that described when user's sentence is parsed, with reference to user-specific Preprocessing rule storehouse, the rule generated based on secondary corpus aid in main corpus to carry out semantic parsing to be included:
The first threshold of the success rate of semantic parsing more than setting is carried out if based on main corpus, is then first entered based on main corpus The semantic parsing of row, if it succeeds, responding user according to analysis result;If parsing failure, then language would be carried out based on secondary corpus The parsing of material storehouse;
The Second Threshold of the success rate less than setting of semantic parsing is carried out if based on main corpus, is then first based on secondary corpus and is entered The semantic parsing of row;Fail if based on the parsing of secondary corpus, then semantic parsing is carried out based on main corpus.
3. method according to claim 2, it is characterised in that:
If based on secondary corpus successfully resolved, then user is responded according to analysis result;
Failure is all parsed if based on secondary corpus and main corpus, then secondary parsing is carried out to sentence.
4. according to the method in claim 2 or 3, it is characterised in that:
Before line statement parsing is entered based on secondary corpus, first matched successively before the rule in preprocessing rule storehouse with sentence to be resolved Part, if it does, then replacing sentence to be resolved with consequent, being then based on secondary corpus carries out semantic parsing;If complete All it fails to match for the regular former piece in portion, then being directly based upon secondary corpus carries out semantic parsing.
5. the method according to any one of claims 1 to 3, it is characterised in that:
It is described that secondary resolving to is carried out to sentence:Sentence is parsed based on loose constraint condition more wider than last parsing.
6. it is a kind of extension spoken dialogue system corpus system, it is characterised in that include:Semantic meaning analysis module, front end interaction Module and secondary language material database management module, wherein:
Semantic meaning analysis module, for when user's sentence is parsed, preprocessing rule storehouse with reference to user-specific, based on secondary corpus The rule of generation aids in main corpus to carry out semantic parsing;In semantic meaning analysis module parsing failure, secondary semantic parsing is carried out, Generate candidate's analysis result and give front end interactive module;
Front end interactive module, the candidate's analysis result for semantic meaning analysis module to be generated feeds back to user and interacts, comprising Interactive interface and interaction process;
Secondary language material database management module, for creating the secondary corpus and preprocessing rule of user-specific on the basis of main corpus Storehouse, by the candidate's sentence received from front end interactive module and the secondary corpus of corresponding semantic information write, and user's sentence is arrived The mapping ruler write preprocessing rule storehouse of candidate's sentence, and be responsible for based on the sentence generative semantics parsing rule in secondary corpus Then.
7. system according to claim 6, it is characterised in that the semantic meaning analysis module carries out in such a way semanteme Parsing:
The first threshold of the success rate of semantic parsing more than setting is carried out if based on main corpus, is then first entered based on main corpus The semantic parsing of row, if it succeeds, responding user according to analysis result;If parsing failure, then language would be carried out based on secondary corpus The parsing of material storehouse;
The Second Threshold of the success rate less than setting of semantic parsing is carried out if based on main corpus, is then first based on secondary corpus and is entered The semantic parsing of row;Fail if based on the parsing of secondary corpus, then semantic parsing is carried out based on main corpus.
8. system according to claim 7, it is characterised in that:
If based on secondary corpus successfully resolved, then semantic meaning analysis module is according to analysis result response user;
Failure is all parsed if based on secondary corpus and main corpus, then semantic meaning analysis module carries out secondary parsing to sentence.
9. the system according to claim 7 or 8, it is characterised in that:
Semantic meaning analysis module first matches successively preprocessing rule before line statement parsing is entered based on secondary corpus with sentence to be resolved Regular former piece in storehouse, if it does, then replacing sentence to be resolved with consequent, being then based on secondary corpus carries out semanteme Parsing;If all it fails to match for rule former piece, being directly based upon secondary corpus carries out semantic parsing.
10. the method according to any one of claim 6 to 8, it is characterised in that:
It is described that secondary resolving to is carried out to sentence:Sentence is parsed based on loose constraint condition more wider than last parsing.
CN201611255063.7A 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus Active CN106649278B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611255063.7A CN106649278B (en) 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611255063.7A CN106649278B (en) 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus

Publications (2)

Publication Number Publication Date
CN106649278A true CN106649278A (en) 2017-05-10
CN106649278B CN106649278B (en) 2019-11-15

Family

ID=58837348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611255063.7A Active CN106649278B (en) 2016-12-30 2016-12-30 Extend the method and system of spoken dialogue system corpus

Country Status (1)

Country Link
CN (1) CN106649278B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107240398A (en) * 2017-07-04 2017-10-10 科大讯飞股份有限公司 Intelligent sound exchange method and device
CN109753976A (en) * 2017-11-01 2019-05-14 中国电信股份有限公司 Corpus labeling device and method
CN110032740A (en) * 2019-04-20 2019-07-19 卢劲松 It customizes individual character semanteme and learns application method
CN110942765A (en) * 2019-11-11 2020-03-31 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994967A (en) * 1988-01-12 1991-02-19 Hitachi, Ltd. Information retrieval system with means for analyzing undefined words in a natural language inquiry
US7412440B2 (en) * 2003-12-05 2008-08-12 International Business Machines Corporation Information search system, information search supporting system, and method and program for information search
CN102663016A (en) * 2012-03-21 2012-09-12 上海汉翔信息技术有限公司 System and method for implementing input information extension on input candidate box on electronic device
CN103246641A (en) * 2013-05-16 2013-08-14 李营 Text semantic information analyzing system and method
CN105786793A (en) * 2015-12-23 2016-07-20 百度在线网络技术(北京)有限公司 Method and device for analyzing semanteme of spoken language text information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4994967A (en) * 1988-01-12 1991-02-19 Hitachi, Ltd. Information retrieval system with means for analyzing undefined words in a natural language inquiry
US7412440B2 (en) * 2003-12-05 2008-08-12 International Business Machines Corporation Information search system, information search supporting system, and method and program for information search
CN102663016A (en) * 2012-03-21 2012-09-12 上海汉翔信息技术有限公司 System and method for implementing input information extension on input candidate box on electronic device
CN103246641A (en) * 2013-05-16 2013-08-14 李营 Text semantic information analyzing system and method
CN105786793A (en) * 2015-12-23 2016-07-20 百度在线网络技术(北京)有限公司 Method and device for analyzing semanteme of spoken language text information

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107240398A (en) * 2017-07-04 2017-10-10 科大讯飞股份有限公司 Intelligent sound exchange method and device
CN107240398B (en) * 2017-07-04 2020-11-17 科大讯飞股份有限公司 Intelligent voice interaction method and device
CN109753976A (en) * 2017-11-01 2019-05-14 中国电信股份有限公司 Corpus labeling device and method
CN109753976B (en) * 2017-11-01 2021-03-19 中国电信股份有限公司 Corpus labeling device and method
CN110032740A (en) * 2019-04-20 2019-07-19 卢劲松 It customizes individual character semanteme and learns application method
CN110942765A (en) * 2019-11-11 2020-03-31 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus

Also Published As

Publication number Publication date
CN106649278B (en) 2019-11-15

Similar Documents

Publication Publication Date Title
JP6675463B2 (en) Bidirectional stochastic rewriting and selection of natural language
KR101768509B1 (en) On-line voice translation method and device
Oh et al. Stochastic language generation for spoken dialogue systems
CN107665706B (en) Rapid voice interaction method and system
US6983239B1 (en) Method and apparatus for embedding grammars in a natural language understanding (NLU) statistical parser
US8874443B2 (en) System and method for generating natural language phrases from user utterances in dialog systems
WO2018034118A1 (en) Dialog system and computer program therefor
DE60123952T2 (en) GENERATION OF A UNIFORM TASK DEPENDENT LANGUAGE MODEL THROUGH INFORMATION DISCUSSION PROCESS
US6963831B1 (en) Including statistical NLU models within a statistical parser
US20050086047A1 (en) Syntax analysis method and apparatus
CN106649278B (en) Extend the method and system of spoken dialogue system corpus
JP2009193448A (en) Dialog system, method, and program
JP2000353161A (en) Method and device for controlling style in generation of natural language
CN103325370A (en) Voice identification method and voice identification system
DE602004004310T2 (en) System with combined statistical and rule-based grammar model for speech recognition and understanding
US8296319B2 (en) Information retrieving apparatus, information retrieving method, information retrieving program, and recording medium on which information retrieving program is recorded
WO2016068690A1 (en) Method and system for automated semantic parsing from natural language text
CN111324712A (en) Dialogue reply method and server
KR101409298B1 (en) Method of re-preparing lexico-semantic-pattern for korean syntax recognizer
CN111310457B (en) Word mismatching recognition method and device, electronic equipment and storage medium
CN106021286A (en) Method for language understanding based on language structure
DE60119643T2 (en) Homophone choice in speech recognition
Yeh et al. Ontology‐based speech act identification in a bilingual dialog system using partial pattern trees
JP3022511B1 (en) Language processing device and semantic determination device
CN111552785A (en) Method and device for updating database of human-computer interaction system, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant