WO2008128423A1 - An intelligent dialog system and a method for realization thereof - Google Patents

An intelligent dialog system and a method for realization thereof Download PDF

Info

Publication number
WO2008128423A1
WO2008128423A1 PCT/CN2008/000764 CN2008000764W WO2008128423A1 WO 2008128423 A1 WO2008128423 A1 WO 2008128423A1 CN 2008000764 W CN2008000764 W CN 2008000764W WO 2008128423 A1 WO2008128423 A1 WO 2008128423A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
corpus
text
mapping
speech
set
Prior art date
Application number
PCT/CN2008/000764
Other languages
French (fr)
Chinese (zh)
Inventor
Yangsheng Xu
Chong Guo Li
Jingyu Yan
Jun Cheng
Xinyu Wu
Original Assignee
Shenzhen Institute Of Advanced Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding

Abstract

An intelligent dialog system includes a text-comprehending answering module (2) which is used to obtain an output text based on an input text. The module includes a word dividing unit, a mapping corpus (7), a mapping unit (5), a dialog corpus (8) and a searching unit (6). The word dividing unit is used to tag parts of speech for said input text (4), and obtain a word set with parts of speech tagging. The mapping corpus (7) is used to set and store a mapping relationship between key words and concept sentences. The mapping unit (5) is used to search said mapping corpus (7) based on said word set and map to obtain a concept sentence (5). The dialog corpus (8) is used to set and store mapping relationship between concept sentences and output text. The searching unit (6) is used to search said dialog corpus (8) based on said concept sentence and obtain the output text.

Description

An intelligent implement a chat system and method Technical Field

The present invention relates to the field of human-machine voice interaction, in particular, it relates to a natural language as a medium of intelligent chat system and its implementation home service robots, entertainment robots used in the field as well as voice conversations. Background technique

With the advent of an aging society, the accelerated pace of society, people's lack of face to face communication, more by phone, mail and Internet. Therefore, some people may produce a feeling of loneliness, or difficult to find the right people to chat boredom, can not find a place to talk their feelings, they may want to have a way to talk to their feelings, help loneliness, or give some specific help.

And, in modern society, fast-paced and high-stress environment, people want to be understood by others, resolve its own pressure, self-talk, which can use natural language to communicate and to listen, understand and answer intelligent entities are in demand of. Especially for elderly people, in order to prevent dementia or memory loss, for a device capable of verbal communication and voice reminders of great demand. For certain people, interacting in a natural language to get the information they want is necessary.

Intelligent service robots in the home, people want to use natural language to manipulate and control the robot some functions to achieve harmony between man and robot to better serve humanity. So for the human and social significance it has voice chat system. There are many simple voice dialogue toys, which primarily using speech recognition technology chip, waveform matching on the market, ahead of entry and voice answered mapping relationship, to achieve the answer to the input sentence. So this product a limited number of dialogue and conversation and can not be dynamically added understand, and can not really achieve the purpose of natural interaction with people.

Another is intelligent entity exists in some chat over instant messenger, such as its main technique is through MSN, QQ and other chat tools, construct a virtual agent T, and attaches to the Internet, through information retrieval and database search Inquiry to answer questions and chat. Characterized by words as a medium of communication, and is completely dependent on the Internet or communications network; intelligent entity that does not use natural language to communicate with people, lack of dialogue with the real machine language and experience the fun, not meet earlier described various social needs.

The prior art further comprises a voice chat automatic speech recognition, understanding spoken text, speech synthesis step, when the synthetic high accuracy was better identification; spoken text is generally appreciated that attempts to identify the semantic analysis, semantic frames may be employed, or ontology representation like. I.e. derived semantic analysis method reflect some formal meaning of the statement in accordance with each notional word in the syntactic structure of the input sentence and the sentence; semantic framework semantic analysis is the carrier, and some systems using semantic ontology representation or organization framework. However, the main difficulty lies in how the semantic framework of semantics, and semantics because the semantic framework is empirical, it is difficult to have a unified standard, and the number is massive, this can lead to difficulties in semantic framework established.

Thus, defects of the prior art, needs to be improved. SUMMARY

The purpose of this invention is to provide an intelligent chat system and its implementation method, used in home service robots, entertainment robots and voice conversation areas.

Aspect of the present invention is as follows:

An intelligent chat system, which includes a text according to an input to obtain an output text in the text appreciated that the answer module; appreciated that the text word answer module comprises means for mapping the XML-based corpora, a mapping unit, is based on XML - Corpus and then searching means; means for the word of the input text speech tagging, to obtain part of speech having a standard word set injection '; corpus for establishing the mapping and storage of the concept of keywords statement mapping relationship; the mapping unit according to the set of words, searching the mapping corpus, to obtain the mapping concept statement; corpus for establishing the session and storing a mapping relationship between the output statements to concepts of text; the search unit according to the concepts of sentence, the search the dialogue corpus, obtained by mapping the output text.

The intelligent chat system, in which further comprising means for converting the input speech into a speech recognition module of the input text.

Intelligent chat system, wherein, further comprising means for converting the output text into speech output speech synthesis module.

The intelligent chat system, wherein the mapping the dialog corpus corpus and disposed in the same corpus. Intelligent chat system, wherein, further comprising a preprocessing unit for the words from the word set of unit, replace the words set information, more dialogue flag settings dialog or flag, to obtain the mapping unit for the words used in the collection.

The intelligent chat system, which further comprises a post-processing unit, configured to output the text from the search unit performs the following process: storing history information or added, provided the subject of conversation, the search information obtained was added, to obtain an output of the speech synthesis module output text. . Method for implementing intelligent chat system, a chat system including a smart appreciated answer module according to the output text input text is a text, comprising the steps of: Al, based on the mapping to establish dialogue corpus and corpus of XML, the mapping establishment corpus and the concept of keywords stored mapping relationship statement, the dialog statements corpus storage concepts and establishing mapping relationship between the output text; A2, the input speech tagging the text, to obtain a set of words with part of speech tagging; A3, of the said word set and the mapping set a corpus keyword matching calculation, the concept of the statement; A4, according to the concept of the conversation sentence corpus, searches, to generate output text.

Implemented method, wherein prior to step A2, further comprising the step of: converting the input speech into input text. Implemented method, wherein, further comprising the step A5: converting the output text into speech output.

Implementation method, wherein, after step A4, also includes post-processing step for increasing the accuracy of the answer: Join or storing history information, set conversation topics in information search.

Implemented method, wherein prior to step A3, further comprising the step of: Bl, determining the presence of a case where input text: demonstrative pronoun occurs, no change in the theme or the need to add knowledge, the corresponding pre-treatment step performed: replacing words set information, more dialogue flag or flag settings dialog, otherwise step A3; B2, determines whether or not the pretreatment is completed, then the success flag is returned to step A4, otherwise failure flag, step A3.

Implemented method, wherein the mapping the dialog corpus corpus and disposed in the same corpus. Implemented method, wherein the step A1 further comprises: setting a weight for weight values ​​of the mapping speech corpus, wherein the weight value by orthogonal two orthogonal optimization methods or optimization obtained.

Implemented method, wherein, further comprising the step A6, the user reviews the output speech, the text appreciated that the answer module adjusting the weight value according to the evaluation.

Implemented method, wherein, further comprising a personal information storing step for the user, and the weight value is stored in the user's personal information; when the user logs in, and reading the weight value corresponding to adjust the mapping corpus.

With the above, the present invention has established a corpus of speech and weight optimization of learning, semantic mapping and classification were also set up to answer the mapping between semantic; it is possible to use natural language to communicate with others, accuracy higher, also provides verbal communication and voice reminders; and realize the true language of dialogue between man and machine, allowing users to get real language of living experience and fun. BRIEF DESCRIPTION

Figure 1 is a general framework of the invention, FIG chat system;

Oral text understanding the present invention. FIG. 2 flowchart answer;

Oral text understanding the present invention. FIG. 3 a schematic view of the answer module; FIG. 4 is a schematic map corpus mapping description format of the present invention;

Direct answer dialogue format concept statement corpus FIG. 5 is a schematic view of the invention are described;

Answer format with dialogue history information corpus present invention will be described in FIG. 6 a schematic view;

The default session corpus FIG 7 of the present invention described in a schematic format of the answer library;

FIG 8 is a flowchart of a method of the invention;

Speech right in FIG. 9 of the present invention a method of optimizing a schematic weight;

Heavy weights speech online learning of the present invention in FIG. 10 flowchart. detailed description

Object of the present invention is constructed based on a text on the interaction, can also interact with the voice, chat system having smart, or robot, in order to meet people's needs. Example of the preferred embodiment be the following detailed description of the present invention.

The present invention provides a voice chat system, in particular, in order to achieve a natural language interaction with the framework of the present invention may be substantially preclude the three basic modules: an automatic speech recognition module (voice-to-text, Automatic Speech Recognition, ASR, Speech to text, STT), the user's natural tone ^ obtained by automatic speech recognition corresponding text, i.e., a speech recognition module for dividing an input speech into input text; spoken text appreciated that the answer module (text-to-text, text to text, TTT), That text is output based on the input text in the text to understand the answer module, intelligent chat system whereby the text spoken language understanding, generate answer text, this process will be used corpus and chat records system needs historical information; for converting the output to output text speech synthesis module of the speech (text to voice, speech synthesis ^ text to speech> TTS), by a voice synthesis module will respond by voice text and user interaction. If you do not consider natural language interaction, text only consider the perspective of interaction, you can include text only understand the answer module.

Automatic speech recognition module and a speech synthesis module can be used on the existing market module, including a respective software modules in the embedded platform, mainly to its demands high recognition accuracy, the best synthesis.

For text to understand the answer module, we understand the method of this patent is to use semantic mapping and classification, while establishing a mapping between the semantic answer, compared to the traditional method, simple, but will face a huge space and semantic categories. Issue a spoken audio signal into a corresponding text character via an automatic speech recognition module, a spoken language understanding module text answer text input is processed and given the context of the text corpus and answer dialogue The dialogue, the speech text synthesis module will finally obtained answer converted into sound signals, and user interaction. Of course, it is a simple process: text answer text spoken language understanding module for processing the input and given the context of the text corpus and answer dialogue The dialogue does not include input or output sound. 1, voice chat system can output a voice of the user as input to the system, for example through the microphone, the speech signal to the speech recognition module 1, the speech into text into spoken text appreciated that the answer module 2, in this the module will perform the entire process of Figure 2 and using the corresponding database, and returns the corresponding statement text answer, the answer statement text into the speech synthesis module 3 will convert the text into speech, so that the user can hear through the speaker feedback. The present invention is not only for voice chat, may be applied to various information system, automatic guide system, and the system description language learning automatic system, etc., may be used in a variety of applications requiring information output, can not only reduce labor costs, At the same time can improve the accuracy of the information and the management of information.

Text spoken language understanding and intelligent answering the chat system of the present invention, by the Chinese speech tagging, to give a set of keywords, and then understood from this set spoken text corpus is mapped onto a conceptual statements; conceptual statements, dialogue corpus, history information and network databases or information given in answer to the concept statement. 3, the text spoken in understanding the answer module 2, which is the main process input text by word speech tagging unit 4, the input speech tagging the text to give a set of speech tagging words; and mapping unit, i.e. mapping module 5, according to the set of words, map search corpus 7, to obtain the mapping concept statement; then search unit, i.e., the search module 6, according to the concept of sentence, the search dialog corpus 8, obtained by mapping the output text. Which involves both databases, wherein the mapping corpus 7, i.e. the database 7 is set from keywords to describe the concept of mapping of the statement, detailed description format shown in Figure 4, which defines the 14 kinds of Chinese speech, and to a set of keywords in each group should correspond to a conceptual statements;. dialogue corpus 8, i.e. primarily the database 8 answers recorded statement concept, FIG. 5 is directly connected to answer specific format statement describes the concept does not involve the environmental and historical information; Figure 6 is a statement describing answer and record at the same time given based on historical information, environmental information and the current conceptual statements; Figure 7 is the default answer library, the program will answer from the default library specified when needed given by way of output text. For example, when the user says, "What is your name," through voice recognition module to get in better condition, "What is your name", ¾ over-speech tagging will get a segmentation and part of speech results, "you (pronouns) of (particle) name (noun) is (verb) what (pronouns) ", enter the mapping process, by comparing the results scored by the set of speech tagging corpus concept, the concept of statement will get the highest score of three, for example, from high to low scores arrangement "What is your name", "What is your name", "do you know the name of it," and apparently expression is the meaning of the highest points, which is mapping the concept of statements obtained in accordance with the concept. statement, search the dialogue corpus, it He can be answered. For some statements, such as "like", then the system needs to know the context of the environment, by matching information in the preceding paragraph, will be able to know how to answer, such as "What do you like the movie?" And so on.

The intelligent chat system, or a text appreciated that the answer module, may further comprise a preprocessing unit for the words from the word set of unit, replace the words set information, more dialogue flag or flag settings dialog, means for using the word to obtain the mapping set.

The intelligent chat system, or a text appreciated that the answer module, can be further post 5 ^ processing unit for the output from the text search unit performs the following process: storing history information or added, provided a conversation topic , information obtained by the search is added to give the speech synthesis module outputs the output text.

And using the pre-processing unit post-processing unit, may increase the accuracy of the information, to facilitate the user's understanding of information, allowing users to easily and issuing a higher accuracy and understanding of information.

On this basis, the present invention also provides a method of intelligent ^^ now chat system, shown in Figure 8, a chat system including a smart appreciated answer module according to the output text input text is a text, comprising the steps of:

Al, established based on the mapping of XML corpus and dialogue corpus, the corpus mapping keywords to create and store concept mapping relationship statements, to establish a dialogue corpus storage concepts and statements to the mapping relationship between the output of the text. A1 may further comprise the step of: setting a weight for weight values ​​of the mapping speech corpus, wherein the weight value can be optimized by orthogonal or orthogonal two optimization methods preclude obtained. DETAILED two orthogonal or orthogonal optimization optimization method described later in detail.

A2, the input text speech tagging, obtain a set of word speech tagging. Speech tagging for subsequent matching calculation step. Prior to step A2, may further comprise the step of: converting the input speech into text input, i.e. outside of the collected voice information into text information. If the interaction is not considered a natural language, text only considered interaction angle, can be omitted speech input step of converting the input text.

A3, the set of words and the words of the corpus mapping keyword matching set of calculation, the concept of the statement. Prior to step A3, the step may further comprise: Bl, determining the presence of a case where input text: demonstrative pronoun occurs, no change in the theme or the need to add knowledge, the corresponding pre-treatment step performed: replacement word set information, more dialogue flag or set flags dialogue, otherwise step A3; B2, determine whether pretreatment is completed, the success flag is returned to step A4, otherwise failure flag, step A3. Among them, replace the word collection of information is at the user's current input text contains a demonstrative pronoun, need to be replaced, such as user input: "? The city dandy" At this point you can query the chat history or information stored in the database, For example, history information stored in Shenzhen city, it will need to be replaced, Shenzhen beautiful? And for subsequent processing. Dialogue flag indicating whether the main topic of conversation is a conversion occurs when there is a new theme appears, it is necessary to modify the topic of conversation. For example, when users start talking about the weather, but suddenly turned into cars, then we must modify the topic of conversation, dialogue is set to increase or flag, making history information fail or change. Set dialogue marked increase in dialogue and sign a similar concept, 'when the subject is the first time the need to increase dialogue flag, flags dialogue needs to be set when the subject changes.

A4, the dialogue based on the concept statement, search corpus, generate output text. After step A4, also may include post-processing steps: adding or storing history information, set conversation topics in information search. Which contains historical information and the user has to talk to the sentence, as well as other important information, such as the speaker's name, age, hobbies and so on; the topic of conversation is the current talk about topics such as weather, stocks, news, culture, sports and so on, which is valid for the robot to search tips and information of the answer; information search means, according to the topic of conversation, can meet the needs of users by searching a database or a network, for example, when it comes to the weather, according to user given time and place, given the weather respective city or region, or given changes in the weather and other relevant information through these searches get can be given to answer the needs of users. Further, the above-described post-processing step may be used to increase the accuracy of the answer, so that the higher accuracy of the output text.

After step A4, may further comprise the step A5: converting the output text into speech output. If the interaction is not considered a natural language, text only considered interaction angle, may be omitted output text into speech output step.

After the step A4, may further comprise the step A6, the user reviews the output speech, the text appreciated that the answer module adjusting the weight value according to the evaluation. In this case, each user can also create a personal information file, i.e., the user further comprising the step of storing personal information, and the weight value is stored in the user's personal information; when the user logs in, and reading the weight value adjusting the map corresponding to the corpus. Wherein, the evaluation of subjective, the answer of the system, the user may be given three levels of evaluation, such as, well, okay, not good, or other other grades of evaluation, the present invention does not additionally be defined. After the evaluation system is obtained, it may also be given by a voice profile; while the system according to a result of the weight values ​​of the mapping speech corpus is adjusted.

The present invention also provides a method of spoken language understanding. Since the degree of user and environment quiet different speech recognition software used by its own characteristics, and some duplicate itself has spoken, omissions, pause, and the characteristics of the wrong sentences having the same semantic rich variety of expression methods, such automatic the output of the speech recognition uncertainty and diversity, and therefore, the method according to the rules of conventional natural language understanding and difficult to resolve semantic expression. In fact, when human chat to communicate in noisy environments, and sometimes can not hear every word they are saying, but if you can understand which a few key words, and depending on context part, the other party will be able to be recovered meaning of the expression. Therefore, in this case, using the keyword (Keywords) statements to the concept (concept sentence) to obtain semantic mapping speaker, and statements directly to the concept represented by the corresponding natural sentence.

FIG 2 is a flowchart of understanding spoken text answer.

First, by word module 9 to give a speech tagging a collection of words, Chinese word already have more research, and have a higher accuracy rate, are not repeated here; at the same time based on historical information chat, when some input sentence demonstrative pronouns, or constant theme of conversation, or when common sense knowledge needs to be added, on the need for pre-processing; pretreatment 10, that is, before 10 processing needed to replace some of the necessary information, or to increase dialogue flag bit set, the system can be returned directly by a flag, to indicate the result of the pretreatment. If the return flag successful pretreatment, post-treatment directly into the processing module 14 to give the final output text; even if treated after pretreatment, will enter the match ordering module 11, shown in FIG. 4 in accordance with the corpus, an input POS tagging and corpus keys set of attributes describing the candidate's speech set of matching calculation, the different parts of speech have different weights corpus each alternative concept statement will be given a score, such as "What is your name the name "this sentence is the best expression of the semantics of the term" when the name of "the importance of other relatively weaker, so during the match, should match the importance of the highest part of speech; this directly affects the degree of matching speech concept the accuracy of the statement.

The match will last sorting module configured with a set of the three highest scoring pattern. Because the impact of speech recognition and deficiencies inherent in the use of the environment, recognized text may appear is not a complete statement, even chaotic character, word of the result will be poor in this case, by mapping the resulting map scores statements are zero, in this case, that the chat system did not hear the speaker, called the concept of statement collection is set to null.

If the set is empty, the default direct it into the corpus shown in FIG. 7; if the set is not empty, selecting the highest scoring statements comparing the first threshold value 12, when the score is less than the threshold directly into the Figure 7 the default corpus illustrated, when the score is not less than the threshold concept will give successful statement mapping pattern corresponding to the statement as a concept. Wherein, upon determining the first threshold value, by selecting 100 a typical test set, by matching score to the test results, the highest scores were selected threshold was first threshold value here .

After obtaining the concept of statement by the search module 13, according to the Corpus shown some historical information and 6, the attempt to give a response text, which is a search process, answering statement with the current concepts and statements on a system as input to search because not necessarily meet the two inputs at the same time, it is possible to search result is empty. If an answer text search is considered successful, 14 will be answered directly to the processing module into the output; if the search result is empty, it is considered to have failed, it will enter the corpus FIG. 5 to FIG. to answer, similar to the final output processing module 14 into the post. Output statement after the processing module 14 corresponding processing, which will include some historical information, or storing historical information, the topic of conversation state is set, the search query has relevant information, will ultimately lead to answer the text, returning to the speech synthesis module . The final answer generated text, according to the answer to the concept statement, and search history information to co-generation.

The present invention further provides a method for storing configuration described dialogue corpus. To complete the description key to the concept mapping relationship statements, to give the corresponding output sentence according to the concept and context statements describe and store, designed based on XML (storage structure described in extendable markup language :) language to describe these non structured data structure, and an XML document to describe corpus, to store data in a relational database. Corpus Corpus maps and historical information and dialogue are using XML to describe and store. And defines the properties required for the node description corpus. The database stores a set of parts of speech, the concept statement, answer sentence and historical information, and so on. It features easy to organize and manage dynamic content can be modified corpus. Corpus various artificial methods can be modified manually and by adding data while additions and modifications can be done directly through the interactive voice corpus, and can automatically store specific data.

The present invention also provides a process and method to learn by voice. Knowledge accumulated chat system can be informed by way of natural interaction with interlocutors, and to determine by mutual inquiry whether to allow chat system to get the knowledge of the user given the same time chat system gives the corresponding natural language feedback.

The present invention also provides a chat context information and use history. The system processes and human interaction will automatically some of the information stored in the record context among some important information and conversations are stored, and will add the appropriate information in the course of the dialogue, the answer based on the information to dynamically organize statements.

The present invention also provides a speech heavy weights optimization and online learning. When the keyword is mapped to the concept of statements, each of the different parts of speech keywords will have different weights. Using an optimization method to obtain the optimum speech respective weight values, and to learn to dynamically modify the weight value online. When the keyword mapped to a corresponding conceptual statements need to be weighted for each keyword speech, different parts of speech of keywords having different weights in a sentence semantic process, usually a sentence having higher nouns and verbs weight, understanding of the semantics of the sentence has important significance. However, many natural language speech species, the heavy weight of each part of speech and not a value determined. Therefore, the proposed speech heavy weight optimization and online learning method to achieve keywords to maximize the accuracy of the mapping concept statement.

9, an orthogonal weight optimization to determine the speech method of weight. Due to the large Chinese part of speech, and the importance of different parts of speech do not know exactly semantic expression, we need to get through the heavy weight of each part of speech optimization. As a general linguistic point of view as well as common sense, choose the verbs, nouns, pronouns, numerals, adjectives, nouns, adverbs, idioms, word time, auxiliary, modal, name, distinguishing words, Nouns, this 14 class relatively important word. First and experience needed to get the required 14 parts of speech and linguistic knowledge in accordance with the 14 speech divided into two groups, such as nouns, verbs, pronouns, nouns, adjectives, time words, names, seven for the first part of speech set; the modal particle orientation word, the difference word, auxiliary, idioms, adverbs, numerals, which part of speech a second set of seven, the right to obtain a re-usable set of optimized by orthogonal test groups. When the first set of tests, with the relatively more important factor as seven attributes, three levels, such as 3, 2, 1, L18-3-7 selection criteria orthogonal table. Further 7 parts of speech will be set to 0. When creating a collection of test, every word spoken type, and in the test set will try to make every part of speech according to the probability of occurrence of a natural. In each test, the test set every word artificially give a reasonable match out of the concept of statement of scoring and scoring as a result of this test. This will be tested 18 rounds. Through the first set of tests, a set of current will be optimal weight values. When the second set of experiments, the relative importance given to the part of speech 7 to obtain a first set of test weight values. The remaining 7 weight parts of speech reuse orthogonal optimization, for example 2, the level 10, the same selection criteria L18- 3- 7 orthogonal table. With the same test with the first set of criteria and marking, the remaining seven were optimized speech. Finally, the combination of two speech obtained, to obtain weight values ​​available to the system 14 of the speech.

10, the heavy weight of the various parts of speech online learning. When the user enters the speech training mode, training database will be by voice, a voice input given to a first user enters a test mapping module 15, a mapping module mapping module is shown in FIG. 2155 and maps the result to the user in the form of voice and decision block 16, the user gives evaluation feedback, decision block 16 will be in the weight adjustment module 17 arithmetically adjust weights, the weights adjusted weight into a mapping module 15, re-adjusted at a weight by evaluating, until finally achieve customer satisfaction degree of matching. For example, when the user says "What's your specialty is," then the system after treatment will ask, "What you say is, 'What is your specialty you," or ask, "What you say is' What do you do,'" apparently users will answer "yes" or "no", the system according to the answer, it will adjust the speech weight so best to answer them correctly.

The present invention further provides a method of driving behavior of a natural language. Speaking with a natural way to issue commands driven, in speech to the concept of a collection of statements, and statements from concept to final answers and feedback, there is a specific format and motion driven script that way naturally spoken driving system or issue commands. For driving behavior, no longer use the system in advance defined phrases or simple imperative to drive the behavior of the system, but for the expression of some natural order can give the correct response, and to identify and respond by voice to achieve remind function of the user. This behavior-driven approach more in line with people's daily habits, for the new user does not require much learning can be driven in natural language system.

The present invention further provides a system for realizing an embedded voice chat. For such voice chat frame design, implemented in many ways, such as using a speech recognition chip and mapping storage completion corpus identified using the embedded system implemented with a processor similar to normal speech recognition and speech synthesis, language comprehension and . Embedded implementation is one, complete automatic speech recognition, semantic understanding and speech synthesis in specific embedded operating system, and to integrate various implementations of software under different platforms will be different. This approach is fully equipped with the inherent characteristics of voice chat system, along with a portable, low power consumption and compact, and low price.

The present invention further provides a method for querying and answering a natural sound information. Inquiries and feedback are using natural voice, and able to give answer a manner consistent with human language. Meet the way people communicate in a natural language to get the information they need, Bian asked information with an interactive way to answer and confirmation. And the data can come from an existing database and from the Internet.

It should be understood that those of ordinary skill in the art, can be modified or converted according to the above description, and all such modifications and variations shall fall within the scope of the appended claims.

Claims

Rights request
1, an intelligent chat system, characterized by comprising for understanding the answer module according to input text to obtain an output text in the text; the text appreciated that the answer module comprises a word unit, based on XML mapping corpus, a mapping unit, an XML-based dialogue and a search unit corpus;
The word means the input text for speech tagging, obtain a set of speech tagging words; corpus for establishing the mapping and storing the mapping between the keywords conceptual statements;
The mapping unit according to the set of words, searching the mapping corpus, to obtain the mapping concept statement; corpus for establishing the session and storing a mapping relationship between the output statements to concepts of text;
Means for searching the corpus based on the concept of the conversation sentence search, map to obtain an output text.
2. The smart chat system according to claim 1, characterized in that it further comprises means for converting speech input as a voice recognition module input text.
3. The smart chat system according to claim 1, characterized in that it further comprises a speech synthesis module for converting the output text into speech output.
4. The intelligent chat system according to claim 1, wherein the mapping the dialog corpus corpus and disposed in the same corpus.
5. The intelligent chat system according to claim 1, characterized by further comprising a preprocessing unit for the words in the word unit from the set of words to be replaced set information, more dialogue flag or flag settings dialog bits for the words obtained using the mapping unit set.
6. The smart chat system according to claim 1, characterized in that, further comprising a post-processing unit, for the output from the text search unit performs the following process: storing history information or added, provided a conversation topic , information obtained by the search is added to give the speech synthesis module outputs the output text.
7, a method for implementing intelligent chat system, a chat system including a smart appreciated answer module according to the output text input text is a text, comprising the steps of:
Al, established based on the mapping of XML corpus corpus and dialogue, and to establish the mapping corpus storage mapping keywords related to the concept statement, the dialogue corpus concept created and stored statements to the mapping relationship between the output text; A2, the input text speech tagging, obtain a set of speech tagging words;
A3, the set of words and mapping the set of words corpus keyword matching calculation, the concept of the statement;
A4, the dialogue based on the concept statement, search corpus, generate output text.
8, implemented method according to claim 7, characterized in that, prior to the step A2, further comprising the step of - converting the input speech into input text.
9, depending on the implementation of the method as claimed in claim 7, characterized in that, further comprising the step A5: converting the output text into speech output.
10. The method of claim 7 implemented, characterized in that, after the step A4, further comprising a post-treatment step for increasing the accuracy of the answer: storing history information or added, provided the subject of conversation, search information added.
11, depending on the implementation of the method as claimed in claim 7, wherein, prior to step A3, further comprising the step of:
BL, a case where the presence of the input text is determined: demonstrative pronoun occurs, no change in the theme or the need to add knowledge, the corresponding pre-treatment step performed: replacement word set information, more dialogue flag or flag settings dialog, otherwise step A3;
B2, determine whether pretreatment is completed, a success flag is returned to step A4, otherwise failure flag, step A3.
12. The method of claim 7 implemented, wherein the mapping the dialog corpus corpus and disposed in the same corpus.
13. The method of claim 7 implemented, wherein the step A1 further comprises: setting a weight for weight values ​​of the speech corpus is mapped, wherein two orthogonal or orthogonal optimization method of optimizing the weight value Bian obtain.
14, the implementation method according to claim 13, characterized in that, further comprising the step A6, the user reviews the output speech, the text appreciated that the answer module adjusting the weight value according to the evaluation.
The implementation method according to claim 14, characterized in that, further comprising a personal information storing step for the user, and the weight value is stored in the user's personal information; when the user logs in, and reading the weight value corresponding to adjusting the mapping corpus.
PCT/CN2008/000764 2007-04-19 2008-04-15 An intelligent dialog system and a method for realization thereof WO2008128423A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN 200710074112 CN101075435B (en) 2007-04-19 2007-04-19 Intelligent chatting system and its realizing method
CN200710074112.1 2007-04-19

Publications (1)

Publication Number Publication Date
WO2008128423A1 true true WO2008128423A1 (en) 2008-10-30

Family

ID=38976431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/000764 WO2008128423A1 (en) 2007-04-19 2008-04-15 An intelligent dialog system and a method for realization thereof

Country Status (2)

Country Link
CN (1) CN101075435B (en)
WO (1) WO2008128423A1 (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075435B (en) * 2007-04-19 2011-05-18 深圳先进技术研究院 Intelligent chatting system and its realizing method
JP5897240B2 (en) * 2008-08-20 2016-03-30 株式会社ユニバーサルエンターテインメント Customer support system, as well as the conversation server
US8374859B2 (en) 2008-08-20 2013-02-12 Universal Entertainment Corporation Automatic answering device, automatic answering system, conversation scenario editing device, conversation server, and automatic answering method
CN101551998B (en) 2009-05-12 2011-07-27 上海锦芯电子科技有限公司 A group of voice interaction devices and method of voice interaction with human
CN101610164B (en) 2009-07-03 2011-09-21 腾讯科技(北京)有限公司 Implementation method, device and system of multi-person conversation
CN101794304B (en) * 2010-02-10 2016-05-25 深圳先进技术研究院 Industry information service system and method
CN102737631A (en) * 2011-04-15 2012-10-17 富泰华工业(深圳)有限公司 Electronic device and method for interactive speech recognition
CN102194005B (en) * 2011-05-26 2014-01-15 卢玉敏 Chat robot system and automatic chat method
US8930189B2 (en) * 2011-10-28 2015-01-06 Microsoft Corporation Distributed user input to text generated by a speech to text transcription service
CN103150981A (en) * 2013-01-02 2013-06-12 曲东阳 Self-service voice tour-guiding system and triggering method thereof
CN103198155B (en) * 2013-04-27 2017-09-22 北京光年无限科技有限公司 An interactive system and method based on intelligent mobile terminal Q
CN103279528A (en) * 2013-05-31 2013-09-04 俞志晨 Question-answering system and question-answering method based on man-machine integration
CN104281609A (en) * 2013-07-08 2015-01-14 腾讯科技(深圳)有限公司 Voice input instruction matching rule configuration method and device
WO2015058386A1 (en) * 2013-10-24 2015-04-30 Bayerische Motoren Werke Aktiengesellschaft System and method for text-to-speech performance evaluation
CN103593054B (en) * 2013-11-25 2018-04-20 北京光年无限科技有限公司 One binding emotion recognition system and the output Q
CN104754110A (en) * 2013-12-31 2015-07-01 广州华久信息科技有限公司 Machine voice conversation based emotion release method mobile phone
CN104123939A (en) * 2014-06-06 2014-10-29 国家电网公司 Substation inspection robot based voice interaction control method
CN105404617B (en) * 2014-09-15 2018-12-14 华为技术有限公司 A method for controlling remote desktop, control systems and controlled end
CN104392720A (en) * 2014-12-01 2015-03-04 江西洪都航空工业集团有限责任公司 Voice interaction method of intelligent service robot
CN104615646A (en) * 2014-12-25 2015-05-13 上海科阅信息技术有限公司 Intelligent chatting robot system
CN104898589A (en) * 2015-03-26 2015-09-09 天脉聚源(北京)传媒科技有限公司 Intelligent response method and device for intelligent housekeeper robot
WO2016173326A1 (en) * 2015-04-30 2016-11-03 北京贝虎机器人技术有限公司 Subject based interaction system and method
CN105094315B (en) * 2015-06-25 2018-03-06 百度在线网络技术(北京)有限公司 Intelligent human-computer-based methods and apparatus AI chatting
CN106326208A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 System and method for training robot via voice
CN105206284A (en) * 2015-09-11 2015-12-30 清华大学 Virtual chatting method and system relieving psychological pressure of adolescents
CN105376140A (en) * 2015-09-25 2016-03-02 云活科技有限公司 A voice message prompt method and device
CN105573710A (en) * 2015-12-18 2016-05-11 合肥寰景信息技术有限公司 Voice service method for network community
CN105912712A (en) * 2016-04-29 2016-08-31 华南师范大学 Big data-based robot conversation control method and system
CN106294321A (en) * 2016-08-04 2017-01-04 北京智能管家科技有限公司 Conversation mining method and device for special field
CN106228983B (en) * 2016-08-23 2018-08-24 北京谛听机器人科技有限公司 Scene processing method and system types of human-computer interaction in natural language
CN106412263A (en) * 2016-09-19 2017-02-15 合肥视尔信息科技有限公司 Human-computer interaction voice system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1516112A (en) * 1995-03-01 2004-07-28 精工爱普生株式会社 Speed identification conversation device
JP2005025602A (en) * 2003-07-04 2005-01-27 Matsushita Electric Ind Co Ltd Text and language generation device and its selection means
US20050256717A1 (en) * 2004-05-11 2005-11-17 Fujitsu Limited Dialog system, dialog system execution method, and computer memory product
US20060173686A1 (en) * 2005-02-01 2006-08-03 Samsung Electronics Co., Ltd. Apparatus, method, and medium for generating grammar network for use in speech recognition and dialogue speech recognition
JP2006208905A (en) * 2005-01-31 2006-08-10 Nissan Motor Co Ltd Voice dialog device and voice dialog method
CN101075435A (en) * 2007-04-19 2007-11-21 深圳先进技术研究院 Intelligent chatting system and its realizing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6499013B1 (en) 1998-09-09 2002-12-24 One Voice Technologies, Inc. Interactive user interface using speech recognition and natural language processing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1516112A (en) * 1995-03-01 2004-07-28 精工爱普生株式会社 Speed identification conversation device
JP2005025602A (en) * 2003-07-04 2005-01-27 Matsushita Electric Ind Co Ltd Text and language generation device and its selection means
US20050256717A1 (en) * 2004-05-11 2005-11-17 Fujitsu Limited Dialog system, dialog system execution method, and computer memory product
JP2006208905A (en) * 2005-01-31 2006-08-10 Nissan Motor Co Ltd Voice dialog device and voice dialog method
US20060173686A1 (en) * 2005-02-01 2006-08-03 Samsung Electronics Co., Ltd. Apparatus, method, and medium for generating grammar network for use in speech recognition and dialogue speech recognition
CN101075435A (en) * 2007-04-19 2007-11-21 深圳先进技术研究院 Intelligent chatting system and its realizing method

Also Published As

Publication number Publication date Type
CN101075435B (en) 2011-05-18 grant
CN101075435A (en) 2007-11-21 application

Similar Documents

Publication Publication Date Title
Juang et al. Automatic recognition and understanding of spoken language-a first step toward natural human-machine communication
Robinson et al. WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition
Zue et al. JUPlTER: a telephone-based conversational interface for weather information
Wu et al. Emotion recognition from text using semantic labels and separable mixture models
US6633846B1 (en) Distributed realtime speech recognition system
US7729918B2 (en) Trainable sentence planning system
US6665640B1 (en) Interactive speech based learning/training system formulating search queries based on natural language parsing of recognized user queries
US7216080B2 (en) Natural-language voice-activated personal assistant
US7050977B1 (en) Speech-enabled server for internet website and method
US7516076B2 (en) Automated sentence planning in a task classification system
Delgado et al. Spoken, multilingual and multimodal dialogue systems: development and assessment
US8005680B2 (en) Method for personalization of a service
US7640160B2 (en) Systems and methods for responding to natural language speech utterance
Zue et al. Conversational interfaces: Advances and challenges
US7783486B2 (en) Response generator for mimicking human-computer natural language conversation
US20040098245A1 (en) Method for automated sentence planning in a task classification system
US20040210438A1 (en) Multilingual speech recognition
US20050154580A1 (en) Automated grammar generator (AGG)
Murray et al. Extractive summarization of meeting recordings.
US20020087313A1 (en) Computer-implemented intelligent speech model partitioning method and system
US20020123891A1 (en) Hierarchical language models
US20100145694A1 (en) Replying to text messages via automated voice search techniques
US20080091406A1 (en) System and method for a cooperative conversational voice user interface
US6785651B1 (en) Method and apparatus for performing plan-based dialog
US20020111811A1 (en) Methods, systems, and computer program products for providing automated customer service via an intelligent virtual agent that is trained using customer-agent conversations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08733962

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08733962

Country of ref document: EP

Kind code of ref document: A1