CN103021403A

CN103021403A - Voice recognition based selecting method and mobile terminal device and information system thereof

Info

Publication number: CN103021403A
Application number: CN2012105930794A
Authority: CN
Inventors: 张国峰
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2012-12-31
Filing date: 2012-12-31
Publication date: 2013-04-03
Also published as: CN103280218A; CN106847278A; TW201426736A; TWI511124B

Abstract

The invention discloses a voice recognition based selecting method and a mobile terminal device and an information system thereof. The selecting method includes: receiving first input voice; performing voice recognition and natural language processing for the first input voice so as to generate corresponding first semantic analysis; selecting corresponding portions from a plurality of data according to the semantic analysis of the first input voice; when the quantity of the selected data is 1, performing corresponding operation according to the type of the selected data; when the quantity of the selected data is larger than 1, displaying a data list according to the selected data and receiving second input voice; performing voice recognition and natural language processing for the second input voice so as to generate corresponding second semantic analysis; and selecting corresponding portions from the data of the data list according to the semantic analysis of the second input voice.

Description

System of selection and mobile terminal apparatus and infosystem based on speech recognition

Technical field

The invention relates to a kind of system of selection and mobile terminal apparatus thereof and infosystem, and particularly relevant for a kind of system of selection based on speech recognition and mobile terminal apparatus and infosystem.

Background technology

In the natural language understanding (Nature Language Understanding) of counter, usually can grasp with specific grammer intention or the information of user's read statement.Therefore, if store the data of abundant user's read statement in the database, just can accomplish rational judgement.

In the known practice, having a kind of is to utilize built-in fixedly word to tabulate to grasp user's read statement, and fixedly comprised specific intention or the employed particular terms of information in the word tabulation, and the user needs to express its intention or information according to this particular terms, and its intention or information could correctly be identified by system.Yet, force the user to go to remember that fixedly each particular terms of word tabulation is the practice of quite not hommization.For example: prior art is used the fixedly embodiment of word tabulation, require the user in inquiry weather, must say: " Shanghai (or Beijing) tomorrow (or day after tomorrow) weather how? " if and the user uses other more natural colloquial style to express when also wanting to inquire weather conditions, such as be " Shanghai tomorrow how? " because do not occur in the statement " weather ", so prior art will be understood as " there is a place of crying tomorrow in Shanghai ", the obvious so real intention of not catching the user.In addition, the employed statement kind of user is very complicated, and often changes to some extent again, even the statement that the user may input error sometimes, must grasp by the mode of fuzzy matching in the case user's read statement.Therefore, the effect that only provides the fixedly word tabulation of the input rule that ossifys to reach is just poorer.

In addition, when utilizing natural language understanding to process polytype user view, the syntactic structure of the intention that some is different but is identical, for example the read statement as the user is " I will see the The Romance of the Three Kingdoms ", its user view might be to want to see the film of the The Romance of the Three Kingdoms, or want to read the book of the The Romance of the Three Kingdoms, therefore usually in this case, just can match two kinds and may be intended to allow the user do selection.Yet, under many circumstances, provide unnecessary may being intended to allow the user do select to be very unnecessary and do not have efficient.For example, when user's read statement was " I want to see MillionStar ", book or paintings that user's intention is mated for reading MillionStar were (because MillionStar are TV programme) very unnecessary.

Moreover generally speaking, the search result that obtains in full-text search is non-structured data.Information in the unstructured data is dispersion and the association of not having, for example, behind the Search engines such as google or Baidu input keyword, the Web search result who obtains is exactly unstructured data, because search result must just can find central useful information by artificial reading item by item, and such practice is not only wasted user's time, and may miss the information of wanting, so can be very restricted on practicality.

Summary of the invention

The invention provides a kind of system of selection based on speech recognition and mobile terminal apparatus and infosystem, can promote the convenience of user's operation.

The present invention proposes a kind of system of selection based on speech recognition, comprising: receive the first input voice; The first input voice are carried out speech recognition to produce the first word string; The first word string is carried out natural language processing to produce the first semantic analysis of corresponding the first input voice; From a plurality of data, select corresponding part according to the first semantic analysis; When the quantity of the data of selecting is 1, carry out corresponding operation according to the type of selected data; When the quantity of the data of selecting greater than 1 the time, show data list and receive the second input voice according to the data of selecting; The second input voice are carried out speech recognition to produce the second word string; The second word string is carried out natural language processing to produce the second semantic analysis of corresponding the second input voice; According to selecting corresponding part in the second semantic analysis these data from data list.

The present invention proposes a kind of mobile terminal apparatus, comprises voice receiving unit, display unit, storage unit and data processing unit.Voice receiving unit receives the first input voice and the second input voice.Display unit is in order to show data list.Storage unit is in order to store a plurality of data.Data processing unit couples voice receiving unit, display unit and storage unit.Data processing unit carries out speech recognition to produce the first word string to the first input voice, the first word string is carried out natural language processing input the first semantic analysis of voice to produce corresponding first, and from these data, select the part of correspondence according to the first semantic analysis.When the quantity of the data of selecting was 1, data processing unit carried out corresponding operation according to the type of selected data.When the quantity of the data of selecting greater than 1 the time, data processing unit shows data list according to the Data Control display unit of selecting.Data processing unit carries out speech recognition to produce the second word string to the second input voice, the second word string is carried out natural language processing input the second semantic analysis of voice to produce corresponding second, and select corresponding part in foundation the second semantic analysis these data from data list.

The present invention proposes a kind of infosystem, comprises server and mobile terminal apparatus.Server is in order to store a plurality of data and to have speech identifying function.Mobile terminal apparatus comprises voice receiving unit, display unit and data processing unit.Voice receiving unit receives the first input voice and the second input voice.Display unit is in order to show data list.Data processing unit couples voice receiving unit, display unit and server.Data processing unit carries out speech recognition to produce the first word string by server to the first input voice, the first word string is carried out voice are inputted in natural language processing to produce correspondence first the first semantic analysis, and server is selected corresponding part and is sent to data processing unit from these data according to the first semantic analysis.When the quantity of the data of selecting was 1, data processing unit carried out corresponding operation according to the type of selected data.When the quantity of the data of selecting greater than 1 the time, data processing unit shows data list according to the Data Control display unit of selecting, and data processing unit carries out speech recognition to produce the second word string by server to the second input voice, the second word string is carried out natural language processing input the second semantic analysis of voice to produce corresponding second, and select corresponding part and be sent to data processing unit in the data of server foundation the second semantic analysis from data list.

Based on above-mentioned, system of selection and mobile terminal apparatus and the infosystem based on speech recognition of the embodiment of the invention, it is to the first input voice and the second input voice carry out speech recognition and natural language processing is inputted semantic analysis corresponding to voice to confirm the first input voice and second, and according to the first input voice and semantic analysis corresponding to the second input voice data is selected.By this, can promote the convenience of user's operation.

For above-mentioned feature and advantage of the present invention can be become apparent, embodiment cited below particularly, and cooperate appended graphic being described in detail below.

Description of drawings

Fig. 1 is the calcspar according to the natural language understanding system of one embodiment of the invention.

Fig. 2 is according to the natural language processing device of one embodiment of the invention synoptic diagram to the analysis result of user's various solicited messages.

Fig. 3 A is the synoptic diagram according to the stored a plurality of records with specific data structure of the structured database of one embodiment of the invention.

Fig. 3 B is the synoptic diagram of the stored a plurality of records with specific data structure of according to another embodiment of the present invention structured database.

Fig. 3 C is the synoptic diagram that according to another embodiment of the present invention directs data stores the stored directs data of form.

Fig. 4 A is the process flow diagram according to the search method of one embodiment of the invention.

Fig. 4 B is the process flow diagram of the natural language understanding system course of work according to another embodiment of the present invention.

Fig. 5 A is the calcspar of the natural language dialogue system that illustrates according to one embodiment of the invention.

Fig. 5 B is the calcspar of the natural language understanding module that illustrates according to one embodiment of the invention.

Fig. 5 C is the calcspar of the natural language dialogue system that illustrates according to another embodiment of the present invention.

Fig. 6 is the method flow diagram of the correction voice answer-back that illustrates according to one embodiment of the invention.

Fig. 7 A is the calcspar of the natural language dialogue system that illustrates according to one embodiment of the invention.

Fig. 7 B is the calcspar of the natural language dialogue system that illustrates according to another embodiment of the present invention.

Fig. 8 is the natural language dialogue method flow diagram that illustrates according to one embodiment of the invention.

Fig. 9 is the system schematic according to the mobile terminal apparatus of one embodiment of the invention.

Figure 10 is the system schematic according to the infosystem of one embodiment of the invention.

Figure 11 is the process flow diagram based on the system of selection of speech recognition according to one embodiment of the invention.

Figure 12 is the calcspar of the speech control system that illustrates according to one embodiment of the invention.

Figure 13 is the calcspar of the speech control system that illustrates according to one embodiment of the invention.

Figure 14 is the process flow diagram of the speech control method that illustrates according to one embodiment of the invention.

[main element label declaration]

100: natural language understanding system 102,505,705: solicited message

104: analysis result 106: may be intended to syntax data

108: keyword 110: respond the result

112: intention data 114: determine the intention syntax data

116: analysis result output module 200: searching system

220: structured database 240: Search engine

260: Retrieval Interface unit 280: directs data stores form

300: natural language processing device 302: record

304: title block 306: content bar

308: subfield 310: guide the hurdle

312: numeric field 314: the hurdle, source

316: temperature hurdle 400: the knowledge assistance Understanding Module

S410 ~ S450: the step of search method according to an embodiment of the invention

S510 ~ S570: the step of the natural language understanding system course of work according to an embodiment of the invention

500,500 ', 700,700 ': the natural language dialogue system

501,701: phonetic entry 503,703: analysis result

507,707: voice answer-back 509,709: Feature Semantics

510,710: phonetic sampling module 511,711: candidate list

520,520 ', 720,720 ': the natural language understanding module

522,722: sound identification module 524,724: the natural language processing module

526,726: voice synthetic module

S602, S604, S606, S608, S610, S612: each step of method of revising voice answer-back

702: speech synthesis processing module 730: property database

740: database for natural language

S802 ~ S890: each step of natural language dialogue method according to an embodiment of the invention

900,1010: mobile terminal apparatus 910,1011: voice receiving unit

920,1013: data processing unit 930,1015: display unit

940: storage unit 1000: infosystem

1020: server S P1: the first voice

SP2: the second voice

S1100 ~ S1190: according to the process flow diagram based on the system of selection of speech recognition of one embodiment of the invention

1200,1300: speech control system 1210: auxiliary actuating apparatus

1212,1222: wireless transport module 1214: trigger module

1216: wireless charging battery 12162: battery unit

12164: wireless charging module 1220,1320: mobile terminal apparatus

1221: voice system 1224: the phonetic sampling module

1226: voice synthetic module 1227: the voice output interface

1228: communication module 1230:(high in the clouds) server

1232: speech understanding module 12322: sound identification module

12324: speech processing module

Embodiment

Because the known utilization fixedly embodiment of word tabulation can only provide rigid input rule, judgement for the changeable read statement of user is very not enough, so often cause user's intention misjudgment be can not find required information or export unnecessary information to problems such as users because judgment is not enough.In addition, known Search engine can only provide the user and disperse and relevant not strong search result, can filter out information needed so the user also will take time to inspect one by one, not only loses time but also may miss information needed.The present invention namely proposes search method and the system of a structural data for the foregoing problems of known technology, provide specific field to store dissimilar data elements at structural data, when providing the user to use the natural-sounding input message to retrieve, can be fast and the user's that judges rightly intention, and then provide information needed to give the user or provide more accurate message to choose for it.

Fig. 1 is the calcspar according to the natural language understanding system of one embodiment of the invention.As shown in Figure 1, natural language understanding system 100 comprises searching system 200, natural language processing device 300 and knowledge assistance Understanding Module 400, knowledge assistance Understanding Module 400 couples natural language processing device 300 and searching system 200, searching system 200 also comprises structured database 220, Search engine 240 and Retrieval Interface unit 260, and wherein Search engine 240 couples structured database 220 and Retrieval Interface unit 260.In the present embodiment, searching system 200 includes Retrieval Interface unit 260, but non-to limit the present invention, may not have Retrieval Interface unit 260 among some embodiment, and otherwise makes 240 pairs of structured database of Search engine 220 carry out full-text search.

When the user sends solicited message 102 to natural language understanding system 100, but natural language processing device 300 analysis request information 102, and be sent to knowledge assistance Understanding Module 400 in the syntax data 106 that may be intended to that will analyze, wherein may be intended to syntax data 106 and comprise keyword 108 and intention data 112.Subsequently, knowledge assistance Understanding Module 400 takes out and may be intended to the keyword 108 in the syntax data 106 and be sent to searching system 200 and will be intended to data 112 be stored in knowledge assistance Understanding Module 400 inside, and the Search engine 240 in the searching system 200 will carry out after the full-text search according to 108 pairs of structured database of keyword 220, and the response results 110 with full-text search is back to knowledge assistance Understanding Module 400 again.Then, knowledge assistance Understanding Module 400 is compared according to 110 pairs of stored intention data 112 of response results, and definite intention syntax data 114 of trying to achieve is sent to analysis result output module 106, and analysis result output module 116 is again according to determining intention syntax data 114, transmit analysis result 104 to server, after inquiring the required data of user, give the user with it subsequently.

Above-mentioned analysis result output module 116 can combine with other module according to circumstances, for example can incorporate in one embodiment in the knowledge assistance Understanding Module 400 or be located away from another embodiment natural language understanding system 100 and be arranged in server (comprising natural language understanding system), process again so server will directly receive intention syntax data 114.In addition, knowledge assistance Understanding Module 400 can be stored in intention data 112 in the storage device of inside modules, in natural language understanding system 100, in the server (comprising natural language understanding system) or in any reservoir that can capture for knowledge assistance Understanding Module 400, the present invention is not limited this.Moreover, natural language understanding system 100 comprises that searching system 200, natural language processing device 300 and knowledge assistance Understanding Module 400 can construct with the various combinations of hardware, software, firmware or aforesaid way, and the present invention does not also limit this.

Aforementioned natural language understanding system 100 can be arranged in cloud server, also can be arranged in the server of LAN, even be to be positioned at personal computer, mobile computing apparatus (such as notebook computer) or device for mobile communication (such as mobile phone) etc.Each member in natural language understanding system 100 or the searching system 200 also not necessarily need be arranged in the uniform machinery, and visual actual needs is dispersed in different device or system links by various communications protocol.For example, natural language understanding processor 300 and knowledge assistance Understanding Module 400 are configurable in same intelligent mobile phone, and searching system 200 is configurable in another cloud server; Or Retrieval Interface unit 260, natural language understanding processor 300 and knowledge assistance Understanding Module 400 are configurable in same notebook computer, and in Search engine 240 and structured database 220 configurable another servers in LAN.In addition, when natural language understanding system 100 all is positioned at server (no matter being cloud server or LAN server), searching system 200, natural language understanding processor 300 and knowledge assistance Understanding Module 400 can be disposed in the different main frames, and be planned as a whole the transmission of its mutual message and data by the server main system.Certainly, searching system 200, natural language understanding processor 300 and knowledge assistance Understanding Module 400 also visual actual demand and will be wherein both or all be incorporated in the main frame, the present invention does not limit the configuration of this part.

In an embodiment of the present invention, the user can send solicited message to natural language processing device 300 in various manners, such as sending solicited message with modes such as the phonetic entry of speaking or text descriptions.For instance, if natural language understanding system 100 is the server (not shown)s that are arranged in high in the clouds or LAN, then the user can pass through first mobile device (mobile phone for example, PDA, flat computer or similar system) come input request information 102, then by the telecommunication system dealer solicited message 102 is sent to natural language understanding system 100 in the server again, allow natural language processing device 300 carry out the analysis of solicited message 102, last server is after confirming user view, again by analysis result output module 116 with the analysis result 104 of correspondence by after the processing of server, the information that the user is asked is passed user's mobile device back.For instance, solicited message 102 can be the problem (for example the weather of Shanghai " tomorrow how ") that the user wishes to try to achieve by natural language understanding system 100 answer, and natural language understanding system 100 analyze the user be intended that the weather of inquiry Shanghai tomorrow the time, will give the user with the weather data of inquiring about as Output rusults 104 by analysis result output module 116.In addition, if the user to natural language understanding system 100 under instruction be " I will see and allow bullet fly ", when " I want to listen the date of passing by together ", because " allow bullet fly " or " date of passing by together " may comprise different fields, so natural language processing device 300 can be parsed into user's solicited message 102 and one or morely may be intended to syntax data 106, this may be intended to syntax data 106 and include keyword 108 and intention data 112, and then via after the structural data 240 in the searching system 220 is carried out full-text search, and then affirmation user's intention.

Furthermore, when user's solicited message 102 is " how tomorrow Shanghai time the, natural language processing device 300 by analysis after, can produce one and may be intended to syntax data 106:

"＜queryweather 〉,＜city 〉=Shanghai,＜the time 〉=tomorrow ".

In one embodiment, if natural language understanding system 100 thinks that user's intention is quite clear and definite, just can be directly with user's intention (that is the weather in inquiry Shanghai tomorrow) by analysis result output module 116 output analysis results 104 to server, and server can send the user to inquiring the specified sky weather of user.Again for example, when user's solicited message 102 during for " I will see the The Romance of the Three Kingdoms ", natural language processing device 300 by analysis after, can produce three and may be intended to syntax data 106:

"＜readbook 〉,＜bookname 〉=The Romance of the Three Kingdoms ";

"＜watchTV 〉,＜TVname 〉=The Romance of the Three Kingdoms "; And

"＜watchfilm 〉,＜filmname 〉=The Romance of the Three Kingdoms ".

This is because the keyword 108 (that is " The Romance of the Three Kingdoms ") that may be intended in the syntax data 106 may belong to different fields, that is three fields of books (＜readbook 〉), TV play (＜watchTV 〉) and film (＜readfilm 〉), so solicited message 102 can be parsed into and a plurality ofly may be intended to syntax data 106, therefore need to be further analyzed by knowledge assistance Understanding Module 400, confirm user's intention.Again for another example, if during user's input " I will see and allow bullet fly ", because wherein " allowing bullet fly " might be that movie name or title claim, may be intended to syntax data 106 so also may occur following at least two:

"＜readbook 〉,＜bookname 〉=allow bullet fly "; And

"＜watchfilm 〉,＜filmname 〉=allow bullet fly ";

It belongs to respectively books and two fields of film.The above-mentioned syntax data 106 that may be intended to needs to be further analyzed by knowledge assistance Understanding Module 400 subsequently, and therefrom tries to achieve and determine intention syntax data 114, expresses the clearly intention of user's solicited message.When knowledge assistance Understanding Module 400 analysis may be intended to syntax data 106, knowledge assistance Understanding Module 400 can transmit keywords 108 (example " The Romance of the Three Kingdoms " described above or " allow bullet fly ") to searching system 200 by Retrieval Interface 206.Structured database 220 in the searching system 200 has stored a plurality of records with specific data structure, and Search engine 240 can come by the keyword 108 that Retrieval Interface unit 260 receives structured database 220 is carried out full-text search, and the response results that full-text search obtains returned to knowledge assistance Understanding Module 400, knowledge assistance Understanding Module 400 just can be tried to achieve by this response results 110 and be determined intention syntax data 114 subsequently.As for structured database 220 being carried out full-text search to determine the details of intention syntax data 114, will do more detailed description by Fig. 3 A, Fig. 3 B and relevant paragraph in the back.

In concept of the present invention, natural language understanding system 100 can capture first the keyword 108 in user's the solicited message 102, and differentiate the domain attribute of keyword 108 by the full-text search result of structured database 220, during for example above-mentioned input " I will see the The Romance of the Three Kingdoms ", can produce belong to respectively books, TV play, three fields of film may be intended to syntax data 106, further analyze and confirm again subsequently user's clearly intention.Therefore the user can give expression to its intention or information in the colloquial style mode very like a cork, and does not need to learn by heart especially particular terms, the particular terms of for example tabulating about fixing word in the known practice.

Fig. 2 is the synoptic diagram according to the analysis result of 300 couples of users' of natural language processing device of one embodiment of the invention various solicited messages.

As shown in Figure 2, when user's solicited message 102 is the weather of Shanghai " tomorrow how ", natural language processing device 300 by analysis after, can produce and may be intended to syntax data 106 and be:

"＜queryweather 〉,＜city 〉=Shanghai,＜the time 〉=tomorrow "

Wherein be intended to data 112 and be " Shanghai " and " tomorrow " for "＜queryweather〉" keyword 108.Owing to after the analysis of natural language processing device 300, only obtain one group of intention syntax data 106 (inquiry weather＜queryweather 〉), therefore in one embodiment, knowledge assistance Understanding Module 400 can directly take out and keyword 108 " Shanghai " and " tomorrow " be sent to server as analysis result 104 and inquire about the information of weather and (for example inquire about Shanghai weather overview tomorrow, comprise meteorology, temperature ... etc. information), do not judge user view and do not need that structured database 220 is carried out full-text search.Certainly, in one embodiment, still can carry out full-text search to structured database 220 and do more accurate user view judgement, have the knack of skill person of the present invention and can change according to actual demand.

In addition, when user's solicited message 102 is " I will see and allow bullet fly ", may be intended to syntax data 106 because can produce two:

"＜readbook 〉,＜bookname 〉=allow bullet fly "; And

"＜watchfilm 〉,＜filmname 〉=allow bullet fly ";

The keyword 108 identical with two corresponding intention data 112 "＜readbook〉" and "＜watchfilm〉" and two " allows bullet fly ", represents that its intention may be the film of seeing the books of " allowing bullet fly " or seeing " allowing bullet fly ".For further confirming user's intention, to transmit keyword 108 by knowledge assistance Understanding Module 400 " allows bullet fly " to Retrieval Interface unit 260, then Search engine 240 just " allows bullet fly " by this keyword 108 to come structured database 220 is carried out full-text search, to confirm that " allowing bullet fly " should be that title claims or movie name, use the intention of confirming the user.

Moreover, when user's solicited message 102 is " I want to listen a date of passing by together ", can produces two and may be intended to syntax data 106:

"＜playmusic 〉,＜singer 〉=pass by together＜songname 〉=date "; "＜playmusic 〉,＜songname 〉=date of passing by together "

The identical intention data 112 of two correspondences "＜playmusic〉", and two groups of corresponding keywords 108 " are passed by together " and are reached " date of passing by " with " date ", represent that respectively its intention may be the song " date " of listening the singer " to pass by together " and sing, or listen song " date of passing by together ", knowledge assistance Understanding Module 400 can transmit first group of keyword 108 and " passed by together " with " date " and second group of keyword " date of passing by " to Retrieval Interface unit 260 this moment, be confirmed whether " date " this first song (first group of user view that keyword implied) that " passing by together " this singer sings, or no " date of passing by together " this first song (second group of user view that keyword imply) arranged, use affirmation user's intention.Yet the present invention is not limited to may be intended to syntax data and the intention corresponding form of data and title in this represented each.

Fig. 3 A is the synoptic diagram according to the stored a plurality of records with specific data structure of the structured database 220 of one embodiment of the invention.

Generally speaking, in some known full-text search practices, the search result that obtains is non-structured data (results that for example search by google or Baidu), because the every terms of information of its search result is to disperse and do not have an association, so the user must inspect every terms of information more one by one, therefore cause the restriction of practicality.Yet, in concept of the present invention, can effectively promote effectiveness of retrieval and correctness by structured database.Because the inner numeric data that comprises of each record in the disclosed structured database of the present invention has relevance each other, and these numeric datas are jointly in order to express from user's the solicited message intention to this record.So when Search engine carries out a full-text search to structured database, can be in the numeric data of record when being mated, output corresponding to the directs data of this numeric data to confirm the intention of this solicited message.The implementation detail of this part will be done further to describe by following example.

In an embodiment of the present invention, structured database 220 each stored record 302 comprise title block 304 and content bar 306, comprise a plurality of subfields 308 in the title block 304, each subfield comprises guide hurdle 310 and numeric field 312, the guide hurdle 310 of described a plurality of record 302 is in order to storing directs data, and the numeric field of described a plurality of record 302 with 312 with the numerical value storage data.Illustrate with the record 1 shown in Fig. 3 A at this, each subfield 308 in the title block 304 of record 1 has stored respectively:

" singerguid: Liu Dehua ",

" songnameguid: the date of passing by together "; And " songtypeguid: Hong Kong and Taiwan, Guangdong language, popular ";

The guide hurdle 310 of each subfield 308 has stored respectively numeric field 312 that directs data " singerguid ", " songnameguid " reach " songtypeguid " its corresponding subfield 308 and has then stored respectively numeric data " Liu Dehua ", " date of passing by together " and reach " Hong Kong and Taiwan; Guangdong language, popular ".The field kind that directs data " singerguid " represents numeric data " Liu Dehua " is singer's title (singer), the field kind that directs data " songnameguid " represents numeric data " date of passing by together " is song title (song), directs data " songtypeguid " represents numeric data, and the field kind of " Hong Kong and Taiwan; Guangdong language, popular " is types of songs (song type).In fact each directs data at this can represent with different specific string number or character respectively, in the present invention not as limit.Record 1 306 of content bar are to have stored the data that perhaps store other in lyrics of " date of passing by together " this first song (composition/word person for example ... Deng), yet the True Data in the content bar 306 of each record is not the emphasis that the present invention emphasizes, therefore only schematically describes it in Fig. 3 A.

Among the aforesaid embodiment, each record comprises title block 304 and content bar 306, and the subfield 308 in the title block 304 comprises guide hurdle 310 and numeric field 312, but non-to limit the present invention, can there be content bar 306 among some embodiment, even be not guide hurdle 310 among some embodiment yet.

In addition, in an embodiment of the present invention, in between the data of each subfield 308, store the data that the first special character is separated each subfield 308, between the data of guiding hurdle 310 and this numeric field 312, store the second special character and separate the data of guiding hurdle and numeric field.For instance, as shown in Figure 3A, " singerguid " between " Liu Dehua ", " songnameguid " and between " date of passing by " and " songtypeguid " and " Hong Kong and Taiwan; Guangdong language; popular " between be to utilize the second special character ": " do separation, be to utilize the first special character " | " to do separation and record 308 of each subfields of 1, yet the present invention is not limited to come as the special character in order to separate with ": " or " | ".

On the other hand, in an embodiment of the present invention, each subfield 308 in the title block 304 can have fixedly figure place, for example the fixedly figure place of each subfield 308 can be 32 characters, and the fixedly figure place on guide hurdle 310 wherein can be 7 or 8 positions (are used at most guiding 128 or 256 kind of different directs data), in addition, cause the first special character and the needed figure place of the second special character can be fixed, so the fixedly figure place of subfield 308 is guided hurdle 310 at deduction, the first special character, after the figure place that the second special character accounts for, remaining figure place just can all be used for the numeric data on numerical value storage hurdle 312.Moreover, because the figure place of subfield 308 is fixed, the content that adds subfield 308 storage datas can be sequentially guide hurdle 310 (pointer of directs data) as shown in Figure 3A, the first special character, the numeric data of numeric field 312, the second special character, and as previously mentioned, the bit quantity of these four data is also fixed, so can skip on the implementation the position (for example skipping front 7 or 8 positions) of guiding hurdle 310, and the figure place of the second special character (is for example skipped 1 character again, that is 8 positions) after, deduct again the shared figure place of the first special character (last 1 character for example, 8 positions) afterwards, the last numeric data that just can directly obtain numeric field 312 (for example in first subfield 308 of record 1, directly take out numeric data " Liu Dehua "), then carry out again required field kind judgement and get final product.So, after present numeric data comparison of taking out is complete (no matter whether comparing success or not), the numeric data (for example in second subfield 308 of record 1, directly take out numeric data " date of passing by together ") of can be again taking out next subfield 308 according to the mode of above-mentioned taking-up numeric data, the comparison of the field kind of comparing.The mode of above-mentioned taking-up numeric data can begin to compare from recording 1, and after having compared all numeric datas of record 1, takes out the numeric data (for example " Feng Xiaogang ") of first subfield 308 in the title block 308 of record 2 again and compares.Above-mentioned comparison program will continue to carry out, until the numeric data of all records all was compared.

It should be noted that the figure place of above-mentioned subfield 308 and guide hurdle 310, the first special character, the second special character figure place of using can change according to practical application, the present invention is not limited this.The mode that aforementioned utilization compares to take out numeric data is a kind of embodiment, but non-in order to limit the present invention, another embodiment can carry out with the mode of full-text search.In addition, the above-mentioned implementation mode of guiding hurdle 310, the second special character, the first special character of skipping, can use a translation (for example division) to reach, the enforcement of this part can be carried out with hardware, software or both modes of collocation, has the knack of skill person of the present invention and can change according to the meter actual demand.In another embodiment of the present invention, each subfield 308 in the title block 304 can have fixedly figure place, guide hurdle 310 in the subfield 308 can have another fixedly figure place, and can not comprise the first special character and the second special character in the title block 304, because each subfield 308 and the figure place of respectively guiding hurdle 310 be for fixing, so can utilize the mode of skipping particular number of bits or the mode of using position translation (for example division) directly to take out directs data or numeric data in each subfield 308.

It should be noted, because the front has been mentioned subfield 308 and has been had certain figure place, thus can be in natural language understanding system 100 (or comprising in the server of natural language understanding system 100) usage counter to record what compare at present be certain subfield 308 of a certain record.In addition, the record of comparison also can store its order with another counter.For instance, when the record order that represents to compare at present with one first counter records respectively, and when representing that with one second counter the subfield compared at present sequentially, if at present comparison be the 3rd subfield 308 (that is comparison " filenameguid: magnificent friendship brother ") of the record 2 of Fig. 3 A the time, the stored numerical value of the first counter will be 2 (what represent at present comparison is record 2), and the stored numerical value of the second counter then is 3 (what represent present comparison is the 3rd subfield 308).Moreover, the above-mentioned mode that only stores the directs data of guiding hurdle 310 with 7 or 8 positions, system wishes that the most numerical digit with subfield 308 all is used for the numerical value storage data, actual directs data then can be used as pointer by these 7,8 positions, reads actual directs data according to this from the stored directs data form 280 of searching system 220 again.So, when practical operation, except can directly taking out numeric data compares, also can be when producing matching result, the numerical value of above-mentioned two counters of direct basis directly takes out directs data and gives knowledge assistance Understanding Module 400 as response results 110.For instance, when record 6 the 2nd subfield 308 (that is " songnameguid: betray ") when the match is successful, the numerical value of learning the first present counter/second counter is respectively 6 and 2, therefore can go to according to these two numerical value to store the directs data shown in Fig. 3 C and store form 280, by the subfield 2 of record 6 inquire directs data for " songnameguid ".In one embodiment, after the position tree of subfield 308 can being fixed, all positions with subfield 308 all are used for the numerical value storage data again, guide hurdle, the first special character, the second special character so can remove fully, as long as and Search engine 240 is known and is crossed whenever fixedly that figure place is exactly another subfield 308, and in the second counter, add and one get final product (certainly, also need when whenever changing next record and retrieving the storage values of the first counter is added one), can provide so more figure place to come the numerical value storage data.

Cite an actual example again to illustrate when comparison produces matching result that passback matched record 110 is done the process of further processing to knowledge assistance Understanding Module 400.Corresponding to the data structure of above-mentioned record 302, in an embodiment of the present invention, when user's solicited message 102 is " I will see and allow bullet fly ", can produces two and may be intended to syntax data 106:

"＜readbook 〉,＜bookname 〉=allow bullet fly "; With

"＜watchfilm 〉,＜filmname 〉=allow bullet fly ";

The keyword 108 that Search engine 240 just receives by Retrieval Interface unit 260 " allows bullet fly " to come the title block 304 to the stored record of the structured database 220 of Fig. 3 A to carry out full-text search.In the full-text search, in title block 304, found to store the record 5 that numeric data " allows bullet fly ", therefore produced matching result.Next, searching system 200 will return in record 5 title blocks 304, corresponding to keyword 108 " allow bullet fly " directs data " filmnameguid " as matched record 110 and be back to knowledge assistance Understanding Module 400.Because in the title block of record 5, comprise the directs data " filmnameguid " that corresponding numeric data " allows bullet fly ", so the directs data " filmnameguid " of knowledge assistance Understanding Module 400 by comparison record 5 may be intended to the intention data 112 that syntax data 106 before stored "＜watchfilm〉" or "＜readbook〉" with above-mentioned, just the definite intention syntax data 114 that can judge this solicited message is "＜watchfilm 〉,＜filmname 〉=allow bullet fly " (because all comprising " film " therein).In other words, this time described in user's the solicited message 102 data " to allow bullet fly " be movie name, and data user's solicited message 102 be intended to see a film " allowing bullet fly ", but not read books.

Cite an actual example again and be further described.When user's solicited message 102 is " I want to listen a date of passing by together ", can produces two and may be intended to syntax data 106:

"＜playmusic 〉,＜singer 〉=pass by together＜songname 〉=date "; With

"＜playmusic 〉,＜songname 〉=date of passing by together ";

Two groups of keywords 108 that Search engine 240 just receives by Retrieval Interface unit 260:

" pass by together " and " date "; And

" date of passing by together "

Come the title block 304 of the stored record of the structured database 220 of Fig. 3 A is carried out full-text search.Because in the full-text search, in the title block 304 of all records, do not find corresponding to " pass by together " matching result with " date " of first group of keyword 108, but found corresponding to second group of keyword 108 record 1 on " date of passing by together ", so searching system 200 will record in 1 title block 304 directs data " songnameguid " corresponding to second group of keyword 108, as matched record 110 and be back to knowledge assistance Understanding Module 400.Next, knowledge assistance Understanding Module 400 is after the directs data " songnameguid " that receives corresponding numeric data " date of passing by together ", just with may be intended to syntax data 106 (that is "＜playmusic 〉;＜singer 〉=pass by together;＜songname 〉=date " with "＜playmusic;＜songname 〉=intention data 112 in date ") of passing by together (that is＜singer,＜songname〉etc.) compare, so just find to describe the data that singer's title is arranged in this user's the solicited message 102, song title is arranged is the data (because only have＜songname〉compare successfully) on " date of passing by together " but describe.So, definite intention syntax data 114 that knowledge assistance Understanding Module 400 can be judged this solicited message 102 by above-mentioned comparison for "＜playmusic 〉;＜songname 〉=date of passing by together ", and user's solicited message 102 be intended to listen a song " date of passing by together ".

In another embodiment of the present invention, retrieval and matched record 110 can be the full matched record of mate fully with keyword 108 or the part matched record of mating with keyword 108 parts.For instance, if user's solicited message 102 is " I want listen Xiao Jing to rise betrayal ", similarly, natural language processing device 300 by analysis after, produce two and may be intended to syntax data 106:

"＜playmusic 〉,＜singer 〉=Xiao Jingteng,＜songname 〉=betray "; And "＜playmusic 〉,＜songname 〉=betrayal of Xiao Jingteng ";

And transmit two groups of keywords 108:

" Xiao Jingteng " and " betrayal "; And

" betrayal of Xiao Jingteng ";

Give Retrieval Interface unit 260, the keyword 108 that Search engine 240 then receives by Retrieval Interface unit 260 comes the title block 304 of the stored record 302 of the structured database 220 of Fig. 3 A is carried out full-text search.Because in full-text search, corresponding second group of keyword 108 " betrayal of Xiao Jingteng " do not match any record, but corresponding first group of keyword 108 " Xiao Jingteng " and " betrayals " found record 6 with the matching result that records 7.Since second group of keyword 108 " Xiao Jingteng " with " betrayals " only with the numeric data of record in 6 " Xiao Jingteng is complementary; reach " Cao's lattice " and match other numeric data " Yang Zongwei "; therefore record 6 and be part matched record (please noting that the record 5 of above-mentioned corresponding requests information 102 " I will see and allow bullet fly " and the record 1 of corresponding requests information " I want to listen a date of passing by together " are all the part matched record); and keyword " Xiao Jingteng " and " betrayals " mated record fully 7 numeric data (because of second group of keyword 108 " Xiao Jingteng " and " betrayal " all the match is successful), be complete matched record so record 7.In an embodiment of the present invention, when this Retrieval Interface unit 260 is exported a plurality of matched record 110 to knowledge assistance Understanding Module 400, can sequentially export the matched record 110 of full matched record (that is whole numeric datas is all mated) and part matched record (that is only having the numeric data of part to be mated), wherein the priority of full matched record is greater than the priority of part matched record.Therefore, in the Retrieval Interface unit during matched record 110 of 260 output records 6 and record 7, the output priority of record 7 can be greater than the output priority of record 6, all produce matching result because record 7 whole numeric datas " Xiao Jingteng " with " betrayal ", also comprise " Yang Zongwei " and " Cao's lattice " and do not produce a result but record 6.That is to say that record stored in the structured database 220 is higher to the matching degree of the keyword 108 in its solicited message 102, preferentially be output easilier, so that the user consults or select corresponding definite intention syntax data 114.In another embodiment, the corresponding matched record 110 of record that directly output priority is the highest is as the usefulness of determining intention syntax data 114.Aforementioned non-to limit the present invention, as long as because (for example may take in another embodiment to search mode that matched record namely exports, take " I think the betrayal that tin Xiao Jing rises " as solicited message 102, when retrieving record 6 when namely producing matching result, namely the directs data of output record 6 correspondences is done matched record 110), and do not comprise the ordering of priority, to accelerate the speed of retrieval.In another embodiment, record that can be the highest to priority is directly carried out its corresponding processing mode and is provided and gives the user.For example when priority the highest when playing the film of the The Romance of the Three Kingdoms, can play-over film and user.In addition, if priority the highest rise the betrayal of performance for Xiao Jing the time, can be directly with this playback of songs and user.It should be noted that the present invention is limited this at this for illustrative purposes only.

In an again embodiment of the present invention, if user's solicited message 102 is " I will listen the betrayal of Liu De China ", then its may be intended to syntax data 106 one of them be:

"＜playmusic 〉,＜singer 〉=Liu Dehua,＜songname 〉=betray ";

If Retrieval Interface unit 260 can't find any matching result with keyword 108 " Liu Dehua " and " betrayal " input Search engine 240 in the database of Fig. 3.In another embodiment of the present invention, Retrieval Interface unit 260 can be respectively with keyword 108 " Liu Dehua " and " betrayal " input Search engine 240, and to try to achieve " Liu Dehua " be that singer's title (directs data singerguid) and " betrayal " are song title (directs data songnameguid, and the singer may be Cao's lattice or Xiao Jingteng, Yang Zongwei and the chorus of Cao's lattice) to correspondence respectively.At this moment, natural language understanding system 100 is reminding user further: " whether be Xiao Jingteng sing (according to the matching result of record 7) " if betraying this song, perhaps, " whether be Xiao Jingteng, Yang Zongwei and Cao's lattice chorus (according to the matching result of record 6) ".

In an again embodiment of the present invention, structured database 220 stored records can also include hurdle 314, source and temperature hurdle 316.Database shown in Fig. 3 B, it also comprises hurdle 314, source and temperature hurdle 316 except every field of Fig. 3 A.Hurdle, source 314 of each record is to come from the source value which structured database (structured data storehouse 220 only in this is graphic, and in fact can have how different structured database) or which user, server provide in order to store this record.And, the hobby that natural language understanding system 100 can be divulged in request message 102 before according to the user, retrieve the structured database (when for example carrying out full-text search generation coupling with the keyword in the solicited message 102, just the temperature value of this record being added) of particular source.And respectively record 302 temperature hurdle 316 in order to store search temperature value or the popular degree value (for example this is recorded in the special time by single user, specific user group, all users' matching times or probability) of this record 302, the reference when judging user view for knowledge assistance Understanding Module 400.In detail and opinion, when user's solicited message 102 during for " I will see the The Romance of the Three Kingdoms ", natural language processing device 300 by analysis after, can produce and a plurality ofly may be intended to syntax data 106:

"＜readbook 〉,＜bookname 〉=The Romance of the Three Kingdoms ";

"＜watchTV 〉,＜TVname 〉=The Romance of the Three Kingdoms "; And

"＜watchfilm 〉,＜filmname 〉=The Romance of the Three Kingdoms ".

If natural language understanding system 100 (for example utilize store these notes by temperature hurdle 316 record 302 by number of times that certain user clicked) in the historical record of user's solicited message 102, count the request of its major part for seeing a film, then natural language understanding system 100 can be done retrieval (the source value in the hurdle 314 of originating this moment for the structured database that stores the film record, the code that record stores the structured database of film record), thereby can preferentially judge "＜watchfilm 〉,＜filmname 〉=The Romance of the Three Kingdoms " is to determine intention syntax data 114.For instance, also can record 302 at each in one embodiment and be mated once, just can add one on the temperature hurdle 316 of back, as user's historical record.So at foundation keyword 108 " The Romance of the Three Kingdoms " when doing full-text search, can from all matching results, select the highest record 302 of numerical value in the temperature hurdle 316, as the usefulness of judging user view.In one embodiment, if natural language understanding system 100 is in the result for retrieval of keyword 108 " The Romance of the Three Kingdoms ", judge corresponding " The Romance of the Three Kingdoms " this to go out the stored search temperature value in the temperature hurdle 316 of TV Festival goal record the highest, then just can preferentially judge "＜watchTV 〉,＜TVname 〉=The Romance of the Three Kingdoms " is to determine intention syntax data 114.In addition, above-mentioned alter mode to temperature hurdle 316 stored numerical value can change by the computer system at natural language understanding system 100 places, and the present invention is not limited this.In addition, the numerical value on temperature hurdle 316 also can successively decrease in time, with the expression user temperature of a certain record 302 reduced gradually, the present invention is not also limited this part.

Lift again another example, in another embodiment, because the user may miss potter the TV play of seeing the The Romance of the Three Kingdoms in certain period, the user can't finish watching the short time because the length of TV play may be very long, therefore in the short time, may repeat to click (supposing that every coupling once just adds one with the numerical value in the temperature hurdle 316), therefore cause certain record 302 to be repeated coupling, this part all can be learnt by the data of analyzing temperature hurdle 316.Moreover in another embodiment, telecommunication operator also can utilize temperature hurdle 316 to represent data that a certain source provides by the temperature of taking, and these data supplier's coding can store with hurdle 314, source.For instance, if certain position supply " The Romance of the Three Kingdoms TV play " the probability that clicks of supplier's quilt the highest, so when certain user input " I will see the The Romance of the Three Kingdoms " solicited message 102 time, although when the database to Fig. 3 B carries out full-text search, can find the books (record 8) of reading the The Romance of the Three Kingdoms, watch The Romance of the Three Kingdoms TV play (record 9), watch three matching results of The Romance of the Three Kingdoms film (record 10), but because the data in the temperature hurdle 316 show that watching The Romance of the Three Kingdoms TV play is that the most popular option (that is records 8 now, 9, the numerical value on 10 temperature hurdle is respectively 2,5,8), so being done matched record 110, the directs data that record 10 is provided first exports knowledge assistance understanding system 400 to, as the override option of judging user view.In one embodiment, can be simultaneously the data on hurdle 314, source be shown to the user, allow the user judge that he wants the TV play of watching whether to be provided by certain supplier.It should be noted, above-mentioned to source hurdle 314 stored data with and alter mode, also can change by the computer system at natural language understanding system 100 places, the present invention is not limited this.

Significantly, the inner numeric data that comprises of each record in the disclosed structured database of the present invention have each other relevance (for example recording the numeric data " Liu Dehua " in 1; " the date of passing by together "; " Hong Kong and Taiwan, Guangdong language, popular " all be to describe record 1 feature); and these numeric datas jointly in order to express from user's solicited message to the intention of this record (for example to " date of passing by together " when producing matching result; expression user's intention may be the data access to recording 1); so when Search engine carries out full-text search to structured database; can be when the numeric data of record is mated; output is corresponding to the directs data of this numeric data (for example output " songnameguid " as response results 110), and then confirm the intention (for example in the knowledge assistance Understanding Module, comparing) of this solicited message.

Disclose or the content of teaching based on above-mentioned example embodiment, Fig. 4 A is the process flow diagram according to the search method of one embodiment of the invention.See also Fig. 4, the search method of embodiments of the invention may further comprise the steps:

Provide structured database, and structured database stores a plurality of records (step S410);

Receive at least one keyword (step S420);

Come the title block of a plurality of records is carried out full-text search (step S430) by keyword.For instance, allow the title block 304 of the stored a plurality of records 302 of 240 pairs of structured database of Search engine 220 carry out full-text search keyword 108 input Retrieval Interface unit 260, can carry out such as the retrieval mode that Fig. 3 A or Fig. 3 B are carried out or the mode that does not change its spirit as for retrieval mode;

Judge whether full-text search has matching result (step S440).For instance, judge by Search engine 240 whether these keyword 108 corresponding full-text searches have matching result; And

If matching result is arranged, sequentially export full matched record and part matched record (step S450).For instance, if this keyword 108 of record matching is arranged in the structured database 220, then sequentially full matched record and the directs data in the part matched record (can store form 280 by the directs data to Fig. 3 C obtains) of this keyword 108 of output matching are sent to knowledge assistance understanding system 400 as matched record 110 in Retrieval Interface unit 260

Wherein the priority of full matched record is greater than the priority of part matched record.

Aforesaid process step is non-, and some step is to ignore or to remove to limit the present invention, for example, in another embodiment of the present invention, can come execution in step S440 by being arranged in searching system 200 outer matching judgment modules (not being illustrated in figure); Or in another embodiment of the present invention, can ignore above-mentioned steps S450, its action of sequentially exporting full matched record and part matched record can by being arranged in the matching result output module (not being illustrated in figure) outside the searching system 200, sequentially export among the execution in step S450 action of full matched record and part matched record.

Disclose or the content of teaching based on above-mentioned example embodiment, Fig. 4 B is the process flow diagram of natural language understanding system 100 courses of work according to another embodiment of the present invention.See also Fig. 4 B, natural language understanding system 100 courses of work of another embodiment of the present invention may further comprise the steps:

Receive solicited message (step S510).For instance, user's solicited message 102 that will have voice content or a word content is sent to natural language understanding system 100;

Provide structured database, and structured database stores a plurality of records (step S520);

With solicited message grammaticalization (step S530).For instance, after the solicited message 102 of natural language processing device 300 analysis user, and then transfer to and corresponding may be intended to syntax data 106;

Distinguish the possible attribute (step S540) of keyword.For instance, knowledge assistance Understanding Module 400 picks out the possible attribute that may be intended at least one keyword 108 in the syntax data 106, and for example, keyword 108 " The Romance of the Three Kingdoms " may be book, film and TV programme;

Come the title block 304 of a plurality of records is carried out full-text search (step S550) by keyword 108.For instance, keyword 108 input Retrieval Interface unit 260 are allowed the title block 304 of the stored a plurality of records of 240 pairs of structured database of Search engine 220 carry out full-text search;

Judge whether full-text search has matching result (step S560).For instance, judge by Search engine 240 whether these keyword 108 corresponding full-text searches have matching result;

If matching result is arranged, sequentially export full matched record and the corresponding directs data of part matched record (step S570) is matched record 110.For instance, if this keyword 108 of record matching is arranged in the structured database 220, then Retrieval Interface unit 260 sequentially full matched record and the corresponding directs data of part matched record of this keyword 108 of output matching be matched record 110,

Wherein the priority of full matched record is greater than the priority of part matched record; And

Definite intention syntax data (step S580) of sequentially output correspondence.For instance, knowledge assistance Understanding Module 400 is used definite intention syntax data 114 corresponding to output by full matched record and the part matched record of sequentially output.

Aforesaid process step is non-to limit the present invention, and some step is to ignore or to remove.

In sum, the present invention is by taking out user's the included keyword of solicited message, and the title block for the record with data structure in the structured database carries out full-text search, if generation matching result, just can judge the affiliated field kind of keyword, use and determine that the user is in the represented intention of solicited message.

Next in speech recognition, should be used as more explanation for above structured database.At first in the natural language dialogue system, the voice answer-back that corrects mistakes according to user's phonetic entry, and further find out the application that other possible answer offers the user back and forth and explain.

Although device for mobile communication now can provide the natural language dialogue function, link up with device for mobile communication to allow the user send voice.Yet at present speech dialogue system, when user's phonetic entry is indefinite, because the phonetic entry of same sentence may mean a plurality of different intentions or purpose, so system can export the voice answer-back that does not meet phonetic entry easily.Therefore in a lot of dialogue scenes, the user is difficult to obtain meeting the voice answer-back of its intention.For this reason, the present invention proposes a kind of method and natural language dialogue system of revising voice answer-back, and wherein voice answer-back to mistake can be repaiied according to user's phonetic entry by the natural language dialogue system, and further finds out other possible answer and offer back and forth the user.In order to make content of the present invention more clear, below the example that really can implement according to this as the present invention especially exemplified by embodiment.

Fig. 5 A is the calcspar of the natural language dialogue system that illustrates according to one embodiment of the invention.Please refer to Fig. 5 A, natural language dialogue system 500 comprises phonetic sampling module 510, natural language understanding module 520 and speech database for speech synthesis 530.In one embodiment, phonetic sampling module 510 is in order to receive phonetic entry 501 (for example from user voice), subsequently it is resolved and produce analysis result 503, and natural language understanding module 520 can be resolved and obtains wherein solicited message 505 analysis result 503 AndAfter finding the answer that meets solicited message 505,, export again the voice of inquiring about and give the user as the voice answer-back 507 corresponding to phonetic entry 501 by speech database for speech synthesis 130 being carried out corresponding speech polling according to this answer.Wherein, when if the voice answer-back 505 that natural language understanding module 520 has been done does not meet solicited message 505 in the phonetic entry 501 (for example the user inputs another phonetic entry and indicates this thing), natural language understanding module 520 can be revised answer originally, and exports another voice answer-back 507 and give the user.

Each member in the aforementioned natural language dialogue system 500 is configurable in uniform machinery.For example, phonetic sampling module 510 for example is to be disposed at same electronic installation with natural language understanding module 520.Wherein, electronic installation can be mobile phone (Cell phone), personal digital assistant (Personal Digital Assistant, PDA) device for mobile communication, palmtop computer (Pocket PC), Tablet PC (Tablet PC), notebook computer, personal computer or other electronic installation that possesses communication function or bitcom is installed such as mobile phone, intelligent mobile phone (Smart phone) do not limit its scope at this.In addition, above-mentioned electronic installation can use Android operating system, microsoft operating system, Android operating system, (SuSE) Linux OS etc., is not limited to this.Certainly, each member in the aforementioned natural language dialogue system 500 also not necessarily need be arranged in the uniform machinery, and can be dispersed in different device or system and link by various communications protocol.For example, natural language understanding module 520 can be arranged in cloud server, also can be arranged in the server of LAN.In addition, each member in the natural language understanding module 520 also can be dispersed in different machines, and each member in for example natural language understanding module 520 can be positioned at the machine identical or different with phonetic sampling module 510.

In the present embodiment, phonetic sampling module 510 is in order to receive phonetic entry, and this phonetic sampling module 510 can be the device of the reception message such as microphone (Microphone), and phonetic entry 501 can be the voice from the user.

Natural language understanding module 520 can receive the phonetic entry 501 of transmitting from phonetic sampling module 510, produces analysis result 503 so that phonetic entry 501 is resolved.And, natural language understanding module 120 can produce at least one candidate list that comprises at least one candidate answers 111 according to the one or more Feature Semantics 505 in the analysis result 503 (such as the keyword 108 of mentioning among Figure 1A etc.), and then find out the answer that meets Feature Semantics 109 from these candidate answers, and then output voice answer-back 107 gives the user.Because behind voice answer-back 105 outputs and the user, he/her may think at present new answer the demand that does not meet him or need that again input is more selects (for example voice answer-back 105 outputs is that a plurality of options require the user fine horse further to select) ... Deng, so the user can input another voice again.Be with, if the user inputs another voice, natural language understanding module 520 then another phonetic entry 501 of inputting of User judges whether the previous voice answer-back of exporting 507 correct; If NO, then natural language understanding module 520 also can be found out another candidate answers from above-mentioned candidate list 511, and produces according to this new voice answer-back 507 and provide and give the user.The details of this part will be done further by Fig. 5 B and describe.

In addition, the natural language understanding module 520 of present embodiment can be come implementation by the hardware circuit that or several logic locks combine.Perhaps, in another embodiment of the present invention, natural language understanding module 520 can be come implementation by computer program code.For instance, natural language understanding module 520 is such as being to be implemented into application program, operating system or driver etc. by the code segment that program language is write, and these code segment are stored in the storage element, and carry out it by processing unit.In order to make those skilled in the art further understand the natural language understanding module 520 of present embodiment, under give an actual example to describe.So, the present invention only for illustrating, not as limit, such as using hardware, software, firmware or the modes such as mix and match of these three kinds of embodiments, all can use to implement the present invention at this.

Fig. 5 B is the calcspar of the natural language understanding module 520 that illustrates according to one embodiment of the invention.Please refer to Fig. 5 B, the natural language understanding module 520 of present embodiment can comprise sound identification module 522, natural language processing module 524 and voice synthetic module 526.Wherein, sound identification module 522 can receive the analysis result 503 that phonetic entry 501 is resolved that transmits from phonetic sampling module 510, and converts one or more Feature Semantics 509 (such as the keyword 108 of Figure 1A or words and expressions etc.) to.524 of natural language processing modules can be resolved these Feature Semantics 509 again, (for example carry out full-text search by 200 pairs of structured database of searching system 220 of Figure 1A and obtain at least one candidate list 511, and obtaining response 110 and rear generation of intention data 112 comparisons determined intention syntax data 114, and can from candidate list 511, select an answer that meets phonetic entry 501 (for example to select full matched record as the repayment answer analysis result 104 of being sent by analysis result output module 116 at last) ... Deng).Owing to this repayment answer is the answer that natural language understanding module 520 gets in internal analysis, so also must will convert voice output to could export and give the user, so voice synthetic module 526 can come voice inquirement generated data storehouse 530 according to the repayment answer, and this speech database for speech synthesis 530 for example be record literal with and corresponding voice messaging, can so that voice synthetic module 526 can be found out the voice corresponding to the repayment answer, use and synthesize the first voice answer-back 507.Afterwards, voice synthetic module 126 can be with synthetic voice by voice output interface (not illustrating), and wherein the voice output interface is used the output voice and given the user such as being the device outputs such as loudspeaker, loudspeaker or earphone.

For instance, if user input is " I will see the The Romance of the Three Kingdoms " phonetic entry 501 words, sound identification module 522 can receive the analysis result 503 that phonetic entry 501 is resolved that transmits from phonetic sampling module 510, and then converting to for example is to comprise keyword 108 " The Romance of the Three Kingdoms " Feature Semantics 509.524 of natural language processing modules can be resolved this Feature Semantics 509 " The Romance of the Three Kingdoms " again, for example carry out full-text search by 200 pairs of structured database of searching system 220 of Figure 1A, and obtaining response 110 and rear generation of intention data 112 comparisons determined intention syntax data 114, the analysis result 104 of being sent by analysis result output module 116 at last, and generation comprises the candidate answers of three intention options of " The Romance of the Three Kingdoms ", and it is integrated into a candidate list 511 (that is comprises " reading ", " see TV play ", and " seeing a film " three options), then from these three intention options of candidate list 511, select again the highest answer (for example selecting the record 10 of Figure 1A) in temperature hurdle as the repayment answer.In one embodiment, can directly carry out temperature hurdle soprano's corresponding mode, for example play-over Xiao Jing and rise " betrayal " of singing and give the user, the present invention is not limited this.

In addition, natural language processing module 524 also can be by resolving follow-up another phonetic entry 501 that receives (because transporting in the same way feed-in phonetic sampling module 510 with previous phonetic entry 501), and judge whether repayment answer last time is correct, these voice are the user for the response that before provided the voice answer-back 507 that gives the user to do, and it comprises the user and thinks the information of previous voice answer-back 507 correctness.If above-mentioned repayment answer represents the user and thinks that repayment answer (that is before having conveyed to user person by voice answer-back 507) is incorrect, natural language processing module 524 also can be selected other answer in the above-mentioned candidate list 511, and according to selected result, produce the second voice answer-back 507 (giving the user because play by the mode of previous transmission voice answer-back 507 too) by voice synthetic module 526.Then, voice synthetic module 526 can also give the user by the output of voice output interface with the second voice answer-back 507 that synthesizes.

Continue the example of previous user's input " I will see the The Romance of the Three Kingdoms ", if the user wants to see the TV play of the The Romance of the Three Kingdoms, so it is not that the user wants just that Figure 1A of previous output and user records 10 option (because being the film of seeing " The Romance of the Three Kingdoms "), so the user may input " I will see The Romance of the Three Kingdoms TV play " or " I do not see The Romance of the Three Kingdoms film " ... Deng as the second phonetic entry 501.So the second phonetic entry 501 will be in parsing and after obtaining its solicited message 505 (or Feature Semantics 509), second voice answer-back 507 (if the user wants to watch The Romance of the Three Kingdoms TV play) of output " I play The Romance of the Three Kingdoms TV play for you now " or output " you want is which option " second voice answer-back 507 of (if the user only negates present option), and other option of collocation candidate list 511 is chosen for the user.In addition, in one embodiment, previous voice answer-back 507 is exported is certain option of candidate list 111 when determining whether the information that it wants to the user for it, and the user will input this judgement this moment " affirm " or " negate " and solicited message 505.For example " ask movie ", " this is not the thing that I want " ... etc. message.So the second phonetic entry 501 will be in parsing and after obtaining its solicited message 505, output " I play The Romance of the Three Kingdoms film for you now " the second voice answer-back 507 (if the user wants to watch The Romance of the Three Kingdoms film) or output " you want is which option " second voice answer-back 507 of (if the user only negates present option), and other option of collocation output candidate list 511 is chosen for the user.Moreover, in another embodiment, if (for example coupling, part are mated entirely according to priority ...) show candidate tabulation 511 is during with the user, the second phonetic entry 501 that the user inputs may comprise " selection " message.For example show " watch The Romance of the Three Kingdoms books "; " watch The Romance of the Three Kingdoms TV play "; and " watch The Romance of the Three Kingdoms film " three options do when selecting to the user; the user may input " I will see a film " ", or " I want the 3rd option " ... during Deng the second phonetic entry, after will and finding user's intention in the solicited message 505 of analyzing the second phonetic entry 501 (for example selecting to watch film), so the second phonetic entry 501 will be in parsing and after obtaining its solicited message 505, output " I play The Romance of the Three Kingdoms film for you now " the second voice answer-back 507 (if the user wants to watch The Romance of the Three Kingdoms film) then play-over film and give the user, or output " you want is to read The Romance of the Three Kingdoms books " second voice answer-back 507 of (if selected be reading), and collocation shows that the e-book of the The Romance of the Three Kingdoms gives user's action.

In the present embodiment, sound identification module 522, natural language processing module 524 and the voice synthetic module 526 in the aforementioned natural language understanding module 520 can be configured in the uniform machinery with phonetic sampling module 510.In other embodiments, sound identification module 522, natural language processing module 524 and voice synthetic module 526 also can be dispersed in the different machine (for example computer system, server or similar device/system).The natural language understanding module 520 ' shown in Fig. 5 C for example, voice synthetic module 526 can be configured in uniform machinery 502 with phonetic sampling module 510, and sound identification module 522, natural language processing module 524 are configurable at another machine.

The natural language dialogue system 500 of above-mentioned collocation Fig. 5 A of below namely arranging in pairs or groups illustrates the method for revising voice answer-back 507.Fig. 6 is the method flow diagram of the correction voice answer-back 507 that illustrates according to one embodiment of the invention.In the method for correction voice answer-back 507 in the present embodiment, when the user thinks that the voice answer-back 507 of playing does not at present meet its previous solicited message 505 of inputting, can input again another phonetic entry 501 and feed-in phonetic sampling module 510, when learning that by 520 analyses of natural language understanding module previous broadcast is given user's voice answer-back 507 and do not met user's intention more subsequently, natural language understanding module 520 can be exported another voice answer-back 507 again, uses and revises voice answer-back 507 originally.For convenience of description, be example in this natural language dialogue system 500 of only lifting Fig. 5 A, but the method for the correction voice answer-back 507 of present embodiment is also applicable to the natural language dialogue system 500 ' of above-mentioned Fig. 5 C.

Please be simultaneously with reference to Fig. 5 A and Fig. 6, in step S602, phonetic sampling module 510 can receive the first phonetic entry (also equally the direction feed-in phonetic sampling module 510 by phonetic entry 501).Wherein, the first phonetic entry 501 for example is the voice from the user, and the first phonetic entry 501 also can have user's solicited message 105.Particularly, the first phonetic entry 501 from the user can be inquiry sentence, imperative sentence or other solicited message 505 etc., for example " I will see the The Romance of the Three Kingdoms ", " I will listen the lustily music of water " or " temperature several years today " etc.

In step S604, natural language understanding module 520 can be resolved at least one included Feature Semantics 509 in the first phonetic entry 501, and obtains candidate list 511, and wherein candidate list 511 has one or more candidate answers.Specifically, natural language understanding module 520 can be resolved the first phonetic entry 501, and obtains one or more Feature Semantics 509 of the first phonetic entry 501.At this, Feature Semantics 509 for example is after natural language understanding module 520 is resolved the first phonetic entry 501, resulting keyword or solicited message etc.For instance, when the first phonetic entry 501 of user was " I will see the The Romance of the Three Kingdoms ", the natural language understanding module 520 by analysis rear Feature Semantics 509 that obtains for example was " " The Romance of the Three Kingdoms ", " seeing " ".Again for example, when the first phonetic entry 501 of user was " I will listen the lustily song of water ", the natural language understanding module 520 by analysis rear Feature Semantics 509 that obtains for example was " " lustily water ", " listening ", " song " ".

After connecing, natural language understanding module 520 can be inquired about from searching database (such as Search engine etc.) according to above-mentioned Feature Semantics 509, and obtains at least one search result, according to this as each candidate answers in the candidate list 511.Can as described in Figure 1A, do not given unnecessary details at this as for the mode of selecting candidate answers.Because a Feature Semantics 509 (for example keyword 108 of Figure 1A) may belong to different ken or attribute (film class for example, the books class, music class or game class etc.), and also can further be divided into the plurality of classes (different authors of same film or books title for example in same ken or the attribute, the different singers of same song title, different editions of same game name etc.), so for a Feature Semantics 509, natural language understanding module 520 can inquire one or many search result that is relevant to this Feature Semantics 509 in searching database, wherein can comprise in each search result all kinds of director informations that are relevant to this Feature Semantics 509 (for example with " Xiao Jingteng ", betray " be that keyword 108 is at Fig. 3 A; when the structured database 220 of 3B is carried out full-text search, will obtain two groups of matching results).Wherein, director information for example is in search result, other keyword except Feature Semantics 509 etc.Therefore from another viewpoint, when the first phonetic entry 501 of inputting as the user has a plurality of Feature Semantics 509 (keyword more than 108 that for example can parse), the solicited message 505 that then represents the user is clearer and more definite, so that natural language understanding module 520 can inquire the search result that approaches with solicited message 505.

For instance, when Feature Semantics 509 is " The Romance of the Three Kingdoms ", natural language understanding module 520 look into news to search result for example be the data of data, " ... " The Romance of the Three Kingdoms " ... " novel " " about " ... " The Romance of the Three Kingdoms " ... " TV play " ", the data of " ... " The Romance of the Three Kingdoms " ... " Luo Guanzhong " ... " novel " ", wherein to reach " novel " be cited director information for " TV play ", " Luo Guanzhong ".Again for example, when Feature Semantics 509 is " " lustily water " " music " ", natural language understanding module 520 is looked into the search result that arrives of news for example about the data of the data of the data of " ... " lustily water " ... " music " ... " Liu Dehua " ", " ... " lustily water " ... " music " ... " Li Yijun " ", " ... " lustily water " ... " music " ... " lyrics " ", and wherein to reach " lyrics " be cited director information for " Liu Dehua ", " Li Yijun ".In other words, each search result can comprise Feature Semantics 509 and the director information that is relevant to Feature Semantics, and natural language understanding module 520 can be according to the search result that inquires, data-switching included in the search result is become candidate answers, and candidate answers is recorded in the candidate list 511, use for subsequent step.

In step S606, natural language understanding module 520 can be selected in return answer at least one candidate answers in candidate list 511, and according to repayment answer, the first voice answer-back 507 that output is corresponding.In the present embodiment, natural language understanding module 520 can be according to the candidate answers in the priority arrangement candidate list, and selects the repayment answer according to this priority in candidate list, exports according to this first voice answer-back 507.

For instance, when Feature Semantics 509 is " The Romance of the Three Kingdoms ", suppose that natural language understanding module 520 inquires a lot of pen about the data of " ... " The Romance of the Three Kingdoms " ... " books " ", secondly be the data of " ... " The Romance of the Three Kingdoms " ... " music " ", and about the minimum data of " ... " The Romance of the Three Kingdoms " ... " TV play " ", then natural language understanding module 520 can be with " books of the The Romance of the Three Kingdoms " as the first preferential candidate answers, " music of the The Romance of the Three Kingdoms " as the second preferential candidate answers, " TV play of the The Romance of the Three Kingdoms " is as the 3rd preferential candidate answers.Other details front was carried, did not repeat them here.

Then, in step S608, phonetic sampling module 510 can receive the second phonetic entry 501, and natural language understanding module 520 can be resolved this second phonetic entry 501, and judges whether previous selected repayment answer is correct.At this, phonetic sampling module 510 can be resolved the second phonetic entry 501, to parse the included Feature Semantics 509 of the second phonetic entry 501 (because will export natural language processing module 524 to by phonetic sampling module 522 via the direction of label 509), wherein this Feature Semantics 509 for example is the keyword (for example time, intention, ken or attribute etc.) that the user further provides.And when the Feature Semantics 509 in the second phonetic entry 501 did not meet the director information of being correlated with in the repayment answer, 520 of natural language understanding modules can judge that previous selected repayment answer is incorrect.As for what the solicited message 505 of judging the second phonetic entry 501 comprised be " correctly " or " negate " the mode front of the first voice answer-back 507 carried, and do not repeat them here.

Furthermore, the second phonetic entry 501 that natural language understanding module 520 is resolved can comprise or not comprise clear and definite Feature Semantics 509.For instance, phonetic sampling module 510 for example is to receive from user said " I do not refer to the books of the The Romance of the Three Kingdoms " (situation A), " I do not refer to the books of the The Romance of the Three Kingdoms, and I refer to the TV play of the The Romance of the Three Kingdoms " (situation B), " I refer to the TV play of the The Romance of the Three Kingdoms " (situation C) etc.Feature Semantics 509 among the above-mentioned situation A for example is " "no" " The Romance of the Three Kingdoms " " books " ", Feature Semantics 509 among the situation B for example is " "no" " The Romance of the Three Kingdoms " " books "; "Yes" " The Romance of the Three Kingdoms " " TV play " ", and the Feature Semantics 509 among the situation C for example is " "Yes" " The Romance of the Three Kingdoms " " TV play " ".For convenience of description, above-mentionedly only enumerate situation A, B and C is example, but present embodiment is not limited to this.

Then, natural language understanding module 520 can according to the included Feature Semantics 509 of above-mentioned the second phonetic entry 501, judge whether director information relevant in the repayment answer is correct.That is to say, if disconnected repayment answer is " books of the The Romance of the Three Kingdoms ", and above-mentioned Feature Semantics 509 is " " The Romance of the Three Kingdoms " " TV play " ", then natural language understanding module 520 can judge that the director information (i.e. " books ") of being correlated with in the repayment answer does not meet the Feature Semantics 509 (i.e. " TV play ") from user's the second phonetic entry 501, uses and judges that the repayment answer is incorrect.Similarly, if disconnected repayment answer is " books of the The Romance of the Three Kingdoms ", and above-mentioned Feature Semantics 509 is "no" " The Romance of the Three Kingdoms " " books " ", it is incorrect that then natural language understanding module 520 also can be judged the repayment answer.

After natural language understanding module 520 was resolved the second phonetic entry 501, when the first voice answer-back 501 of output was correct before judging, then shown in step S610, natural language understanding module 120 can be made the response corresponding to the second phonetic entry 501.For instance, suppose to be " yes, being the books of the The Romance of the Three Kingdoms " from the second phonetic entry 501 of user that then natural language understanding module 520 can be the second voice answer-back 507 of output " helping you to open the books of the The Romance of the Three Kingdoms ".Perhaps, natural language understanding module 520 can when playing the second voice answer-back 507, directly load the book contents of the The Romance of the Three Kingdoms by processing unit (not illustrating).

Yet, after natural language understanding module 520 is resolved the second phonetic entry 501, when the first voice answer-back 507 of output is incorrect before judging, then shown in step S612, natural language understanding module 520 can be selected the another one in the candidate answers in candidate list 511, and exports the second voice answer-back 507 according to selected result.At this, if do not have clear and definite Feature Semantics 509 (such as the second phonetic entry 501 of above-mentioned situation A) in the second phonetic entry 501 that the user provides, then natural language understanding module 520 can be selected another candidate answers from candidate list 511.Perhaps, if have clear and definite Feature Semantics 509 (such as the second phonetic entry 501 of above-mentioned situation B and C) in the second phonetic entry 501 that the user provides, but the guided Feature Semantics 509 of then natural language understanding module 520 direct basis users is being selected another candidate answers from candidate list 511.

On the other hand, if have clear and definite Feature Semantics 509 (such as the second phonetic entry of above-mentioned situation B and C) in the second phonetic entry 501 that the user provides, but natural language understanding module 520 is looked in candidate list 511 without the candidate answers that meets the director information of this Feature Semantics 509, then natural language understanding module 520 can output the 3rd voice answer-back 507, such as " NK " or " I do not know " etc.

In order to make those skilled in the art further understand method and the natural language dialogue system of the correction voice answer-back of present embodiment, below be described in detail for an embodiment again.

At first, suppose that the first phonetic entry 501 that phonetic sampling module 510 receives is " I will see the The Romance of the Three Kingdoms " (step S602), then, natural language understanding module 520 can parse the Feature Semantics 509 for " " seeing " " The Romance of the Three Kingdoms " ", and obtain candidate list 511 with a plurality of candidate answers, wherein each candidate answers has relevant director information (step S604), as shown in Table 1.

Table one

Then, natural language understanding module 520 can be selected the repayment answer in candidate list 511.Suppose natural language understanding module 520 sequentially choose in the candidate list candidate answers a with as the repayment answer, then natural language understanding module 520 for example be output " whether playing the books of the The Romance of the Three Kingdoms ", i.e. the first voice answer-back 507 (step S606).

At this moment, if the second phonetic entry 501 that phonetic sampling module 510 receives is " yes " (step S608), then natural language understanding module 520 can be judged above-mentioned repayment answer for correct, and natural language understanding module 520 can be exported another voice answer-back 507 " please wait a moment ", and loads the book contents (step S610) of the The Romance of the Three Kingdoms by processing unit.

Yet, if the second phonetic entry 501 that phonetic sampling module 510 receives is " I do not refer to the books of the The Romance of the Three Kingdoms " (step S208), it is incorrect that then natural language understanding module 520 can be judged above-mentioned repayment answer, and natural language understanding module 520 can be again from the candidate answers b of candidate list 511 ~ e, select another repayment answer, it for example is candidate answers b " whether will play the TV play of the The Romance of the Three Kingdoms ".If the user continues to answer " not being TV play ", one of them of then natural language understanding module 520 meeting selection candidate answers c ~ e repaid.In addition, if the candidate answers a in the candidate list 511 ~ e is all given user's mistake by 520 repayment of natural language understanding module, and when not meeting user's phonetic entry 501 among these candidate answers a ~ e, the voice answer-back 507 (step S612) of then natural language understanding module 520 outputs " looking into without any data ".

In one embodiment, in above-mentioned step S608, if receiving the second phonetic entry 501 of user, phonetic sampling module 510 is " I do not refer to the TV play of the The Romance of the Three Kingdoms ", then natural language understanding module 520 can be selected other candidate answers, for example be candidate answers c, and the voice answer-back 507 of output " whether will play the TV play of the The Romance of the Three Kingdoms ".In the same manner, the phonetic entry 501 that natural language understanding module 520 also can receive according to phonetic sampling module 510 is again chosen other candidate answers and is come as the repayment answer, until do not meet user's phonetic entry 501 among candidate answers a ~ e.

In another embodiment, in above-mentioned step S608, if receiving the second phonetic entry 501 of user, phonetic sampling module 510 is " I refer to the caricature of the The Romance of the Three Kingdoms ", at this, owing to there is no the candidate answers about caricature in the candidate list 511, natural language understanding module 520 can directly be exported the voice answer-back 507 of " looking into without any data ".

Based on above-mentioned, natural language understanding module 520 can be according to from the first phonetic entry 501 of user and voice answer-back 507 corresponding to output.Wherein, when the voice answer-back 507 of exporting when natural language understanding module 520 does not meet user's the solicited message 505 of the first phonetic entry 501, natural language understanding module 520 can be revised the originally voice answer-back 507 of output, and follow-up the second phonetic entry 501 that provides of User, further output meets the voice answer-back 507 of user's solicited message 505.Thus, if during the answer that the dissatisfied natural language understanding module 520 of user provides, natural language understanding module 520 can automatically be revised, and repays new voice answer-back 507 and give the user, uses and promotes user and natural language dialogue system 500 convenience when engaging in the dialogue.

It is worth mentioning that, in the step S606 and step S612 of Fig. 6, natural language understanding module 520 also can be assessed according to difference the method for priority, candidate answers in the candidate list sorts, in candidate list 511, select the repayment answer according to this priority according to this, export again the voice answer-back 507 corresponding to the repayment answer.

For instance, natural language understanding module 520 can be according to everybody's use habit, the priority of the candidate answers in the candidate list 511 that sorts, and the answer of wherein often using about everybody is prioritization then.For example, when Feature Semantics 509 is " The Romance of the Three Kingdoms ", suppose that candidate answers that natural language understanding module 520 finds for example is the music of books and the The Romance of the Three Kingdoms of the TV play of the The Romance of the Three Kingdoms, the The Romance of the Three Kingdoms.Wherein, if everybody typically refers to the books of " The Romance of the Three Kingdoms " when mentioning " The Romance of the Three Kingdoms ", less people can refer to the TV play of " The Romance of the Three Kingdoms ", and still less the people can refer to " The Romance of the Three Kingdoms " music (for example when the temperature hurdle 316 stored numerical value among Fig. 3 C are whole users' coupling situation, numerical value that can the temperature hurdle is judged), then natural language understanding module 520 can be according to the candidate answers of prioritization about " books ", " TV play ", " music ".That is to say that natural language understanding module 520 can preferentially select " books of the The Romance of the Three Kingdoms " to come as the repayment answer, and according to this repayment answer output voice answer-back 507.

In addition, but natural language understanding module 520 Users custom, to determine the priority of candidate answers.Specifically, natural language understanding module 520 can be recorded in the user session database with the phonetic entry 501 that once received from the user, and wherein the user session database for example is to be stored in storage device.The user session database can record natural language understanding module 520 when resolving users' phonetic entry 501, and the Feature Semantics 509 that obtains and natural language understanding module 520 produce replys the response message such as record.In addition, when the stored numerical value system in temperature hurdle that also can be in Fig. 3 C 316 and user's custom (for example matching times) is relevant, the numerical value judgement user's on available temperature hurdle use habit or priority.Therefore, natural language understanding module 520 is when selecting the repayment answer, can be according to the response message that records in the user session database, have the candidate answers of the director information that meets with response message according to priority ordering for the repayment answer, use the voice answer-back 507 that output meets user's phonetic entry.

For instance, when supposing that user and natural language understanding module 520 engage in the dialogue, often be lifted to " books of the The Romance of the Three Kingdoms ", and less mentioning " TV play of the The Romance of the Three Kingdoms ", and (for example record has 20 about the record of " books of the The Romance of the Three Kingdoms " in the user session database still less to mention " music of the The Romance of the Three Kingdoms ", 8 records about " TV play of the The Romance of the Three Kingdoms ", and 1 record about " music of the The Romance of the Three Kingdoms "), then the priority of the candidate answers in the candidate list will be sequentially " books of the The Romance of the Three Kingdoms ", " TV play of the The Romance of the Three Kingdoms " and " music of the The Romance of the Three Kingdoms ".That is to say that when Feature Semantics was " The Romance of the Three Kingdoms ", natural language understanding module 520 can select " books of the The Romance of the Three Kingdoms " to come as the repayment answer, and according to this repayment answer output voice answer-back 507.

It is worth mentioning that, but natural language understanding module 520 User hobby also, to determine the priority of candidate answers.Specifically, the user session database also can record the keyword of the expressed mistake of user, for example: " liking ", " idol ", " detest " or " disliking " etc.Therefore, natural language understanding module 520 can be in candidate list 511, and the number of times that is recorded according to above-mentioned keyword comes candidate answers is sorted.For instance, the director information number of times of supposing to be relevant in the candidate answers " liking " is more, and then this candidate answers can preferentially be selected.Perhaps, suppose to select in the answer director information number of times that is relevant to " detest " more, then be selected more afterwards.

For instance, when supposing that user and natural language understanding module 520 engage in the dialogue, often mention " TV play that I dislike seeing the The Romance of the Three Kingdoms ", and less mentioning " my disagreeable music of listening the The Romance of the Three Kingdoms ", and still less mention " my the disagreeable books of listening the The Romance of the Three Kingdoms " and (for example record 20 in the user session database about the record of " I dislike seeing the TV play of the The Romance of the Three Kingdoms ", 8 records about " my disagreeable music of listening the The Romance of the Three Kingdoms ", and 1 about " my the disagreeable books of listening the The Romance of the Three Kingdoms "), then the priority of the candidate answers in the candidate list 511 sequentially is " books of the The Romance of the Three Kingdoms ", " TV play of the The Romance of the Three Kingdoms " and " music of the The Romance of the Three Kingdoms ".That is to say that when Feature Semantics 509 was " The Romance of the Three Kingdoms ", natural language understanding module 120 can select the books of " The Romance of the Three Kingdoms " to come as the repayment answer, and according to this repayment answer output voice answer-back 507.In one embodiment, can add in addition one in that the temperature hurdle 316 of Fig. 3 B is outer " detest the hurdle " (not shown), in order to recording user " the detest degree ".In another embodiment, can be resolved to the user to a certain record " detest " during information, directly subtract one (or other numerical value) on the temperature hurdle of corresponding record, like this can be when not increasing field the hobby of recording user.The embodiment of various recording user hobbies all can be applicable in the embodiment of the invention, and the present invention is not limited this.

On the other hand, but the also more early phonetic entry 501 of input of User of natural language understanding module 520, to determine the priority of at least one candidate answers.That is to say, suppose that phonetic entry 501 is arranged (i.e. the 4th phonetic entry) by the time advance that is received with phonetic sampling module 510 when the first phonetic entry 501, then natural language understanding module 520 also can be by resolving the keyword 108 (or Feature Semantics 509) in the 4th phonetic entry, and in candidate list 511, preferentially choose candidate answers with director information that keyword 108 therewith meets with as the repayment answer, and according to this repayment answer output voice answer-back 507.

For instance, suppose that natural language understanding module 520 receives first the phonetic entry 501 of " I want to see TV play ", and every after several seconds, suppose that natural language understanding module 520 receives again the phonetic entry 501 of " it is good to help me to put the The Romance of the Three Kingdoms ".At this moment, natural language understanding module 520 can recognize the keyword 108 of " TV play " in primary phonetic entry 501, therefore, natural language understanding module 520 can be from candidate list 511, choose director information that candidate answers is correlated with and be the candidate answers about " TV play ", and with this candidate answers as the repayment answer and according to this output is given the user with voice answer-back 507.

Based on above-mentioned, natural language understanding module 520 can be according to the phonetic entry 501 from the user, and consider everybody's use habit, user preferences, user habit or user said front and back dialogue etc. information in light of actual conditions, and output is given the user than the voice answer-back 507 that can meet the solicited message 505 of phonetic entry.Wherein, natural language understanding module 520 can be according to different sortords, and for example everybody's use habit, user preferences, user habit or user said front and back dialogue etc. mode is come the candidate answers in the priority ordering candidate list 511.By this, if when more indefinite from user's phonetic entry, natural language understanding module 520 can be considered everybody's use habit, user preferences, user habit or user said front and back dialogue in light of actual conditions, judges the intention (for example attribute of the Feature Semantics 509 in the phonetic entry 501, ken etc.) that means in user's the phonetic entry 501.In other words, if candidate answers and user once expressed/intention of everybody's indication near the time, 520 of natural language understanding modules can be paid the utmost attention to this candidate answers and be the repayment answer.Thus, the voice answer-back 507 that natural language dialogue system 500 exports can meet user's solicited message.

In sum, in the method and natural language dialogue system of the correction voice answer-back of present embodiment, the natural language dialogue system can be according to from the first phonetic entry 501 of user and voice answer-back 507 corresponding to output.Wherein, when the voice answer-back 507 of exporting when the natural language dialogue system does not meet user's the solicited message 507 of the first phonetic entry 501 or Feature Semantics 509, the originally voice answer-back 507 of output can be revised by the natural language dialogue system, and follow-up the second phonetic entry 501 that provides of User, further select the second voice answer-back 507 of the solicited message 505 that meets the user.In addition, the natural language dialogue system also can preferentially select more suitable repayment answer according to everybody's use habit, user preferences, user habit or user said front and back dialogue etc. mode, exports according to this voice answer-back 507 and gives the user.Thus, if during the answer that the dissatisfied natural language dialogue system of user provides, the natural language dialogue system can automatically revise according to the solicited message 505 that the user says each time, and repay new voice answer-back 507 and give the user, use and promote user and the natural language dialogue system convenience when engaging in the dialogue.

Then again with natural language understanding system 100 and structured database 220 framework such as grade and members, be applied to reply and explanation that the example of candidate answers is done according to providing with user's session operational scenarios and context, user's use habit, everybody's use habit and user preferences.

Fig. 7 A is the calcspar of the natural language dialogue system that illustrates according to one embodiment of the invention.Please refer to Fig. 7 A, natural language dialogue system 100 comprises phonetic sampling module 710, natural language understanding module 720, property database 730 and database for natural language 740.In fact, the phonetic sampling module 710 among Fig. 7 A and the phonetic sampling module of Fig. 5 A 510 are identical and natural language understanding module 520 is also identical with natural language understanding module 720, so the function of its execution is identical.In the present embodiment, phonetic sampling module 710 is in order to receive phonetic entry 701 (that is phonetic entry 501, for example from user's voice), and natural language understanding module 720 can be resolved the solicited message 705 (that is solicited message 505) in the phonetic entry, and voice answer-back 707 (that is voice answer-back 507) corresponding to output.Each member in the aforementioned natural language dialogue system 700 is configurable in uniform machinery, and the present invention is not limited this.

Natural language understanding module 720 can receive from phonetic sampling module 710 transmit phonetic entry is resolved after analysis result 703, so that the content of phonetic entry 701 is resolved.And, natural language understanding module 120 can produce the candidate list 711 that comprises at least one candidate answers according to the one or more Feature Semantics 709 in the phonetic entry (such as keyword 108 etc.), from these candidate answers, find out again the answer that meets Feature Semantics 709, with output voice answer-back 707.In addition, the natural language understanding module 720 of present embodiment can be come implementation by the hardware circuit that or several logic locks combine, or comes implementation with computer program code, at this only for illustrating, not as limit.

Fig. 7 B is the calcspar of the natural language dialogue system 700 ' that illustrates according to another embodiment of the present invention.The natural language understanding module 720 ' of Fig. 7 B can comprise sound identification module 722 and natural language processing module 724, and phonetic sampling module 710 can be incorporated in the speech synthesis processing module 702 with voice synthetic module 726.Wherein, sound identification module 722 can receive and transmit the analysis result 703 that phonetic entry 701 is resolved from phonetic sampling module 710, and converts one or more Feature Semantics 709 (such as keyword or words and expressions etc.) to.124 of natural language processing modules can be resolved these Feature Semantics 709 again, and obtain at least one candidate list 711, and select an answer that meets phonetic entry 701 from candidate list 711, with as the repayment answer.Owing to this repayment answer is the answer that natural language understanding module 720 gets in internal analysis, so also must will convert voice output to could export and give the user, so voice synthetic module 726 can come voice inquirement generated data storehouse 730 according to the repayment answer, and this speech database for speech synthesis 730 for example be record literal with and corresponding voice messaging, can so that voice synthetic module 726 can be found out the voice corresponding to the repayment answer, use and synthesize the first voice answer-back 707.Afterwards, voice synthetic module 726 can be with synthetic voice by voice output interface (not illustrating), and wherein the voice output interface is such as being the devices such as loudspeaker, loudspeaker or earphone) output, use the output voice and give the user.

In the present embodiment, sound identification module 722, natural language processing module 724 and the voice synthetic module 726 in the aforementioned natural language understanding module 720 can be configured in the uniform machinery with phonetic sampling module 710.In other embodiments, sound identification module 722, natural language processing module 724 and voice synthetic module 726 also can be dispersed in (for example computer system, server or similar device/system) in the different machines.The natural language understanding module 720 ' shown in Fig. 7 C for example, voice synthetic module 726 can be configured in uniform machinery 702 with phonetic sampling module 710, and sound identification module 722, natural language processing module 724 are configurable at another machine.

The natural language dialogue system 700 of above-mentioned collocation Fig. 7 A of below namely arranging in pairs or groups illustrates the natural language dialogue method.Fig. 8 is the process flow diagram of the natural language dialogue method that illustrates according to one embodiment of the invention.For convenience of description, be example in this natural language dialogue system 800 of only lifting Fig. 7 A, but the natural language dialogue method of present embodiment is also applicable to the natural language dialogue system 700 ' of above-mentioned Fig. 7 C.Compare down with Fig. 5/6, the phonetic entry of Fig. 5/6 handled Users and automatically revise the information of exporting, be the hobby characteristic of coming recording user according to property database but Fig. 7 A/7B/8 is handled, and from candidate list, select according to this candidate answers to give the user.In fact, the embodiment of Fig. 5/6 and Fig. 7 A/7B/8 can select one or and deposit, invention is not limited this.

Please be simultaneously with reference to Fig. 7 A and Fig. 8, in step S810, phonetic sampling module 710 can receive the first phonetic entry 701.Wherein, the first phonetic entry 701 for example is the voice from the user, and the first phonetic entry 701 also can have user's solicited message 705.Particularly, the first phonetic entry 701 from the user can be inquiry sentence, imperative sentence or other solicited message 705 etc., for example " I will see the The Romance of the Three Kingdoms ", " I will listen the lustily music of water " or " temperature several years today " etc.

In step S820, natural language understanding module 720 can be resolved at least one included Feature Semantics 709 in the first phonetic entry 701, and then obtains candidate list 711, and wherein candidate list 711 has one or more candidate answers.Specifically, natural language understanding module 720 can be resolved the first phonetic entry 701, and obtains one or more Feature Semantics 709 of the first phonetic entry 701.At this, Feature Semantics 709 for example is after natural language understanding module 720 is resolved the first phonetic entry 701, resulting keyword or solicited message etc.For instance, when the first phonetic entry 701 of user was " I will see the The Romance of the Three Kingdoms ", the natural language understanding module 720 by analysis rear Feature Semantics 709 that obtains for example was " " The Romance of the Three Kingdoms ", " seeing " ".Again for example, when the first phonetic entry 701 of user was " I will listen the lustily song of water ", the natural language understanding module 720 by analysis rear Feature Semantics 109 that obtains for example was " " lustily water ", " listening ", " song " ".

After connecing, natural language understanding module 720 can be inquired about from searching database (such as Search engine etc.) according to above-mentioned Feature Semantics 709, and obtains at least one search result, according to this as each candidate answers in the candidate list 711.Because Feature Semantics 709 may belong to different ken or attribute (film class for example, the books class, music class or game class etc.), and also can further be divided into the plurality of classes (different authors of same film or books title for example in same ken or the attribute, the different singers of same song title, different editions of same game name etc.), so for a Feature Semantics 709, natural language understanding module 720 can be in searching database for natural language 740, inquire one or many search result that is relevant to this Feature Semantics 709, wherein can comprise all kinds of director informations that are relevant to this Feature Semantics 709 in each search result.Wherein, director information for example is in search result, other keyword except Feature Semantics 709 etc.Therefore from another viewpoint, when the first phonetic entry of inputting as the user has a plurality of Feature Semantics 709, represent that then user's solicited message 705 is clearer and more definite, so that natural language understanding module 720 can inquire the search result that approaches with solicited message 705.

For instance, when Feature Semantics 709 is " The Romance of the Three Kingdoms ", natural language understanding module 720 look into news to search result for example be the data of data, " ... " The Romance of the Three Kingdoms " ... " novel " " about " ... " The Romance of the Three Kingdoms " ... " TV play " ", the data of " ... " The Romance of the Three Kingdoms " ... " Luo Guanzhong " ... " novel " ", wherein to reach " novel " be cited director information for " TV play ", " Luo Guanzhong ".Again for example, when Feature Semantics 709 is " " lustily water " " music " ", natural language understanding module 720 is looked into the search result that arrives of news for example about the data of the data of the data of " ... " lustily water " ... " music " ... " Liu Dehua " ", " ... " lustily water " ... " music " ... " Li Yijun " ", " ... " lustily water " ... " music " ... " lyrics " ", and wherein to reach " lyrics " be cited director information for " Liu Dehua ", " Li Yijun ".In other words, each search result can comprise Feature Semantics 709 and the director information that is relevant to Feature Semantics 709, and natural language understanding module 720 can be according to the search result that inquires, data-switching included in the search result is become candidate answers, and candidate answers is recorded in the candidate list 711, use for subsequent step.

In step S830, natural language understanding module 720 is according to property database 730, and one of them in candidate list 711 at least one candidate answers of selection is the repayment answer, and according to the repayment answer, exports the first voice answer-back 707.In the present embodiment, natural language understanding module 720 can be according to the candidate answers in the priority arrangement candidate list 711, and selects the repayment answer according to this priority in candidate list 711.And in step S840, according to the repayment answer, export the first voice answer-back 707.

For instance, when Feature Semantics 709 is " The Romance of the Three Kingdoms ", suppose that natural language understanding module 720 inquires a lot of pen about the data of " ... " The Romance of the Three Kingdoms " ... " books " ", secondly be the data of " ... " The Romance of the Three Kingdoms " ... " music " ", and about the minimum data of " ... " The Romance of the Three Kingdoms " ... " TV play " ", then natural language understanding module 720 can be with " books of the The Romance of the Three Kingdoms " as the first preferential candidate answers, " music of the The Romance of the Three Kingdoms " as the second preferential candidate answers, " TV play of the The Romance of the Three Kingdoms " is as the 3rd preferential candidate answers.

In order to make those skilled in the art further understand natural language dialogue method and the natural language dialogue system of present embodiment, below be described in detail for an embodiment again.

At first, suppose that the first phonetic entry 701 that phonetic sampling module 710 receives is " I will see the The Romance of the Three Kingdoms " (step S810), then, natural language understanding module 720 can parse the Feature Semantics 709 for " " seeing " " The Romance of the Three Kingdoms " ", and obtain candidate list 711 with a plurality of candidate answers, wherein each candidate answers has shown in relevant director information (step S820) table one also described above.

Then, natural language understanding module 720 can be selected the repayment answer in candidate list 711.Suppose natural language understanding module 720 sequentially choose in the candidate list 711 candidate answers a (please refer to table one) with as the repayment answer, then natural language understanding module 720 for example is output " whether playing the books of the The Romance of the Three Kingdoms ", i.e. the first voice answer-back 707 (step S830 ~ S840).

It is worth mentioning that, natural language understanding module 720 also can be assessed according to difference the method for priority, sort candidate answers in the candidate list 711 is selected the repayment answer according to this priority according to this in candidate list 711, export the voice answer-back 707 corresponding to the repayment answer again.Moreover natural language understanding module 720 can according to judging user preferences with a plurality of session logs of user, utilize this user preferences to determine the priority of candidate answers.In one embodiment, can utilize the temperature hurdle 316 of Fig. 3 B to come the hobby of recording user, carry as for the recording mode front on temperature hurdle 316, not repeat them here.Certainly, user's hobby also can operating characteristic database 730, and its embodiment will describe in detail at the following passage.Mode as for property database 730 recording users hobby, can use keyword (for example " The Romance of the Three Kingdoms ") is the basis, the collocation user preferences (such as " like " etc. the forward term and " detest " etc. negative term be divided into two hurdles), then calculate hobby quantity (such as statistics forward term with etc. the quantity of negative term), so when inquiring user is liked, can directly inquire about hobby relevant field (such as inquiry forward term with etc. negative term how much quantity is respectively arranged), judge according to this again user's hobby.

And before step S810 receives the first phonetic entry, in step S802, receive a plurality of phonetic entries, namely previous dialog history record according to previous a plurality of phonetic entries 701, captures user preferences attribute (step S804).

Conversation content based on a plurality of phonetic entries 701, when supposing that user and natural language understanding module 720 engage in the dialogue, often mention " TV play that I dislike seeing the The Romance of the Three Kingdoms ", and less mentioning " my disagreeable music of listening the The Romance of the Three Kingdoms ", and still less mention " my the disagreeable books of listening the The Romance of the Three Kingdoms " and (for example record 20 records about " I dislike seeing the TV play of the The Romance of the Three Kingdoms " (that is " The Romance of the Three Kingdoms " quantity of adding the negative term of " TV play " is exactly 20) in the property database 730,8 records about " my disagreeable music of listening the The Romance of the Three Kingdoms " (that is " The Romance of the Three Kingdoms " quantity of adding the negative term of " music " is 8), and 1 about " my the disagreeable books of listening the The Romance of the Three Kingdoms ") (that is " The Romance of the Three Kingdoms " add " books " the quantity of negative term be 1), then the priority of the candidate answers in the candidate list 711 sequentially is " books of the The Romance of the Three Kingdoms ", " music of the The Romance of the Three Kingdoms ", and " TV play of the The Romance of the Three Kingdoms ".That is to say that when Feature Semantics 709 was " The Romance of the Three Kingdoms ", natural language understanding module 720 can select the books of " The Romance of the Three Kingdoms " to come as the repayment answer, and according to this repayment answer output voice answer-back 707.

It is worth mentioning that, but natural language understanding module 720 User hobby also, to determine the priority of candidate answers.Specifically, property database 730 also can record the keyword of the expressed mistake of user, for example: " liking ", " idol " (above is the front term), " detest " or " disliking " (above is negative term) etc.Therefore, natural language understanding module 720 can be in candidate list 711, and the number of times that is recorded according to above-mentioned keyword comes candidate answers sort (that is relatively front term or which person of negative term to quote number of times more).For instance, the director information number of times of supposing to be relevant in the candidate answers " liking " more (that is front term quote number of times more), then this candidate answers can preferentially be selected.Perhaps, suppose to select in the answer director information number of times that is relevant to " detest " more (that is negative term quote number of times more), then be selected more afterwards.

In addition, natural language understanding module 720 can be according to everybody's use habit, the priority of the candidate answers in the candidate list 711 that sorts, the answer of wherein often using about everybody is prioritization (for example using the temperature hurdle 316 of Fig. 3 C to keep a record) then.For example, when Feature Semantics 709 is " The Romance of the Three Kingdoms ", suppose that candidate answers that natural language understanding module 720 finds for example is the music of books and the The Romance of the Three Kingdoms of the TV play of the The Romance of the Three Kingdoms, the The Romance of the Three Kingdoms.Wherein, if everybody typically refers to the books of " The Romance of the Three Kingdoms " when mentioning " The Romance of the Three Kingdoms ", less people can refer to the TV play of " The Romance of the Three Kingdoms ", and still less the people can refer to the music of " The Romance of the Three Kingdoms ", and then natural language understanding module 720 can be according to the candidate answers of prioritization about " books ", " TV play ", " music ".That is to say that natural language understanding module 720 can preferentially select " books of the The Romance of the Three Kingdoms " to come as the repayment answer, and according to this repayment answer output voice answer-back 707.As for above-mentioned " the answer prioritization often used of everybody " mode, can use the temperature hurdle 316 of Fig. 3 C to keep a record, and recording mode has disclosed, does not repeat them here in the relevant paragraph of above-mentioned Fig. 3 C.

In addition, but also User custom of natural language understanding module 720, to determine the priority of candidate answers.Specifically, natural language understanding module 720 can be recorded in property database 730 with the phonetic entry 701 that once received from the user, property database 730 can record natural language understanding module 720 when resolving users' phonetic entry 701, and the Feature Semantics 709 that obtains and natural language understanding module 720 produce replys the response message such as record.Therefore, natural language understanding module 720 is when selecting the repayment answer, can be according to the response message that records in the property database 730, have the candidate answers of the director information that meets with response message according to priority ordering for the repayment answer, use the voice answer-back that output meets user's phonetic entry.As for above-mentioned " User custom determines the priority of candidate answers " mode, also can use the temperature hurdle 316 of Fig. 3 C to keep a record, and recording mode has disclosed, does not repeat them here in the relevant paragraph of above-mentioned Fig. 3 C.

For instance, when supposing that user and natural language understanding module 720 engage in the dialogue, often be lifted to " books of the The Romance of the Three Kingdoms ", and less mentioning " TV play of the The Romance of the Three Kingdoms ", and still less mention " music of the The Romance of the Three Kingdoms " and (for example record 20 in the property database 730 about the record of " books of the The Romance of the Three Kingdoms ", 8 records about " TV play of the The Romance of the Three Kingdoms ", and 1 record about " music of the The Romance of the Three Kingdoms "), then the priority of the candidate answers in the candidate list 111 will be sequentially " books of the The Romance of the Three Kingdoms ", " TV play of the The Romance of the Three Kingdoms " and " music of the The Romance of the Three Kingdoms ".That is to say that when Feature Semantics 709 was " The Romance of the Three Kingdoms ", natural language understanding module 720 can select " books of the The Romance of the Three Kingdoms " to come as the repayment answer, and according to this repayment answer output voice answer-back 707.

Comprehensively above-mentioned, natural language understanding module 720 is stored to (step S806) in the property database 730 with above-mentioned user preferences attribute, user habit and everybody's use habit.That is to say, in step S802, step S804 and step S806, know the user preferences attribute from user's previous dialog history record, and with in the user preferences attribute adding property database 730 of collecting, in addition, also user habit and everybody's use habit are stored to property database 730, allow natural language understanding module 720 can utilize abundant information in the property database 730, the user is provided more accurate replying.

After step S806, receive the first phonetic entry at step S810, and at the Feature Semantics 709 that S820 resolves the first phonetic entry, obtain candidate list 711.Then, natural language understanding module 720 determines the priority (step S880) of at least one candidate answers according to user preferences attribute, user habit or everybody's use habit.Then, in candidate list 711, select repayment answer (step S890) according to priority.Afterwards, according to the repayment answer, export the first voice answer-back 707 (step S840).

On the other hand, but the also more early phonetic entry 701 of input of User of natural language understanding module 120, to determine the priority of at least one candidate answers.That is to say, suppose that phonetic entry 701 is arranged (i.e. the second phonetic entry) by the time advance that is received with phonetic sampling module 710 when the first phonetic entry 701, then natural language understanding module 720 also can be by resolving the keyword in the second phonetic entry 701, and in candidate list 711, preferentially choose candidate answers with director information that keyword therewith meets with as the repayment answer, and according to this repayment answer output voice answer-back 707.

For instance, suppose that natural language understanding module 720 receives first the phonetic entry 701 of " I want to see TV play ", and every after several seconds, suppose that natural language understanding module 720 receives again the phonetic entry 701 of " it is good to help me to put the The Romance of the Three Kingdoms ".At this moment, natural language understanding module 720 can recognize the keyword of " TV play " in primary phonetic entry 701, therefore, natural language understanding module 720 can be from candidate list 711, choose director information that candidate answers is correlated with and be the candidate answers about " TV play ", and with this candidate answers as the repayment answer and according to this output is given the user with voice answer-back 707.

Based on above-mentioned, natural language understanding module 720 can be according to the phonetic entry from the user, and consider everybody's use habit, user preferences, user habit or user said front and back dialogue etc. information in light of actual conditions, and output is given the user than the voice answer-back 707 that can meet the solicited message 705 of phonetic entry 701.Wherein, natural language understanding module 720 can be according to different sortords, and for example everybody's use habit, user preferences, user habit or user said front and back dialogue etc. mode is come the candidate answers in the priority ordering candidate list 711.By this, if when more indefinite from user's phonetic entry 701, natural language understanding module 720 can be considered everybody's use habit, user preferences, user habit or user said front and back dialogue in light of actual conditions, judges the intention (for example attribute of the Feature Semantics in the phonetic entry 709, ken etc.) that means in user's the phonetic entry 701.In other words, if candidate answers and user once expressed/intention of everybody's indication near the time, 720 of natural language understanding modules can be paid the utmost attention to this candidate answers and be the repayment answer.Thus, the voice answer-back 707 that natural language dialogue system 700 exports can meet user's solicited message 705.

In sum, the invention provides a kind of natural language dialogue method and system thereof, the natural language dialogue system can be according to export corresponding voice answer-back from the first phonetic entry of user.Natural language dialogue of the present invention system also can be according to foundation everybody use habit, user preferences, user habit or user said front and back dialogue etc. mode, preferentially select more suitable repayment answer, export according to this voice answer-back and give the user, use and promote user and the natural language dialogue system convenience when engaging in the dialogue.

Then again with natural language understanding system 100 and structured database 220 framework such as grade and members, the quantity of the candidate answers that is applied to the solicited message analysis of User phonetic entry and gets, decision direct basis data type operates or requires the user that further indication is provided, subsequently in candidate answers only during surplus one, the also explanation done of the example that operates of direct basis data type.

Fig. 9 is the system schematic according to the mobile terminal apparatus of one embodiment of the invention.Please refer to Fig. 9, in the present embodiment, mobile terminal apparatus 900 comprises voice receiving unit 910, data processing unit 920, display unit 930 and storage unit 940.Data processing unit 920 couples voice receiving unit 910, display unit 930 and storage unit 940.Voice receiving unit 910 is in order to receive the first input voice SP1 and the second input voice SP2 and to be sent to data processing unit 920.The first above-mentioned phonetic entry SP1 and the second phonetic entry SP2 can be phonetic entries 501,701 or comprise the phonetic entry of solicited message.Display unit 930 is in order to be controlled by data processing unit 920 to show data list.Storage unit 940 is in order to store a plurality of data, and these data are structural data as the aforementioned, does not repeat them here.In addition, storage unit 940 can be the internal memory of any type in server or the computer system, dynamic random memory (DRAM) for example, static random internal memory (SRAM), flash memory (Flash memory), ROM (read-only memory) (ROM) ... Deng, the present invention is not limited this, has the knack of skill person of the present invention and can select according to actual demand.

In the present embodiment, data processing unit 920 can carry out speech recognition to the first input voice SP1 and comprise solicited message 102 with generation, 505, or the first word string of 705 (for example being possible be intended to syntax data 106), again the first word string being carried out natural language processing (for example is keyword 108 with the first meaning of one's words information that produces corresponding the first input voice SP1, Feature Semantics 507/509 etc.), and according to the first meaning of one's words information corresponding to the first input voice SP1 from the data (for example Search engine 240 carries out full-text search according to 108 pairs of structured database of keyword 220) of storage unit 940, select corresponding part (for example result 110 or candidate list 511/711 are responded in output).When the quantity of the data of selecting was 1, data processing unit 120 can carry out corresponding operation according to the type of selected data; When the quantity of the data of selecting greater than 1 the time, data processing unit 920 shows data lists (for example the show candidate tabulation 511/711) according to the Data Control display unit 940 of selecting.Showing under the situation that data list is further chosen for the user fine horse, data processing unit 120 can be received the second input voice SP2, and it is carried out speech recognition to produce the second word string, again the second word string is carried out voice SP2 is inputted in natural language processing to produce correspondence second the second meaning of one's words information, and according to the part of selecting correspondence in the data of the second meaning of one's words information from data list corresponding to the second input voice SP2.Wherein, the first meaning of one's words information and the second meaning of one's words information can be made of a plurality of keyword 108.Above-mentioned the second phonetic entry SP2 is analyzed and produces the mode of the second word string and the second meaning of one's words information, therefore the mode that can use Fig. 5 A and 7A that the second phonetic entry is analyzed repeats no more.

Similarly, when the quantity of the data of selecting was 1, data processing unit 920 can carry out corresponding operation according to the type of selected data; When the quantity of the data of selecting greater than 1 the time, data processing unit 920 can show data lists according to the Data Control display unit 940 of selecting again.Then, select corresponding part according to next the second input voice SP2 again, carry out corresponding operation according to the quantity of the data of selecting again, this can analogize with reference to above-mentioned explanation and learns, then repeats no more at this.

Furthermore, data processing unit 920 can be inputted a plurality of relevant entrys (for example will record the numeric data of each subfield 308 in 302 title blocks 304) and first of each data the first meaning of one's words information corresponding to voice SP1 compare (for example comparing with keyword 108).When the first meaning of one's words information of these relevant entrys of each data and the first input voice SP1 is at least part of to seasonable (for example part coupling), then data are considered as the first input corresponding data of voice SP1 (for example generation matching result of Fig. 3 A/3B).Wherein, if the type of data is music file, then relevant entry can comprise song title, singer, album name, publication time, broadcast order ... Deng; If the type of data is image file, then relevant entry can comprise film title, publication time, staff's (comprising the performance personnel) ... Deng; If the type of data is web page files, then relevant entry can comprise web site name, type of webpage, corresponding user's account ... Deng; If the type of data is picture file, then relevant entry can comprise picture name, pictorial information ... Deng; If the type of data is business card file, then relevant entry can comprise contact man's title, contact man's phone, contact man address ... Deng.With explanation, and can decide according to practical application by data type for for example for above-mentioned data type, and the embodiment of the invention is not as limit.

Then, data processing unit 920 can judge the second input the second meaning of one's words information corresponding to voice SP2 whether comprise an order vocabulary of indication order (for example " I want the 3rd option " or " I select the 3rd ").When the second meaning of one's words information corresponding to the second input voice SP2 comprised the order vocabulary of indication order, then data processing unit 920 selected to be positioned at the data of correspondence position in data list according to order vocabulary.When the second meaning of one's words information corresponding to the second input voice SP2 does not comprise the order vocabulary of indication order, the meaning of one's words information that then data processing unit 920 is corresponding with these relevant entrys of each data in the data list and the second input voice SP2 compares to determine a plurality of degree of correspondence of these data and the second input voice SP2, and can be according to these data in these degree of correspondence determination datas tabulations whether corresponding second input voice SP2.In one embodiment of this invention, data processing unit 920 can be according to degree of correspondence, and one of them corresponding second input voice SP2 of data is to simplify the flow process of selecting in the determination data tabulation.Wherein, degree of correspondence is corresponding the second input voice SP2 for the soprano in data processing unit 920 selecting datas.

For instance, if the first input voice SP1 is " today, weather how ", after carrying out speech recognition and natural language processing, the first input the first meaning of one's words information corresponding to voice SP1 can comprise that " today " reaches " weather ", therefore data processing unit 920 can read the data of correspondence weather today, and shows the data list of these weather datas by display unit 930.Then, if the second input voice SP2 is " I will see the 3rd data " or " I select the 3rd ", after carrying out speech recognition and natural language processing, the second meaning of one's words information corresponding to the second input voice SP2 can comprise " the 3rd ", can be read as the order vocabulary of indication order this " the 3rd ", therefore the 3rd data during data processing unit 920 meeting reading out datas are tabulated, and show corresponding Weather information by display unit 930 again.Perhaps, if the second input voice SP2 is " I will see Pekinese's weather " or " I select Pekinese's weather ", after carrying out speech recognition and natural language processing, the second meaning of one's words information corresponding to the second input voice SP2 can comprise that " Beijing " reaches " weather ", so data processing unit 920 can read corresponding Pekinese data in the position data tabulation.When being 1, selected data number shows corresponding Weather information by display unit 930; When selected data number greater than 1, show again that then further data list selects for the user.

If the first input voice SP1 is " I will phone Lao Zhang ", after carrying out speech recognition and natural language processing, the first meaning of one's words information corresponding to the first input voice SP1 can comprise that " phone " reaches " opening ", therefore data processing unit 920 can read contact man's data that corresponding surname " is opened ", and shows the data list of these contact man's data by display unit 930.Then, if the second input voice SP2 is " the 3rd Lao Zhang " or " I select the 3rd ", after carrying out speech recognition and natural language processing, the second meaning of one's words information corresponding to the second input voice SP2 can comprise " the 3rd ", can be read as the order vocabulary of indication order this " the 3rd ", therefore the 3rd data during data processing unit 920 meeting reading out datas are tabulated, and the selected data of foundation are pulled and connected.Perhaps, if the second input voice SP2 is " I select 139 beginnings ", after carrying out speech recognition and natural language processing, the second meaning of one's words information corresponding to the second input voice SP2 can comprise that " 139 " reach " beginning ", can not be read as indication order vocabulary sequentially in these " 139 ", so telephone number is contact man's data of 139 beginnings in the data processing unit 920 meeting reading out datas tabulations; If the second input voice SP2 is " I want the Lao Zhang of Pekinese ", after carrying out speech recognition and natural language processing, the second meaning of one's words information corresponding to the second input voice SP2 can comprise that " Beijing " reaches " opening ", and the address is the contact man of Pekinese data in the data processing unit 920 meeting reading out data tabulations.When selected data number is 1, then pull and connect according to selected data; When selected data number greater than 1, show again that then further data list selects for the user.

If the first input voice SP1 is " I will look for the dining room ", after carrying out speech recognition and natural language processing, the first meaning of one's words information of the first input voice SP1 can comprise " dining room ", data processing unit 920 can read the data in all dining rooms, because such indication is not very clear and definite, so will show that the data list of these dining room data give the user by display unit 930, and etc. the user further indicate.Then, when if the user inputs " the 3rd dining room " or " I select the 3rd " by the second input voice SP2, after carrying out speech recognition and natural language processing, the second meaning of one's words information corresponding to the second input voice SP2 can comprise " the 3rd ", can be read as the order vocabulary of indication order this " the 3rd ", therefore the 3rd data during data processing unit 920 meeting reading out datas are tabulated, and the selected data of foundation show.Perhaps, if the second input voice SP2 is " I select nearest ", after carrying out speech recognition and natural language processing, the second input the second meaning of one's words information corresponding to voice SP2 can comprise " nearest ", thus data processing unit 920 can the reading out datas tabulation in the nearest dining room data of address and user; If the second input voice SP2 is " I want the dining room, Pekinese ", after carrying out speech recognition and natural language processing, the second meaning of one's words information corresponding to the second input voice SP2 can comprise that " Beijing " reaches " dining room ", so the address is dining room, Pekinese data in the data processing unit 920 meeting reading out data tabulations.When selected data number is 1, then show according to selected data; When selected data number greater than 1, show again that then further data list selects for the user.

According to above-mentioned, data processing unit 920 carries out corresponding operation according to the type of selected data.For instance, when the type of selected data is a music file, then data processing unit 920 carries out music according to selected data; When the type of selected data is an image file, then data are processed Unit 920 and are carried out image according to selected data and play; When the type of selected data is a web page files, then data processing unit 920 shows according to selected data; When the type of selected data is a picture file, then data processing unit 920 carries out Image Display according to selected data; When the type of selected data is a business card file, then data processing unit 920 is pulled and connected according to selected data.

Figure 10 is the system schematic according to the infosystem of one embodiment of the invention.Please refer to Fig. 9 and Figure 10, in the present embodiment, infosystem 1000 comprises mobile terminal apparatus 1010 and server 1020, and wherein server 1020 can be cloud server, LAN server or other similar device, but the embodiment of the invention is not as limit.Mobile terminal apparatus 1010 comprises voice receiving unit 1011, data processing unit 1013 and display unit 1015.Data processing unit 1013 couples voice receiving unit 1011, display unit 1015 and server 1020.Mobile terminal apparatus 1010 can be mobile phone (Cell phone), personal digital assistant (Personal Digital Assistant, PDA) device for mobile communication such as mobile phone, intelligent mobile phone (Smart phone), the present invention is not also limited this.The functional similarity of voice receiving unit 1011 is in voice receiving unit 910, and the functional similarity of display unit 1015 is in display unit 930.Server 1020 is in order to store a plurality of data and to have speech identifying function.

In the present embodiment, data processing unit 1013 can carry out speech recognition to produce the first word string by 1020 pairs first inputs of server voice SP1, again the first word string is carried out voice SP1 is inputted in natural language processing to produce correspondence first the first meaning of one's words information, and server 1020 can be selected the part of correspondence and be sent to data processing unit 1013 according to the first meaning of one's words information corresponding to the first input voice SP1 from these stored data.When the quantity of the data of selecting was 1, data processing unit 1013 can carry out corresponding operation according to selected data type; When the quantity of the data of selecting greater than 1 the time, data processing unit 1013 shows that according to the Data Control display unit 1015 of selecting data lists give the user, and waits the user and further indicate.After the user inputs indication again, then, data processing unit 1013 can carry out speech recognition to produce the second word string by 1020 pairs second inputs of server voice PS2, again the second word string is carried out voice SP2 is inputted in natural language processing to produce correspondence second the second meaning of one's words information, and server 1020 is according to selecting the part of correspondence and be sent to data processing unit 1013 in the second meaning of one's words information these data from data list corresponding to the second input voice SP2.Similarly, when the quantity of the data of selecting was 1, data processing unit 920 can carry out corresponding operation according to the type of selected data; When the quantity of the data of selecting greater than 1 the time, data processing unit 1013 can show data lists according to the Data Control display unit 1015 of selecting again.Then, server 1020 can be selected corresponding part according to follow-up the second input voice SP2 again, and data processing unit 1013 can carry out corresponding operation according to the quantity of the data of selecting again, and this can analogize with reference to above-mentioned explanation and learns, then repeats no more at this.

It should be noted, in one embodiment, be 1 o'clock if input the selected data bulk of the first meaning of one's words information corresponding to voice SP1 according to first, can directly carry out operation corresponding to these data.In addition, in another embodiment, can export first a prompting and give the user, will be performed with the respective operations of notifying user-selected data.Moreover, in another embodiment, also can being according to the selected data bulk of the second meaning of one's words information corresponding to the second input voice SP2, directly carry out operation corresponding to these data at 1 o'clock.Certainly, in another embodiment, can also export first a prompting and give the user, will be performed with the respective operations of notifying user-selected data, the present invention is not limited this.

Furthermore, server 1020 can first meaning of one's words information corresponding with the first input voice SP1 be compared a plurality of relevant entrys of each data.First meaning of one's words information corresponding with the first input voice SP1 when these relevant entrys of each data is at least part of at once, then data is considered as the corresponding data of the first input voice SP1.If inputting the selected data bulk of the first meaning of one's words information corresponding to voice SP1 according to first is 1 o'clock, the user may be again by the second input voice SP2 input indication.Since the indication of inputting by the second input voice SP2 user at this moment may comprise order (selecting in the demonstration information which to wait order in order to indication), directly certain one in the selected demonstration information (for example directly the content of a certain information of indication) or according to the intention of indicating the judgement user (for example choose nearest dining room, just use demonstration " recently " the dining room to the user).So server 1020 will judge then whether the second meaning of one's words information corresponding to the second input voice SP2 comprises an order vocabulary of indication order.When the second meaning of one's words information corresponding to the second input voice SP2 comprised the order vocabulary of indication order, then server 1020 selected to be positioned at the data of correspondence position in data list according to order vocabulary.When the second meaning of one's words information corresponding to the second input voice SP2 does not comprise the order vocabulary of indication order, the second meaning of one's words information that then server 1020 is corresponding with these relevant entrys of each data in the data list and the second input voice SP2 compares to determine the degree of correspondence of these data and the second input voice SP2, and can be according to these data in these degree of correspondence determination datas tabulations whether corresponding second input voice SP2.In one embodiment of this invention, server 1020 can be according to one of them corresponding second input voice SP2 of these data in these degree of correspondence determination data tabulations, to simplify the flow process of selecting.Wherein, server 1020 can select in these data degree of correspondence to be corresponding the second input voice SP2 for the soprano.

Figure 11 is the process flow diagram based on the system of selection of speech recognition according to one embodiment of the invention.Please refer to Figure 11, in the present embodiment, can receive the first input voice (step S1100), and the first input voice 901 are carried out speech recognition to produce the first word string (step S1110), again the first word string is carried out natural language processing to produce the first meaning of one's words information (step S1120) of corresponding the first input voice.Then, can from a plurality of data, select corresponding part (step S1130) according to the first meaning of one's words information, and judge whether the quantity of the data of selecting is 1 (step S1140).When the quantity of the data of selecting is 1, that is the judged result of step S1140 is "Yes", then carries out corresponding operation (step S1150) according to the type of selected data.When the quantity of the data of selecting greater than 1 the time, that is the judged result of step S1140 is "No", show data list and receive the second input voice (step S1160) according to the data of selecting, and the second input voice are carried out speech recognition to produce the second word string (step S1170), again the second word string is carried out natural language processing to produce the second meaning of one's words information (step S1180) of corresponding the second input voice.Then, according to selecting corresponding part in the second meaning of one's words information these data from data list, return step S1140 judgement and judge whether the quantity of the data of selecting is 1.Wherein, the order of above-mentioned steps is that the embodiment of the invention is not as limit in order to explanation.And the details of above-mentioned steps can with reference to Fig. 9 and Figure 10 embodiment, then repeat no more at this.

In sum, system of selection and mobile terminal apparatus and the infosystem based on speech recognition of the embodiment of the invention, it is to the first input voice and the second input voice carry out speech recognition and natural language processing is inputted meaning of one's words information corresponding to voice to confirm the first input voice and second, according to the first input voice and meaning of one's words information corresponding to the second input voice data is selected again.By this, can promote the convenience of user's operation.

Next for the disclosed natural language understanding system 100 of the present invention and structured database 220 framework such as grade and members, the operational instances of arranging in pairs or groups mutually with auxiliary actuating apparatus explains.

Figure 12 is the calcspar of the speech control system that illustrates according to one embodiment of the invention.Please refer to Figure 12, speech control system 1200 comprises auxiliary actuating apparatus 1210, mobile terminal apparatus 1220 and server 1230.In the present embodiment, auxiliary actuating apparatus 1210 can by wireless signal transmission, start the voice system of mobile terminal apparatus 1220, so that mobile terminal apparatus 1220 is linked up according to voice signal and server 1230.

Specifically, auxiliary actuating apparatus 1210 comprises the first wireless transport module 1212 and trigger module 1214, and wherein trigger module 1214 is coupled to the first wireless transport module 1212.The first wireless transport module 1212 for example is to support wireless compatibility authentication (Wireless fidelity, Wi-Fi), global intercommunication microwave access (Worldwide Interoperability for Microwave Access, WiMAX), bluetooth (Bluetooth), ultra broadband (ultra-wideband, UWB) or radio-frequency (RF) identification (Radio-frequency identification, the device of communications protocol such as RFID), it can send wireless signal transmission, to correspond to each other with another wireless transport module and to set up wireless link.Trigger module 1214 is such as being button, button etc.In the present embodiment, after the user presses this trigger module 1214 generations one trigger pip, the first wireless transport module 1212 receives this trigger pip and starts, this moment, the first wireless transport module 1212 can send wireless signal transmission, and transmitted this wireless signal transmission to mobile terminal apparatus 1220 by the first wireless transport module 1212.In one embodiment, above-mentioned auxiliary actuating apparatus 1210 can be a bluetooth earphone.

Although it should be noted that the earphone/of some hand-free also has the design that starts mobile terminal apparatus 1220 some function at present, in the another embodiment of the present invention, auxiliary actuating apparatus 1210 can be different from above-mentioned earphone/.Online by with mobile terminal apparatus of above-mentioned earphone/, listen/converse to replace the earphone/ on the mobile terminal apparatus 1220, start-up performance is additional design, but the auxiliary actuating apparatus 1210 of this case " only " for the voice system of opening mobile terminal apparatus 1220, do not have the function of listening/conversing, so inner circuit design can be simplified, cost is also lower.In other words, for above-mentioned hands-free headsets/microphone, auxiliary actuating apparatus 1210 is other devices, and namely the user may possess the earphone/of hand-free and the auxiliary actuating apparatus 1210 of this case simultaneously.

In addition, the body of above-mentioned auxiliary actuating apparatus 1210 can be the articles for use that the user can reach conveniently, ornaments such as ring, wrist-watch, earrings, necklace, glasses, be various carry-on Portable article, or installation component, for example for being disposed at the driving accessory on the bearing circle, be not limited to above-mentioned.That is to say that auxiliary actuating apparatus 1210 is " life-stylize " device, by the setting of built-in system, allow the user can touch easily trigger module 1214, with the opening voice system.For instance, when the body of auxiliary actuating apparatus 1210 was ring, the user easily moveable finger trigger module 1214 of pressing ring was triggered it.On the other hand, the body when auxiliary actuating apparatus 1210 is that the user also can trigger the trigger module 1214 of driving accessory device during the road easily when being disposed at the device of driving accessory.In addition, compared to the discomfort of wearing earphone/and listening/converse, use the auxiliary actuating apparatus 1210 of this case the voice system in the mobile terminal apparatus 1220 can be opened, even and then open sound amplification function (then will describe in detail), so that the user need not wear earphone/, still can directly listen/converse by mobile terminal apparatus 1220.In addition, for the user, these " life-stylize " auxiliary actuating apparatus 1210 article for originally wearing or use, so do not have in the use the uncomfortable or problem of discomfort, namely do not need the adaptation of taking time.For instance, when the user cooks in the kitchen, in the time of need to dialing the mobile phone that is positioned over the parlor, suppose its wear have ring, the auxiliary actuating apparatus of the present invention 1210 of necklace or wrist-watch body, just can touch ring, necklace or wrist-watch with the opening voice system with inquiry friend recipe details.Can also reach above-mentioned purpose although partly have at present the earphone/of start-up performance, but in the process of at every turn cooking, be not all to need to call to consult the friend at every turn, so for the user, wear at any time earphone/and cook, can say suitable inconvenience in order to controlling mobile terminal apparatus at any time.

In other embodiments, auxiliary actuating apparatus 1210 also may be configured with wireless charging battery 1216, in order to drive the first wireless transport module 1212.Furthermore, wireless charging battery 1216 comprises battery unit 12162 and wireless charging module 12164, and wherein wireless charging module 12164 is coupled to battery unit 12162.At this, wireless charging module 12164 can receive the energy of supplying from a wireless power supply (not illustrating), and is that electric power comes battery unit 12162 charging with this energy conversion.Thus, the first wireless transport module 1212 of auxiliary actuating apparatus 1210 can charge by wireless charging battery 1216 expediently.

On the other hand, mobile terminal apparatus 1220 for example is mobile phone (Cell phone), personal digital assistant (Personal Digital Assistant, PDA) mobile phone, intelligent mobile phone (Smart phone), or palmtop computer (Pocket PC), Tablet PC (Tablet PC) or notebook computer of bitcom etc. are installed.Mobile terminal apparatus 1220 can be any Portable (Portable) mobile device that possesses communication function, does not limit its scope at this.In addition, mobile terminal apparatus 1220 can use Android operating system, microsoft operating system, Android operating system, (SuSE) Linux OS etc., is not limited to above-mentioned.

Mobile terminal apparatus 1220 comprises the second wireless transport module 1222, the second wireless transport module 1222 can be complementary with the first wireless transport module 1212 of auxiliary actuating apparatus 1210, and adopt corresponding home control network communication protocol (communications protocol such as wireless compatibility authentication, global intercommunication microwave access, bluetooth, ultra-wide band communication agreement or radio-frequency (RF) identification), use with the first wireless transport module 1212 and set up wireless link.It should be noted that described herein " first " wireless transport module 1212, " second " wireless transport module 1222 is to illustrate that wireless transport module is disposed at different devices, is not to limit the present invention.

In other embodiments, mobile terminal apparatus 1220 also comprises voice system 1221, this voice system 1221 is coupled to the second wireless transport module 1222, so after the user triggers the trigger module 1214 of auxiliary actuating apparatus 1210, can wirelessly start voice system 1221 by the first wireless transport module 1212 and the second wireless transport module 1222.In one embodiment, this voice system 1221 can comprise phonetic sampling module 1224, voice synthetic module 1226 and voice output interface 1227.Phonetic sampling module 1224 is in order to receiving the voice signal from the user, and this phonetic sampling module 1224 is such as the device that be the reception message such as microphone (Microphone).Voice synthetic module 1226 can be inquired about a speech database for speech synthesis, and this speech database for speech synthesis for example be record literal with and the information of corresponding voice, so that voice synthetic module 1226 can be found out the voice corresponding to specific character message, so that word message is carried out phonetic synthesis.Afterwards, voice synthetic module 1226 can with synthetic voice by 1227 outputs of voice output interface, be used to play and give the user.Above-mentioned voice output interface 1227 is such as being loudspeaker or earphone etc.

In addition, mobile terminal apparatus 1220 also may be configured with communication module 128.Communication module 1228 for example is can transmit and the element that receives wireless signal, such as radio-frequency (RF) transceiver.Furthermore, communication module 1228 can allow the user answer or call or use other service that telecommunication operator provides by mobile terminal apparatus 1220.In the present embodiment, communication module 1228 can be by the response message of the Internet reception from server 1230, and set up calling on-line between mobile terminal apparatus 1220 and at least one electronic installation according to this response message, wherein said electronic installation for example is another mobile terminal apparatus (not illustrating).

Server 1230 is such as being the webserver or cloud server etc., and it has speech understanding module 1232.In the present embodiment, speech understanding module 1232 comprises sound identification module 12322 and speech processing module 12324, and wherein speech processing module 12324 is coupled to sound identification module 12322.At this, sound identification module 12322 can receive the voice signal that transmits from phonetic sampling module 1224, voice signal is converted to a plurality of segmentations semantic (such as keyword or words and expressions etc.).12324 of speech processing module can parse according to these segmentations semantemes mean (such as intention, time, place etc.) of the semantic representatives of these segmentations, and then judge the meaning represented in the above-mentioned voice signal.In addition, speech processing module 12324 also can produce corresponding response message according to the result who resolves.In the present embodiment, speech understanding module 1232 can be come implementation by the hardware circuit that or several logic locks combine, and can also be to come implementation with computer program code.It is worth mentioning that in another embodiment, speech understanding module 1232 is configurable in mobile terminal apparatus 1320, speech control system 1300 as shown in figure 13.The operation of the speech understanding module 1232 of above-mentioned server 1230 can be such as the natural language understanding system 100 of Figure 1A, the natural language dialogue system 500/700/700 ' of Fig. 5 A/7A/7B.

The method that the above-mentioned speech control system 1200 plain language sounds of below namely arranging in pairs or groups are controlled.Figure 14 is the process flow diagram of the speech control method that illustrates according to one embodiment of the invention.Please be simultaneously with reference to Figure 12 and Figure 14, in step 1402, auxiliary actuating apparatus 1210 sends wireless signal transmission to mobile terminal apparatus 1220.Detailed explanation is that when the first wireless transport module 1212 of auxiliary actuating apparatus 1210 was triggered because receiving a trigger pip, this auxiliary actuating apparatus 1210 can send wireless signal transmission to mobile terminal apparatus 1220.Particularly, when the trigger module 1214 in the auxiliary actuating apparatus 1210 is pressed by the user, this moment, trigger module 1214 meetings be triggered because of trigger pip, and make the first wireless transport module 1212 send wireless signal transmission to the second wireless transport module 1222 of mobile terminal apparatus 1220, use so that the first wireless transport module 1212 links by home control network communication protocol and the second wireless transport module 1222.Above-mentioned auxiliary actuating apparatus 1210 only is used for opening the voice system of mobile terminal apparatus 1220, does not have the function of listening/conversing, so inner circuit design can be simplified, cost is also lower.In other words, for the additional hands-free headsets/microphone of general mobile terminal apparatus 1220, auxiliary actuating apparatus 1210 is another devices, and namely the user may possess the earphone/of hand-free and the auxiliary actuating apparatus 1210 of this case simultaneously.

It is worth mentioning that, the body of above-mentioned auxiliary actuating apparatus 1210 can be the articles for use that the user can reach conveniently, various carry-on Portable article such as ring, wrist-watch, earrings, necklace, glasses, or installation component, for example for being disposed at the driving accessory on the bearing circle, be not limited to above-mentioned.That is to say that auxiliary actuating apparatus 1210 is " life-stylize " device, by the setting of built-in system, allow the user can touch easily trigger module 1214, with opening voice system 1221.Therefore, use the auxiliary actuating apparatus 1210 of this case the voice system 1221 in the mobile terminal apparatus 1220 can be opened, even and then open sound amplification function (then will describe in detail), so that the user need not wear earphone/, still can directly listen/converse by mobile terminal apparatus 1220.In addition, for the user, these " life-stylize " auxiliary actuating apparatus 1210 article for originally wearing or use, so do not have in the use the uncomfortable or problem of discomfort.

In addition, the first wireless transport module 1212 and the second wireless transport module 1222 all can be in sleep pattern or mode of operation.Wherein, it is closed condition that sleep pattern refers to wireless transport module, that is wireless transport module can not receive/the detecting wireless signal transmission, and can't link with other wireless transport module.It is opening that mode of operation refers to wireless transport module, that is wireless transport module detecting wireless signal transmission constantly, or sends at any time wireless signal transmission, and can link with other wireless transport module.At this, when trigger module 1214 is triggered, if the first wireless transport module 1212 is in sleep pattern, then trigger module 1214 can wake the first wireless transport module 1212 up, make the first wireless transport module 1212 enter mode of operation, and make the first wireless transport module 1212 send wireless signal transmission to the second wireless transport module 1222, and allow the first wireless transport module 1212 link by the second wireless transport module 1222 of home control network communication protocol and mobile terminal apparatus 1220.

On the other hand, continue to maintain mode of operation and consume too much electric power for fear of the first wireless transport module 1212, in the Preset Time after the first wireless transport module 1212 enters mode of operation (for example being 5 minutes), if trigger module 1214 is not triggered again, then the first wireless transport module 1212 can enter sleep pattern from mode of operation, and stops to link with the second wireless transport module 1220 of mobile terminal apparatus 1220.

Afterwards, in step 1404, the second wireless transport module 1222 of mobile terminal apparatus 1220 can receive wireless signal transmission, to start voice system 1221.Then, in step S1406, when the second wireless transport module 1222 detects wireless signal transmission, mobile terminal apparatus 1220 can start

voice system

1221, and 1221 sampling modules 1224 of voice system can begin received speech signal, for example " temperature several years today ", " phone Lao Wang.", " ask enquiring telephone number." etc.

In step S1408, phonetic sampling module 1224 can be sent to above-mentioned voice signal the speech understanding module 1232 in the server 1230, to resolve voice signal and to produce response message by speech understanding module 1232.Furthermore, sound identification module 12322 in the speech understanding module 1232 can receive the voice signal from phonetic sampling module 1224, and it is semantic that voice signal is divided into a plurality of segmentations, speech processing module 12324 then can be carried out speech understanding to above-mentioned segmentation semanteme, to produce the response message in order to the voice responsive signal.

In another embodiment of the present invention, mobile terminal apparatus 1220 more can receive the response message that speech processing module 12324 produces, and perhaps carries out the operation that response message is assigned by interior in the voice output interface 1227 output response messages according to this.In step S1410, the voice synthetic module 1226 of mobile terminal apparatus 1220 can receive the response message that speech understanding module 1232 produces, and carry out phonetic synthesis according to the content in the response message (such as vocabulary or words and expressions etc.), and produces voice answer-back.And in step S1412, voice output interface 1227 can receive and export this voice answer-back.

For example, when the user presses trigger module 1214 in the

auxiliary actuating apparatus

1210,1212 of the first wireless transport modules can send wireless signal transmission to the second wireless transport module 1222, so that mobile terminal apparatus 1220 starts the phonetic sampling module 1224 of voice system 1221.At this, suppose that the voice signal from the user is an inquiry sentence, for example " temperature several years today ", then phonetic sampling module 1224 just can receive and the speech understanding module 1232 that this voice signal is sent in the server 1230 is resolved, and speech understanding module 1232 can send back mobile terminal apparatus 1220 with resolving the response message that produces.Suppose that the content in the response message that speech understanding module 1232 produces is " 30 ℃ ", then voice synthetic module 1226 can synthesize voice answer-back with the message of these " 30 ℃ ", and voice output interface 1227 can should be reported these voice to the user.

In another embodiment, suppose that the voice signal from the user is an imperative sentence, for example " phone Lao Wang.", then can identify this imperative sentence in the speech understanding module 1232 and be " dialing to the request of Lao Wang ".In addition, speech understanding module 1232 can produce new response message again, and for example " whether PLSCONFM sets aside Lao Wang ", and the response message that this is new is sent to mobile terminal apparatus 1220.At this, voice synthetic module 1226 can synthesize voice answer-back by the response message that this is new, and reports in the user by voice output interface 1227.Further say, when the user reply sure answer for "Yes" and so on the time, similarly, phonetic sampling module 1224 can receive and transmit this voice signal to server 1230, to allow speech understanding module 1232 resolve.After speech understanding module 1232 is resolved and finished, just can record a dialing command information at response message, and be sent to mobile terminal apparatus 1220.At this moment, the contact information that 1228 of communication modules can record according to call database inquires the telephone number of " Lao Wang ", and setting up the calling on-line between mobile terminal apparatus 1220 and another electronic installation, that is " Lao Wang " given in dialing.

In other embodiments, except above-mentioned speech control system 1200, also can utilize speech control system 1300 or other similar system, carry out above-mentioned method of operating, not be limited with the above embodiments.

In sum, in the speech control system and method for present embodiment, auxiliary actuating apparatus can wirelessly be opened the phonetic function of mobile terminal apparatus.And, the body of this auxiliary actuating apparatus can be the user conveniently can and " life-stylize " and articles for use, ornaments such as ring, wrist-watch, earrings, necklace, glasses, be various carry-on Portable article, or installation component, for example for being disposed at the driving accessory on the bearing circle, be not limited to above-mentioned.Thus, compared to the discomfort of wearing in addition at present hands-free headsets/microphone, will be more convenient with the voice system that the auxiliary actuating apparatus 1210 of this case is opened in the mobile terminal apparatus 1220.

It should be noted that above-mentioned server 1230 with speech understanding module may be the webserver or cloud server, and cloud server may relate to the problem of user's the right of privacy.For example, the user need upload complete address book to cloud server, just can finish as calling, send out the operation relevant with address book such as news in brief.Even it is online that cloud server adopt to be encrypted, and instant biography do not preserve, and the load that still is difficult to eliminate the user is excellent.Accordingly, below provide method and the corresponding voice interactive system thereof of another kind of speech control, mobile terminal apparatus can in the situation of not uploading complete address book, be carried out the interactive voice service with cloud server.In order to make content of the present invention more clear, below the example that really can implement according to this as the present invention especially exemplified by embodiment.

Although the present invention discloses as above with embodiment; so it is not to limit the present invention; have in the technical field under any and usually know the knowledgeable; without departing from the spirit and scope of the present invention; when doing a little change and retouching, so protection scope of the present invention is as the criterion when looking appended the claim scope person of defining.

Claims

1. system of selection based on speech recognition comprises:

Receive one first input voice;

These the first input voice are carried out speech recognition to produce one first word string;

This first word string is carried out natural language processing to produce one first semantic analysis that should the first input voice;

From a plurality of data, select corresponding part according to this first semantic analysis;

When the quantity of the data of selecting is 1, carry out corresponding operation according to the type of selected data;

When the quantity of the data of selecting greater than 1 the time, show a data list and receive one second input voice according to the data of selecting;

These the second input voice are carried out speech recognition to produce one second word string;

This second word string is carried out natural language processing to produce should one second semantic analysis of the second input voice; And

According to selecting corresponding part in this second semantic analysis these a plurality of data from data list.

2. the system of selection based on speech recognition according to claim 1, wherein from a plurality of data, select the step of corresponding part to comprise according to this first semantic analysis:

With respectively a plurality of relevant entrys and this first semantic analysis of these a plurality of data are compared; And

When respectively these a plurality of relevant entrys and this first semantic analysis of these a plurality of data are at least part of at once, then these data are considered as the corresponding data of these the first input voice.

3. the system of selection based on speech recognition according to claim 1, wherein according to selecting the step of corresponding part to comprise in this second semantic analysis these a plurality of data from data list:

Judge whether this second semantic analysis comprises an order vocabulary of indication order;

When this second semantic analysis comprises this order vocabulary of indication order, then in this data list, select to be positioned at the data of correspondence position according to this order vocabulary;

When this second semantic analysis does not comprise this order vocabulary of indication order, then with respectively these a plurality of relevant entrys of these a plurality of data and this second semantic analysis compare to determine that these a plurality of data and this second input a plurality of degree of correspondence of voice in this data list; And

One of them that determines this a plurality of data in this data list according to this a plurality of degree of correspondence is to should second inputting voice.

4. the system of selection based on speech recognition according to claim 3, one of them that wherein determines those data in this data list according to this a plurality of degree of correspondence is to should the second step of inputting voice comprising:

Selecting in these a plurality of data this degree of correspondence is to should the second input voice for the soprano.

5. the system of selection based on speech recognition according to claim 1 wherein comprises according to the step that the type of selected data is carried out corresponding operation:

When the type of selected data is a music file, then selected data are carried out music;

When the type of selected data is an image file, then selected data are carried out image and play;

When the type of selected data is a web page files, then selected data are shown;

When the type of selected data is a picture file, then selected data are carried out Image Display; And

When the type of selected data is a business card file, then selected data are pulled and connected.

6. mobile terminal apparatus comprises:

One voice receiving unit receives one first input voice and one second input voice;

One display unit is in order to show a data list;

One storage unit is in order to store a plurality of data; And

One data processing unit, couple this voice receiving unit, this display unit and this storage unit, this data processing unit carries out speech recognition to produce one first word string to these the first input voice, this first word string is carried out natural language processing to produce one first semantic analysis that should the first voice, and from these a plurality of data, select corresponding part according to this first semantic analysis, when the quantity of the data of selecting is 1, this data processing unit carries out corresponding operation according to the type of selected data, when the quantity of the data of selecting greater than 1 the time, this data processing unit shows this data list according to this display unit of Data Control of selecting, and this data processing unit carries out speech recognition to produce one second word string to these second voice, this second word string is carried out natural language processing producing one second semantic analysis that should the second input voice, and according to the part of selecting correspondence in this second semantic analysis this a plurality of data from this data list.

7. mobile terminal apparatus according to claim 6, wherein this data processing unit respectively compare by a plurality of relevant entrys and this first semantic analysis of these a plurality of data, when respectively these a plurality of relevant entrys and this first semantic analysis of these a plurality of data are at least part of at once, then these data are considered as the corresponding data of these the first input voice.

8. mobile terminal apparatus according to claim 6, wherein this data processing unit judges whether this second semantic analysis comprises an order vocabulary of indication order, when this second semantic analysis comprises this order vocabulary of indication order, then this data processing unit selects to be positioned at the data of correspondence position in this data list according to this order vocabulary, when this second semantic analysis does not comprise this order vocabulary of indication order, then this data processing unit with in this data list respectively these a plurality of relevant entrys of these a plurality of data and this second semantic analysis compare to determine a plurality of degree of correspondence of these a plurality of data and these the second input voice, and determine this a plurality of data in this data list according to this a plurality of degree of correspondence one of them to should second inputting voice.

9. mobile terminal apparatus according to claim 8, wherein to select in this a plurality of data this degree of correspondence be to should second inputting voice for the soprano to this data processing unit.

10. mobile terminal apparatus according to claim 6, wherein the type when selected data is a music file, then this data processing unit carries out music according to selected data, when the type of selected data is an image file, then this data processing unit carries out the image broadcast according to selected data, when the type of selected data is a web page files, then this data processing unit shows according to selected data, when the type of selected data is a picture file, then this data processing unit carries out Image Display according to selected data, and when the type of selected data be a business card file, then data processing unit is pulled and connected according to selected data.

11. an infosystem comprises:

One server is in order to store a plurality of data and to have speech identifying function; And

A kind of mobile terminal apparatus comprises:

One display unit is in order to show a data list;

One data processing unit, couple this voice receiving unit, this display unit and this server, this data processing unit carries out speech recognition to produce one first word string by this server to these the first input voice, the first word string is carried out natural language processing to produce one first semantic analysis that should the first input voice, and this server is selected corresponding part and is sent to this data processing unit according to this first semantic analysis from these a plurality of data, when the quantity of the data of selecting is 1, this data processing unit carries out corresponding operation according to the type of selected data, when the quantity of the data of selecting greater than 1 the time, this data processing unit shows this data list according to this display unit of Data Control of selecting, and this data processing unit carries out speech recognition to produce one second word string by this server to these the second input voice, the second word string is carried out natural language processing to produce one second justice that should the second input voice is analyzed, and this server is according to selecting the part of correspondence in this second semantic analysis this a plurality of data from this data list and being sent to this data processing unit.

12. infosystem according to claim 11, wherein this server respectively compare by a plurality of relevant entrys and this first semantic analysis of these a plurality of data, when respectively these a plurality of relevant entrys and this first semantic analysis of these a plurality of data are at least part of at once, then these data are considered as the corresponding data of these the first input voice.

13. infosystem according to claim 11, wherein this server judges whether this second semantic analysis comprises an order vocabulary of indication order, when this second semantic analysis comprises this order vocabulary of indication order, then this server selects to be positioned at the data of correspondence position in this data list according to this order vocabulary, when this second semantic analysis does not comprise this order vocabulary of indication order, then this server with in this data list respectively these a plurality of relevant entrys of these a plurality of data and this second semantic analysis compare to determine a plurality of degree of correspondence of these a plurality of data and these the second input voice, and determine this a plurality of data in this data list according to this a plurality of degree of correspondence one of them to should second inputting voice.

14. infosystem according to claim 13, wherein to select in this a plurality of data this degree of correspondence be to should second inputting voice for the soprano to this this server.

15. infosystem according to claim 11, wherein the type when selected data is a music file, then this data processing unit carries out music according to selected data, when the type of selected data is an image file, then this data processing unit carries out the image broadcast according to selected data, when the type of selected data is a web page files, then this data processing unit shows according to selected data, when the type of selected data is a picture file, then this data processing unit carries out Image Display according to selected data, and when the type of selected data be a business card file, then data processing unit is pulled and connected according to selected data.