CN101178705A - Free-running speech comprehend method and man-machine interactive intelligent system - Google Patents
Free-running speech comprehend method and man-machine interactive intelligent system Download PDFInfo
- Publication number
- CN101178705A CN101178705A CNA2007101957208A CN200710195720A CN101178705A CN 101178705 A CN101178705 A CN 101178705A CN A2007101957208 A CNA2007101957208 A CN A2007101957208A CN 200710195720 A CN200710195720 A CN 200710195720A CN 101178705 A CN101178705 A CN 101178705A
- Authority
- CN
- China
- Prior art keywords
- notion
- state
- meaning
- words
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses natural language understanding method, comprising the steps that: a natural language is matched with a conceptual language symbol after receiving the natural language input by the customer, and then a conception is associated with the conceptual language symbol; a conception which is most suitable to the current language content is selected by being compared with the preset conception dictionary, and then whether the conception is ambiguous is judged; and if the answer is YES, the conception is obtained by a language data base, entering the next step; and if the answer is NO, the conception is obtained based on the principle of language content matched, entering to the next step; a core conception and a sub conception are obtained by a conception reorganization, wherein, the core language meaning of the core conception is defined by an operation of the computer while the sub language meaning of the sub conception is defined by the operation content of the computer; and the complete language meaning is obtained by combining the core language meaning with the sub language meaning. The invention also provides a human-computer interaction intelligent system based on the method provided by the invention. The invention recognizes the natural sound input by the customer more accurately, thereby providing the customer with more intelligent and perfect services.
Description
Technical field
The present invention relates to human-computer interaction technology, be meant a kind of natural language understanding method and human-computer interaction intelligent system especially.
Background technology
The enterprise commerce service is with automatic speech and be that characteristics have been carried out for many years at home manually.Along with the development of internet, mobile network, the online application of enterprise commerce has become new window of enterprise and media.Client's contact mode of existing business application---as rely on traditional manual telephone system to answer and the more and more enterprise commerce requirement incompatibility Internet era the such as static unidirectional enterprise web site, original with automatic speech with manually be that the business service of characteristics has run into following challenge now:
Along with the aggravation of market competition, enterprise constantly releases new products ﹠ services.As the important department of contact customer, Customer Service Center must all have at fingertips to all products ﹠ services that enterprise provides.Therefore, enterprise can carry out special training to the contact staff usually in the early stage that puts out a new product, and along with the quickening of new product speed to introduce, this training frequency also improves (cost increases thus) gradually.Because this information is more and more, has exceeded the degree that the contact staff can grasp sometimes even, the client personnel have to inquire about related data temporarily or inquire other colleagues to this understanding in the time of client's inquiry.This interim information retrieval makes the client wait for the longer time on the one hand, and the cost of each conversation is significantly risen.In some cases, client's phone is transferred to different customer services and represents in the hand, and at this moment the client need constantly repeat the problem of oneself, causes client's trouble and being discontented with customer service.
Developing along with the enterprise market, the products ﹠ services information of decorrelation that more client and potential customers have passed through client service center or enterprise web site, at this moment original Call center of enterprise is because the restriction of hardware system causes the commercial representative can not serve more client simultaneously.At this moment enterprise is faced with a selection, adds large equipment and personnel investment (apply for more multi-line, improve the telecommunication apparatus processing power) or customer revenue.Simultaneously, on the website, can not form online real-time, interactive with the client.This is very backward, the method for service of inefficiency.Can not form good interaction with a large amount of clients in real time.
Because the quickening of work rhythm, quite a lot of user is inconvenient to utilize the Customer Service Center of enterprise in the working time, after they can select to come off duty usually or utilize joint false object time.As enterprise, keep a long customer service time and mean the increase of cost, and, can bring various problems of management in work festivals or holidays for contact staff itself.
Enterprise product marketing and advertisement are the cardinal tasks of enterprise commerce, and the product that existing enterprise is interested in the product whom, who can buy me can not accurately know, the propelling movement of advertisement can not be at effective customers, can not intelligent precision marketing and advertisement pushing
The birth of network intelligence robot:
Now, the network intelligence robot that with the Microsoft is representative is born, in up-to-date definition, network robot is " mode with instant messaging (IM) tool software contact person shows, a kind of information interactive platform that can accompany you to chat, help you to look into information, accompany you to play games as true man ".IM can allow interpersonal contact embody more closely, and coverage rate is extensive, becomes the best carrier of network robot.Exploitation to " network intelligence robot " has become new focus of intelligent Web epoch.
Summary of the invention
In view of this, the present invention proposes a kind of natural language understanding method, the natural-sounding of user's input is discerned more accurately, and provided intelligent more perfect service based on the human-computer interaction intelligent system of this method for the user.
Based on above-mentioned purpose natural language understanding method provided by the invention, comprising:
After receiving the natural language of user's input, the natural language assembly is gone out the conceptual language symbol, notion and conceptual language symbol are associated;
By comparing default notion dictionary, therefrom choose the notion that meets current linguistic context most, judged whether ambiguity, if having, then draw notion by the corpus technology, enter next step; If no, adopt the meaning of one's words to meet collocating principle and directly draw notion, enter next step;
Concept identification also draws key concept and peripheral notion, and key concept is by the clear and definite core meaning of one's words of computer operation itself, and peripheral notion is by the clear and definite peripheral meaning of one's words of computer operation content;
Try to achieve complete semanteme according to the core semanteme and in conjunction with peripheral semanteme.
This method also is provided with the notion dictionary, and the notion dictionary mainly comprises: between the notion and the various relations between the attribute of notion, and the corresponding relation between notion and its linguistic notation;
Described natural language assembly process comprises: use the assembly algorithm that the conceptual language symbol assembly from original text that comprises in the notion dictionary is come out.
Relation between the described notion of this method comprises: last the next, the synonym between the notion, antisense, to justice, whole and part, attribute and host, material and finished product, main body and incident, content and event relation.
This method is described to be chosen the concept process that meets current linguistic context most and adopts the meaning of one's words to meet collocating principle and adopt the linguistic context principle that is consistent.
The described concept identification of this method also comprises the recognizer of utilizing digital quantity, and the digital quantity in the text is identified.
This method is described also to be comprised after trying to achieve complete semanteme: according to the natural language meaning of one's words of setting up and the mapping relations identifying operation content between the computer operation;
Different by operation, identifying operation content from command text, locating content indication resource;
Executable operations, the output result also presents.
The foundation of the mapping relations between the described natural language meaning of one's words of this method and the computer action is to bind realization by the core meaning of one's words and computer operation, operand.
This method adopts sentence pattern method and priori method to try to achieve these two kinds of methods of the meaning of one's words in conjunction with coming the identifying operation content.
The process of the described employing sentence pattern of this method method comprises: the various sentence patterns of collecting each operation, and they are organized be stored in the knowledge base, and use rising space coupling finite-state automata to realize the identification of sentence pattern, afterwards, come content of operation in the recognition command text according to this sentence pattern.
The described resource of this method location comprises: determine file is under which catalogue; What the filename of determining file is.
The described definite file directory of this method, determine that filename is by traveling through the realization of directory tree and matching files name, comprise: use finite-state automata to realize the extraction of keyword in the title, the keyword dictionary is gathered as pattern string, be input in the longest coupling finite-state automata, use finite-state automata to realize " extracting as required " then: each property value of each resource is all joined in the finite-state automata with the form of pattern string, make up automat then, and with its scan command text, then automat navigates to corresponding resource with the source attribute values that occurs in the output command string by property value.
The described computer operation of this method is represented with the ordered sequence that content of operation is formed by operating itself, represent the complete meaning of one's words by the ordered sequence that the core meaning of one's words and the peripheral meaning of one's words are formed, a plurality of concept structures become the complete meaning of one's words, and the complete meaning of one's words of computer understanding also draws operation.
Based on above-mentioned purpose a kind of human-computer interaction intelligent provided by the invention system, comprising: knowledge base server is used to preserve commercial knowledge data and management system frequently-used data;
Artificial intelligence engine is used for the problem of user's input is carried out analyzing and processing and obtained the answer that retrieval is submitted to;
The data statistic analysis unit is used for user and artificial intelligence engine interaction data are carried out statistical study.
From above as can be seen, natural language understanding method provided by the invention is passed through concept coordination, the notion dictionary, technological means such as concept identification are discerned the semantic and periphery semanteme of core of natural language simultaneously, better understand the natural language purposes of user's input, this natural language can be a written form, it also can be speech form, can make the daily employed language of computer understanding people (as Chinese, English), make computing machine understand the implication of natural language, understand the information of people with the natural language input, and then trigger the computing machine associative operation, correctly answer the relevant issues in the input information.
Customer service human-computer interaction intelligent system help enterprise provides intelligent business service efficiently, can bring following many-sided income:
1) reduces the reinvestment of enterprise, tackle the dramatic growth of number of users calmly in the business service field
According to investigation to bank, telecommunications, government and ordinary enterprises Customer Service Center data, among the problem that Call center received very major part be the problem of repetition or the problem of simple consulting, for example function of product, characteristics, working flow process etc.If we can be diverted to this part problem other channels, for example this part user's demand can be satisfied in the website of enterprise.Enterprise all attempts so to do usually, but according to final user's feedback, and they think usually that when the website of visit enterprise website structure too complex, the time that need cost a lot of money seek the content of own needs.
The online man-machine interactive system of enterprise commerce can be understood client's demand by the question and answer of guided bone, gives the answer of a satisfaction of user in the short period of time.And the user only need just can reach the purpose of oneself as chatting with the customer service representative.And because system architecture itself, the online man-machine interactive system of enterprise commerce can realize simultaneously the service to a large number of users.
Therefore, if the user of enterprise increases rapidly, enterprise can utilize customer service man-machine interactive system shunting part counsel user wherein, and what make that enterprise can be calm provides the suitable service that needs for whole users.
2) reduce enterprise commerce operating cost
The online customer service, product marketing, advertisement that can realize 24 hours accurately pushes etc., and enterprise can save more communication cost and personnel cost.
3) reduce client's unhappiness experience-stand-by period, the accuracy of answer
Because the part counseling problem is transferred to the network customer service, enterprise has more more resources (circuit, personnel) to put among the complicated problems.For the client of enterprise, when it dials client service center at every turn, need the time of queuing just shorter, even can connect immediately.
And for the user who uses the network customer service, system can provide relevant the answer immediately according to its problem, and because it is answered based on company standard information, can not have mistake.
4) client is analyzed and the self service ability
System can all customer problems of complete documentation, and enterprise or government can be according to these record analyses client's behavioral characteristics, and for example which problem is that the user is concerned about most, and the specific user is interested in etc. what product/service.And utilize traditional client, these information to be stored in speech form or belong to contact staff's private information, be difficult to the reprocessing (search and analyze) of the information of carrying out.
System can provide more accurately and answer by to Analysis of Common Problem.For the problem that temporarily can't satisfy, enterprise can analyze, thereby provides information for later similar problem.
5) low cost provides round the clock and holiday service
In holiday and next period, the user is ready to use the longer time to solve the problem of product or service, and under traditional customer service affair center mode, enterprise or government will employ more customer service representative on the one hand, average holding time is elongated on the other hand, and corresponding cost can sharply increase.And intelligent customer service man-machine interactive system is in case deployment can provide full-time service under the situation that almost cost free increases.
In a word, enterprise or government can support for traditional business service center provide the problem shunting by disposing the online man-machine interactive system of enterprise commerce, makes traditional business service center can be absorbed in complicated and client high value.Thereby in two kinds of channels, enterprise can provide more rapid, the accurately service of standard for the user.Under condition cheaply, for the high speed development of enterprise provides effective support.For enterprise or government, the use of this leading technology has also increased new image (image of science and technology and innovation) for it.
Description of drawings
Fig. 1 is an embodiment of the invention natural language understanding method flow synoptic diagram;
Fig. 2 be between embodiment of the invention notion and the linguistic notation " multi-to-multi " concern synoptic diagram;
Fig. 3 is the synoptic diagram of organizing of embodiment of the invention notion dictionary;
Fig. 4 turns to the function synoptic diagram for embodiment of the invention AC machine;
Fig. 5 is typical finite-state automata synoptic diagram of the embodiment of the invention;
Fig. 6 is a typical A C of an embodiment of the invention machine assembly process synoptic diagram;
Fig. 7 is the finite-state automata synoptic diagram of the embodiment of the invention after improving;
Fig. 8 is the understanding process synoptic diagram of embodiment of the invention computing machine to " notion ";
Fig. 9 is embodiment of the invention rising space coupling finite-state automata structural representation;
Figure 10 is the framework synoptic diagram of two executed user commands in the embodiment of the invention " autoabstract system ";
Figure 11 be the embodiment of the invention directly/content analysis indirectly, handle synoptic diagram;
Figure 12 is typical file system directories tree construction synoptic diagram of the embodiment of the invention;
Figure 13 is the synoptic diagram of embodiment of the invention resource and attribute example thereof;
Figure 14 is an embodiment of the invention resource location algorithm model synoptic diagram;
Figure 15 is an embodiment of the invention human-computer interaction intelligent system architecture synoptic diagram.
Embodiment
With reference to the accompanying drawings the present invention is described more fully, exemplary embodiment of the present invention wherein is described.
When the present invention handles in natural language identification, at first make up a notion dictionary, wherein comprise notion commonly used and language symbolic formulation commonly used thereof, in user's natural language, carry out pro forma coupling with matching algorithm then, running into the situation that multiple assembly is arranged, determining according to linguistic context at that time.For some spoken and written languages, if certain word string does not occur in the notion dictionary, yet in interactive dialogue, repeatedly occur, we think that it also is the linguistic notation expression of certain notion so, and this form is joined in the notion dictionary.
The notion dictionary mainly comprises two parts content: the one, and between the notion and the various relations between the attribute of notion, the 2nd, the corresponding relation between notion and its linguistic notation.
Finite-state automata
Finite-state automata be one 5 tuple (Q, g, f, s0, T), wherein Q is limited state set, g is a transfer function, f is the inefficacy function, s0 is an original state, T is the state set that can transfer to.Its use comprises two aspects: structure and assembly, the course of work mainly relies on three core function: transfer function (g), inefficacy function (f) and output function (output).The construction process of AC machine is the process of structural regime collection Q and these three core function just.
Referring to shown in Figure 1, the preferable embodiment of natural language understanding method of the present invention may further comprise the steps:
This process mainly is by using finite-state automata (abbreviating AC machine or finte-state machine as) to solve the assembly on the linguistic form.The better form that notion and linguistic notation are represented in computing machine is a text, deposits with the form of character string, and the AC function matches pattern string from target string in single pass fully, the real concept assembly.
The assembly of notion:
Computing machine will be realized natural language understanding, will obtain semanteme exactly from linguistic notation.And semanteme is expressed by notion, and therefore, the understanding process of natural language need be come out the concept identification of wherein expressing from linguistic notation, then a plurality of concept structures in the language is become the complete meaning of one's words.
Since the corresponding relation of multi-to-multi between notion and the linguistic notation, so the automatic extraction of notion divides two steps to realize:
(1) assembly goes out the linguistic notation of notion from input text;
(2) determine the expressed notion of linguistic notation that assembly goes out according to information such as linguistic context, this shows as the identification of notion.
1, the method for concept coordination:
When native system is handled, at first make up a notion dictionary, wherein comprise notion commonly used and language symbolic formulation commonly used thereof, use matching algorithm in original text (sentence or article), to carry out pro forma coupling then, running into the situation that multiple assembly is arranged, determining according to linguistic context at that time.This process is divided into two basic steps:
1. use the assembly algorithm that the conceptual language symbol assembly from original text that comprises in the notion dictionary is come out;
2. with conceptual language symbol and concept related.
Referring to shown in Figure 2, be the relation of " multi-to-multi " between notion and its linguistic notation, a linguistic notation may be expressed different notions.And in certain linguistic context, this linguistic notation can only be expressed a notion, so which notion linguistic notation embody, and determine according to linguistic context.
For article, if certain word string does not occur in the notion dictionary, yet in article, repeatedly occur, think that so it also is the linguistic notation expression of certain notion, and this form is joined in the notion dictionary.
2, the tissue of notion dictionary:
The notion dictionary mainly comprises two parts content: the one, and between the notion and the various relations between the attribute of notion, the 2nd, the corresponding relation between notion and its linguistic notation.
Relation between the notion refers to last the next, synonym between the notion, antisense, to relations such as justice, whole and part, attribute and host, material and finished product, main body and incident, content and incidents.Such as " computing machine " is the relation of whole and part with " CPU ", and " cloth " is the relation of material and finished product with " clothes ".
Mainly by 16 kinds of following relations: (1) hyponymy, (2) synonymy, (3) antonymy, (4) justice is concerned, (5) attribute-host's relation, (6) parts-whole relation, (7) material-finished product relation, (8) incident-role relation, (9) agent/experience person/concern main body-event relation, (10) word denoting the receiver of an action/content/possess and control thing etc.-event relation, (11) instrument-event relation, (12) place-event relation, (13) time-event relation, (14) value-relation on attributes, (15) entity-value relation, (16) correlationship.
The tissue of notion and its linguistic notation adopts the mode of Fig. 3 in the notion dictionary, and one hurdle, the left side is the linguistic notation of notion, and one hurdle, the right is corresponding notion, and the centre is a corresponding relation.
Be the relation of " multi-to-multi " between notion and the linguistic notation thereof, therefore the linguistic form in Fig. 2 is the relation (certain notion conversely also is the relation of one-to-many to the correspondence of linguistic form) of one-to-many to the corresponding relation of notion.Therefore, the structure of linguistic notation node is as shown in table 1 in this project:
Word | Number | Index?1 | Index?2 | … | Index?n |
Table 1
Wherein:
Word: the character string of indication linguistic form;
Number: linguistic notation may be related the notion number;
" linguistic notation " and " notion " all carried out ordering and handled in Computer Storage;
Fig. 3 has shown the corresponding relation from " linguistic notation " to " notion ".If replenish corresponding relation (promptly a notion can have the multilingual symbolic formulation) among this figure from " notion " to " linguistic notation ", so just can generate service for the computing machine natural language, and make the natural language that generates more near people's daily life, more natural.This part work can be simply from the corresponding relation of Fig. 2 by Computing, obtain its inverse correspondence relation, and organized and form.
3, improve finite-state automata, the real concept assembly:
The better form that notion and linguistic notation are represented in computing machine is a text, deposits with the form of character string.Assembly process on the linguistic form generally uses Aho-Corasick finite-state automata (abbreviating AC machine or finte-state machine as) to solve.
Example 1-3-1: establishing the pattern string set is: P={ " China ", " China ", " People's Republic of China (PRC) ", " Chinese ", " people ", " the Chinese people " }
Finite-state automata:
Finite-state automata be one 5 tuple (Q, g, f, s0, T), wherein Q is limited state set, g is a transfer function, f is the inefficacy function, s0 is an original state, T is the state set that can transfer to.Its use comprises two aspects: structure and assembly, the course of work mainly relies on three core function: transfer function (g), inefficacy function (f) and output function (output).The construction process of AC machine is the process of structural regime collection Q and these three core function just.
(1) vergence function g:
Def 2.2 (path of state): in the AC machine, if there is an orderly transfer function collection of transferring to state B from state A, then claim this orderly transfer function collection to be the path of state A to state B, A is called the initial state in this path, and B is called the Zhongdao state in this path.
Path from 0 state to any one state abbreviates the path of this state as.
Transfer function g makes the AC machine behind the input specific character, shifts current state to new state, is actually the binary function of its current state and input character, and it can be expressed as:
G (current state, character)=next current state;
During structure, for each pattern Pi, one by one take out the character among the Pi, beginning is a current state with the O state, then according to the character that taken out decision current state thereafter: (i) have the path that indicates this character as this state, then directly with the next state in this path as current state; Otherwise (ii), add a new state that indicates than existing state number big 1 in the AC machine, and add a path from the current state to the new state, and current character is set at the transfer character in this path, then with this new state as current state; Circulation finishes up to this pattern.When Pi finishes, Pi is composed to current state as the output function value, and then current state is reverted to 0 state.
The AC machine that pattern string set among the example 1-3-1 constitutes turns to functional arrangement, referring to shown in Figure 4.
(2) structure inefficacy function f
The timing of AC unit has been imported when failure (turn to function call) when turning to the character that does not have mark in the functional arrangement at current state s, will call the inefficacy function, is about to f (s) and changes current state into.As the AC machine among Fig. 4 at the character of state 3 input if not " people ", the AC machine will call the inefficacy function of state 3 so, and state 9 is set at current state.
Def 2.3 (level of state): in the AC machine, the transfer number that the path comprised of state is called the level of this state.
Because each state transitions all needs one to shift character, therefore in fact the level of state is exactly the number of all transfer characters the transfer process from 0 state to this state, and the output function of this state is exactly the pattern string that these characters combine and formed according to transfer sequence.
During structure, 0 state is 0 layer state among Fig. 4; 1,9,11 states are the ground floor state; 2,3,10,12 is second layer state; 4 is the 3rd layer state; State 5 occupies the 4th layer,
Regulation:
The inefficacy function f (0)=0 of the 0th layer state, the inefficacy function f (s)=0 of all ground floor state s.In Fig. 4, f (0)=0, f (1)=f (9)=f (11)=0;
To the state s of non-ground floor, if his father's state is r, promptly g (r, a)=s, then its inefficacy function f (s)=g (f (s
*), a), state s wherein
*For ancestors' state of reviewing state s (<ancestors state 〉: :=<state |<state〉<ancestors' state) resulting nearest one make g (f (s
*), a) state of Cun Zaiing.
The inefficacy function that pattern string among the example 1-3-1 constitutes is as shown in table 2:
|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
?f(s) | 0 | 0 | 9 | 10 | 12 | 0 | 0 | 0 | 0 | 11 | 0 | 0 |
Table 2
(3) structure output function output
At state s, if AC machine success assembly goes out pattern string, then it should provide output function output (s), thereby export these pattern strings, as the AC machine among the routine 1-3-1, at state 5, AC machine success assembly goes out " the Chinese people " and " people " these two pattern strings, so the AC machine should be exported them at this state.
During structure, in process (1), when each pattern finishes, should give current state s give output function output (s)={ present mode } (as, output (2)={ " China " }, output (3)={ " China " } ...).
After constructing inefficacy function f (s)=s ', should revise output function, make output (s)=output (s) ∪ output (s ').
Obviously, the process of AC machine scanning is exactly the process that the AC machine carries out adjacent assembly with the character in the target strings of input.
As shown in Figure 5, promptly be the AC machine of constructing according to the pattern string among the routine 1-3-1.
Wherein, solid arrow represents to turn to function, and the literal on it is the input of AC machine under this state, and it will cause the state transitions of AC machine;
Dotted arrow is represented the function that lost efficacy, and the inefficacy function of all the other states all is 0 state;
Text in the square frame is the output mode string of state, i.e. output function.
After these three construction of function were intact, the AC machine just can begin the assembling mode string.The process of carrying out assembly with the AC machine is as follows:
From 0 state, take out the character in the target text one by one, and turn to the guide of function or inefficacy function to enter NextState according to it;
When certain state has output function, carry out output function, the pattern string that the output assembly obtains;
Make process that the AC machine among the use-case 1-3-1 carries out assembly to target text T1=" Beijing is the capital of the People's Republic of China (PRC) " as shown in Figure 6, when the AC machine arrives 3,4,5,8 states, assembly goes out { " China " }, { " Chinese " }, { " the Chinese people ", " people " }, { " People's Republic of China (PRC) " } respectively, these pattern strings occur in target strings T1 on the one hand, also be all pattern strings that occur among this target strings T1 on the other hand, and the output of pattern string order and their appearance sequence consensus in target text.By Fig. 6 as can be seen, AC machine assembly process does not have and recalls yet, and work efficiency is higher.
Improve finite-state automata:
Present embodiment uses the assembly of AC machine real concept linguistic form, hypothetical target text T1=" Beijing is the capital of the People's Republic of China (PRC) ", during assembly, " People's Republic of China (PRC) " these several adjacent word assembly wherein are as a whole, it has expressed a notion, that is exactly " China " this country, and speech (phrase) such as " China " that will wherein not comprise, " Chinese ", " people " are also handled as notion respectively.The mode of exporting long pattern string with the AC machine realizes this assembly process, be that the AC machine is when carrying out assembly to target text, export the longest pattern string as far as possible, those shorter pattern strings that are contained in are not wherein then exported, and (such as " China " and " Chinese " is what to intersect in T1 not export the string of those intersections, " China " word has been used twice, the pattern string that the assembly at first of embodiment of the invention output AC machine is come out, i.e. " China ", and " Chinese " this pattern string of back has not just been exported).Common AC machine is transformed, to reach this purpose.
Divide three aspects that the structure and the assembly process of improved AC machine are described below: when export (1); (2) what is exported; (3) how to handle after the output.Here, adhere to that left is to the assembly principle of priority.
(1) when exports
Def 2.4 (leaf state): in the AC machine, do not have the state of transfer function to be called the leaf state.
Among Fig. 5, state 2,8,10,12 all is the leaf state.
Obviously, the leaf state has following several attribute:
The leaf state does not have transfer function, imports any character at this state and all will cause the function calls that lost efficacy;
The leaf state certainly exists output function, and an output function of this state is exactly the longest output function in its path, this is from the construction process of AC machine as can be seen: on the one hand, certainly exist a pattern string (being made as p), make the AC machine when structure, construct this leaf state (being made as sp); On the other hand, if there is longer pattern string (being made as pl) in this path, the AC machine is when structure so, must continue to transfer to new state sl from state sp, just there is a transfer function in state sp at least so, steering state sl, thus sp has been not the leaf state just;
From A and B as can be seen, leaf state inefficacy function calls will cause one of the leaf state output long pattern string from 0 state to this state;
Therefore, for the leaf state, on the one hand, any input character all will cause the function call that lost efficacy, and on the other hand, the pattern string that the assembly of AC machine obtains is the longest, therefore should export this pattern string, that is to say that during the inefficacy function call of leaf state, the AC machine should be exported the long pattern string that assembly obtains.
A lot of states of AC machine all have output, and these states are not limited to leaf state (such as states such as 3,4,5 in the AC machines of Fig. 4), when the AC machine is transferred to these states, because the AC machine does not also know that whether the character of next importing can cause the AC machine to continue to transfer to new state (is that the calling branch function produces state transitions, rather than call the inefficacy function), thereby whether the longest the AC machine can not determine output function the pattern string of these states, therefore can not export immediately, this point is different with common AC machine.
The construction process of the inefficacy function f of contrast AC, as can be seen, the inefficacy function always makes the current state s of AC machine lose efficacy towards the low state f (s) of level; And, the orderly transfer character set Cf (s) of the path correspondence of failure state f (s) just in time is the corresponding Right Aligns subclass of character set Cs in order in the path of state s, that is to say, in the AC machine assembly process, the inefficacy function that calls state means that the front part of the path character set of state will no longer handle (think and handle), therefore under left-hand assembly principle of priority, the long pattern string that the AC machine should export that assembly obtains this moment.AC machine among the example 2-2, as Fig. 5, when the AC machine arrives state 5, assembly obtains the longest pattern string " the Chinese people " (also being to shift the pattern string that character is formed in the path of state 5 in order), the failure state of 5 states is 12, the pattern string that shifts the character composition in its path in order is " people ", and length is 2, just in time is the Right Aligns part of " the Chinese people " (length is 4); If the input character of state 5 has caused the inefficacy function calls, then state 5 should be exported the long pattern string " the Chinese people " that assembly goes out.
That is to say that for non-leaf state, during the inefficacy function call, the pattern string that the assembly of AC machine obtains is the long pattern string under the left-hand coupling principle of priority, should export.
Comprehensive leaf state and two kinds of situations of non-leaf state, the AC machine is when the inefficacy function call, and the pattern string that institute's assembly obtains is the longest, should export this pattern string, under all the other situations, the AC machine can not affirm whether the pattern string that assembly goes out is the longest, does not therefore export.
(2) what is exported
In the discussion of front, before the AC machine calls the inefficacy function, the long pattern string that the output assembly is gone out, and before this, the AC machine is not exported.
Be the convenience of describing, the state number of note from 0 state to the required process of state s is n (s), and the output mode string number of state s is m (s).
Common AC machine to current state sc, needs that (state of these states is number not necessarily continuous, and state is designated as s successively through the individual state of n (sc) from 0 state transitions
C1, s
C2..., s
Cn (sc)), and during each state sk, all have the individual output mode string of m (sk), before state sc, the AC machine is accumulating the output mode string always, output mode string number altogether is
During AC machine inefficacy function call, to from these pattern strings, pick out the longest pattern string and output exactly.
Obviously, AC before current state sc calls the inefficacy function the long pattern string that can assembly comes out be in the path of state sc, along this path towards 0 state search output function that can search for out at first, that have output function state (being made as scp), and the length of this output mode string is exactly the length in the path of state scp, that is to say that the AC machine should be exported this pattern string.In the AC machine of example 2-2, if the assembly process is at state 6 Transfer Faults, the output function of this state will be called, before output, at first export this state the long pattern string that can assembly comes out, be analyzed as follows: the path of 6 states is 0-1-3-4-5-6, along this path from 6 states towards 0 state search, can search for out at first and state with output function is 5 states, it has two output mode strings: { " the Chinese people ", " people " }, obviously the output mode string in front is that all shift the pattern string that characters are formed, length maximum the path from 0 state to 5 states, be 4, should export this pattern string; The also available similar approach of the output of state 5 and state 7 calculates.
It should be noted that, AC machine assembly invalid state not necessarily has output function, and (5 states among the routine 2-2 have output, and 6,7 states all do not have output), so, in order to obtain the longest pattern string output, a kind of method is to recall towards 0 state according to the path of current state, up to the state and the output pattern string wherein that find one to have output function, this method need be recalled, and will reduce the performance of AC machine;
In the system, adopted the way of data binding to realize: output buffer of design in the AC machine, and bind this buffer zone to the current state of assembly, the timing of AC unit with the long pattern string that can assembly goes out be stored in this buffer zone, like this before the AC machine inefficacy function call as long as export content in this buffer zone; The AC machine is not before calling the inefficacy function, state shifts more backward, the pattern string length that assembly obtains is big more, thereby, in the state transitions process, if new state has output function, then the longest output mode string that needs only new state is saved in the output buffer, recalling when having avoided the output of AC machine like this.AC machine among the example 1-3-1, state transitions is to 1 state, not output, 3 states of transferring to have output " China ", the output buffer content is " China ", the output of 4 states is to inherit from the inefficacy function, do not count, 5 states of transferring to have output " the Chinese people ", and the content update of output buffer is " the Chinese people ", and 6,7 states all do not have output, the output buffer content is constant, if lost efficacy at these states, then the AC machine is the output buffer content, i.e. " the Chinese people ".After the assembly of AC machine was lost efficacy and transferred to new state, should empty output buffer.
The front was told about, the inefficacy function of AC machine is always to the short state transitions in path, so the length of the output mode string of the function state pointed that lost efficacy is always short than the length of the output mode string that current state had, thereby when the structure output function, the output mode string " also " of inefficacy function status need not be concentrated to the output function of current state.After AC machine shown in Figure 5 improves, 4 states will not have the output mode string, and 5 states have only output mode string " the Chinese people ".This process has been simplified the structure of AC machine output function, and the structure of AC machine has also been simplified some.
(3) how to handle after the output
The AC machine will be exported the pattern that assembly goes out before the inefficacy function call, on the one hand, for avoiding overlap type output, the mode contents of having exported can not be as the input of next one string; On the other hand, owing to adopted the output that lags behind, there is not the string of output to keep.
For avoiding overlap type output, can force to allow the AC machine after output, transfer to 0 state (rather than failure state of common AC machine), at this moment, for fear of the missing mode string, just must be with assembly but do not have the character exported (particularly, all characters exactly from nearest state to the path of current state) keep with output function, a kind of method is to adopt to recall, allow the AC machine recall these characters, and restart assembly from 0 state, this method has adopted to be recalled, and makes the assembly process more complicated of AC machine, and abandoned some assembly processes, reduced performance; Another kind method is the assembly result who makes full use of each step, to improve the assembly efficient of AC machine, particularly, is divided into two kinds of situations:
If invalid state has output function, force the current state of AC machine to make the AC machine will not export the overlap type pattern string this moment so to 0 state, there is not simultaneously the character of assembly behind this state yet; That is to say that the output function of this state just is defined as 0 state, the inefficacy function building process of this and common AC machine is different;
Invalid state does not have output function, and AC will export the output function with output function state scp nearest apart from sc on this state sc path this moment, and the character of AC machine assembly all characters that are scp to the path between the sc; According to I, the inefficacy function of scp will be 0 state, because all states between scp and the sc all do not have output function, the inefficacy function that the building process of the inefficacy function of sc just is based on scp is that 0 state makes up, just utilize scp all characters to the path between the sc to make the state that the AC machine can be transferred to from 0 state as shifting character, if in the transfer process, called the inefficacy function, then shift since 0 state once more, circulation is up to the pairing state of all EOCs, and the building process of this building process and common AC machine is identical.
AC machine among the example 1-3-1, as Fig. 5, call the inefficacy function at 4 states, the AC machine will be exported the nearest output function with output function state before 4 states so, i.e. " China " of 3 states (is noted, here because the output function of 4 states itself is to come from the succession of inefficacy function, therefore in not being calculated in), because " China " word exports, so the AC machine can not transfer to former failure state again, i.e. 10 states (this state has utilized " China " word again); Simultaneously, because 3 states are p34={ " people " to the character set on 4 state path } word do not export, and therefore can not delete its (must utilize it again, otherwise may lose some output); From 0 state, the character in p34 shifts, and the AC machine will be transferred to 11 states, and this is the state that transfer to when 4 states lost efficacy of AC machine just, the failure state of the AC machine state 4 after just improving.
In sum, the embodiment of the invention is done following improvement to the structure and the assembly process of AC machine:
(1) construction process of transfer function is constant.
(2) in the construction process of inefficacy function, stipulate the inefficacy function f (s)=0 of the state s that all have output, the inefficacy construction of function process of all the other states is constant.
(3) during the final output function of structure, need not be in the output function tabulation of current state with the output " also " of inefficacy function, thus each state has an output mode string at the most, has avoided the output of overlap type pattern string.
It more than is the improvement of process that plain edition AC mechanism is built.
(4) assembly process
As above saying, on the one hand, the method for employing data binding realizes the memory of output mode, on the other hand, adopts the method for the output that lags behind, only the pattern in the ability output buffer when the inefficacy function call.After the AC machine was transformed, its inner structure as shown in Figure 7.
The main real concept identification of this process, the identification of notion comprises:
1, formal ambiguity assembly problem
After improving the AC machine, the embodiment of the invention has organized one one to comprise the pattern string input that the notion dictionary of using notion and expression-form commonly used thereof always is used as the AC machine, and the AC machine adopts left-hand the longest preferential assembly output principle to realize its function.Yet there is ambiguity in form in some sentences, such as " research life too bitter ", wherein " research life " these several words can assembly for 1. " postgraduate | life " or 2. " study | life ", the notion that assembly is come out is different.
Which kind of group mode is correct so actually? if only consider this sentence, two kinds of group modes all are correct so, under the actual conditions, need determine correct group mode according to the linguistic context at this sentence place.
At a sentence, do not consider its residing language environment, for the text that has the ambiguity assembly, mainly solve by the corpus technology:
1. word combines the probability of notion: count the probability that several adjacent words in the corpus combine notion, relatively draw the high ideational form of assembly probability;
2. the probability that occurs simultaneously of notion: count the frequency that different several notions occur in same sentence, relatively draw the high ideational form of frequency.
2, the affirmation of notion
The assembly from target text of AC function goes out the wherein linguistic form of expressed notion, and further work is exactly to associate linguistic form and its expressed notion (semanteme of linguistic form just)
When choosing the expressed notion of linguistic form, must adhere to two cardinal rules:
Principle one, the expressed notion of linguistic form must be consistent on the meaning of one's words with other notion in the sentence, promptly meets collocating principle on the meaning of one's words;
Principle two, if still can not determine the notion that linguistic form is expressed, so, can utilize notion that linguistic form expresses must be with the theme of sentence article of living in, be the principle that the residing linguistic context of sentence is consistent.
By principle one, can determine the expressed notion of most of linguistic form (meet collocating principle such as " beating " in " getting food ", " knitting a sweater " by semanteme and can be judged to be notions such as respectively corresponding " buying ", " braiding "), the expressed notion of small part linguistic form is still needed and is wanted basis principle two could judge (such as in " virus has infectiousness ", basis principle is " the physiology virus " told about on " software virus " told about on the corresponding computer or the physiology once not determining " virus ")
The assembly of digital quantity notion:
Digital quantity often comes across in the natural language, such as time, quantity, percentage or the like, represents very concrete information.The symbolic representation of digital quantity remains text (as " 25 years "), in computing machine, (be stored as character string forms as " 25 years " with the character string forms storage, account for 8 bytes), this character string and common character string do not have the difference of essence, if do not do special processing, then computing machine is difficult to it is used as the quantity processing.
Yet computing machine can not rest on the aspect of linguistic form to the understanding of natural language, and must obtain the expressed meaning of one's words of linguistic form.In other words,, must obtain its semanteme in the time of computer understanding, just the content of the digital quantity of its expression at the digital quantity character string.Thereby, must make up certain model and design corresponding algorithm, make computing machine from target text, the assembly notion time, the digital quantity in the text can be identified, and obtain the expressed semanteme of these digital quantity notions.Wherein, an important process is to convert this numeral character string to numeral, and this comprises the content of two aspects:
The position of identification number word string in text;
The digital quantity that the discriminating digit string list shows.
Native system studied the feature of the digital quantity text that occurs in the man-machine conversation, and designed the algorithm that text is converted into numeral; The apish thought process of the design of this algorithm, people is to identification, the memory process of numeral, all obtained effect preferably in the complexity various aspects such as (time/spaces) of algorithm.
Main specific to the man-machine conversation area research feature of digital quantity text and text recognizer to numeral.
1.6 digital quantity notion and semantic identification
1.6.1 the essential characteristic of digital quantity
The digital quantity that occurs in the Chinese text has following several form basically:
(1) represents by pure arabic numeral, such as " 35 " in " high 35 floor in this building ";
(2) by behind the arabic numeral with one the expression order mark, such as " 3,050,000 " in " 3,050,000 population ";
(3) mark by the expression order that is close to behind the Chinese figure (or arabic numeral), as " 2,500,024,553 ", " 3,050,000 " for another example, and " 3,000 5 hundred 62 " this form;
(4) by Chinese figure arrangement in turn represent, such as " 35619 " expression 35619;
(5) some other form, such as " 5 percent ", " 5% ", and the exponential representation form that often occurs in the mathematics.
These several forms all are often to occur in Chinese text, and several following several rules are arranged basically:
Represent by the character string that continuous numerical character (arabic numeral or Chinese figure character) is formed, be not with the order character between the character; During end of string, can be with an order character (as " hundred million ", " ten thousand " etc., the convenience of be handling, percentage symbol " % ", thousand minutes symbol " ‰ " etc. also can be regarded as the order character, down together) to modify whole numeric string, such as above mentioned (1), (2), (4);
By the string representation of forming immediately following an order character behind the numerical character, during end of string, can be with an order character to modify whole numeric string, such as above mentioned (3);
In the digital quantity, numerical character or all use arabic numeric characters perhaps all uses the Chinese figure character, generally not can both be mingled with use (such as, following form seldom appears in the text: 3561 9);
Digital quantity does not begin (except " ten ") with the order character;
1.6.2 determining of digital quantity text position
In the text numeric string position determine to be exactly to find out reference position and the end position of numeric string in text, result of equal value be the length of finding out numeric string with and start/end put.During identification, generally the initial character from text begins to handle successively (perhaps reverse scanning, promptly last character from text begins to scan forward), write down the index (side-play amount of each character in text simultaneously, be position information), so the problem of the discriminating digit amount text position character numerical character whether that changes into the current scanning of test.
Numerical character is limited in the Chinese text, such as " one ", " two ", " three ", " 1 ", " 2 ", " 3 ", represent character " ten ", " hundred ", " thousand ", " ten thousand ", " hundred million " of order etc. for another example, in the spoken language and another numerical character that often occurs in the text be " two " (as " 235 "), expression " 2 ".Set formed in these numerical characters, order character, and note is made Nset:
Nset={ " one ", " two ", " three ", " 1 ", " 2 ", " 3 ", " ten ", " hundred ", " thousand " ... } (formula 2-3)
Note, comprise full-shape and half-angle numerical character and percentage, thousand fens symbols etc. among the Nset.
The numeric string position is definite just very simple like this, the simplest way be with current scanning character successively with each element of set Nset relatively, see whether equate, just can test out in these numerals whether or the order character, algorithm is very simple, but efficient is not high; Adopt the HASH algorithm
Wherein, function C onvertWord is used to judge whether digital correlation character of a character, if not, then returns (DWORD)-1; Otherwise, if numerical character will return 0x01,0x3 respectively such as " 1 ", " three "; If character is the character of expression order, such as " ten ", " thousand " then return its order, and the most significant digit of rreturn value is set to 1, to show difference, as 0x80000001,0x80000003 (0x prefix designates 16 system numbers).
Notice and generally do not represent concrete numeral when the order character occurs separately, perhaps do not represent numerical information (only represent that as " hundreds of millions " in " hundreds of millions spectators " spectators are a lot, be not spectators be 100,000,000 or several hundred million, more non-100,000,000 take advantage of 10,000; " 100,000 " in " Most Urgent ", " thousand " in " extremely delicate and dangerous situation " and " one " do not represent concrete numerical information), if therefore simply the order character is all converted to numeric string, so, all to make mistakes in these cases.The simple approach of handling this situation is: (1) will not handle as numeric string with the text string that the order character begins, and promptly numeric string must begin with numerical character, so avoided many mistakes; Certainly, " ten " in the Chinese must do special processing, all often appearance " tens " in spoken language and the written word (such as, " 15 the moon 16 circles "); (2) numerical character in Chinese idiom, the idiom, such as " thousand " in the Chinese idiom " extremely delicate and dangerous situation " and " one ", the several numerical characters in the Chinese idiom " neat and quick " are not handled as numerical character; (3) when to scan one may be the text string of numeric string, if connect measure word or countable noun behind the string, then, such text string can be used as numeric string and handles; When the people runs into some indefinite notions (perhaps do not catch, must based on context handle) in handling the natural language process, also be according at that time context---linguistic context is determined.Such as, when the beginning of reading a sentence is " 100,000 ... ", and do not know whether it represents the numeral of determining, if " urgent " followed in the back, not not concrete numeral so just, if but what follow later is " why individual " (" Hundred Thousand Whys "), what then can determine its expression basically is concrete numeral; This method still has the only a few exception.When the embodiment of the invention addresses this problem, this two principles have just been adhered to.
The identification of digital correlation such as time, currency amount
According to the step of front, the location recognition of the digit strings that occurs in the Chinese text can be come out, and be converted into concrete numeral.
Yet the digital quantity that occurs in the text often is not only a numeral, and also has other some information: such as time quantum: ten three: 42, and amount of money: six five maos, or the like.In the processing, this tittle need intactly identify (such as two the complete 13:42 of being identified as of needs in front, 6.5 yuan), and certainly, it is first step wherein that numeric string is converted to numeral.
Remaining process need is finished in conjunction with positions in sentence such as numeric string, measure word.And in this process, need to carry out in conjunction with the characteristics of Chinese, three principles are arranged here basically:
In the Chinese, numeric string always appears at the front of corresponding measure word, and is close to measure word;
There are the units (as unit, angle, branch) of different sizes in same class quantity, and in Chinese, always big unit in front, little unit in the back (as hexa-atomic one jiao nine minutes);
If same quantity has used a plurality of measure word that vary in size to express, then last measure word may omit (as hexa-atomic one jiao nine, omitting last " branch "), and this is often to occur in Chinese.
The identifying of digital correlation amount is described as an example with the identification of amount of money here.
(1) collects the relevant unit of amount of money
(these information can be used as basic general knowledge and are stored in the system knowledge base at first need manually to collect the expression-form commonly used of unit of amount of money according to unit-sized, only need from knowledge base, access when using to get final product, thereby this part artificial collection process is disposable), and the size of these units also collected and be stored in the knowledge base.Here the result of Sou Jiing is: A, unit, piece, circle; B, angle, hair; C divides; And the category-A unit-sized is 10 times of the category-B unit-sized, and the category-B unit-sized is 10 times of C class unit-sized.
(2) computation process
Computation process is carried out according to several the principles of telling about previously, mainly judge the beginning and the end position of a digital correlative, and concern the size of calculating whole digital quantity according to the unit-sized of depositing in the knowledge base according to positions in text such as numeric string, measure word.
In the whole process of semantic understanding, the Word Intelligent Segmentation technology is an initial link, and the core word that it will form statement extract for semantic module and uses.In the process of participle, how can provide enough speech to come to handle rightly, and filter out redundant information for routine analyzer, this is the quality of later stage semantic analysis and the important prerequisite of speed.The ambiguity combination that the Word Intelligent Segmentation of You Lika has avoided traditional participle technique to produce when splitting.Thereby for the processing of semantic understanding provides good original material.Simultaneously, in the process of participle, the synonym in the middle of the knowledge base can be mated one by one and submit to the semantic understanding module simultaneously and use, and the sentence of handling so not only provides original sentence pattern, has also carried the notion part of statement simultaneously.
Step 9: setting up the mapping relations between the natural language meaning of one's words and the computer action, specifically is that the core meaning of one's words and computer operation, operand are bound and realized.
The expression of the meaning of one's words in computing machine:
What 1, the meaning of one's words was expressed is the operation of computing machine
Interactive purpose is to allow computing machine carry out operation in the specific area, and the meaning of one's words expressed of people is inevitable relevant with the operation of computing machine so.Thereby the meaning of one's words can be represented by the corresponding calculated machine operation.
The single job that computing machine is carried out comprises the content of operation itself and operation, and they are corresponding respectively with the core meaning of one's words and the peripheral meaning of one's words in the language:
1. key concept is expressed the core meaning of one's words, and the core meaning of one's words is expressed the operation that computing machine will be carried out;
2. the peripheral conceptualization periphery meaning of one's words, the peripheral meaning of one's words is expressed the content of computing machine executable operations.
Because utterance expression is a complete meaning of one's words, computing machine operation just is so the meaning of one's words is represented by operating content own and operation.
2, computer operation is represented by operating ordered sequence own and that content of operation is formed
(operand): computing machine execution specific operation necessary operations number of contents and type are called the operand of this operation.The operation number scale of operation op is made PARAMop.
According to the difference of operation, the number of content of operation and type are all different, and after associating with specific operation, the order between the content of operation is also most important.
Correspond in the software realization, an operation corresponds to a module of software, and it is exactly to call the corresponding software module of this operation that computing machine is carried out an operation; The corresponding software module parameters needed of operand, the parameter difference of software module, the result after computing machine is carried out is with difference.The parameter of software module is sequential, and the order of mistake will cause software module not carry out, and perhaps execution result is not right.
Thereby an operation of computing machine can be made following record:
ACTION={op,param1,param2,...,paramn}
Wherein, ACTION represents a complete operation, and op then is operation itself, and paramk is a k content of operation.Param1, param2 ..., be orderly between the paramn.
3, the ordered sequence of forming by the core meaning of one's words and the peripheral meaning of one's words is represented the complete meaning of one's words
In the man-machine conversation,,, can represent the complete meaning of one's words of language by the ordered sequence that the core meaning of one's words and the peripheral meaning of one's words are formed according to the method for expressing of computer operation because meaning of one's words expression is the operation of computing machine:
SEMANTICS={Core,p1,p2,...,pn}
Wherein, SEMANTICS is the complete meaning of one's words of language; Core is the core meaning of one's words in the language, and is corresponding with computer operation; Pk is k the peripheral meaning of one's words, and is corresponding with the content of computer operation.The needed peripheral meaning of one's words number of the different core meaning of one's words does not wait.
Attention: p1, p2 ..., be that orderly, different order will cause the different meaning of one's words between the pn.
4, the collection of operand
, the execution of operation is not only the expressions of data in computing machine, the more important thing is the computing machine executable operations, and this comprises the content of two aspects: 1. which operation computing machine will carry out; 2. how computing machine carries out this operation.
For 1., the operation that the computer application system in the specific area can be carried out is limited, is designated as:
Wherein, d represents concrete field, and OPd is the set of all operations that computing machine can be carried out among the d of field, and (1≤k≤Nd) is a concrete operation in this field to opk, and Nd is the number that calculates the operation that function carries out among the d of field.At system design stage, the designer can draw these operations according to the realm information analysis of d, thereby computing machine according to top formula, is mapped natural language expressing and certain bar operation (for example opn) of calculating the function execution when the analysis user natural language instructions;
For 2., computing machine need according to the pairing operation of natural language (as opn), identify this operation necessary operations content (just identifying all operations number of operation) from natural language on basis 1., on this basis, computing machine could be carried out this operation.
The operand of each operation is limited, and it leaves in the computing machine as domain knowledge base.The collection of operand is generally finished by manual.
The content of operation that should be noted that some operation during this time has default value (Default Value), if promptly the user does not provide these contents, then uses pre-set value to come complete operation.The operation (op_open) expressed as " opening " this notion in the file operation field of telling about previously, it needs two contents, the filename that the first is to be opened; It two is the software (instrument just) that is used to open this file; Wherein first content is necessary (otherwise not knowing to open which file), and second content is optional: promptly, if the user has specified a software to open file, then use this software to open; Do not use which software to open file if the user specifies, then, on the one hand, the software that application system can be provided with an acquiescence opens file; On the other hand, if the platform of application system is Windows, then can utilize characteristics of Windows system---just file association technology---(this is the extension name of Windows and the technology of software binding to allow Windows select for use an only software to open this file, such as the expansion of file doc by name, then Windows knows that starting Microsoft Word software opens it, and this also is an only software of opening the doc file).On technology realizes, can call Windows api function ShellExecute (Ex) function and finish this process.
The type of content of operation is relevant with realm information, and type commonly used has: 1. plain text (format_text), 2. ordinary numbers (format_number), 3. date (format_date), 4. time (format_time), 5. currency (format_money), 6. file (format_file), 7. program (format_program) etc.
Thereby the operand of operation op can be expressed as:
Wherein, (0<k<Nop) is a content of operation to pk, shows the content and the type of needs; Nop is the content of operation number of operation op.
5, the binding of the core meaning of one's words and computer operation, operand.
In computing machine, the notion of target text and its expression is all stored with character string forms, thereby, after each notion that it is expressed extracts from target text, the resulting character string that remains of computing machine.Therefore, still do not know the concrete semanteme (because they remain character string) that they are expressed after computing machine extracts notion from target text.
Here lifting two simple example illustrates:
Example: daily several examples of meeting:
1. may I ask one-plus-one how many equal?
2. please open " natural language understanding teaching materials " this file.
These two notions of band underscore have play a part respectively crucial (being equivalent to predicate) in these two sentences: 1. the semanteme that " adds " is that front one number and back one number are carried out "+" this mathematical operation, and returns its result (result is 2); 2. the semanteme of " opening " is to allow program of computer starting to read a file, and the content and the form that comprise in the file are explained out with visual means, be presented on the screen and (specifically start which program and open which file, the additional information that is provided by sentence indicates, the information that has can Use Defaults, thereby may omit), file to be opened here is " natural language understanding teaching materials ".
Yet most computers but is ignorant of so how at present! Consider the storage mode of computing machine, 1. and 2. routine all is with the character string storage in computing machine, but what connotation they represent on earth, and what operation this carries out, and computing machine is ignorant.Though the embodiment of the invention has had the notion dictionary, comprise the definition of notion commonly used, yet computing machine still needs character string forms when these notions of storage and their definition, that is to say, computing machine is appreciated that a notion (is made as Concept, in computing machine,, be equivalent to understand the semanteme of this character string with the character string forms storage), just it must be understood that its definition (is made as Define, in computing machine,, be equivalent to understand the semanteme of this character string) also with the character string forms storage.In the realization, after all, computing machine need be understood the semanteme of a character string.As shown in Figure 8:
The semantic identifying of key concept need the software module that key concept is corresponding with it connect (just will set up the mapping relations between the natural language meaning of one's words and the computer action).Therefore earlier from commercial realm information, can learn the operation that computing machine need be carried out in advance, thereby design corresponding software module.
Computer operation is expressed by a computer symbols usually, these symbols need get up with concrete concept connection when concept identification, the process of contact uses the method for software " binding " to realize, is reflected to machine word and calls the turn, and realizes by " structure ".Table 3 is these structural tables:
Key (string representation) | Respective operations (op) | Respective software module (module) |
Table 3
The binding structure of table 3 key, computer operation, software module.
Be an object lesson in file operation field below.
Example: several keys in file operation field, operation and respective software module binding structure organization:
(1) definition of computer operation (4 of preliminary definitions, as follows):
Op_open=opens file
The op_close=close file
The op_edit=editing files
The op_print=print file
OPSfile={op_open,op_close,op_edit,op_print}
(2) design corresponding software module, as follows:
module_open,module_close,module_edit,module_print
(3) binding structure
The binding structure of several operations in the table 4 file operation field
Key (string representation) | Respective operations (op) | Respective software module (module) |
" open " | op_open | module_open |
" close " | op_close | module_close |
" editor " | op_edit | module_edit |
" printing " | op_print | module_print |
Table 4
Operand and operation are associated, and the operand difference of each bar operation need be got up operand and corresponding operational contact; Therefore redefine as follows:
The binding structure (comprising operand) of several operations in the table 5 file operation field
Key (string representation) | Respective operations (op) | Respective operations number (PARAMOP) | Respective software module (module) |
" open " | |
1. file to be opened, type are the program that 2. format_file opens file, and type is format_program (has default value, selected automatically by Windows Shell) | module_open |
" close " | |
1. file to be closed, type are format_file | module_close |
" editor " | |
1. file to be edited, type are the 2. program of editing files of format_file, and type is format_program (has default value, selected automatically by Windows Shell) | module_edit |
" printing " | |
1. treat typescripts, type is format_file | module_print |
Table 5
In the table 5, format_file, format_program etc. are the content of operation types that preamble is told about.
The process of trying to achieve the meaning of one's words based on the priori method is as follows: after in the natural language instructions text key concept being identified, therefrom can find out the operation of this key concept correspondence, also can find out the operand of this action need, and corresponding software module.The priori algorithm is exactly this information of form that has made full use of content of operation, having made up one is the AC machine of pattern string set with all prioris, use this AC machine scan command text then, see if there is pattern string output, if pattern string output is arranged, then it is exactly the priori that identifies, otherwise the user does not provide this information.
The sentence pattern and the meaning of one's words
At first, manually with the linguistic form of commonly used command in the commercial field abstract be sentence pattern, when computing machine begins the analysis user order then, attempt user command is referred to wherein a kind of sentence pattern, thereby extract content of operation wherein.Common sentence pattern such as this operation of the op_open that tells about previously (" opening file ") has following several:
A, open+XXX (example sentence: open " digital earth), as shown in table 6:
Key (operation, " opening ") | File characteristic (XXX, " digital earth) |
Table 6
B ,+XXX+ is opened (example sentence: " digital earth is opened), as shown in table 7:
Mark (" ", " general " etc.) | File characteristic (XXX, " digital earth) | Key (operation, " opening ") |
Table 7
C, with+YYY+ open+this file of XXX+ (example sentence: with Word open " this file of digital earth), as shown in table 8:
Mark (" usefulness " etc.) | Software (YYY, " Word ") | Key (operation, " opening ") | File characteristic (XXX, " digital earth) |
Table 8
D, with+YYY+ handle+XXX+ open (example sentence: with Word " digital earth is opened), as shown in table 9:
Mark (" usefulness " etc.) | Software (YYY, " Word ") | Mark (" ", " general " etc.) | File characteristic (XXX, " digital earth) | Key (operation, " opening ") |
Table 9
Wherein, XXX, YYY are the content of operation of operation op_open, and XXX represents file to be opened, and YYY represents to be used for opening the software (if sentence pattern lacks this content, then Using Defaults) of this file.
Example: natural language instructions=" please open ' natural language understanding teaching materials ' this file ";
Through the sentence pattern coupling, find that it belongs to the category-A type: key (operation, " opening ")+file characteristic (" natural language understanding teaching materials ") thus can therefrom identify the content of operation of operation op_open (" opening "): the file that the user will open is " natural language understanding teaching materials " (file characteristic); Another content of operation (software that opens file) does not occur in command text, thereby Uses Defaults, and promptly uses the software (perhaps selecting a software by Windows) of acquiescence to open file.
Algorithm does not herein change the internal structure and the assembly algorithm of AC machine, but with the core of AC machine as algorithm, in its periphery design one cover algorithm, be used for the situation of special disposal rising space coupling, the thinking of processing comprises two aspects:
One is input to rising space string (sentence pattern) in the finite-state automata;
Its two, after finite-state automata output, the pattern of output is reorganized into the string of rising space.
In algorithm design, mainly considered the position of each subpattern in the rising space pattern, and utilized this position to concern that the output with the core finite-state automata reorganizes, and exported final caller.
Suppose that a rising space string A is made up of 3 substrings that (form is A1 ... A2 ... A3), then before making up the AC machine, A1, A2, A3 need be joined respectively in the core AC machine as pattern string, make up core AC machine then, to the assembly of target text the time, if core AC machine has been exported A1, A2, these 3 pattern strings of A3 (not necessarily be close on the position, yet output being vital in proper order) successively, peripheral software is combined into these 3 pattern strings the output of rising space string again.This structural relation as shown in Figure 9.
In the assembly process, may there be single A1, A2, A3 pattern string, they will directly be disposed by this rising space AC machine this moment.Simultaneously for core AC machine, if certain pattern string p is added into (no matter p is the substring of common string or rising space string) n time, so if this string of p occurred in the target text, then p will export n time in same side-play amount, and this also is one of cardinal rule that the rising space coupling finite-state automata of this structure can operate as normal.
According to the mentality of designing of front, two key algorithms of SW-AC are: during (1) input, the rising space pattern string need be split into the general mode string, join then in the core AC machine; When (2) exporting, the general mode string of output the rising space pattern string need be combined into as far as possible, the order of output mode string need be considered in the time of combination.
Like this, use rising space coupling finite-state automata just can identify sentence pattern, after identifying sentence pattern, just can obtain the meaning of one's words of sentence according to the method for telling about previously.
Try to achieve the meaning of one's words with priori:
Except realizing that based on sentence pattern the present invention has also adopted some new disposal routes, and is not limited to sentence pattern the natural language text identification.Can utilize the priori of each operation to come algorithm for design.
Simplifying under the sentence pattern restraint condition, it will be more natural that the user expresses order, and the implementation algorithm of computing machine will be more flexible also, at this moment, need make full use of various prioris, such as the number of content of operation, type etc.
After in the natural language instructions text, key being identified, can find out the operation of this key correspondence, also can find out the operand of this action need, and corresponding software module.Thereby can make full use of these prioris, design certain algorithm, from command text, identify operand.
This project has mainly been used content of operation to have this priori of different types to come algorithm for design.
An operation has a plurality of content of operation, each has different types, their forms of expression in natural language also are not quite similar, thereby can be according to the necessary operations content type, from command text, identify these contents respectively, all inequality such as text (format_text) and date type (format_date), time type (format_time).
Dissimilar content of operation recognition methodss is different, need take into full account the recognition methods of dissimilar contents in design, and identification process is as follows:
Algorithm is counted flow process according to priori (content of operation type difference) identifying operation
(1)procedure?ParamRecognize
(2)BEGIN
(3)I:=0
(4) FORI:=1 to OP.Number //OP.Number be exactly the operation OP the operand number
(5) BEGIN
(6) according to the type of I content of operation, identify I content of operation
(7) END
(8)?NEXTI
(9)END
Type difference according to the workable basic assumption of the method for the type identification content of operation content of operation that is exactly each operation.That is to say, at a specific operation OP, in the formula, each content of operation p
1, p
2..., p
NopType all inequality, perhaps identical person is seldom.This hypothesis is rational in a lot of applications, in " file operation " field of telling about previously, each operation necessary operations keeps count of and is 1 or 2, and to the operation of each bar, its content of operation type has two kinds: 1. file (format_file), 2. program (format_program).In the algorithm, attempt from input text, to identify the content of operation of format_file, format_program form, wherein the content of operation of format_file form is necessary, and the content of operation of format_program form all has default value in each bar operation, if therefore successfully do not identify the content of operation of format_program form, then can Use Defaults.Content of operation identification for the format_file form, the embodiment of the invention has taked the method for approximate match to realize, the filename of storing in the command text of user input and the computing machine local storage is carried out approximate match, select one to return apart from the file of minimum with command text and to get final product (tell about below, this computation process is realized by " example recognition algorithm ").
A more typical example is that natural language understanding is applied to stock exchange, " buying " wherein arranged, operation such as " sell ", they all need 3 content of operation (these all are the prioris in stock field): 1. stock name, the price of 2. expecting transaction, 3. Jiao Yi number of share of stock, the form of each content of operation is all different, be respectively 1. " stock name " (format_text), 2. money price (format_money), and 3. digital quantity (format_number), when analyzing input text, the embodiment of the invention is the content of operation of these 3 kinds of forms of identification from command text as far as possible: 1. the stock name is a character string, 2. expect that transaction value is that a digital quantity of representing price (may be a decimal, and have unit, the angle, unit grades), 3. number of share of stock then is an integer, generally with " hand " as unit (1 hand is exactly 100 strands).The content of operation identification of these three kinds of forms is all fairly simple, and algorithm flow is as follows:
Content of operation identification in the algorithm stock exchange
(1)procedure?ParamRecognizeInStock
(2)BEGIN
(3) FLOAT price:=identifies price according to the currency recognizer
(4) IF (IsInvalid (price)) if // price is invalid
(5) BEGIN
(6) MessageBox (" please provide transaction for price.”)
(7) RETURN
(8) END
(9) INTEGER count:=identifies the stock exchange number according to the digit recognition algorithm
(10)?IF(IsInvalid(count))
(11) BEGIN
(12) MessageBox (" please provide number of transaction.”)
(13) RETURN
(14) END
(15) STRING name:=identifies the stock name according to stock name recognizer
(16)?IF(IsInvalid(name))
(17) BEGIN
(18) MessageBox (" please provide the stock name.”)
(19) RETURN
(20) END
(21)END
Step 12: computer operation is represented with the ordered sequence that content of operation is formed by operating itself, represent the complete meaning of one's words by the ordered sequence that the core meaning of one's words and the peripheral meaning of one's words are formed, a plurality of concept structures become the complete meaning of one's words, and the complete meaning of one's words of computer understanding also draws operation.
Overall situation linguistic context and subenvironment linguistic context:
Linguistic context (Context) is also called hereinafter (also having the people to be translated as " context " according to its English form), refer to people when using natural language to exchange with exchange relevant various factors, i.e. " environment ".Think that it comprises two aspects: overall situation and subenvironment.
Overall situation refer to exchange purpose default, exchange the field at place, the topic of interchange etc. and the closely-related factor of whole communication process, these factors are the residing environment of whole communication process, thereby influence whole communication process, also influence the many details in the communication process, such as expression way.
Subenvironment is meant the residing language environment of every a word in the interchange, comprise preamble, hereinafter, many-sided factor such as semanteme of expression way, expression, intonation, so a part of content of subenvironment people's interchange just, they have been expressed by variety of way by exchanging both sides.For the hearer, it helps understanding the definite semanteme of each language, for words person, and the tissue that it helps exchanging (expression way etc.), here, hearer and words person are mutual.In the dialog procedure, the subenvironment linguistic context only is meant that those exchange the content that both sides have expressed, i.e. " above "; In the interchange of article mode, except " above ", the subenvironment linguistic context of each language also comprises the content that is in this language back in this article, i.e. " hereinafter ".
Overall situation will influence the expression, expression way of both sides' semanteme in the communication process etc., also the theme that exchanges of influence.
Subenvironment helps the semanteme that the hearer definitely understands each language, and can eliminate because the misunderstanding that unique expression way of words person etc. may cause, thereby reaches the purpose of ambiguity resolution, makes interchange to go on smoothly.Because Functions of Context, in interchange, some linguistic forms may have been expressed the non-existent semanteme of itself (generally few in other words semanteme that uses), this often runs in daily interchange: such as " 555 ", itself has just been expressed a number (being exactly that natural number between 554,556), because a kind of plate of cigarette is named as " 555 ", and the cigarette of this plate is also very famous, thereby " 555 " refer under some linguistic context is exactly this cigarette (as " 555 smoke very fragrant "); On the other hand, in in Chinese, pronouncing, " 555 " and " toot toot " partials, import more convenient and swift than the latter by keyboard, thereby in order to adapt to quick succinct characteristics cybertimes, when a lot of people issues the information of " toot toot " on network just directly use " 555 " replaced (especially on BBS), make " 555 " have this semanteme again, and especially be used on the network (if on network, see its independent appearance, mostly can think all under the situation that it is exactly " toot toot " in fact), use manyly, people are sending e-mails, even when writing letter, also can use this semanteme of " 555 ", and the other side also can understand this meaning.Similarly, " 911 " also are good examples.
In fact, these two kinds of linguistic context of overall situation and subenvironment do not have clear and definite especially boundary yet, and just for the convenience of Computer Processing, the embodiment of the invention is distinguished both of these case.
4.1 obtaining of linguistic context:
Subenvironment linguistic context itself is exactly a part that exchanges, and therefore needs directly to obtain from the content that exchanges, and this process need computing machine is realized automatically.In actual conversation, it comprises every a word and all conversation content before it; In the interchange of article type, then comprised the content of entire article.What need the emphasis consideration here is the semanteme and the expression way thereof of these contents, especially semantic, and it realizes that for context of use semantic row is divergent helpful.During realization, the result at all levels of semantic analysis can be used as linguistic context, the language performance form also is the part of linguistic context.Note comprising information as much as possible simultaneously, such as to one piece of article, author, title, summary, keyword, text, list of references, in addition the time that article is delivered can handle as linguistic context, just obtain various information as much as possible.For aspectant communication process (oral communication), then except semanteme, information such as words person's expression-form, intonation, word speed, voice, omission all are very important.
Because overall situation linguistic context itself, it is can advanced processing good, and as knowledge base store (this part work can computing machine automatically realization or manually collect and build the storehouse).As realm information is exactly the part of overall situation linguistic context, can anticipate.In the real example, as control the application system of autoabstract with voice, can collect the information of following aspect in advance: the position that document to be made a summary is stored, the default-length (can be percentage or occurrence, perhaps determine according to textual content) of summary possiblely when making an abstract is laid particular stress on setting, send the expression-form commonly used of summary order, some idiom, abbreviation, fuzzy concept (such as, up-to-date, nearest) or the like.Some information is used and is manually collected and build the storehouse possible operation simply.
Some system designs for specific area, can only work in specific area.At this moment, need an essential information that important content is exactly this field comprising in the overall situation, such as common-use words, be called for short, expression-form, the semanteme that some word is had in this field etc., these information are got up cumbersome by artificial collection, and it is perfect inadequately, can collect these information (study just) automatically this moment by computing machine, designs a kind of learning algorithm preferably, then a large amount of typically articles in this field imported computing machines and learnt as an example, thereby can collect realm information automatically, and learning outcome can accumulate gradually, saved many artificial work like this, also is not easy to omit [93].
4.2 the expression of linguistic context:
The indicating of linguistic context is prerequisite to make things convenient for computing machine to use, and is convenient to people as far as possible and reads.At first handle those the most frequently used, as to need most linguistic context knowledge.
Here, the present invention still is divided into linguistic context two aspects of overall situation and subenvironment and handles.
1, the overall situation linguistic context is represented
The overall situation linguistic context mainly comprises the theme of domain knowledge, field characteristic, interchange etc., has taked distinct methods to represent at the various knowledge embodiment of the invention, and topmostly has two kinds:
1. use " table " to deposit the semantic and relevant knowledge [100] [129] of the particular term in the field, such as in computer realm, " virus ", " net ", " object-oriented ", " language " etc. all have specific connotation, in other field, they may represent other semanteme, represent as shown in table 10:
Term knowledge uses " table " mode to express (being example with computer application field) in the field
Term | Semantic |
…… | …… |
Virus | One section program with infectiousness, destructiveness, latency, hiding property |
…… | …… |
Object-oriented | The method of a kind of system design, structure, machine word call the turn this method of also often using |
…… | …… |
Net | The Internet, Internet |
…… | …… |
Language | The programmed symbol algorithm that computer programming is used is such as C language, BASIC |
…… | …… |
Table 10
At present popularly on the network variously write a Chinese character in simplified form, term such as partials also can be organized into the form of " table ", so that expand, revise, delete, the left column of table is deposited " term " (as " 555 "), and right row are then deposited the connotation (as " toot toot, painful appearance ") of this term in the network field.
For handling conveniently, the left column of table is handled through sort (Sort), adopts the binary chop algorithm to shorten the time of searching greatly when inquiring about like this, and the right side row of table also will be followed the left column rearrangement, and are constant with the corresponding relation of assurance and left column.
2. use " rule " expression expression to have the linguistic context knowledge of reasoning character, these reasonings are defined in the residing field that exchanges.As discuss about tutor and classmate thesis example in, some overall situation knowledge are represented with regard to the mode of using " rule ":
……
……
2, the subenvironment linguistic context is represented
The subenvironment linguistic context is mainly handled is the theme that exchanges, language semanteme, language performance form etc., and the mode that native system adopts is directly language to be understood result's (promptly semantic) to preserve as data structure, and extracts the theme of interchange from the language interchange.Only exist above in the actual conversation system, main what handle is above communication language and semanteme, linguistic form, and operation and the operating result carried out thus, adopts the expression mode of framework.Following is the frame representation of two executed user commands in " realization of autoabstract system ", referring to shown in Figure 10
4.3 the use of linguistic context
Computing machine also can copy these processing modes of people to handle.In the system of the embodiment of the invention, mainly used language ambience information to finish following work:
1, concept coordination
During concept coordination, being used for assembly goes out suitable conceptualization form, especially when the assembly ideational form, find out suitable ideational form assembly according to current language ambience information, there are two kinds of citation forms such as the assembly in " research life ", a kind ofly being " postgraduate | life ", a kind ofly is " research | life ", preliminary adopts the frequency that their occur in the statistics original text to select here.
2, semantic row's fork
Here made full use of the overall situation linguistic context, domain knowledge is especially wherein determined its semanteme in order to the linguistic form that some is existed ambiguity, selects the semanteme more close with specific area.
3, information extraction from linguistic context
This mainly is at dialog procedure.Often exist in the Chinese and omit composition, and they generally can obtain from language ambience information, example as " realization of autoabstract system " command execution of telling about previously, the order that the user at first sends is " please open ' natural language understanding teaching materials ' this file and do 200 words summary ", (executive system Uses Defaults: Chinese and English autoabstract system of Shanghai Communications University) wherein to have comprised enough content of operation information, and in order in back " too short; as please to increase by 500 words ", some compositions have then been omitted: article to be made a summary, know that from linguistic context it still is that piece article of just having finished summary, i.e. " natural language understanding teaching materials "; Executive system has also been omitted, and also should be the autoabstract executive system of using just now here, i.e. " Chinese and English autoabstract system of Shanghai Communications University ".
Step 13~14: different by operation, identifying operation content from command text, locating content indication resource, resource are positioned with two contents: (1) determines file is under which catalogue; (2) what the filename of determining file is.It mainly is exactly the process that travels through directory tree and matching files name.Realize by algorithm.Detailed process is to use finite-state automata to realize the extraction of keyword in the title, the keyword dictionary is gathered as pattern string, be input in the longest coupling finite-state automata, use finite-state automata to realize " extracting as required " then: each property value of each resource is all joined in the finite-state automata with the form of pattern string, make up automat then, and with its scan command text, then automat is with the source attribute values that occurs in the output command string.Can navigate to corresponding resource by property value.
Resource location during the computing machine executable operations:
Type difference at content of operation, need different disposal routes: numeric type (format_number), date (format_date), time (format_time) if etc. the content of type identify all right, the content of file and Program Type then need be located corresponding file, program in computing machine, and calls them; On this meaning, also content of operation can be divided into direct type content of operation (abbreviating direct content as) and indirect-type content of operation (abbreviating indirect content as).Here, the data relevant with operation are referred to as " resource ", the position fixing process of content is called " resource location " indirectly.Directly content and indirect contents processing process are seen shown in Figure 11.Dotted arrow refers to solid arrow place content transmitted among the figure.
Be the convenience of statement, hereinafter claim the resource of content of operation resource pointed, be called the resource of this document name such as a pairing file of filename for this content of operation.
Indirect content basis type difference, the indication resource location is also different, and (may be a certain data in the database, the perhaps file in the file system or certain remote resource be as Internet resources, Deng), corresponding localization method is also different, and (resource requirement is opened database and is obtained data in the database; Resource on the network need arrive removal search on the network, or the like).Here tell about the method for locating file resource in the local file system used herein, and the design of resource location algorithm.
5.1 resource searching
Suppose that indirect content indication resource is a file, this moment, resource was positioned with two contents: (1) determines file is under which catalogue; (2) what the filename of determining file is.
In the computing machine, file system is generally by the catalogue form tissue, and at present popular catalogue adopts tree structure, the root node of tree promptly is the root directory of file system, each node of tree all is a catalogue, and can deposit the file of some under the catalogue, Figure 12 is a typical file system structure figure.Thereby the process of locating file is exactly the process of traversal directory tree and matching files name, and traversal can realize with formation.
5.2 resources bank
The resource that the method for using resource searching is located indirect content has been told about in the front, and is example with the file resource, however it have two significantly not enough:
(1) arithmetic speed is slow
The resource location need be sent the back in natural language instructions and be searched on local computer or telecommunication network, if all kinds of resources that comprise in the local computer are less, response speed can be satisfied the demand so, if yet the various resource quantity that comprise in the local computer very big (many) such as the file in the file system, then must cause search speed to reduce, thereby not satisfy requirement; The telecommunication network search relates to a plurality of links such as network, and speed will be lower; Here the problem of Xian Shiing is resource not to be carried out pre-service;
(2) can only search for simple resource characteristics
Obviously, resource is directly showed, the characteristic that is easy to get, search for fairly simple quick, to those complexity, need characteristic through certain processing, then certainly exist a large amount of computings in the search, greatly reduce response speed, such as the document retrieval in the file system, filename is its characteristic that shows, algorithm is simpler during search, if yet the Rule of judgment of search need be according to the author of file, keyword, the title of file, summary, the characteristic that descriptor etc. are comparatively complicated must relate to the respective handling to file content, thereby slow down response speed.
For (1), owing to all kinds of resources are not classified before handling, caused waste search time on many irrelevant resources, such as, the text search, wherein the file that will search for is not program file, binary executable, if search procedure can be defined in the text document class, can save the plenty of time, for the resource of same type, if can sort or index process, then can significantly improve processing speed;
For (2), the processing that complex characteristics needs mostly relates to Intelligent treatment, because the level of intelligence of computing machine is still relatively lower at present, therefore in search procedure, handle the work (each search all needs these work of repetition) that must spend the plenty of time and cause repetition by computing machine, if these Intelligent treatment can be finished, then can save a large amount of search times before search procedure.In addition, the characteristic of resource generally remains unchanged, as the author of document, document creation time, keyword etc., therefore, can be before use these characteristics of resource be handled well by artificial (or computing machine), directly be called during search and get final product, so just save the work in a large amount of search processing, thereby improved response speed, can handle the complex characteristics of resource simultaneously again.
Comprehensively the better solution of (1) and (2) is that resource is built the storehouse, is called resources bank.On the one hand, in building the storehouse process, can only handle interested resource type, for other resources, can not be stored in (dissimilar resources also can be put in the different storehouses) in the storehouse,, can only word or file be handled in the storehouse such as above-mentioned file, other file (as program file) does not then deal with, and has so objectively played the purpose of resource classification; On the other hand, the simple character and the complex characteristics of resource all can be anticipated (but artificial treatment or use a computer processing automatically or the combination of two aspects), be stored in the storehouse, can directly call the resource characteristics in the storehouse when searching for like this.And, at the characteristic of resource, can the resource in the storehouse be sorted, processing such as index, thereby improve the speed of computing machine locating resource.
Here for the example of a library format, ginseng is shown in Table 11, and text document is built the library structure example.The structure in this storehouse can be used for the application system of word or file processing aspect.
Filename | Document Title | The document time | Document author | Document keyword |
… | … | … | … | |
A.DOC | Realization based on the Computerized intelligent of natural language understanding summary | 2007.10 | Ma Zhankai etc. | The man-machine conversation of natural language understanding sentence pattern |
B.DOC | Slideshow natural language man-machine interface based on knowledge | 2007.5 | Ma Zhankai etc. | The man-machine conversation of natural language understanding sentence pattern |
C.DOC | The automatic Study of recognition and the realization of numeral in the Chinese text | 2007.3 | Ma Zhankai etc. | The natural language understanding digit recognition |
… | … | … | … | … |
Table 11
Another benefit of building the storehouse is to carry out internal sorting (such as for word or file, can indicate the affiliated field of the document, as news, science and technology, prose etc.) to the resource that system needs, and is convenient to system handles like this.
Using resource to build the part work that the storehouse mode brings is the maintenance of resources bank, mainly is to increase, delete operation because the numerous characteristics of resource is constant, therefore the storehouse can be anticipated, this process can be manually or computing machine finish.
5.3 resource location algorithm
The resource location algorithm is to calculate the whether resource of user's indication of certain resource, belongs to the operation of relative microcosmic.
A resource has many-sided attribute, each property value is exactly resource projection on this attribute direction, and people always realize by the property value of allocated resource when using natural language to come allocated resource, and the part property value of general allocated resource, rather than all.Thereby during the resource location, need come recognition resource according to the part property value of resource.
The process of resource identification is exactly the process of coming recognition resource according to the attribute of resource, realizes the algoritic module of this function, is referred to as resource localizer.
Typical resource such as document, it has attributes such as filename, path, title, author, time, as shown in figure 13, people are when censuring a document resources, property values such as frequent its title of use, author, time are as just having used the title attribute value in " opening this piece of digital earth document ".
Two cardinal rules of resource and attribute thereof are:
(1) different resource of same type, the property value of their same attributes can be identical (yet all properties value of any two resources can not be identical, otherwise be exactly same resource).Specified a property value as the user, then associated resource all may be the specified resource of user.
(2) do not consider order between the attribute, it is quite obvious.The user does not generally consider the order between the attribute yet when coming allocated resource by attribute.
In the daily life, the title of some resource (here, title is also regarded one of characteristic of resource as, is name attribute) is oversize, is not easy memory, and people have often only remembered a part of character/word of title, also have to use this part title to come allocated resource.In this case, the keyword in the title often remembered of people.
In " giving me that record word and so on document " as order, the user has forgotten full name of the document, only remembers wherein to have comprised " numeral " this speech, and it is positioned at the front of title probably, in being to use " numeral and so on " to censure the document resource.At this moment, the office worker will arrive in the document library and seek, when 1. he find " during the digital earth document, obviously to meet the requirements, therefore be extracted out; find that then 2. " digital living " document also meets the requirements; also relevant with order when 3. he find " strange numeral " document once more, just the degree of correlation is littler; so he will submit this three pieces of documents to, and last 3. placing.
Refinement once, if regard the title of resource as an object (being the title object), (preamble is mentioned and with the keyword in this title, keyword in the title often that the user remembers) all extracts, respectively as the unnamed property value of title object, when not considering in the title word order between the keyword (also ignoring the order between unnamed attribute), the identifying of resource name also can use the resource location algorithm to realize so.
The extraction algorithm of keyword is fairly simple in the title, uses finite-state automata just can realize.The keyword dictionary is gathered as pattern string, be input in the longest coupling finite-state automata of telling about the front, make up this finite-state automata then, and use its to scan resource name text, then finite-state automata will be exported the keyword in the keyword dictionary that occurs in the resource name, thereby reach the purpose of keyword Automatic.That is to say that the extraction process of keyword (being the property value building process of title object) can be realized automatically by computing machine.In invention, just be achieved in that and be the scale of minimizing system, directly each expression-form of notion in the foregoing concept dictionary is being combined into crucial dictionary and uses.
Tell about the design of resource location algorithm below.
The resource location algorithm will solve two key problems:
How does one identify which attribute that has occurred resource in the command text?
Its two since the property value of resource may be identical, how to distinguish the specified resource of user according to the source attribute values that occurs in the command text?
The identification of the source attribute values that occurs in the command text
A kind of method is the attribute type according to resource, the text string of expression Resource Properties in the command text is divided in the different Attribute class, respectively the text of expression the type attribute of occurring in this generic attribute of each resource and the command text is mated in each Attribute class then, it is as follows to give an example:
Example: order: " ' natural language teaching materials ' that Ma Zhankai writes are opened "
Analyze beginning, " Ma Zhankai " this text string is divided into the author, and " natural language teaching materials " this text string is divided into Document Title, then successively at document library author hurdle coupling " Ma Zhankai " this string, and at Document Title hurdle coupling " natural language teaching materials " this string, matching result may be:
(1) document " natural language teaching materials ": Document Title and author all occur in command text;
(2) document " digit recognition method ": Document Title does not occur in command text, and the author appears in the command text.
Here use finite-state automata to realize " extracting as required ": each property value of each resource is all joined in the finite-state automata with the form of pattern string, make up automat then, and with its scan command text, then automat is with the source attribute values that occurs in the output command string.Can navigate to corresponding resource by property value.
After joining in the AC machine as each property value with document resources as pattern string, use the AC machine that input text " is opened ' natural language teaching materials ' that Ma Zhankai writes " and scan, then the AC machine will be exported:
(1)
Ma Zhankai: author property, the place resource:
1. " natural language teaching materials,
2. " count Word recognition methods "
(2)
The natural language teaching materials: the Document Title attribute, the place resource:
1. " natural language is said Justice "
That is to say, occurred two property values of resource " natural language teaching materials " in the command text: author property and title attribute, and the property value of resource " digit recognition method " has appearred simultaneously: author property.This method implements fairly simple, has adopted this method in native system just, has obtained effect preferably.
Step 15: execution module is carried out, according to using weights mechanism to judge that a plurality of attributes of different resource discern user's indication resource.At first calculate the natural language instructions of input and " distance " of each resource, select the minimum resource that will operate as the user of " distance " then.
Discern user's indication resource according to the source attribute values that occurs:
The value of the same attribute of different resource may be identical, makes that a property value is related with a plurality of resources, because user's command text relates to a plurality of attributes of different resource, thereby need discern user's indication resource according to these attributes.
It is considered herein that the resource that the source attribute values that occurs in the user command text is related all might become the resource that the user refers in particular to, just possibility varies in size, so the present invention uses weights mechanism to judge.At first calculate the natural language instructions of input and " distance " of each resource, select the minimum resource that will operate as the user of " distance " then, the algorithmic procedure model as shown in figure 14.
Among Figure 14, S represents natural language instructions text, O
iRepresent i resource, and V
IjJ property value representing i resource, dis (S, O
i) expression S and O
iBetween distance, and also use dis
iExpression.
What total total m resource in the supposing the system now, and user operated is n resource, so:
dis
n=min{dis(S,O
1),dis(S,O
2),...,dis(S,O
k),...,dis(S,O
m)}
=min{dis
1, dis
2..., dis
k..., dis
m(formula 5-3-1)
Because the distance calculation relevant with resource all is to realize by the property value of resource, therefore in (formula 2-6) between S and k the resource apart from dis
kCan be calculated as follows:
The resource of same type has the attribute of similar number and type, and therefore (formula 2-7) can improve as follows:
dis
k=dis(S,O
k)=dis(S,V
k1,V
k2,...,V
kN) (5-3-3)
Here N is a user's interest attribute number in this resource type.For the resource of particular type, the embodiment of the invention defines the weights of its each attribute, and the weights of i attribute are defined as follows:
Generally,, think that then it is 0 for the identification contribution of resource in command text, therefore β in addition if property value does not occur
i=0, certainly as the case may be, also can define its value for needing arbitrarily.
Next step work is according to property value V
IjWhether occur in command text S, it is as follows to define a sign function:
Therefore, when (formula 2-8) calculates, at first calculate sign (S, V
Ij), and according to the weights W of respective attributes
j, draw product sign (S, V
Ij) * W
j, then with these products according to certain algorithm organization, obtain the net result of (formula 2-8), as follows:
This result has been arranged, can calculate the resource n of minor increment, thereby identified the resource of user's operation by (formula 2-11).
The overall solution of the natural language understanding problem of " from the superficial to the deep, deal with problems to the field one by one " that the project of mainly having introduced is above taked, in the current generation, the embodiment of the invention has been selected the breach of the man-machine conversation in the commercial field as natural language understanding; Next has described the natural language understanding method based on notion, comprise: improve finite-state automata and realized the identification of the assembly of notion, notion and obtained the method for the complete meaning of one's words, proposed the method for utilizing priori to help understand by notion; Then describe context of use and assist understanding, linguistic context is divided into the overall situation linguistic context to native system and the subenvironment linguistic context is handled, and described obtaining, represent and using of they respectively; Set forth computing machine resource position fixing process during executable operations after obtaining the meaning of one's words at last, proposed and realize that the mode of resources bank comes organizational resources, and designed the resource location algorithm and come locating resource.
Step 16: result's output presents, and realizes man-machine interaction.
Based on said method embodiment of the invention human-computer interaction intelligent system architecture, referring to shown in Figure 15, Figure 15 shows the network structure that comprises system of the present invention, comprising: center Intelligent treatment service platform 101, operator's subsystem 102, enterprises end subsystem 103 and End-Customer 104.Wherein, center Intelligent treatment service platform 101 is the man-machine interactive intelligence of the present invention system.
In center Intelligent treatment service platform 101, comprising: knowledge base server 1011 is used to preserve commercial knowledge data and management system frequently-used data; Artificial intelligence engine 1012 is used for the problem of user's input is carried out analyzing and processing and obtained the answer that retrieval is submitted to; Data statistic analysis unit 1013 is used for user and artificial intelligence engine interaction data are carried out statistical study.
Operator's subsystem 102 is used to the enterprise commerce robot that functions such as intelligent search, advertisement accurately propelling movement, Voip value-added service are provided.Comprising: operator business gateway 1021, advertisement and value-added service server 1022.Be made up of website, business gateway and advertisement and value-added service server, major function is on the one hand in information such as each company information of business gateway set of websites, product, videos, and carries out the issue of fresh information, the accurate propelling movement of advertisement.On the other hand, utilizing VOIP and value-added service is that (visitor's fast registration is sent 20 integrations for example, and mobile telephone registration send 170 integrations, on the basis of (fill in and recommend sign indicating number to send 30 integrations again), as long as 5 minutes network telephone call time is just sent in registration again in numerous visitors' service.In addition, can also send note (Fetion of similar China Mobile).
Enterprises end subsystem 103 is used for managerial knowledge storehouse, FAQ, log analysis management etc., comprising: business intelligence robot 1031, enterprise self-determining service unit 1032 etc.Form by WEB mesh engine and enterprise web site server, during practical application " business intelligence robot " is embedded in the enterprise web site.The owned mandate account number of enterprise can online management and the enterprise commerce knowledge base on modification " business intelligence robot " backstage and inquiry visitor's information.As: can upload enterprise's file, picture, commercial video in the knowledge data base of backstage.
End-Customer 104 carries out with the business intelligence robot by enterprise web site or operator business gateway etc. alternately.End-Customer browses enterprise web site or website, operator business gateway need not to download any software, and clicking " business intelligence robot " icon just can directly (MSN, QQ and " business intelligence robot " carries out on-line consulting, a series of interactivity intelligence operations such as product video display, message are inquired about, ordered, watch to product by webpage, instant communication software.
Description of the invention is in order to provide for the purpose of example and the explanation, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selecting and describing embodiment is for better explanation principle of the present invention and practical application, thereby and makes those of ordinary skill in the art can understand the various embodiment that have various modifications that the present invention's design is suitable for special-purpose.
Claims (13)
1. a natural language understanding method is characterized in that, comprising:
After receiving the natural language of user's input, the natural language assembly is gone out the conceptual language symbol, notion and conceptual language symbol are associated;
By comparing default notion dictionary, therefrom choose the notion that meets current linguistic context most, judged whether ambiguity, if having, then draw notion by the corpus technology, enter next step; If no, adopt the meaning of one's words to meet collocating principle and directly draw notion, enter next step;
Concept identification also draws key concept and peripheral notion, and key concept is by the clear and definite core meaning of one's words of computer operation itself, and peripheral notion is by the clear and definite peripheral meaning of one's words of computer operation content;
Try to achieve complete semanteme according to the core semanteme and in conjunction with peripheral semanteme.
2. method according to claim 1 is characterized in that, the notion dictionary is set, and the notion dictionary mainly comprises: between the notion and the various relations between the attribute of notion, and the corresponding relation between notion and its linguistic notation;
Described natural language assembly process comprises: use the assembly algorithm that the conceptual language symbol assembly from original text that comprises in the notion dictionary is come out.
3. method according to claim 2, it is characterized in that the relation between the described notion comprises: last the next, the synonym between the notion, antisense, to justice, whole and part, attribute and host, material and finished product, main body and incident, content and event relation.
4. method according to claim 1 is characterized in that, the described concept process that meets current linguistic context most of choosing adopts the meaning of one's words to meet collocating principle and adopts the linguistic context principle that is consistent.
5. according to claim 1 or 4 described methods, it is characterized in that described concept identification also comprises the recognizer of utilizing digital quantity, the digital quantity in the text is identified.
6. method according to claim 1 is characterized in that, describedly also comprises after trying to achieve complete semanteme: according to the natural language meaning of one's words of setting up and the mapping relations identifying operation content between the computer operation;
Different by operation, identifying operation content from command text, locating content indication resource;
Executable operations, the output result also presents.
7. method according to claim 6 is characterized in that, the foundation of the mapping relations between the described natural language meaning of one's words and the computer action is to bind realization by the core meaning of one's words and computer operation, operand.
8. method according to claim 6 is characterized in that, adopts sentence pattern method and priori method to try to achieve these two kinds of methods of the meaning of one's words in conjunction with coming the identifying operation content.
9. system according to claim 8, it is characterized in that, the process of described employing sentence pattern method comprises: the various sentence patterns of collecting each operation, and they are organized be stored in the knowledge base, and use rising space coupling finite-state automata to realize the identification of sentence pattern, afterwards, come content of operation in the recognition command text according to this sentence pattern.
10. system according to claim 9 is characterized in that, described resource location comprises: determine file is under which catalogue; What the filename of determining file is.
11. method according to claim 10, it is characterized in that, described definite file directory, determine that filename is by traveling through the realization of directory tree and matching files name, comprise: use finite-state automata to realize the extraction of keyword in the title, the keyword dictionary is gathered as pattern string, be input in the longest coupling finite-state automata, use finite-state automata to realize " extracting as required " then: each property value of each resource is all joined in the finite-state automata with the form of pattern string, make up automat then, and with its scan command text, then automat navigates to corresponding resource with the source attribute values that occurs in the output command string by property value.
12. method according to claim 6, it is characterized in that, described computer operation is represented with the ordered sequence that content of operation is formed by operating itself, represent the complete meaning of one's words by the ordered sequence that the core meaning of one's words and the peripheral meaning of one's words are formed, a plurality of concept structures become the complete meaning of one's words, and the complete meaning of one's words of computer understanding also draws operation.
13. a human-computer interaction intelligent system is characterized in that, comprising: knowledge base server is used to preserve commercial knowledge data and management system frequently-used data;
Artificial intelligence engine is used for the problem of user's input is carried out analyzing and processing and obtained the answer that retrieval is submitted to;
The data statistic analysis unit is used for user and artificial intelligence engine interaction data are carried out statistical study.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2007101957208A CN101178705A (en) | 2007-12-13 | 2007-12-13 | Free-running speech comprehend method and man-machine interactive intelligent system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2007101957208A CN101178705A (en) | 2007-12-13 | 2007-12-13 | Free-running speech comprehend method and man-machine interactive intelligent system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101178705A true CN101178705A (en) | 2008-05-14 |
Family
ID=39404963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2007101957208A Pending CN101178705A (en) | 2007-12-13 | 2007-12-13 | Free-running speech comprehend method and man-machine interactive intelligent system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101178705A (en) |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894548A (en) * | 2010-06-23 | 2010-11-24 | 清华大学 | Modeling method and modeling device for language identification |
CN101901599A (en) * | 2009-05-19 | 2010-12-01 | 塔塔咨询服务有限公司 | The system and method for the quick original shapeization of the existing voice identifying schemes of different language |
CN102184167A (en) * | 2011-05-25 | 2011-09-14 | 安徽科大讯飞信息科技股份有限公司 | Method and device for processing text data |
CN102609410A (en) * | 2012-04-12 | 2012-07-25 | 传神联合(北京)信息技术有限公司 | Authority file auxiliary writing system and authority file generating method |
CN102641589A (en) * | 2011-02-21 | 2012-08-22 | 科乐美数码娱乐株式会社 | Game system and control method thereof |
CN102667889A (en) * | 2009-12-16 | 2012-09-12 | 浦项工科大学校产学协力团 | Apparatus and method for foreign language study |
CN102831207A (en) * | 2012-08-06 | 2012-12-19 | 北京小米科技有限责任公司 | Computer terminal and information interaction method |
CN102868695A (en) * | 2012-09-18 | 2013-01-09 | 天格科技(杭州)有限公司 | Conversation tree-based intelligent online customer service method and system |
CN102929859A (en) * | 2012-09-27 | 2013-02-13 | 东莞宇龙通信科技有限公司 | Reading assistive method and device |
CN103577198A (en) * | 2013-11-22 | 2014-02-12 | 中国联合网络通信集团有限公司 | User-oriented Internet of Things service platform and remote control method |
CN103955449A (en) * | 2014-04-21 | 2014-07-30 | 安一恒通(北京)科技有限公司 | Target sample positioning method and device |
CN104240700A (en) * | 2014-08-26 | 2014-12-24 | 智歌科技(北京)有限公司 | Global voice interaction method and system for vehicle-mounted terminal device |
CN104462758A (en) * | 2014-11-03 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Method and device for generating label sequence of observation character strings |
CN104598609A (en) * | 2015-01-29 | 2015-05-06 | 百度在线网络技术(北京)有限公司 | Concept processing method and device for vertical field |
CN104679472A (en) * | 2015-02-13 | 2015-06-03 | 百度在线网络技术(北京)有限公司 | Man-machine voice interactive method and device |
CN104765729A (en) * | 2014-01-02 | 2015-07-08 | 中国人民大学 | Cross-platform micro-blogging community account matching method |
CN104808497A (en) * | 2015-02-15 | 2015-07-29 | 联想(北京)有限公司 | Information processing method and first electronic device |
CN105096942A (en) * | 2014-05-21 | 2015-11-25 | 清华大学 | Semantic analysis method and semantic analysis device |
CN105260178A (en) * | 2015-09-21 | 2016-01-20 | 上海智臻智能网络科技股份有限公司 | Intelligent cloud service application development method and system |
CN105469801A (en) * | 2014-09-11 | 2016-04-06 | 阿里巴巴集团控股有限公司 | Input speech restoring method and device |
CN105491090A (en) * | 2014-09-17 | 2016-04-13 | 阿里巴巴集团控股有限公司 | Network data processing method and device |
CN105590626A (en) * | 2015-12-29 | 2016-05-18 | 百度在线网络技术(北京)有限公司 | Continuous speech man-machine interaction method and system |
CN105912701A (en) * | 2016-04-26 | 2016-08-31 | 南京玛锶腾智能科技有限公司 | File processing method for intelligent robots |
CN106057205A (en) * | 2016-05-06 | 2016-10-26 | 北京云迹科技有限公司 | Intelligent robot automatic voice interaction method |
CN106407196A (en) * | 2015-07-29 | 2017-02-15 | 成都诺铱科技有限公司 | Semantic analysis intelligent instruction robot applied to logistics management software |
CN106575504A (en) * | 2014-04-17 | 2017-04-19 | 软银机器人欧洲公司 | Executing software applications on a robot |
CN106663426A (en) * | 2014-07-03 | 2017-05-10 | 微软技术许可有限责任公司 | Generating computer responses to social conversational inputs |
CN107003999A (en) * | 2014-10-15 | 2017-08-01 | 声钰科技 | To the system and method for the subsequent response of the first natural language input of user |
CN107195303A (en) * | 2017-06-16 | 2017-09-22 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN107329986A (en) * | 2017-06-01 | 2017-11-07 | 竹间智能科技(上海)有限公司 | The interactive method and device recognized based on language performance |
CN107526514A (en) * | 2016-06-21 | 2017-12-29 | 阿里巴巴集团控股有限公司 | digital information input processing method and device |
CN108241646A (en) * | 2016-12-23 | 2018-07-03 | 阿里巴巴集团控股有限公司 | A kind of searching and matching method and device recommend method and apparatus |
CN108369806A (en) * | 2016-01-22 | 2018-08-03 | 微软技术许可有限责任公司 | Configurable all-purpose language understands model |
CN108399919A (en) * | 2017-02-06 | 2018-08-14 | 中兴通讯股份有限公司 | A kind of method for recognizing semantics and device |
CN108573046A (en) * | 2018-04-18 | 2018-09-25 | 什伯(上海)智能技术有限公司 | A kind of user instruction treatment method and device based on AI systems |
CN108764480A (en) * | 2016-08-23 | 2018-11-06 | 上海智臻智能网络科技股份有限公司 | A kind of system of information processing |
CN109388695A (en) * | 2018-09-27 | 2019-02-26 | 深圳前海微众银行股份有限公司 | User's intension recognizing method, equipment and computer readable storage medium |
CN109815322A (en) * | 2018-12-27 | 2019-05-28 | 东软集团股份有限公司 | Method, apparatus, storage medium and the electronic equipment of response |
CN110494841A (en) * | 2017-05-03 | 2019-11-22 | 谷歌有限责任公司 | Context language translation |
CN111539219A (en) * | 2017-05-19 | 2020-08-14 | 北京蓦然认知科技有限公司 | Method, equipment and system for disambiguating natural language content title |
CN111695131A (en) * | 2020-06-23 | 2020-09-22 | 上海用正医药科技有限公司 | Document management method and system for clinical trial |
CN111916161A (en) * | 2020-06-23 | 2020-11-10 | 上海用正医药科技有限公司 | Method and device for collecting and converting multiple data sources in clinical test process |
US10909969B2 (en) | 2015-01-03 | 2021-02-02 | Microsoft Technology Licensing, Llc | Generation of language understanding systems and methods |
CN113204943A (en) * | 2021-05-05 | 2021-08-03 | 杭州新范式生物医药科技有限公司 | Method for structured representation of semantic meaning and method for recognizing a semantic meaning sequence as a semantic meaning |
CN113420570A (en) * | 2021-07-01 | 2021-09-21 | 沈阳创思佳业科技有限公司 | Method, system and device for improving translation accuracy |
CN114119213A (en) * | 2021-12-15 | 2022-03-01 | 平安科技(深圳)有限公司 | Risk detection method and device for financing service, computer equipment and storage medium |
CN117892735A (en) * | 2024-03-14 | 2024-04-16 | 中电科大数据研究院有限公司 | Deep learning-based natural language processing method and system |
-
2007
- 2007-12-13 CN CNA2007101957208A patent/CN101178705A/en active Pending
Cited By (76)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901599B (en) * | 2009-05-19 | 2013-08-28 | 塔塔咨询服务有限公司 | System and method for rapid prototyping of existing speech recognition solutions in different languages |
CN101901599A (en) * | 2009-05-19 | 2010-12-01 | 塔塔咨询服务有限公司 | The system and method for the quick original shapeization of the existing voice identifying schemes of different language |
US9767710B2 (en) | 2009-12-16 | 2017-09-19 | Postech Academy-Industry Foundation | Apparatus and system for speech intent recognition |
CN102667889A (en) * | 2009-12-16 | 2012-09-12 | 浦项工科大学校产学协力团 | Apparatus and method for foreign language study |
CN102667889B (en) * | 2009-12-16 | 2014-10-22 | 浦项工科大学校产学协力团 | Apparatus and method for foreign language study |
CN101894548B (en) * | 2010-06-23 | 2012-07-04 | 清华大学 | Modeling method and modeling device for language identification |
CN101894548A (en) * | 2010-06-23 | 2010-11-24 | 清华大学 | Modeling method and modeling device for language identification |
CN102641589A (en) * | 2011-02-21 | 2012-08-22 | 科乐美数码娱乐株式会社 | Game system and control method thereof |
CN102641589B (en) * | 2011-02-21 | 2014-09-17 | 科乐美数码娱乐株式会社 | Game system and control method thereof |
CN102184167A (en) * | 2011-05-25 | 2011-09-14 | 安徽科大讯飞信息科技股份有限公司 | Method and device for processing text data |
CN102184167B (en) * | 2011-05-25 | 2013-01-02 | 安徽科大讯飞信息科技股份有限公司 | Method and device for processing text data |
CN102609410B (en) * | 2012-04-12 | 2014-12-17 | 传神联合(北京)信息技术有限公司 | Authority file auxiliary writing system and authority file generating method |
CN102609410A (en) * | 2012-04-12 | 2012-07-25 | 传神联合(北京)信息技术有限公司 | Authority file auxiliary writing system and authority file generating method |
CN102831207B (en) * | 2012-08-06 | 2015-01-28 | 小米科技有限责任公司 | Computer terminal and information interaction method |
CN102831207A (en) * | 2012-08-06 | 2012-12-19 | 北京小米科技有限责任公司 | Computer terminal and information interaction method |
CN102868695B (en) * | 2012-09-18 | 2015-06-17 | 天格科技(杭州)有限公司 | Conversation tree-based intelligent online customer service method and system |
CN102868695A (en) * | 2012-09-18 | 2013-01-09 | 天格科技(杭州)有限公司 | Conversation tree-based intelligent online customer service method and system |
CN102929859A (en) * | 2012-09-27 | 2013-02-13 | 东莞宇龙通信科技有限公司 | Reading assistive method and device |
CN102929859B (en) * | 2012-09-27 | 2015-07-08 | 东莞宇龙通信科技有限公司 | Reading assistive method and device |
CN103577198A (en) * | 2013-11-22 | 2014-02-12 | 中国联合网络通信集团有限公司 | User-oriented Internet of Things service platform and remote control method |
CN104765729B (en) * | 2014-01-02 | 2018-08-31 | 中国人民大学 | A kind of cross-platform microblogging community account matching process |
CN104765729A (en) * | 2014-01-02 | 2015-07-08 | 中国人民大学 | Cross-platform micro-blogging community account matching method |
CN106575504A (en) * | 2014-04-17 | 2017-04-19 | 软银机器人欧洲公司 | Executing software applications on a robot |
CN103955449A (en) * | 2014-04-21 | 2014-07-30 | 安一恒通(北京)科技有限公司 | Target sample positioning method and device |
CN105096942A (en) * | 2014-05-21 | 2015-11-25 | 清华大学 | Semantic analysis method and semantic analysis device |
CN106663426A (en) * | 2014-07-03 | 2017-05-10 | 微软技术许可有限责任公司 | Generating computer responses to social conversational inputs |
CN104240700A (en) * | 2014-08-26 | 2014-12-24 | 智歌科技(北京)有限公司 | Global voice interaction method and system for vehicle-mounted terminal device |
CN105469801B (en) * | 2014-09-11 | 2019-07-12 | 阿里巴巴集团控股有限公司 | A kind of method and device thereof for repairing input voice |
CN105469801A (en) * | 2014-09-11 | 2016-04-06 | 阿里巴巴集团控股有限公司 | Input speech restoring method and device |
CN105491090B (en) * | 2014-09-17 | 2019-01-01 | 阿里巴巴集团控股有限公司 | network data processing method and device |
CN105491090A (en) * | 2014-09-17 | 2016-04-13 | 阿里巴巴集团控股有限公司 | Network data processing method and device |
CN107003999B (en) * | 2014-10-15 | 2020-08-21 | 声钰科技 | System and method for subsequent response to a user's prior natural language input |
CN107003999A (en) * | 2014-10-15 | 2017-08-01 | 声钰科技 | To the system and method for the subsequent response of the first natural language input of user |
CN104462758A (en) * | 2014-11-03 | 2015-03-25 | 百度在线网络技术(北京)有限公司 | Method and device for generating label sequence of observation character strings |
CN104462758B (en) * | 2014-11-03 | 2017-05-24 | 百度在线网络技术(北京)有限公司 | Method and device for generating label sequence of observation character strings |
US10909969B2 (en) | 2015-01-03 | 2021-02-02 | Microsoft Technology Licensing, Llc | Generation of language understanding systems and methods |
CN104598609A (en) * | 2015-01-29 | 2015-05-06 | 百度在线网络技术(北京)有限公司 | Concept processing method and device for vertical field |
CN104598609B (en) * | 2015-01-29 | 2017-12-08 | 百度在线网络技术(北京)有限公司 | A kind of concept treating method and apparatus for vertical field |
CN104679472A (en) * | 2015-02-13 | 2015-06-03 | 百度在线网络技术(北京)有限公司 | Man-machine voice interactive method and device |
CN104808497A (en) * | 2015-02-15 | 2015-07-29 | 联想(北京)有限公司 | Information processing method and first electronic device |
CN106407196A (en) * | 2015-07-29 | 2017-02-15 | 成都诺铱科技有限公司 | Semantic analysis intelligent instruction robot applied to logistics management software |
CN105260178A (en) * | 2015-09-21 | 2016-01-20 | 上海智臻智能网络科技股份有限公司 | Intelligent cloud service application development method and system |
CN105590626A (en) * | 2015-12-29 | 2016-05-18 | 百度在线网络技术(北京)有限公司 | Continuous speech man-machine interaction method and system |
CN105590626B (en) * | 2015-12-29 | 2020-03-03 | 百度在线网络技术(北京)有限公司 | Continuous voice man-machine interaction method and system |
CN108369806A (en) * | 2016-01-22 | 2018-08-03 | 微软技术许可有限责任公司 | Configurable all-purpose language understands model |
CN108369806B (en) * | 2016-01-22 | 2022-07-22 | 微软技术许可有限责任公司 | Configurable generic language understanding model |
CN105912701A (en) * | 2016-04-26 | 2016-08-31 | 南京玛锶腾智能科技有限公司 | File processing method for intelligent robots |
CN106057205B (en) * | 2016-05-06 | 2020-01-14 | 北京云迹科技有限公司 | Automatic voice interaction method for intelligent robot |
CN106057205A (en) * | 2016-05-06 | 2016-10-26 | 北京云迹科技有限公司 | Intelligent robot automatic voice interaction method |
CN107526514B (en) * | 2016-06-21 | 2021-01-26 | 阿里巴巴集团控股有限公司 | Digital information input processing method and device |
CN107526514A (en) * | 2016-06-21 | 2017-12-29 | 阿里巴巴集团控股有限公司 | digital information input processing method and device |
CN108764480B (en) * | 2016-08-23 | 2020-07-07 | 上海智臻智能网络科技股份有限公司 | Information processing system |
CN108764480A (en) * | 2016-08-23 | 2018-11-06 | 上海智臻智能网络科技股份有限公司 | A kind of system of information processing |
CN108241646A (en) * | 2016-12-23 | 2018-07-03 | 阿里巴巴集团控股有限公司 | A kind of searching and matching method and device recommend method and apparatus |
CN108399919A (en) * | 2017-02-06 | 2018-08-14 | 中兴通讯股份有限公司 | A kind of method for recognizing semantics and device |
CN110494841B (en) * | 2017-05-03 | 2021-06-04 | 谷歌有限责任公司 | Contextual language translation |
CN110494841A (en) * | 2017-05-03 | 2019-11-22 | 谷歌有限责任公司 | Context language translation |
CN111539219B (en) * | 2017-05-19 | 2024-04-26 | 吴晨曦 | Method, equipment and system for disambiguation of natural language content titles |
CN111539219A (en) * | 2017-05-19 | 2020-08-14 | 北京蓦然认知科技有限公司 | Method, equipment and system for disambiguating natural language content title |
CN107329986A (en) * | 2017-06-01 | 2017-11-07 | 竹间智能科技(上海)有限公司 | The interactive method and device recognized based on language performance |
CN107195303A (en) * | 2017-06-16 | 2017-09-22 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN107195303B (en) * | 2017-06-16 | 2021-08-20 | 云知声智能科技股份有限公司 | Voice processing method and device |
CN108573046B (en) * | 2018-04-18 | 2021-06-29 | 什伯(上海)智能技术有限公司 | User instruction processing method and device based on AI system |
CN108573046A (en) * | 2018-04-18 | 2018-09-25 | 什伯(上海)智能技术有限公司 | A kind of user instruction treatment method and device based on AI systems |
CN109388695A (en) * | 2018-09-27 | 2019-02-26 | 深圳前海微众银行股份有限公司 | User's intension recognizing method, equipment and computer readable storage medium |
CN109815322A (en) * | 2018-12-27 | 2019-05-28 | 东软集团股份有限公司 | Method, apparatus, storage medium and the electronic equipment of response |
CN111695131A (en) * | 2020-06-23 | 2020-09-22 | 上海用正医药科技有限公司 | Document management method and system for clinical trial |
CN111695131B (en) * | 2020-06-23 | 2021-04-02 | 上海用正医药科技有限公司 | Document management method and system for clinical trial |
CN111916161B (en) * | 2020-06-23 | 2021-04-16 | 上海用正医药科技有限公司 | Method and device for collecting and converting multiple data sources in clinical test process |
CN111916161A (en) * | 2020-06-23 | 2020-11-10 | 上海用正医药科技有限公司 | Method and device for collecting and converting multiple data sources in clinical test process |
CN113204943A (en) * | 2021-05-05 | 2021-08-03 | 杭州新范式生物医药科技有限公司 | Method for structured representation of semantic meaning and method for recognizing a semantic meaning sequence as a semantic meaning |
CN113204943B (en) * | 2021-05-05 | 2024-07-05 | 杭州新范式生物医药科技有限公司 | Structured representation of a semantic meaning and method for identifying a semantic meaning sequence as a semantic meaning |
CN113420570A (en) * | 2021-07-01 | 2021-09-21 | 沈阳创思佳业科技有限公司 | Method, system and device for improving translation accuracy |
CN113420570B (en) * | 2021-07-01 | 2024-04-30 | 沈阳创思佳业科技有限公司 | Method, system and device for improving translation accuracy |
CN114119213A (en) * | 2021-12-15 | 2022-03-01 | 平安科技(深圳)有限公司 | Risk detection method and device for financing service, computer equipment and storage medium |
CN117892735A (en) * | 2024-03-14 | 2024-04-16 | 中电科大数据研究院有限公司 | Deep learning-based natural language processing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101178705A (en) | Free-running speech comprehend method and man-machine interactive intelligent system | |
Rebele et al. | YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames | |
US10496749B2 (en) | Unified semantics-focused language processing and zero base knowledge building system | |
Van Aggelen et al. | The debates of the European Parliament as linked open data | |
CN100458795C (en) | Intelligent word input method and input method system and updating method thereof | |
US8214366B2 (en) | Systems and methods for generating a language database that can be used for natural language communication with a computer | |
CN109493265A (en) | A kind of Policy Interpretation method and Policy Interpretation system based on deep learning | |
US20120036130A1 (en) | Systems, methods, software and interfaces for entity extraction and resolution and tagging | |
CN106776797A (en) | A kind of knowledge Q-A system and its method of work based on ontology inference | |
CA3060498C (en) | Method and system for integrating web-based systems with local document processing applications | |
CN116244344B (en) | Retrieval method and device based on user requirements and electronic equipment | |
CN104391969B (en) | Determine the method and device of user's query statement syntactic structure | |
CN110929007A (en) | Electric power marketing knowledge system platform and application method | |
Karim et al. | A step towards information extraction: Named entity recognition in Bangla using deep learning | |
CN116108175A (en) | Language conversion method and system based on semantic analysis and data construction | |
Becker et al. | COCO-EX: A tool for linking concepts from texts to ConceptNet | |
CN104778232A (en) | Searching result optimizing method and device based on long query | |
Kettler et al. | A template-based markup tool for semantic web content | |
CN116304347A (en) | Git command recommendation method based on crowd-sourced knowledge | |
CN111753540B (en) | Method and system for collecting text data to perform Natural Language Processing (NLP) | |
CN114064036A (en) | Method and device for associating software function module with responsible person based on knowledge graph | |
Shakhovska et al. | The method of automatic summarization from different sources | |
Li et al. | Semantics-Enhanced Online Intellectual Capital Mining Service for Enterprise Customer Centers | |
JP2001325284A (en) | Method and device for extracting information from table structure area and recording medium stored with information extracting program | |
Truskinger et al. | Reconciling folksonomic tagging with taxa for bioacoustic annotations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20080514 |