CN106156110B - Text semantic understanding method and system - Google Patents

Text semantic understanding method and system Download PDF

Info

Publication number
CN106156110B
CN106156110B CN201510159102.2A CN201510159102A CN106156110B CN 106156110 B CN106156110 B CN 106156110B CN 201510159102 A CN201510159102 A CN 201510159102A CN 106156110 B CN106156110 B CN 106156110B
Authority
CN
China
Prior art keywords
network
sub
text
word string
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510159102.2A
Other languages
Chinese (zh)
Other versions
CN106156110A (en
Inventor
吴维昊
杨溥
潘青华
王影
胡国平
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201510159102.2A priority Critical patent/CN106156110B/en
Publication of CN106156110A publication Critical patent/CN106156110A/en
Application granted granted Critical
Publication of CN106156110B publication Critical patent/CN106156110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The invention discloses a kind of text semantic understanding method and systems, this method comprises: building is based on major network-subnet mode digraph grammer networks in advance, the digraph grammer networks include a master network and one or more sub-networks, and every section of path of the digraph grammer networks corresponds to a text character or a subnet identifier;Obtain text to be resolved;The text is decoded based on the digraph grammer networks, obtains decoding paths;The correlation for obtaining the decoding paths is semantic as semantic understanding result.The present invention can be effectively reduced the complexity of digraph grammer networks, improve decoding efficiency, reduce memory consumption.

Description

Text semantic understanding method and system
Technical field
The present invention relates to natural language processing technique fields, and in particular to a kind of text semantic understanding method and system.
Background technique
It is always related fields research as the natural language understanding technology in one of direction most important in artificial intelligence field The hot spot of personnel's research.Especially in recent years, with the rapid development of development of Mobile Internet technology, the level of informatization is increasingly improved, Information on network also exponentially increases severely therewith, and the mankind enter big data era.People more thirst for machine capable of being allowed to understand Natural language, to efficiently analyze and obtain valuable information from the data of magnanimity.
Traditional semantic understanding system mainly utilizes grammar definition to go out several sentence inputting set, when the text of input Among these set, then success is understood.The semantic demand excavated profound for text in recent years, researcher propose The scheme that text semantic based on grammar rule understands.The application sentence syntax in the program under specific application environment clearly each first Rule, to describe the input of the natural language syntax under each concrete application;Then the progress of this grammar rule is efficiently compiled To the intelligible digraph grammer networks of computer;Finally the natural language input received and digraph grammer networks are carried out Matching parsing extracts related semanteme according to Optimum Matching path, realizes and understand the Deep Semantics of the sentence phrase of input.
However, needing to be defined as on thousand using traditional semantic understanding system based on grammar rule for mass data Ten thousand kinds of syntax, the digraph grammer networks structure constructed according to grammar rule are quite huge, complicated.In addition, in legacy system The decoding of digraph grammer networks is the process of a breadth search, thus carries out matching parsing in user version and grammer networks When, it is computationally intensive, time-consuming more, cause the efficiency of entire semantic understanding to be greatly reduced, and hardware resource consumption in its decoding process Greatly.
Summary of the invention
The embodiment of the present invention provides a kind of text semantic understanding method and system, low to solve prior art decoding efficiency, The big problem of hardware resource consumption in decoding process.
For this purpose, the embodiment of the present invention provides the following technical solutions:
A kind of text semantic understanding method, comprising:
Based on major network-subnet mode digraph grammer networks, the digraph grammer networks include one for building in advance Master network and one or more sub-networks, the corresponding text character or one in every section of path of the digraph grammer networks A subnet identifier;
Obtain text to be resolved;
The text is decoded based on the digraph grammer networks, obtains decoding paths;
The correlation for obtaining the decoding paths is semantic as semantic understanding result.
Preferably, described construct based on major network-subnet mode digraph grammer networks includes:
Sentence grammar rule is established according to the syntactic property that natural language under each application inputs;
Determine master network and the corresponding text type of sub-network;
According to master network and the corresponding text type of sub-network, generation belt is compiled to the sentence grammar rule The major network digraph grammer networks and subnet digraph grammer networks of network identifier.
Preferably, described to be decoded based on the digraph grammer networks to the text, obtaining decoding paths includes:
To text to be resolved, word string matching is carried out from the first node of master network;
If there is subnet identifier in the coupling path of master network, master network match information is recorded, and calls institute It states the corresponding sub-network of subnet identifier and carries out word string matching, obtain and record sub-network match information;
After the completion of text to be resolved all matches, according to obtained master network match information and sub-network match information, Obtain decoding paths.
Preferably, described that the text is decoded based on the digraph grammer networks, it obtains decoding paths and also wraps It includes:
When calling the corresponding sub-network of the subnet identifier to carry out word string matching, judge the sub-network whether be It calls for the first time;
If it is, carrying out word string matching using the sub-network, and the sub-network match information of acquisition is saved in son In net matching result manager;
Otherwise, history match result is obtained from the subnet match management device as sub-network match information.
Preferably, the sub-network match information includes: sub-network coupling path, sub-network search sign, has matched word The number of words of string;The master network match information includes: master network coupling path, the subnet identifier of the sub-network of calling, Match the number of words of word string;
It is described to judge whether the sub-network is to call to include: for the first time
If the sub-network search sign expression is not searched for, it is determined that the sub-network is to call for the first time;
If the sub-network search sign expression has been searched for, and the master network match information and sub-network matching letter The number of words for having matched word string in breath is identical, it is determined that the sub-network is called for the first time to be non-.
Preferably, described to include: using sub-network progress word string matching
When carrying out word string matching using the sub-network, word string matching, the fault tolerant mechanism packet are carried out using fault tolerant mechanism Include one or more of word string matching way: from jump, even jump, wrongly written character are fault-tolerant.
Preferably, the sub-network has one or more layers.
A kind of text semantic understanding system, comprising:
Network struction module, for being constructed in advance based on major network-subnet mode digraph grammer networks, the digraph Grammer networks include a master network and one or more sub-networks, and every section of path of the digraph grammer networks is corresponding One text character or a subnet identifier;
Receiving module, for obtaining text to be resolved;
Decoder module obtains decoding paths for being decoded based on the digraph grammer networks to the text;
As a result module is obtained, the correlation for obtaining the decoding paths is semantic as semantic understanding result.
Preferably, the network struction module includes:
Rule setting unit, the syntactic property for being inputted according to natural language under each application establish sentence grammar rule;
Text division unit, for determining master network and the corresponding text type of sub-network;
Compilation unit, for according to master network and the corresponding text type of sub-network, to the sentence grammar rule into Row compiling generates the major network digraph grammer networks with subnet identifier and subnet digraph grammer networks.
Preferably, the decoder module includes:
Matching unit, for carrying out word string matching from the first node of master network to text to be resolved;And in master network When occurring subnet identifier in coupling path, master network match information is recorded, and call the subnet identifier corresponding Sub-network carries out word string matching, obtains and records sub-network match information;
Decoding paths acquiring unit is used for after the completion of the matching unit all matches text to be resolved, according to institute The master network match information and sub-network match information that matching unit obtains are stated, decoding paths are obtained.
Preferably, the decoder module further include:
Judging unit, for calling the corresponding sub-network of the subnet identifier to carry out word string in the matching unit Timing judges whether the sub-network is to call for the first time, and judging result is fed back to the matching unit;
The matching unit when the judging unit judges that the sub-network is to call for the first time, using the sub-network into The matching of row word string, and the sub-network match information of acquisition is saved in subnet matching result manager, in the judging unit When judging that the sub-network right and wrong are called for the first time, history match result is obtained from the subnet match management device as sub-network Match information.
Preferably, the sub-network match information includes: sub-network coupling path, sub-network search sign, has matched word The number of words of string;The master network match information includes: master network coupling path, the subnet identifier of the sub-network of calling, Match the number of words of word string;
The judging unit, specifically for determining the sub-network when sub-network search sign expression is not searched for It to call for the first time, has been searched in sub-network search sign expression, and the master network match information and sub-network matching When the number of words for having matched word string in information is identical, determine that the sub-network is called for the first time to be non-.
Preferably, when the matching unit carries out word string matching using the sub-network, word string is carried out using fault tolerant mechanism Matching, the fault tolerant mechanism includes one or more of word string matching way: from jump, even jump, wrongly written character are fault-tolerant.
The sub-network has one or more layers.
Different from the digraph grammer networks of a traditional bulky complex based on grammar rule building, the present invention is implemented Digraph grammer networks are divided into master network and sub-network by this semantic understanding of example text method, significantly reduce oriented graph grammar net The complexity of network, improves decoding efficiency.Moreover, when being decoded to the text to be resolved that user inputs, it is excellent using depth First searching method carries out grammer networks matching parsing to text to be resolved, reduces memory consumption.
Further, to sub- network settings preservation mechanism, the decoding preservation for inputting text with a user is adjusted for the first time With the match information of sub-network, when subsequent decoding repeats to call the sub-network, directly using being saved in preservation management mechanism Matching result reduces the matching times of sub-network, further improves decoding efficiency.
Further, by fault tolerant mechanism, system survivability is improved.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only one recorded in the present invention A little embodiments are also possible to obtain other drawings based on these drawings for those of ordinary skill in the art.
Fig. 1 is the flow chart of text semantic understanding method of the embodiment of the present invention;
Fig. 2 is in the embodiment of the present invention based on major network-subnet mode digraph grammer networks example one;
Fig. 3 is decoded based on major network-subnet mode digraph grammer networks to text in the embodiment of the present invention Flow chart;
Fig. 4 is in the embodiment of the present invention based on major network-subnet mode digraph grammer networks example two;
Fig. 5 is the structural schematic diagram that text semantic of the embodiment of the present invention understands system.
Specific embodiment
The scheme of embodiment in order to enable those skilled in the art to better understand the present invention with reference to the accompanying drawing and is implemented Mode is described in further detail the embodiment of the present invention.
As shown in Figure 1, being the flow chart of text semantic understanding method of the embodiment of the present invention, comprising the following steps:
Step 101, building in advance is based on major network-subnet mode digraph grammer networks.
Different from the digraph grammer networks of a traditional bulky complex based on grammar rule building, the present invention is implemented Example in, digraph grammer networks are divided into master network and sub-network, i.e., the described digraph grammer networks include a master network, with And one or more sub-networks, the corresponding text character in every section of path of the master network or a subnet identifier.And And according to the actual application, sub-network can be with nesting setting, it can one or more layers is arranged.If an only straton Network, then every section of path of the sub-network corresponds to a text character;If there is multilayer sub-network, then the sub-network of the bottom The corresponding text character in every section of path, and the corresponding text in every section of path of other each straton networks in addition to the bottom Character or a subnet identifier.
The process based on major network-subnet mode digraph grammer networks of building is as follows:
Firstly, establishing sentence grammar rule according to the syntactic property that natural language under each application inputs.The sentence grammar rule It can also be preset by system previously according to common application demand according to system by user according to practical application request Grammar rule determine that natural language input syntax may so as to describing under each application.
Then, it is determined that master network and the corresponding text type of sub-network, draw master network and sub-network with realizing Point.Specifically, distich grammar rule first is analyzed, then determines building master network text class corresponding with sub-network Type, and then realize the division of master network and sub-network.The corresponding text type of the sub-network, mainly user's input are easy Wrong or confusing text word string is defined generally to the noun of the clear context of comparison, such as singer's name, song title, TV Acute name etc..The corresponding text type of the master network, generally model comparision are fixed, and error-prone text is not allowed in user's input Word string.
After determining major network and the corresponding text of subnet, the major network of belt network identifier is produced by compiling Digraph grammer networks and subnet digraph grammer networks.
Such as following sentence grammar rule is compiled, obtained digraph grammer networks are as shown in Figure 2:
$ sub=Wang Fei;
$ main=I want to listen the song of $ sub;
Wherein, the corresponding text of master network is " I wants to listen the song of xxx ", and the mode of text type is relatively fixed, sub-network pair The text answered is " Wang Fei ", and text type is the noun for having the clear context of comparison, and sub is subnet identifier, described oriented The corresponding text character of each path of graph grammar network or a subnet identifier.
Step 102, text to be resolved is obtained.
Step 103, the text is decoded based on the digraph grammer networks, obtains decoding paths.
Firstly, carrying out word string matching from the first node of master network to text to be resolved;If in the coupling path of master network There is subnet identifier, then record master network match information, and the corresponding sub-network of the subnet identifier is called to carry out Word string matching, obtains and records sub-network match information;After the completion of text to be resolved all matches, according to obtained master network Match information and sub-network match information, obtain decoding paths.
Specific decoding process will be described in detail later.
Step 104, the correlation for obtaining the decoding paths is semantic as semantic understanding result.
As shown in figure 3, being carried out based on major network-subnet mode digraph grammer networks to text in the embodiment of the present invention Decoded flow chart, comprising the following steps:
Step 301, master network word string matches.
For the text to be resolved of user's input, word string matching is carried out from the first node of master network.
Step 302, judge whether to call sub-network, if so, 303 are thened follow the steps, it is no to then follow the steps 304.
There is subnet identifier in path in the matching of master network word string, then determines that sub-network need to be called, otherwise determine Without calling sub-network.
Step 303, sub-network is called to carry out word string matching.
It is previously noted that needing to record master network match information when calling sub-network, being matched according to sub-network When, sub-network match information is obtained and records, and then after the completion of text to be resolved all matches, it can be according to obtained major network Network match information and sub-network match information obtain decoding paths.
In practical applications, for the ease of processing, " calling state supervisor " and " subnet matching result can be respectively set Manager " stores above-mentioned master network match information and sub-network match information respectively.It should be noted that each sub-network is each From corresponding one " subnet matching result manager ".Described " calling state supervisor " can create when grammer networks construct, " subnet matching result manager " can corresponding sub-network construct when create, can also in decoding process call sub-network when Building, without limitation to this embodiment of the present invention.In addition, it is necessary to explanation, described " calling state supervisor " and " subnet With results management device " information that is stored in the matching process, after the completion of the text to be resolved decoding once inputted to user, Whole clearings are needed, to avoid on decoded influence next time;Or before decoding starts next time, by initializing to it Whole clearings are carried out, without limitation to this embodiment of the present invention.
Master network match information includes: the subnet identifier of master network coupling path, the sub-network of calling.Sub-network It include: sub-network coupling path with information.
In order to further increase decoding efficiency, above-mentioned master network match information may also include that the number of words for having matched word string, son The number of words that net mate information may also include that sub-network search sign, match word string.In this way, repeating to call in subsequent decoding When the sub-network, the matching result that can directly have been saved using sub-network.It should be noted that the sub-network search sign It can be created when sub-network constructs, and can be with separate storage, it can also be in " the subnet matching knot for creating the corresponding sub-network After fruit manager ", is moved and be stored in " the subnet matching result manager ".
The process for calling sub-network is described in detail below.
When calling sub-network, master network coupling path, the sub-network of calling are stored in " calling state supervisor " first Subnet identifier, currently matched the number of words of word string.Secondly, judge the sub-network whether be call for the first time, if then into The matching of row sub-network word string, and matching result is saved, otherwise using the history match result saved.
Whether the subnet is eventually the judgement called for the first time, by above-mentioned sub-network search sign and can match word string Number of words determine.For example, sub-network search sign value if it is 0, is judged as and calls for the first time;If sub-network search sign Value is 1, then the number of words and " son of word string have been matched before the calling current sub network further stored in judgement " calling state supervisor " Whether the number of words for having matched word string in net matching result manager " before the calling of the storage subnet is identical, is then judged as if they are the same It is non-to call for the first time.
When sub-network is called for the first time, after completing word string matching, sub-network is saved in " subnet matching result manager " Coupling path, calls the number of words that word string has been matched before the subnet at sub-network search sign.The sub-network search sign is used to mark Know whether the sub-network had searched for, value can indicate not search for for 0 or 1,0, and 1 indicates to have searched for, or vice versa.
Subnet is non-when calling for the first time, directly uses the subnet word string coupling path stored in " subnet matching result manager " Information.
Step 304, word string matching is until end, obtains coupling path.
It can be seen that it is different from the digraph grammer networks of a traditional bulky complex based on grammar rule building, Digraph grammer networks are divided into master network and sub-network by text semantic of embodiment of the present invention understanding method, have been significantly reduced To the complexity of graph grammar network, decoding efficiency is improved.Moreover, when being decoded to the text to be resolved that user inputs, Grammer networks matching parsing is carried out to text to be resolved using Depth Priority Searching, reduces memory consumption.
Further, to sub- network settings preservation mechanism, the decoding preservation for inputting text with a user is adjusted for the first time With the match information of sub-network, when subsequent decoding repeats to call the sub-network, directly using being saved in preservation management mechanism Matching result further improves decoding efficiency.
Further below citing the present invention will be described in detail embodiment based on major network-subnet mode digraph grammer networks into It composes a piece of writing this decoded process.
As shown in figure 4, illustrating based on major network-subnet mode digraph grammer networks.
Mainly application is that film is searched for the digraph grammer networks, and wherein the main structure of major network network main1 is that " I thinks See the xxx " of xxx.The digraph grammer networks totally three subnet nets, respectively sub1, sub2, sub3, wherein sub1 is electricity Shadow performer's name subnet net, sub2 are TV play performer name subnet net, and sub3 is movie name subnet net, the eps in network Indicate empty arc, added automatically in compilation process, the sky arc be intended merely to by each logical gate in the sentence syntax from It distinguishes in form, when parsing using network to natural sentence, sky arc can be ignored, two nodes that empty arc is connected are considered as The same node.
Such as user's input " I wants to see the Infernal Affairs of Liu Dehua ", two word strings matching roads are shared in the grammer networks Diameter, path A and path B, specific matching process are as described below:
1. coupling path A (calls sub-network sub3) for the first time:
A) the accurate matching of " I thinks " word string is carried out since major network network, subnet identifier occurs in path sub1;
B) subnet net sub1 is called, it is 3 that active user, which inputs word string matching number of words, and the sub-network of the corresponding sub-network is searched Rope mark is not search for, creates the matching result manager of the corresponding sub-network, and the sub-network search sign is saved in In the matching result manager of the corresponding sub-network, then start to match word string " Liu Dehua ", in the matching of the corresponding sub-network Coupling path is saved in results management device, the sub-network search sign of the corresponding sub-network is set as having searched for;Return to master network;
C) carry out word string " " matching, there is subnet identifier sub3 in path;
D) subnet net sub3 is called, it is 7 that active user, which inputs word string matching number of words, and the sub-network of the corresponding sub-network is searched Rope mark is not search for, creates the matching result manager of the corresponding sub-network, and the sub-network search sign is saved in In the matching result manager of the corresponding sub-network, then start the matching for carrying out word string " Infernal Affairs ", in the corresponding sub-network Matching result manager in save coupling path, the sub-network search sign of the corresponding sub-network is set as having searched for, and returns to master Network returns to semantic understanding result.
2. coupling path B (non-to call sub-network sub3 for the first time)
A) the accurate matching of " I thinks " word string is carried out since master network, subnet identifier sub2 occurs in path;
B) sub-network sub2 is called, it is 3 that active user, which inputs word string matching number of words, the sub-network search of the corresponding sub-network Mark creates the matching result manager of the corresponding sub-network, starts to match word string " Liu Dehua ", in the corresponding son not search for Coupling path is saved in the matching result manager of network, the sub-network search sign of the corresponding sub-network is set as having searched for;It returns Return master network;
C) carry out word string " " matching, there is subnet identifier sub3 in path;
D) sub-network sub3 is called, the sub-network search sign of corresponding sub-network sub3 is has searched for, and active user Inputting word string matching number of words is 7, and calls the word string that matched stored in matching result manager when sub-network sub3 for the first time Number of words is identical, therefore this is called without carrying out word string matching, directly using the matching result manager of corresponding sub-network sub3 The coupling path of middle preservation returns to semantic understanding result.
In addition, it should be noted that, in practical applications, the sub-network can also have fault-tolerant when being matched Mechanism carries out net mate decoding using breadth first search method.User can determine whether to open according to actual needs to be held Wrong mechanism.
Fault tolerant mechanism mainly includes one or more of word string matching way: from jump, even jump, wrongly written character are fault-tolerant.Below after Continue referring to grammer networks shown in Fig. 4, illustrates application fault tolerance mechanism and carry out the matched process of sub-network.
When text to be resolved is " I wants to see the Infernal Affairs of Liu Liude China " or " I wants to see the Infernal Affairs of Liu Zhang Dehua ", institute It, can be from by way of jumping, by " Liu " or " opening " of multi input when stating in subnet and do not have " Liu Liude China " or " Liu Zhang Dehua " Word string sponges, both word string coupling paths are when calling sub-network sub3, it is only necessary to a substring matching is carried out, it is another Word string coupling path directly uses first fit result.
When text to be resolved is " I wants to see the Infernal Affairs of Liu Hua ", there is no " Liu Hua " in subnet sub1 or sub2, and has It, can be fault-tolerant at " Liu Dehua ", " Liu Qing by " Liu Hua " by way of even jumping when " Liu Dehua ", " Liu Qinghua ", " Liu Yuhua " China ", " Liu Yuhua " three kinds of word string coupling paths, when these three coupling paths are when calling sub-network sub3, it is only necessary to carry out one Secondary word String matching, other two word string coupling path directly use first fit result.
When text to be resolved is " I wants to see the magnificent Infernal Affairs of Liu ", there is no " Liu in sub-network sub1 or sub-network sub2 China ", and when having " Liu Dehua ", " Liu get Hua ", " Liu Haihua ", different wrongly written characters matchings can be calculated by wrongly written character fault tolerant mechanism The penalty value progress wrongly written character in path is fault-tolerant, such as that " China of Liu " is fault-tolerant at " Liu Dehua ", " Liu get Hua " two kinds of word string coupling paths. Due to " sea " word string with " " word string is close in pronunciation and font, so will not be fault-tolerant at " Liu Haihua ", when described two Kind coupling path is when calling sub-network sub3, it is only necessary to carry out a substring matching, another word string coupling path directly makes With first fit result.
As it can be seen that the text semantic understanding method of the embodiment of the present invention improves the fault-tolerant energy of system by fault tolerant mechanism Power.
Correspondingly, the embodiment of the present invention also provides a kind of text semantic understanding system, as shown in figure 5, being the one of the system Kind structural schematic diagram.
In this embodiment, the system comprises:
Network struction module 501, it is described for being constructed in advance based on major network-subnet mode digraph grammer networks 500 Digraph grammer networks 500 include a master network and one or more sub-network, and every section of the digraph grammer networks The corresponding text character in path or a subnet identifier;
Receiving module 502, for obtaining text to be resolved;
Decoder module 503 obtains decoding road for being decoded based on the digraph grammer networks to the text Diameter;
As a result module 504 is obtained, the correlation for obtaining the decoding paths is semantic as semantic understanding result.
Above-mentioned network struction module 501 specifically can construct the oriented graph grammar net according to the sentence grammar rule of setting Network.A kind of specific structure of the module includes following each unit:
Rule setting unit, the syntactic property for being inputted according to natural language under each application establish sentence grammar rule;
Text division unit, for determining master network and the corresponding text type of sub-network;
Compilation unit, for according to master network and the corresponding text type of sub-network, to the sentence grammar rule into Row compiling generates the major network digraph grammer networks with subnet identifier and subnet digraph grammer networks.
Above-mentioned decoder module 503 carries out word string matching from the first node of master network specifically to text to be resolved;If major network Occur subnet identifier in the coupling path of network, then record master network match information, and calls the subnet identifier pair The sub-network answered carries out word string matching, obtains and records sub-network match information;After the completion of text to be resolved all matches, root According to obtained master network match information and sub-network match information, decoding paths are obtained.A kind of specific structure of the module includes Matching unit and decoding paths acquiring unit, in which:
The matching unit, for carrying out word string matching from the first node of master network to text to be resolved;And in major network When there is subnet identifier in the coupling path of network, master network match information is recorded, and call the subnet identifier pair The sub-network answered carries out word string matching, obtains and records sub-network match information;
The decoding paths acquiring unit is used for after the completion of the matching unit all matches text to be resolved, root The master network match information and sub-network match information obtained according to the matching unit, obtains decoding paths.
Text semantic of the embodiment of the present invention understands that digraph grammer networks are divided into master network and sub-network by system, effectively The complexity for reducing digraph grammer networks, improves decoding efficiency.Moreover, being carried out in the text to be resolved inputted to user When decoding, grammer networks matching parsing is carried out to text to be resolved using Depth Priority Searching, reduces memory consumption.
Further, above-mentioned decoder module 503 may also include that judging unit, described in calling in the matching unit When the corresponding sub-network of subnet identifier carries out word string matching, judge whether the sub-network is to call for the first time, and will judge As a result the matching unit is fed back to.For example, the sub-network match information includes: sub-network coupling path, sub-network search Mark, the number of words for having matched word string;The master network match information includes: the son of master network coupling path, the sub-network of calling Network identifier, the number of words for having matched word string.In this way, the judging unit can judge sub-network by above-mentioned some information To call also right and wrong to call for the first time for the first time, specifically, when sub-network search sign expression is not searched for, the subnet is determined Network is to call for the first time, has been searched in sub-network search sign expression, and the master network match information and sub-network When identical with the number of words for having matched word string in information, determine that the sub-network is called for the first time to be non-.
Correspondingly, the matching unit is when the judging unit judges that the sub-network is to call for the first time, using described Sub-network carries out word string matching, and the sub-network match information of acquisition is saved in subnet matching result manager, described When judging unit judges that the sub-network right and wrong are called for the first time, history match result is obtained from the subnet match management device and is made For sub-network match information.
As it can be seen that the text semantic of the embodiment of the present invention understands system, by sub- network settings preservation mechanism, for same The decoding that secondary user inputs text saves the match information for calling sub-network for the first time, repeats to call the sub-network in subsequent decoding When, directly using the matching result saved in preservation management mechanism, further improve decoding efficiency.
It should be noted that in practical applications, the matching unit can also have when being matched using sub-network There is fault tolerant mechanism, net mate decoding is carried out using breadth first search method.The fault tolerant mechanism includes following a kind of or more Kind word string matching way: from jump, even jump, wrongly written character are fault-tolerant.It is real that the decoding process of each fault tolerant mechanism can refer to front the method for the present invention The description in example is applied, details are not described herein.
In addition, in a system of the invention, fault tolerant mechanism setup module may further be provided, for providing a user setting Function is determined whether to open fault tolerant mechanism by user according to actual needs.That is, if user opens fault tolerant mechanism, It then when carrying out word string matching using sub-network, is matched using fault tolerant mechanism, otherwise, using the progress of accurate matching mechanisms Match.Certainly, in practical applications, whether can also be preset by system using fault-tolerant machine according to actual application environment needs System.
As it can be seen that the text semantic of the embodiment of the present invention understands that system further improves the appearance of system by fault tolerant mechanism Wrong ability.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method Part explanation.System embodiment described above is only schematical, wherein described be used as separate part description Unit may or may not be physically separated, component shown as a unit may or may not be Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying In the case where creative work, it can understand and implement.
The embodiment of the present invention has been described in detail above, and specific embodiment used herein carries out the present invention It illustrates, method and system of the invention that the above embodiments are only used to help understand;Meanwhile for the one of this field As technical staff, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up institute It states, the contents of this specification are not to be construed as limiting the invention.

Claims (14)

1. a kind of text semantic understanding method characterized by comprising
Based on major network-subnet mode digraph grammer networks, the digraph grammer networks include a major network for building in advance Network and one or more sub-networks, the corresponding text character in every section of path of the digraph grammer networks or a son Network identifier;
Obtain text to be resolved;
The text is decoded based on the digraph grammer networks, obtains decoding paths;
The correlation for obtaining the decoding paths is semantic as semantic understanding result.
2. the method according to claim 1, wherein the building is based on the oriented picture and text of major network-subnet mode Method network includes:
Sentence grammar rule is established according to the syntactic property that natural language under each application inputs;
Determine master network and the corresponding text type of sub-network;
According to master network and the corresponding text type of sub-network, generation band sub-network is compiled to the sentence grammar rule The major network digraph grammer networks and subnet digraph grammer networks of identifier.
3. the method according to claim 1, wherein described be based on the digraph grammer networks to the text It is decoded, obtaining decoding paths includes:
To text to be resolved, word string matching is carried out from the first node of master network;
If there is subnet identifier in the coupling path of master network, master network match information is recorded, and calls the son The corresponding sub-network of network identifier carries out word string matching, obtains and records sub-network match information;
After the completion of text to be resolved all matches, according to obtained master network match information and sub-network match information, obtain Decoding paths.
4. according to the method described in claim 3, it is characterized in that, described be based on the digraph grammer networks to the text It is decoded, obtains decoding paths further include:
When calling the corresponding sub-network of the subnet identifier to carry out word string matching, judge whether the sub-network is for the first time It calls;
If it is, carrying out word string matching using the sub-network, and the sub-network match information of acquisition is saved in subnet With in results management device;
Otherwise, history match result is obtained from the subnet match management device as sub-network match information.
5. according to the method described in claim 4, it is characterized in that, the sub-network match information includes: sub-network matching road Diameter, sub-network search sign, the number of words for having matched word string;The master network match information includes: master network coupling path, calls Sub-network subnet identifier, matched the number of words of word string;
It is described to judge whether the sub-network is to call to include: for the first time
If the sub-network search sign expression is not searched for, it is determined that the sub-network is to call for the first time;
If the sub-network search sign expression has been searched for, and in the master network match information and sub-network match information The number of words for having matched word string it is identical, it is determined that the sub-network is non-to call for the first time.
6. according to the method described in claim 3, it is characterized in that, including: using sub-network progress word string matching
Using the sub-network carry out word string matching when, using fault tolerant mechanism carry out word string matching, the fault tolerant mechanism include with Under one or more word string matching ways: from jump, even jump, wrongly written character it is fault-tolerant.
7. method according to any one of claims 1 to 6, which is characterized in that the sub-network has one or more layers.
8. a kind of text semantic understands system characterized by comprising
Network struction module, for being constructed in advance based on major network-subnet mode digraph grammer networks, the oriented graph grammar Network includes a master network and one or more sub-networks, and every section of path of the digraph grammer networks is one corresponding Text character or a subnet identifier;
Receiving module, for obtaining text to be resolved;
Decoder module obtains decoding paths for being decoded based on the digraph grammer networks to the text;
As a result module is obtained, the correlation for obtaining the decoding paths is semantic as semantic understanding result.
9. system according to claim 8, which is characterized in that the network struction module includes:
Rule setting unit, the syntactic property for being inputted according to natural language under each application establish sentence grammar rule;
Text division unit, for determining master network and the corresponding text type of sub-network;
Compilation unit, for being compiled to the sentence grammar rule according to master network and the corresponding text type of sub-network It translates and generates the major network digraph grammer networks with subnet identifier and subnet digraph grammer networks.
10. system according to claim 8, which is characterized in that the decoder module includes:
Matching unit, for carrying out word string matching from the first node of master network to text to be resolved;And in the matching of master network When occurring subnet identifier in path, master network match information is recorded, and call the corresponding subnet of the subnet identifier Network carries out word string matching, obtains and records sub-network match information;
Decoding paths acquiring unit is used for after the completion of the matching unit all matches text to be resolved, according to described With master network match information and sub-network match information that unit obtains, decoding paths are obtained.
11. system according to claim 10, which is characterized in that the decoder module further include:
Judging unit, for calling the corresponding sub-network of the subnet identifier to carry out word string matching in the matching unit When, judge whether the sub-network is to call for the first time, and judging result is fed back to the matching unit;
The matching unit carries out word when the judging unit judges that the sub-network is to call for the first time, using the sub-network String matching, and the sub-network match information of acquisition is saved in subnet matching result manager, judge in the judging unit When the sub-network right and wrong are called for the first time, history match result is obtained from the subnet match management device and is matched as sub-network Information.
12. system according to claim 11, which is characterized in that the sub-network match information includes: sub-network matching Path, sub-network search sign, the number of words for having matched word string;The master network match information includes: master network coupling path, adjusts The subnet identifier of sub-network, the number of words for having matched word string;
The judging unit, specifically for when sub-network search sign expression is not searched for, headed by determining the sub-network Secondary calling has been searched in sub-network search sign expression, and the master network match information and sub-network match information In the number of words for having matched word string it is identical when, determine that the sub-network is non-to call for the first time.
13. system according to claim 10, which is characterized in that the matching unit carries out word string using the sub-network When matching, word string matching is carried out using fault tolerant mechanism, the fault tolerant mechanism includes one or more of word string matching way: from It jumps, even jump, wrongly written character are fault-tolerant.
14. according to the described in any item systems of claim 8 to 13, which is characterized in that the sub-network has one or more layers.
CN201510159102.2A 2015-04-03 2015-04-03 Text semantic understanding method and system Active CN106156110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510159102.2A CN106156110B (en) 2015-04-03 2015-04-03 Text semantic understanding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510159102.2A CN106156110B (en) 2015-04-03 2015-04-03 Text semantic understanding method and system

Publications (2)

Publication Number Publication Date
CN106156110A CN106156110A (en) 2016-11-23
CN106156110B true CN106156110B (en) 2019-07-30

Family

ID=57338433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510159102.2A Active CN106156110B (en) 2015-04-03 2015-04-03 Text semantic understanding method and system

Country Status (1)

Country Link
CN (1) CN106156110B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897268B (en) * 2017-02-28 2020-06-02 科大讯飞股份有限公司 Text semantic understanding method, device and system
CN114219876B (en) * 2022-02-18 2022-06-24 阿里巴巴达摩院(杭州)科技有限公司 Text merging method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1338721A (en) * 2000-08-16 2002-03-06 财团法人工业技术研究院 Probability-guide fault-tolerant method for understanding natural languages
CN102789464A (en) * 2011-05-20 2012-11-21 陈伯妤 Natural language processing method, device and system based on semanteme recognition
CN103440234A (en) * 2013-07-25 2013-12-11 清华大学 Natural language understanding system and method
CN103500160A (en) * 2013-10-18 2014-01-08 大连理工大学 Syntactic analysis method based on sliding semantic string matching
CN104252533A (en) * 2014-09-12 2014-12-31 百度在线网络技术(北京)有限公司 Search method and search device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8943094B2 (en) * 2009-09-22 2015-01-27 Next It Corporation Apparatus, system, and method for natural language processing
CN103049567A (en) * 2012-12-31 2013-04-17 威盛电子股份有限公司 Retrieval method, retrieval system and natural language understanding system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1338721A (en) * 2000-08-16 2002-03-06 财团法人工业技术研究院 Probability-guide fault-tolerant method for understanding natural languages
CN102789464A (en) * 2011-05-20 2012-11-21 陈伯妤 Natural language processing method, device and system based on semanteme recognition
CN103440234A (en) * 2013-07-25 2013-12-11 清华大学 Natural language understanding system and method
CN103500160A (en) * 2013-10-18 2014-01-08 大连理工大学 Syntactic analysis method based on sliding semantic string matching
CN104252533A (en) * 2014-09-12 2014-12-31 百度在线网络技术(北京)有限公司 Search method and search device

Also Published As

Publication number Publication date
CN106156110A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
JP7383737B2 (en) Pinning artifacts to expand search keys and search spaces in natural language understanding (NLU) frameworks
US8874443B2 (en) System and method for generating natural language phrases from user utterances in dialog systems
Mairesse et al. Stochastic language generation in dialogue using factored language models
CN104915340B (en) Natural language question-answering method and device
Le et al. Smartsynth: Synthesizing smartphone automation scripts from natural language
US9767093B2 (en) Syntactic parser assisted semantic rule inference
CN106528522A (en) Scenarized semantic comprehension and dialogue generation method and system
CN105912692B (en) A kind of method and apparatus of Intelligent voice dialog
CN103440234B (en) Natural language understanding system and method
CN107704453A (en) A kind of word semantic analysis, word semantic analysis terminal and storage medium
KR20220028038A (en) Derivation of multiple semantic expressions for utterances in a natural language understanding framework
CN107515857B (en) Semantic understanding method and system based on customization technology
CN110147544B (en) Instruction generation method and device based on natural language and related equipment
CN109065040A (en) A kind of voice information processing method and intelligent electric appliance
CN110008326A (en) Knowledge abstraction generating method and system in conversational system
US20240184620A1 (en) Invoking functions of agents via digital assistant applications using address templates
WO2018094952A1 (en) Content recommendation method and apparatus
US20230280974A1 (en) Rendering visual components on applications in response to voice commands
CN106156110B (en) Text semantic understanding method and system
Fuckner et al. Using a personal assistant for exploiting service interfaces
CN105808688B (en) Complementary retrieval method and device based on artificial intelligence
CN109670176A (en) A kind of keyword abstraction method, device, electronic equipment and storage medium
Huang et al. Ch2R: a Chinese chatter robot for online shopping guide
Li et al. An extensible scripting language for interactive animation in a speech-enabled virtual environment
US20240004619A1 (en) Using indentation to trim output of a language synthesis model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant