CN107247613A - Sentence analytic method and sentence resolver - Google Patents
Sentence analytic method and sentence resolver Download PDFInfo
- Publication number
- CN107247613A CN107247613A CN201710276537.4A CN201710276537A CN107247613A CN 107247613 A CN107247613 A CN 107247613A CN 201710276537 A CN201710276537 A CN 201710276537A CN 107247613 A CN107247613 A CN 107247613A
- Authority
- CN
- China
- Prior art keywords
- sentence
- resolved
- morpheme
- syntax tree
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 48
- 238000012545 processing Methods 0.000 claims abstract description 24
- 230000001174 ascending effect Effects 0.000 claims description 26
- 238000011156 evaluation Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 24
- 230000008569 process Effects 0.000 description 7
- 230000011218 segmentation Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000003607 modifier Substances 0.000 description 3
- 239000013589 supplement Substances 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 244000086443 Craterellus fallax Species 0.000 description 1
- 235000007926 Craterellus fallax Nutrition 0.000 description 1
- 244000290594 Ficus sycomorus Species 0.000 description 1
- 206010028916 Neologism Diseases 0.000 description 1
- 244000131316 Panax pseudoginseng Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000008140 language development Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of sentence analytic method and sentence resolver.Wherein, this method includes:Obtain sentence to be resolved;According to the grammer of Chinese field language-specific, sentence to be resolved is parsed, wherein, sentence to be resolved and Chinese field language-specific are all based on what Chinese was described.The present invention is solved in the related art, based on the field language-specific of English, and processing is complicated, does not meet the technical problem of the speech habits of Chinese, and improve sentence to be resolved and Chinese field language-specific can be readability, and then improves Consumer's Experience.
Description
Technical field
The present invention relates to field language-specific field, filled in particular to a kind of sentence analytic method and sentence parsing
Put.
Background technology
Field language-specific (Domain-Specific Language, referred to as DSL) is for specific application area
The computer language of design, it is expressed the intention of professional, is aided in it efficiently to solve in this field using the syntax of agreement
Certainly problem.
In the related art, a kind of computer language is described, typically using extended BNF (Extended
Backus-Naur Forms, referred to as EBNF).Traditional field language-specific description instrument, such as (Another Tool of
Language Recognition, referred to as Antlr), the design of field language-specific can be simplified to a certain extent.So
And, traditional computer language based on extended BNF describes method, and the description of existing language and analytical tool
It is (such as Antlr) or of problems:For example, in general field language-specific describes method, it is desirable to be used as base using English
This lexical element and keyword.But, the Chinese punctuate logic complicated due to being difficult to correct processing, even if allowing using Chinese work
For keyword, it is also necessary to as English, increase space, therefore, this method between word and word and do not meet the language of Chinese
Speech custom.
Therefore, in the related art, the field language-specific based on English, processing is complicated, and the language for not meeting Chinese is practised
It is used.
The content of the invention
The embodiments of the invention provide a kind of sentence analytic method and sentence resolver, at least to solve in correlation technique
In, based on the field language-specific of English, processing is complicated, does not meet the technical problem of the speech habits of Chinese.
One side according to embodiments of the present invention there is provided a kind of sentence analytic method, including:Obtain language to be resolved
Sentence;According to the grammer of Chinese field language-specific, sentence to be resolved is parsed, wherein, sentence to be resolved and Chinese field
Language-specific is all based on what Chinese was described.
Alternatively, grammer is described using dynamically changeable data;Grammer includes:For describing Chinese field language-specific
Morpheme type symbol, and, outside divided-by symbol for the dictionary that is supplemented symbol.
Alternatively, according to the grammer of Chinese field language-specific, carrying out parsing to sentence to be resolved includes:By language to be resolved
Sentence is decomposed into basic morpheme;Part of speech is marked to the basic morpheme of decomposition;According to the grammer of Chinese field language-specific, it will be labelled with
The basic morpheme of part of speech resolves to syntax tree.
Alternatively, before sentence to be resolved is decomposed into basic morpheme, in addition to:Sentenced using predetermined ambiguity evaluation algorithm
The sentence to be resolved that breaks whether there is ambiguity;In the case where the judgment result is yes, using predetermined workaround to sentence to be resolved
The ambiguity of presence is evaded.
Alternatively, sentence to be resolved is decomposed into basic morpheme includes:Using longest match principle, by sentence to be resolved point
Solve as basic morpheme, wherein, longest match principle is matching long sentence as far as possible.
Alternatively, according to the grammer of Chinese field language-specific, the basic morpheme for being labelled with part of speech is resolved into syntax tree
Including one below:Using descending manner syntax tree analytical algorithm, the basic morpheme for being labelled with part of speech is resolved into syntax tree, its
In, descending manner syntax tree analytical algorithm is:In predetermined morpheme position, search matching forward successively, when the morpheme of matching is quoted
During other symbols in addition to the symbol cited in morpheme, other symbols are matched;, will using ascending manner syntax tree analytical algorithm
The basic morpheme for being labelled with part of speech resolves to syntax tree, wherein, ascending manner syntax tree analytical algorithm is:Build from sentence to be resolved
The father node of the basic morpheme produced is decomposed, the father node for building father node in a like fashion is adopted afterwards, until producing unique
Root node;By the way of descending manner syntax tree analytical algorithm and ascending manner syntax tree analytical algorithm are combined, word will be labelled with
The basic morpheme of property resolves to syntax tree.
Alternatively, before sentence to be resolved is decomposed into basic morpheme, in addition to:Inferred using predetermined ellipsis and calculated
Method, infers to sentence to be resolved, and sentence to be resolved is reduced to the sentence of Complete Information, wherein, predetermined ellipsis is pushed away
Disconnected algorithm includes at least one of:According to basic morpheme above, the deduction algorithm above supplemented ellipsis;According to
The time that the basic morpheme of reference time is calculated to the time infers algorithm;The basic morpheme of not specified complete information is carried out
The business object of positioning infers algorithm.
Alternatively, in the grammer according to Chinese field language-specific, the basic morpheme for being labelled with part of speech is resolved into grammer
After tree, in addition to:Leaf node on syntax tree passes to the content of leaf node the father node of leaf node;Father node
Content to included all leaf nodes transmission is handled, and obtains the content of father node;Perform successively:The above is passed
Operation is passed and handles, until root node, using the content of root node as the end value of syntax tree, wherein, the end value is used
In execution application programming interfaces.
Other side according to embodiments of the present invention, additionally provides a kind of sentence resolver, it is characterised in that bag
Include:Acquisition module, for obtaining sentence to be resolved;Parsing module, for the grammer according to Chinese field language-specific, treats solution
Analysis sentence is parsed, wherein, sentence to be resolved and Chinese field language-specific are all based on what Chinese was described.
Alternatively, parsing module includes:Participle unit, for sentence to be resolved to be decomposed into basic morpheme;Mark unit,
Part of speech is marked for the basic morpheme to decomposition;Resolution unit, for the grammer according to Chinese field language-specific, will be labelled with
The basic morpheme of part of speech resolves to syntax tree.
Alternatively, parsing module also includes:Judging unit, for judging sentence to be resolved using predetermined ambiguity evaluation algorithm
With the presence or absence of ambiguity;Evade unit, in the case where the judgment result is yes, using predetermined workaround to sentence to be resolved
The ambiguity of presence is evaded.
Alternatively, participle unit includes:Subelement is decomposed, for using longest match principle, sentence to be resolved is decomposed
For basic morpheme, wherein, longest match principle is matching long sentence as far as possible.
Alternatively, resolution unit includes one below:First parsing subelement, is calculated for being parsed using descending manner syntax tree
Method, syntax tree is resolved to by the basic morpheme for being labelled with part of speech, wherein, descending manner syntax tree analytical algorithm is:In predetermined word
Plain position, search matching forward successively, when the morpheme of matching refer to other symbols in addition to the symbol cited in morpheme,
Match other symbols;Second parsing subelement, for using ascending manner syntax tree analytical algorithm, will be labelled with the basic word of part of speech
Element resolves to syntax tree, wherein, ascending manner syntax tree analytical algorithm is:Build the basic morpheme for decomposing and producing from sentence to be resolved
Father node, the father node for building father node in a like fashion is adopted afterwards, until producing unique root node;3rd parsing
Unit, by the way of being combined using descending manner syntax tree analytical algorithm and ascending manner syntax tree analytical algorithm, will be labelled with word
The basic morpheme of property resolves to syntax tree.
Alternatively, parsing module also includes:Unit is inferred, for inferring algorithm using predetermined ellipsis, to be resolved
Sentence is inferred, sentence to be resolved is reduced to the sentence of Complete Information, wherein, predetermined ellipsis infer algorithm include with
It is at least one lower:According to basic morpheme above, the deduction algorithm above supplemented ellipsis;According to the base of the time of reference
The time that this morpheme is calculated to the time infers algorithm;The business pair positioned to the basic morpheme of not specified complete information
As inferring algorithm.
Alternatively, parsing module also includes:Transfer unit, leaf node on syntax tree is by the content of leaf node
Pass to the father node of leaf node;Processing unit, enters for father node to the content of included all leaf nodes transmission
Row processing, obtains the content of father node;Performing module, for performing successively:The above is transmitted and processing operation, until root
Node, the end value of syntax tree is used as using the content of root node.
Other side according to embodiments of the present invention, additionally provides a kind of storage medium, it is characterised in that storage medium
Program including storage, wherein, equipment where control storage medium performs following operate when program is run:Obtain language to be resolved
Sentence;According to the grammer of Chinese field language-specific, sentence to be resolved is parsed, wherein, sentence to be resolved and Chinese field
Language-specific is all based on what Chinese was described.
Other side according to embodiments of the present invention, additionally provides a kind of processor, it is characterised in that processor is used for
Operation program, wherein, following operate is performed when program is run:Obtain sentence to be resolved;According to the language of Chinese field language-specific
Method, is parsed to sentence to be resolved, wherein, sentence to be resolved and Chinese field language-specific are all based on Chinese and are described
's.
In embodiments of the present invention, by using sentence to be resolved is obtained, then according to the language of Chinese field language-specific
Method, is parsed to sentence to be resolved, wherein, sentence to be resolved and Chinese field language-specific are all based on Chinese and are described
, because above-mentioned sentence to be resolved and Chinese field language-specific are all based on what Chinese was described, improve language to be resolved
Sentence can be readability with Chinese field language-specific, and then solves in the related art, based on the field language-specific of English,
Processing is complicated, does not meet the technical problem of the speech habits of Chinese, and then improves Consumer's Experience.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of sentence analytic method according to embodiments of the present invention;
Fig. 2 is the logical model figure of symbol according to embodiments of the present invention;
Fig. 3 is the logical model figure of dictionary according to embodiments of the present invention;
Fig. 4 is the logical model figure of grammer tree node according to embodiments of the present invention;
Fig. 5 is word segmentation result exemplary plot according to embodiments of the present invention;
Fig. 6 be syntax tree parsing according to embodiments of the present invention descending manner before to matching algorithm flow chart;
Fig. 7 is the frame diagram of definition and the parsing of Chinese field language-specific according to embodiments of the present invention;
Fig. 8 is the flow chart of the resolving of Chinese field language-specific according to embodiments of the present invention;And
Fig. 9 is the schematic diagram of sentence resolver according to embodiments of the present invention.
Embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model that the present invention is protected
Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using
Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or
Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover
Lid is non-exclusive to be included, for example, the process, method, system, product or the equipment that contain series of steps or unit are not necessarily limited to
Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product
Or the intrinsic other steps of equipment or unit.
According to embodiments of the present invention there is provided a kind of embodiment of the method for sentence analytic method, it is necessary to illustrate, attached
The step of flow of figure is illustrated can perform in the computer system of such as one group computer executable instructions, though also,
So logical order is shown in flow charts, but in some cases, can be shown to be performed different from order herein
Or the step of description.
In the present embodiment there is provided a kind of sentence analytic method, Fig. 1 is sentence parsing side according to embodiments of the present invention
The flow chart of method, as shown in figure 1, this method comprises the following steps:
Step S102, obtains sentence to be resolved.
Step S104, according to the grammer of Chinese field language-specific, is parsed to sentence to be resolved, wherein, it is to be resolved
Sentence and Chinese field language-specific are all based on what Chinese was described.
By above-mentioned steps, using sentence to be resolved is obtained, then according to the grammer of Chinese field language-specific, solution is treated
Analysis sentence is parsed, wherein, sentence to be resolved and Chinese field language-specific are all based on what Chinese was described, due to upper
State sentence to be resolved and Chinese field language-specific is all based on what Chinese was described, solve in the related art, be based on
The field language-specific of English, processing is complicated, the problem of not meeting the speech habits of Chinese, based on Chinese description, meets Chinese
Speech habits, improve the convenience to field language-specific, and then improve Consumer's Experience.
It should be noted that in the related art, extended BNF can only also describe the morphology of static state, it is impossible to be directed to
The word element object persistently changed carries out morphology expansion.Therefore, general field language-specific (the cmd orders under such as windows
Syntactic definition OK) departs from the custom of natural language, and the personnel in the field need that by necessary training a kind of neck could be grasped
Domain language-specific.Such as, application software platform is according to one new event of business demand dynamic creation, be named as " event X37 ",
The morpheme of " event " class is should be, but because the morphology can not be pre-defined in language description, therefore can not correctly recognize.And
In the Chinese field language-specific of the embodiment of the present invention, grammer can be pre-defined using Chinese.
In addition, in order to increase flexibility, different from using instruments such as the Antlr for extending Ba Kesi expression formulas, the present invention is real
Apply the description that example carries out grammer using dynamically changeable data.When the data of symbol table and dictionary change, it will right immediately
Resolution logic produces influence.For the field such as Antlr language-specific description instrument, it is necessary to which described grammer is solved in advance
Release, generate related Java interpretive codes, then after compiling, can just be deployed in application business system, carry out final language
Speech is performed.Therefore after operation system issue, maintenance, extension and the renewal of field language-specific are more inconvenient.And use this hair
The use dynamically changeable data of bright embodiment carry out the description of grammer, and this process does not need the generation of any resolver code, compiled
Translate and issue again.
The grammer of the embodiment of the present invention is described using dynamically changeable data below and illustrated.
Grammer can include:For the symbol of the type of the morpheme that describes Chinese field language-specific, wherein, symbol refers to one
The fundamental type of lemma element in individual grammer system.The symbol in grammer system is described, code name, title, pattern, terminal symbol is used
And priority attribute, Fig. 2 is the logical model figure of symbol according to embodiments of the present invention, and its logical model is as shown in Figure 2.Example
Such as, table 1 shows code name in symbol definition, and title, pattern, the corresponding relation between terminal symbol and priority is patrolled using this
One group of grammatical symbol example for collecting model description is as shown in table 1 below:
Table 1
" date " and " time " is two symbols independently defined in upper table, and its pattern is fixed using regular expression
Justice.And the pattern of symbol " date-time " uses the reference of " date " and " time " two kinds of symbols (to be marked with square brackets cited
Symbol code name) be defined.For some symbols, it is impossible to defined with simple mode combinations, such as " earth station " symbol is used
Built-in function IsStation () is defined.The function is an inquiry to applied business data, is meant that judgement is
No is a space flight measurement and control earth station.Given target text, if inquiring the record of the earth station in business datum,
Return very, illustrate that target text represents an earth station.The method that this use function is judged, is suitable for dynamic calculation
During constantly change, and can not with simple regular expression describe lexical element.Function not only supports data to look into
Operation is ask, the logical operation of complexity is also supported, whether comprehensive descision target text is specific morphological type.It is basic as one
Agreement, finishing sign can only be defined by pure regular expression or built-in function.And nonterminal symbol (can be made by other symbols
The symbol code name or designation included with square brackets) combination be defined." priority " is the attribute of numeric type, is used to refer to
The fixed order for attempting matching, the less preferential trial matching of numeral.For each symbol, analytical framework will define an acquiescence
Inter-process function, entitled " sCODE ".Wherein s is general prefix, and CODE is the code name of corresponding symbol.Such as letter
Number sDAY () is handles the acquiescence intrinsic function of " day reference " symbol, for determining what is referred in analyzed target text
Which day specific day is.
In addition, also including dictionary for what is supplemented symbol outside divided-by symbol.Dictionary is outside symbol table
Additional morphology describing mode, can provide more Symbol recognitions and matching foundation by way of marking actual text, into
For the supplement of symbol table.Usual symbol table is that, towards language development personnel and system developer, and dictionary is towards common
User's.Fig. 3 is the logical model figure of dictionary according to embodiments of the present invention, is included as shown in Figure 3:Word, symbol and parameter.
Table 2 is the example of a dictionary, wherein, table 2 shows word, the relation between symbol code name and parameter:
Table 2
Word | Symbol code name | Parameter |
Tomorrow | DAY | Now, 1 |
The day after tomorrow | DAY | Now, 2 |
In upper table, two neologisms that have been " DAY " (day reference) symbol definition:" tomorrow ", " day after tomorrow ", and given ginseng
Number.These parameters are transferred to the default processing function sDAY () of DAY symbols, calculate the day that two symbols are specifically represented.It is right
In " tomorrow ", " now, during 1 " two parameter, current date will be calculated as+1 day when given;And " day after tomorrow " will be calculated as working as
Preceding+2 days dates.In this way, DAY symbols are expanded by language users in the way of lightweight, without to
Go out regular expression.
After sentence to be resolved is got, in addition it is also necessary to which sentence to be resolved is parsed, wherein, sentence to be resolved is entered
Row parsing includes:Sentence to be resolved is decomposed into basic morpheme;Part of speech is marked to the basic morpheme of decomposition;It is special according to Chinese field
The grammer of attribute speech, syntax tree is resolved to by the basic morpheme for being labelled with part of speech.
As syntactic definition and parsing the most important data structure of system, the embodiment of the present invention use using grammer tree node as
The logical model of main (rather than grammer subtree).The data structure of tree node can describe a variety of nodes, accommodate various
Data, Fig. 4 is the logical model figure of grammer tree node according to embodiments of the present invention, and its underlying attribute is illustrated in fig. 4 shown below:In language
In the logical model of method tree node, attribute " urtext " is the input word for generating the node;" symbol code name " refers to and works as prosthomere
The sign pattern of point, corresponding symbol object and relevant treatment function can be accessed by the attribute;" content " attribute refers to
Both be probably simple numerical value to the reference of the real data of the node, it is also possible to a class example, and it is any be used for be
Node provides the data structure for calculating information;" father node " points to the even higher level of node of present node;And " child node " is indefinite
Long list, points to each next stage node of present node;" display word " is usually a subset of content, in the present invention
There is provided the text information of the display on tree node during " field language-specific manager " drafting syntax tree of embodiment, generally
For testing and verifying field language-specific text and related resolution algorithm that user is inputted.
Different nodes is connected with each other, you can constitute a syntax tree.Unique root node does not have a father node in tree, and its
One and only one father node of his each node.Undermost node is leaf node, and each leaf node is represented in sentence
Element.And each node represents a kind of computing.
Carried out in addition, part-of-speech tagging algorithm is synchronous with participle.After each morpheme is separated, it is required for carrying out part of speech mark
Note.Not only need to be lemma element mark the sign pattern corresponding to it, also to record its original contents, and therefrom extract crucial
Parameter value.Part-of-speech tagging is completed by the default processing function of symbol.
For example, the sign pattern in " on November 13rd, 2016 " is the date (DATE), the text is passed into correspondence symbol
Default processing function sDATE, it is as follows that the function simplifies pseudo-code:
In above-mentioned part-of-speech tagging algorithm, the numerical value of specific year, month, day is taken out, and establish according to these values
The variable of date type.After default function processing through symbol, a syntax tree node object is established, the symbol of the node is
Identified symbol during participle, urtext and display text are " on November 13rd, 2016 ";Symbol code name is DATE;Content
For the object of a date type, its value is on November 13rd, 2016;Its father node, child node are sky, and representative does not have also currently
There is the structure for carrying out syntax tree.
Before sentence to be resolved is decomposed into basic morpheme, in addition it is also necessary to judge to be resolved using predetermined ambiguity evaluation algorithm
Sentence whether there is ambiguity;In the case where the judgment result is yes, the discrimination existed using predetermined workaround to sentence to be resolved
Justice is evaded.
Traditionally the identifying processing of Chinese natural language is the problem inside Computational Linguistics.The embodiment of the present invention is directed to
The description and parsing of Chinese field language-specific, have devised and embodied the frame structure and function of resolver, specific using as follows
Key algorithm:Ambiguity evaluation algorithm and workaround.
It should be noted that for Chinese natural language, due to being only existed in grammer less part of speech (noun, verb,
Adjective, number, measure word, pronoun, adverbial word, preposition, conjunction, auxiliary word, interjection, onomatopoeia totally 12), but to cover substantial amounts of reality
Border vocabulary (according to《Lexicon of Common Words in Contemporary Chinese(draft)》, Chinese common words are 56008), therefore carrying out syntax tree
Parsing can produce substantial amounts of ambiguity.
Ambiguity should do the best in Chinese field language-specific and avoid.Judged in an embodiment of the present invention using predetermined ambiguity
Algorithm judges that sentence to be resolved whether there is ambiguity, specifically, and a read statement whether there is ambiguity, can use and sentence as follows
Determine method:
(1) during participle, since object statement left end, do not limit maximum matching length, sweep forward it is all can with
The pattern of symbol matched somebody with somebody, a kind of initial participle scheme is constituted for each matching.For each initial participle scheme, to the right
Side is gradually scanned, and often matches a kind of new symbol, all corresponding former participle forecast scheme configuration permutation and combination relation.Class according to this
Push away, until all texts are matched and finished, form x kind participle schemes.If x>1, then it can determine that read statement has morphology discrimination
Justice.
(2) the syntax tree analytical algorithm of descending manner is used a kind of participle scheme respectively, and does not limit the position of matching,
Even higher level of node can be built with matching symbols at an arbitrary position, until constructing unique root node.For all participles
Scheme, forms y kinds tree construction (identical structure is calculated as a kind), if y altogether>1, then it can determine that read statement has syntax discrimination
Justice.
In the case where the judgment result is yes, in order to evade ambiguity, the present invention is except using maximum in segmentation methods
Ensure with principle and using outside descending manner syntax tree analytical algorithm, should also try one's best from the definition of grammer system, thus, this
Inventive embodiments are evaded using predetermined workaround to the ambiguity that sentence to be resolved is present, and propose following principle:
For language institute towards professional domain, segment morpheme as far as possible, define the sign pattern compared with horn of plenty, lifted
The ratio of " symbol quantity/word quantity ".
If (2) symbol A pattern includes symbol B pattern, A priority should be adjusted to it is smaller than B, i.e., it is preferential to be entered using A
Row matching, this principle is maximum matching length principle.
, should be preposition by longer pattern if (3) there is replaceable part in symbol A pattern, with realize priority match compared with
Long pattern.Such as represent or relation OR symbols, the use that should try one's best " or | or " defines its pattern, and avoid using " or |
Or ", prevent from being identified as OR symbols and being separately separated out by " person " "or".The principle is also with maximum matching length principle kiss
Close.
In order to preferably be parsed to sentence to be resolved, selection of the embodiment of the present invention, which is used, is decomposed into sentence to be resolved
Basic morpheme, is specifically included:Using longest match principle, sentence to be resolved is decomposed into basic morpheme, wherein, most long matching is former
The sentence then grown as far as possible for matching.
One section of Chinese sentence is decomposed into basic morpheme by segmentation methods, and determines the finishing sign corresponding to morpheme.The mistake
Journey is also referred to as morphological analysis.In the process, only the pattern of terminal symbol can participate in sweep forward with matching, because only that termination
Symbol can be occurred directly in sentence.Pseudo-code by simplified segmentation methods is as follows.
Wherein, " longest match principle " is the basic participle criterion that a kind of present invention is used.In each searching position of sentence
On, above-mentioned algorithmic match long sentence as far as possible.Lexer () function is searched for forward in current goal text phrase
Tsymbol (finishing sign) pattern.By the trial matching to all finishing signs, the symbol of most long matching length is obtained
Number it will be used as final symbol.The text all in match statement if the principle fails, it tries the morphology of randomness
Search, searching has highest possible participle mode.If both of which there is no feasible word segmentation result, it can report
" read statement is wrong ", shows that it does not meet the grammer currently defined.Fig. 5 is word segmentation result example according to embodiments of the present invention
Figure, its word segmentation result is as shown in Figure 5.
The characteristics of embodiment of the present invention is directed to Chinese domain language, it is also contemplated that calculated using a kind of syntax tree parsing of mediation formula
Method, specifically includes two subalgorithms of descending manner and ascending manner.According to the grammer of Chinese field language-specific, part of speech will be labelled with
Basic morpheme, which resolves to syntax tree, includes herein below:Using descending manner syntax tree analytical algorithm, the basic of part of speech will be labelled with
Morpheme resolves to syntax tree, wherein, descending manner syntax tree analytical algorithm is:In predetermined morpheme position, successively search forward
Match somebody with somebody, when the morpheme of matching refer to other symbols in addition to the symbol cited in morpheme, match other symbols.Specifically,
In theory for a grammer system, if its morphology, syntax and its symbolism are complete, rational (i.e. symbols
In the absence of unlimited reference directly or indirectly to itself), a descending manner resolving can be used, it is total to correct sentence
The structure for the syntax tree that can be realized.Fig. 6 be syntax tree parsing according to embodiments of the present invention descending manner before to matching algorithm
" forward direction matches () " function in flow chart, Fig. 6, to the position being scheduled in current word element, is searched for forward successively.This is one
Recursive procedure, when refer to other symbols in the pattern for being try to matching, the function call itself is gone cited in matching
Symbol.
To a certain extent, although descending manner algorithm is effectively and complete, optimum performance can not be obtained.Work as symbol
There is big quantity symbol in system, and their pattern, when mutually quoting, the complexity of search tends to O (nd), wherein n is symbol
Number quantity, d is maximum reference depth.Therefore, for the numerous large-scale field language-specific of symbol quantity, it is considered as making
With the analytical algorithm of ascending manner, to obtain more preferably performance.
Using ascending manner syntax tree analytical algorithm, the basic morpheme for being labelled with part of speech is resolved into syntax tree, wherein, rise
Formula syntax tree analytical algorithm is:The father node that the basic morpheme produced is decomposed from sentence to be resolved is built, afterwards using identical
Mode builds the father node of father node, until producing unique root node;Using descending manner syntax tree analytical algorithm and ascending manner
The mode that syntax tree analytical algorithm is combined, syntax tree is resolved to by the basic morpheme for being labelled with part of speech.
For the field language-specific of determination, the morpheme that the syntax tree analytical algorithm of ascending manner is produced from participle is opened
Begin, attempt to build their father node.This algorithm asserts rule by application, is reduced significantly search space.According to symbol
Priority (or frequency statistics of historical data), most common terminal symbol combination will preferentially be separated, their father's section
Point will be established.
As the embodiment of an option, the parsing of ascending manner syntax tree can be not limited on current location, target language
As long as the pattern that any position matching is currently attempted in sentence, can be separated progress superior node structure immediately.Target language
The morpheme that do not match of sentence continues to attempt matching, until all leaf nodes are matched, form the complete second level and (compares leaf segment
The high one-level of point) node.The ascending manner matching of same procedure is carried out to second level node, until ultimately forming a unique root section
Point.If can not finally form unique root node, report read statement is wrong.
Above-mentioned descending manner and ascending manner algorithm are combined, mediation formula analytical algorithm is just constituted, its process is as follows:
(1) in read statement, preferentially carry out ascending manner for the pattern of symbol that part priority is high, the frequency of occurrences is high and search
Rope.Local syntax tree is carried out after the match is successful every time immediately to build, subtree is formed.
(2), whenever there is new subtree to successfully construct, its root node is proceeded into ascending manner language together with the other parts of sentence
Method is parsed, until the high frequency mode in sentence is processed.
(3), for the remainder of sentence, the syntax tree solution of descending manner is carried out together with the root node for the subtree having been built up
Analysis, until constructing the syntax tree of whole sentence.
The characteristics of mediation formula syntax tree analytical algorithm combines ascending manner algorithm and descending manner algorithm, is taking into account integrality
The high efficiency of parsing is ensure that simultaneously.The improved properties of mediation formula algorithm, premised on itself syntactic property of language, it is commonly used
High-frequency symbols species less, the ratio that accounts for word it is bigger, be more readily available performance boost.
Before sentence to be resolved is decomposed into basic morpheme, it is also contemplated that utilize:Inferred using predetermined ellipsis
Algorithm, infers to sentence to be resolved, and sentence to be resolved is reduced to the sentence of Complete Information, wherein, predetermined ellipsis
Infer that algorithm includes at least one of:According to basic morpheme above, the deduction algorithm above supplemented ellipsis;Root
The time calculated according to the basic morpheme of the time of reference to the time infers algorithm;The basic morpheme of not specified complete information is entered
The business object of row positioning infers algorithm.
In Chinese field language-specific, also allow the use of ellipsis.The embodiment of the present invention infers algorithm using simple
Sentence is omitted in processing, is reduced to the sentence with Complete Information.Ellipsis infers that being divided into deduction above, time infers and industry
Business object infers three kinds.
Infer above:According to previously mentioned morpheme, ellipsis is supplemented.Such as " in March, 2017 performs work to sentence
Make one.Perform work two April." wherein April be ellipsis, lack year information, it is impossible to for service computation.With ellipsis to
It is preceding to find the sentence with the time, obtain 2017, therefore be " in April, 2017 " by " April " supplement.
Time infers:The morpheme of time is referred to for " tomorrow " " next month " " 1 day " " March " etc., passes through current time
Calculated.As " tomorrow " represents current date+1 day." 1 day ", in the case of no contextual information, refers generally to from current
Nearest 1 day.By that analogy.
Business object is inferred:By searching the business datum of application platform, the morpheme of not specified complete information is determined
Position.Estimating method is that ergodic data table and each bar are recorded, and finds path of the target morpheme in business datum.Such as in teaching operations
In host system, " submit《Exercise industry on-site investigation》To Zhang San " sentence, by retrieving each business datum table, find《Body-building
Industry is investigated》There is record in paper tables of data, be the paper title of student Li Si, and Zhang San is in teacher's information's table
Teacher, therefore sentence can be supplemented " to submit the paper of Li Si《Exercise industry on-site investigation》Give teacher Zhang San ".So, to lacking
The morpheme of few information supplements necessary modifier, realizes the deduction of ellipsis.
In the grammer according to Chinese field language-specific, after the basic morpheme for being labelled with part of speech resolved into syntax tree,
Also include:Leaf node on syntax tree passes to the content of leaf node the father node of leaf node;Father node is to being wrapped
The content of all leaf nodes transmission included is handled (calculating), obtains the content of father node;Perform successively:The above is passed
Pass and handle operation, until root node, using the content of root node as syntax tree end value, wherein, the end value is used
In performing application programming interfaces (Application Processor Interface, referred to as API), user view is realized.
Said process that is to say the operation that performs of syntax tree, specific as follows shown:
1) content (calculate and obtain in part-of-speech tagging) of each leaf node is passed into its father node as parameter.
2) father node receives the parameter of content that all child nodes are transmitted function by default, performs the function and is counted
Calculate, the content of father node is updated with the function return value.
3) by that analogy, until root node calculating is finished, the content of root node is the end value of the syntax tree.
In general, there is interactive relation between the calculating of each leaf node and applied business data, calculate all may be used each time
Business datum can be influenceed.Therefore appropriate execution sequence should be chosen.Can be preferential (i.e. by a left side extremely using a left side according to the property of father node
The right calculating for carrying out node) or right priority principle.It is left preferential because the modifier (attribute and the adverbial modifier) of modern Chinese is general preposition
Principle meets the custom of most of Chinese field language-specifics, therefore is used as the preferred option of inventive algorithm.
For the ease of display, wherein syntax tree is represented in tables of data with the textual form of preposition expression formula.When selected tool
Body test statement, such as " 3 minutes after 27 circle Xiamen station TIN:Experiment arrangement 3-008 " (navigate the circle that day device flies around ground by its centre circle
It is secondary;TIN is a space flight technical term, at the time of implication is that survey station starts tracking to a spacecraft), software parses the sentence
Generative grammar tree, then depicts the Chinese statement syntax tree parsing knot of " Chinese field language-specific manager " Software on Drawing afterwards
Really.
Programming language compiler (interpreter) technology is combined by the embodiment of the present invention with natural language processing technique, structure
The hybrid technological frame of a kind of description of universal Chinese field language-specific and parsing is built, it is allowed to by regular expression, interior
Put the flexi modes such as function and carry out syntactic definition, and the automatic identification of object statement can be carried out according to the grammer, parses and holds
OK, so as to coordinate the build-in function of business application system, the customization operation of finishing service data is realized flexible on demand to business
Extension.Specifically, Fig. 7 is the frame diagram of definition and the parsing of Chinese field language-specific according to embodiments of the present invention, such as Fig. 7
It is shown including:The definition of Chinese field language-specific and parsing technological frame and application business system, wherein, Chinese field is special
By obtaining object statement in the definition of attribute speech and parsing technological frame, to Chinese domain language on the basis of syntactic description
Parsing, then enters line statement and performs operation;The definition of Chinese field language-specific and parsing technological frame by sentence perform with
Application business system is connected, specifically, and business processing operation is carried out with business datum and service logic.
Wherein, Fig. 8 is the flow chart of the resolving of Chinese field language-specific according to embodiments of the present invention, specifically
, Chinese domain language parsing is carried out by process as shown in Figure 8, to object statement by pretreatment, participle, part-of-speech tagging, language
The step of method tree parses, generative grammar tree.Specifically, in addition it is also necessary to carry out participle behaviour using the participle training result in grammer system
Make, syntax parsing training result is parsed to syntax tree.In addition, during participle and part-of-speech tagging, in addition it is also necessary to word
Allusion quotation and symbol table.
The above embodiment of the present invention perfect can support the definition of Chinese (and any other languages) field language-specific
With parsing.It is aided with rational grammar design, or even being capable of natural language word of the automatic identification processing with certain rule, such as wealth
Through news, sports news etc..The hybrid description method of morphology and the syntax based on regular expression and discriminant function, relative to
Extended BNF has higher flexibility.Text can dynamically be changed and expanded to the field language-specific designed using the present invention
Method and come into force, code building, compiling and issue without carrying out language interpreter.
Other side according to embodiments of the present invention, additionally provides a kind of sentence resolver, and Fig. 9 is according to the present invention
The schematic diagram of the sentence resolver of embodiment, includes as shown in Figure 9:Acquisition module 91 and parsing module 93.Below to this
Device is illustrated.
Acquisition module 91, for obtaining sentence to be resolved.
Parsing module 93, for the grammer according to Chinese field language-specific, is parsed to sentence to be resolved, wherein,
Sentence to be resolved and Chinese field language-specific are all based on what Chinese was described.
Alternatively, parsing module includes:Participle unit, for sentence to be resolved to be decomposed into basic morpheme;Mark unit,
Part of speech is marked for the basic morpheme to decomposition;Resolution unit, for the grammer according to Chinese field language-specific, will be labelled with
The basic morpheme of part of speech resolves to syntax tree.
Alternatively, parsing module also includes:Judging unit, for judging sentence to be resolved using predetermined ambiguity evaluation algorithm
With the presence or absence of ambiguity;Evade unit, in the case where the judgment result is yes, using predetermined workaround to sentence to be resolved
The ambiguity of presence is evaded.
Alternatively, participle unit includes:Subelement is decomposed, for using longest match principle, sentence to be resolved is decomposed
For basic morpheme, wherein, longest match principle is matching long sentence as far as possible.
Alternatively, resolution unit includes one below:First parsing subelement, is calculated for being parsed using descending manner syntax tree
Method, syntax tree is resolved to by the basic morpheme for being labelled with part of speech, wherein, descending manner syntax tree analytical algorithm is:In predetermined word
Plain position, search matching forward successively, when the morpheme of matching refer to other symbols in addition to the symbol cited in morpheme,
Match other symbols;Second parsing subelement, for using ascending manner syntax tree analytical algorithm, will be labelled with the basic word of part of speech
Element resolves to syntax tree, wherein, ascending manner syntax tree analytical algorithm is:Build the basic morpheme for decomposing and producing from sentence to be resolved
Father node, the father node for building father node in a like fashion is adopted afterwards, until producing unique root node;3rd parsing
Unit, by the way of being combined using descending manner syntax tree analytical algorithm and ascending manner syntax tree analytical algorithm, will be labelled with word
The basic morpheme of property resolves to syntax tree.
Alternatively, parsing module also includes:Unit is inferred, for inferring algorithm using predetermined ellipsis, to be resolved
Sentence is inferred, sentence to be resolved is reduced to the sentence of Complete Information, wherein, predetermined ellipsis infer algorithm include with
It is at least one lower:According to basic morpheme above, the deduction algorithm above supplemented ellipsis;According to the base of the time of reference
The time that this morpheme is calculated to the time infers algorithm;The business pair positioned to the basic morpheme of not specified complete information
As inferring algorithm.
Alternatively, parsing module also includes:Transfer unit, leaf node on syntax tree is by the content of leaf node
Pass to the father node of leaf node;Processing unit, enters for father node to the content of included all leaf nodes transmission
Row processing, obtains the content of father node;Performing module, for performing successively:The above is transmitted and processing operation, until root
Node, the end value of syntax tree is used as using the content of root node.
Other side according to embodiments of the present invention, additionally provides a kind of storage medium, it is characterised in that storage medium
Program including storage, wherein, equipment where control storage medium performs following operate when program is run:Obtain language to be resolved
Sentence;According to the grammer of Chinese field language-specific, sentence to be resolved is parsed, wherein, sentence to be resolved and Chinese field
Language-specific is all based on what Chinese was described.
Other side according to embodiments of the present invention, additionally provides a kind of processor, it is characterised in that processor is used for
Operation program, wherein, following operate is performed when program is run:Obtain sentence to be resolved;According to the language of Chinese field language-specific
Method, is parsed to sentence to be resolved, wherein, sentence to be resolved and Chinese field language-specific are all based on Chinese and are described
's.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment
The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, others can be passed through
Mode is realized.Wherein, device embodiment described above is only schematical, such as division of described unit, Ke Yiwei
A kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can combine or
Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual
Between coupling or direct-coupling or communication connection can be the INDIRECT COUPLING or communication link of unit or module by some interfaces
Connect, can be electrical or other forms.
The unit illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On unit.Some or all of unit therein can be selected to realize the purpose of this embodiment scheme according to the actual needs.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized using in the form of SFU software functional unit and as independent production marketing or used
When, it can be stored in a computer read/write memory medium.Understood based on such, technical scheme is substantially
The part contributed in other words to prior art or all or part of the technical scheme can be in the form of software products
Embody, the computer software product is stored in a storage medium, including some instructions are to cause a computer
Equipment (can for personal computer, server or network equipment etc.) perform each embodiment methods described of the invention whole or
Part steps.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with store program codes
Medium.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (17)
1. a kind of sentence analytic method, it is characterised in that including:
Obtain sentence to be resolved;
According to the grammer of Chinese field language-specific, the sentence to be resolved is parsed, wherein, the sentence to be resolved and
The Chinese field language-specific is all based on what Chinese was described.
2. according to the method described in claim 1, it is characterised in that
The grammer is described using dynamically changeable data;The grammer includes:For describing the specific language in Chinese field
The symbol of the type of the morpheme of speech, and, in addition to the symbol for the dictionary that is supplemented the symbol.
3. according to the method described in claim 1, it is characterised in that according to the grammer of the Chinese field language-specific,
Carrying out parsing to the sentence to be resolved includes:
The sentence to be resolved is decomposed into basic morpheme;
To the basic morpheme mark part of speech of decomposition;
According to the grammer of the Chinese field language-specific, the basic morpheme for being labelled with part of speech is resolved into syntax tree.
4. method according to claim 3, it is characterised in that the sentence to be resolved is being decomposed into the basic morpheme
Before, in addition to:
Judge that the sentence to be resolved whether there is ambiguity using predetermined ambiguity evaluation algorithm;
In the case where the judgment result is yes, professional etiquette is entered to the ambiguity that the sentence to be resolved is present using predetermined workaround
Keep away.
5. method according to claim 3, it is characterised in that the sentence to be resolved is decomposed into the basic morpheme bag
Include:
Using longest match principle, the sentence to be resolved is decomposed into the basic morpheme, wherein, the longest match principle
The sentence grown as far as possible for matching.
6. method according to claim 3, it is characterised in that according to the grammer of the Chinese field language-specific, will mark
The basic morpheme for having noted part of speech resolves to the syntax tree including one below:
Using descending manner syntax tree analytical algorithm, the basic morpheme for being labelled with part of speech is resolved into the syntax tree, wherein, it is described
Descending manner syntax tree analytical algorithm is:In predetermined morpheme position, search matching forward successively, when the morpheme of matching refer to remove
During other symbols outside the symbol cited in the morpheme, other described symbols of matching;
Using ascending manner syntax tree analytical algorithm, the basic morpheme for being labelled with part of speech is resolved into the syntax tree, wherein, it is described
Ascending manner syntax tree analytical algorithm is:The father node that the basic morpheme produced is decomposed from the sentence to be resolved is built, is adopted afterwards
The father node of father node is built in a like fashion, until producing unique root node;
By the way of the descending manner syntax tree analytical algorithm and the ascending manner syntax tree analytical algorithm are combined, it will be labelled with
The basic morpheme of part of speech resolves to the syntax tree.
7. method according to claim 3, it is characterised in that the sentence to be resolved is being decomposed into the basic morpheme
Before, in addition to:
Algorithm is inferred using predetermined ellipsis, the sentence to be resolved is inferred, the sentence to be resolved is reduced to
The sentence of Complete Information, wherein, the predetermined ellipsis infers that algorithm includes at least one of:According to basic word above
Element, the deduction algorithm above supplemented ellipsis;The time calculated according to the basic morpheme of the time of reference to the time
Infer algorithm;The business object positioned to the basic morpheme of not specified complete information infers algorithm.
8. the method according to any one of claim 3 to 7, it is characterised in that according to the Chinese specific language in field
The grammer of speech, after the basic morpheme for being labelled with part of speech resolved into the syntax tree, in addition to:
Leaf node on the syntax tree passes to the content of the leaf node father node of the leaf node;
The father node is handled the content of included all leaf nodes transmission, obtains the content of father node;
Perform successively:The above is transmitted and processing operation, until root node, regard the content of the root node as institute's predicate
The end value of method tree, wherein, the end value is used to perform application programming interfaces.
9. a kind of sentence resolver, it is characterised in that including:
Acquisition module, for obtaining sentence to be resolved;
Parsing module, for the grammer according to Chinese field language-specific, is parsed to the sentence to be resolved, wherein, institute
State sentence to be resolved and the Chinese field language-specific is all based on what Chinese was described.
10. device according to claim 9, it is characterised in that the parsing module includes:
Participle unit, for the sentence to be resolved to be decomposed into basic morpheme;
Unit is marked, part of speech is marked for the basic morpheme to decomposition;
Resolution unit, for the grammer according to the Chinese field language-specific, the basic morpheme for being labelled with part of speech is resolved to
Syntax tree.
11. device according to claim 10, it is characterised in that the parsing module also includes:
Judging unit, for judging that the sentence to be resolved whether there is ambiguity using predetermined ambiguity evaluation algorithm;
Evade unit, in the case where the judgment result is yes, existing using predetermined workaround to the sentence to be resolved
Ambiguity evaded.
12. device according to claim 10, it is characterised in that the participle unit includes:
Subelement is decomposed, for using longest match principle, the sentence to be resolved is decomposed into the basic morpheme, wherein,
The sentence that the longest match principle is grown as far as possible for matching.
13. device according to claim 10, it is characterised in that the resolution unit includes one below:
First parsing subelement, for using descending manner syntax tree analytical algorithm, the basic morpheme for being labelled with part of speech is resolved to
The syntax tree, wherein, the descending manner syntax tree analytical algorithm is:In predetermined morpheme position, search is matched forward successively,
When the morpheme of matching refer to other symbols in addition to the symbol cited in the morpheme, other described symbols of matching;
Second parsing subelement, for using ascending manner syntax tree analytical algorithm, the basic morpheme for being labelled with part of speech is resolved to
The syntax tree, wherein, the ascending manner syntax tree analytical algorithm is:Build from the sentence to be resolved and decompose the basic of generation
The father node of morpheme, adopts the father node for building father node in a like fashion afterwards, until producing unique root node;
3rd parsing subelement, for using the descending manner syntax tree analytical algorithm and the ascending manner syntax tree analytical algorithm
With reference to mode, the basic morpheme for being labelled with part of speech is resolved into the syntax tree.
14. device according to claim 10, it is characterised in that the parsing module also includes:
Infer unit, for inferring algorithm using predetermined ellipsis, the sentence to be resolved is inferred, waits to solve by described
Analysis sentence is reduced to the sentence of Complete Information, wherein, the predetermined ellipsis infers that algorithm includes at least one of:According to
Basic morpheme above, the deduction algorithm above supplemented ellipsis;The time is entered according to the basic morpheme of the time of reference
The time that row is calculated infers algorithm;The business object positioned to the basic morpheme of not specified complete information infers algorithm.
15. the device according to any one of claim 10 to 14, it is characterised in that the parsing module also includes:
The content of the leaf node is passed to the leaf node by transfer unit, the leaf node on the syntax tree
Father node;
Processing unit, handles the content of included all leaf nodes transmission for the father node, obtains father's section
The content of point;
Performing module, for performing successively:The above is transmitted and processing operation, until root node, when the root node
Content as the syntax tree end value, wherein, the end value be used for perform application programming interfaces.
16. a kind of storage medium, it is characterised in that the storage medium includes the program of storage, wherein, when described program is run
Equipment where controlling the storage medium performs following operate:
Obtain sentence to be resolved;
According to the grammer of Chinese field language-specific, the sentence to be resolved is parsed, wherein, the sentence to be resolved and
The Chinese field language-specific is all based on what Chinese was described.
17. a kind of processor, it is characterised in that the processor is used for operation program, wherein, performed when described program is run with
Lower operation:
Obtain sentence to be resolved;
According to the grammer of Chinese field language-specific, the sentence to be resolved is parsed, wherein, the sentence to be resolved and
The Chinese field language-specific is all based on what Chinese was described.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710276537.4A CN107247613A (en) | 2017-04-25 | 2017-04-25 | Sentence analytic method and sentence resolver |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710276537.4A CN107247613A (en) | 2017-04-25 | 2017-04-25 | Sentence analytic method and sentence resolver |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107247613A true CN107247613A (en) | 2017-10-13 |
Family
ID=60016573
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710276537.4A Pending CN107247613A (en) | 2017-04-25 | 2017-04-25 | Sentence analytic method and sentence resolver |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107247613A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108874917A (en) * | 2018-05-30 | 2018-11-23 | 北京五八信息技术有限公司 | Intension recognizing method, device, equipment and storage medium |
CN109298857A (en) * | 2018-10-09 | 2019-02-01 | 杭州朗和科技有限公司 | Method for building up, medium, device and the calculating equipment of DSL statement model |
CN109558590A (en) * | 2018-11-23 | 2019-04-02 | 中国人民解放军63789部队 | A kind of critical failure device localization method based on spacecraft telemetry parameter participle |
CN109841210A (en) * | 2017-11-27 | 2019-06-04 | 西安中兴新软件有限责任公司 | A kind of Intelligent control implementation method and device, computer readable storage medium |
CN111178052A (en) * | 2019-12-20 | 2020-05-19 | 中国建设银行股份有限公司 | Method and device for constructing robot process automation application |
CN112380848A (en) * | 2020-11-19 | 2021-02-19 | 平安科技(深圳)有限公司 | Text generation method, device, equipment and storage medium |
CN112579093A (en) * | 2020-12-11 | 2021-03-30 | 杭州安恒信息技术股份有限公司 | Information pushing method and device and related equipment |
CN118132375A (en) * | 2024-03-01 | 2024-06-04 | 北京开运联合信息技术集团股份有限公司 | Novel intelligent space survey operation and control language monitoring system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110119047A1 (en) * | 2009-11-19 | 2011-05-19 | Tatu Ylonen Oy Ltd | Joint disambiguation of the meaning of a natural language expression |
CN103365834A (en) * | 2012-03-29 | 2013-10-23 | 富泰华工业(深圳)有限公司 | System and method for eliminating language ambiguity |
US20140059417A1 (en) * | 2012-08-23 | 2014-02-27 | International Business Machines Corporation | Logical contingency analysis for domain-specific languages |
CN103902521A (en) * | 2012-12-24 | 2014-07-02 | 高德软件有限公司 | Chinese statement identification method and device |
CN104050151A (en) * | 2014-06-05 | 2014-09-17 | 北京江南天安科技有限公司 | Security incident feature analysis method and system based on predicate deduction |
US20150142443A1 (en) * | 2012-10-31 | 2015-05-21 | SK PLANET CO., LTD. a corporation | Syntax parsing apparatus based on syntax preprocessing and method thereof |
US20160132304A1 (en) * | 2014-11-12 | 2016-05-12 | International Business Machines Corporation | Contraction aware parsing system for domain-specific languages |
CN105701253A (en) * | 2016-03-04 | 2016-06-22 | 南京大学 | Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method |
CN106095398A (en) * | 2016-05-10 | 2016-11-09 | 深圳前海信息技术有限公司 | Big data mining application process based on DSL and device |
CN106202010A (en) * | 2016-07-12 | 2016-12-07 | 重庆兆光科技股份有限公司 | The method and apparatus building Law Text syntax tree based on deep neural network |
CN106227719A (en) * | 2016-07-26 | 2016-12-14 | 北京智能管家科技有限公司 | Chinese word segmentation disambiguation method and system |
CN106250104A (en) * | 2015-06-09 | 2016-12-21 | 阿里巴巴集团控股有限公司 | A kind of remote operating system for server, method and device |
CN106411626A (en) * | 2015-08-03 | 2017-02-15 | 中兴通讯股份有限公司 | Test method and device based on DSL network element simulator |
CN106446163A (en) * | 2016-09-26 | 2017-02-22 | 福建省知识产权信息公共服务中心 | Retrieval method based on advanced assertion decision algorithm and LL recursive descent method |
-
2017
- 2017-04-25 CN CN201710276537.4A patent/CN107247613A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110119047A1 (en) * | 2009-11-19 | 2011-05-19 | Tatu Ylonen Oy Ltd | Joint disambiguation of the meaning of a natural language expression |
CN103365834A (en) * | 2012-03-29 | 2013-10-23 | 富泰华工业(深圳)有限公司 | System and method for eliminating language ambiguity |
US20140059417A1 (en) * | 2012-08-23 | 2014-02-27 | International Business Machines Corporation | Logical contingency analysis for domain-specific languages |
US20150142443A1 (en) * | 2012-10-31 | 2015-05-21 | SK PLANET CO., LTD. a corporation | Syntax parsing apparatus based on syntax preprocessing and method thereof |
CN103902521A (en) * | 2012-12-24 | 2014-07-02 | 高德软件有限公司 | Chinese statement identification method and device |
CN104050151A (en) * | 2014-06-05 | 2014-09-17 | 北京江南天安科技有限公司 | Security incident feature analysis method and system based on predicate deduction |
US20160132304A1 (en) * | 2014-11-12 | 2016-05-12 | International Business Machines Corporation | Contraction aware parsing system for domain-specific languages |
CN106250104A (en) * | 2015-06-09 | 2016-12-21 | 阿里巴巴集团控股有限公司 | A kind of remote operating system for server, method and device |
CN106411626A (en) * | 2015-08-03 | 2017-02-15 | 中兴通讯股份有限公司 | Test method and device based on DSL network element simulator |
CN105701253A (en) * | 2016-03-04 | 2016-06-22 | 南京大学 | Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method |
CN106095398A (en) * | 2016-05-10 | 2016-11-09 | 深圳前海信息技术有限公司 | Big data mining application process based on DSL and device |
CN106202010A (en) * | 2016-07-12 | 2016-12-07 | 重庆兆光科技股份有限公司 | The method and apparatus building Law Text syntax tree based on deep neural network |
CN106227719A (en) * | 2016-07-26 | 2016-12-14 | 北京智能管家科技有限公司 | Chinese word segmentation disambiguation method and system |
CN106446163A (en) * | 2016-09-26 | 2017-02-22 | 福建省知识产权信息公共服务中心 | Retrieval method based on advanced assertion decision algorithm and LL recursive descent method |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109841210A (en) * | 2017-11-27 | 2019-06-04 | 西安中兴新软件有限责任公司 | A kind of Intelligent control implementation method and device, computer readable storage medium |
CN109841210B (en) * | 2017-11-27 | 2024-02-20 | 西安中兴新软件有限责任公司 | Intelligent control implementation method and device and computer readable storage medium |
CN108874917A (en) * | 2018-05-30 | 2018-11-23 | 北京五八信息技术有限公司 | Intension recognizing method, device, equipment and storage medium |
CN109298857A (en) * | 2018-10-09 | 2019-02-01 | 杭州朗和科技有限公司 | Method for building up, medium, device and the calculating equipment of DSL statement model |
CN109558590A (en) * | 2018-11-23 | 2019-04-02 | 中国人民解放军63789部队 | A kind of critical failure device localization method based on spacecraft telemetry parameter participle |
CN109558590B (en) * | 2018-11-23 | 2022-11-15 | 中国人民解放军63789部队 | Method for positioning key fault device based on spacecraft remote measurement parameter word segmentation |
CN111178052A (en) * | 2019-12-20 | 2020-05-19 | 中国建设银行股份有限公司 | Method and device for constructing robot process automation application |
CN112380848A (en) * | 2020-11-19 | 2021-02-19 | 平安科技(深圳)有限公司 | Text generation method, device, equipment and storage medium |
CN112380848B (en) * | 2020-11-19 | 2022-04-26 | 平安科技(深圳)有限公司 | Text generation method, device, equipment and storage medium |
CN112579093A (en) * | 2020-12-11 | 2021-03-30 | 杭州安恒信息技术股份有限公司 | Information pushing method and device and related equipment |
CN118132375A (en) * | 2024-03-01 | 2024-06-04 | 北京开运联合信息技术集团股份有限公司 | Novel intelligent space survey operation and control language monitoring system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107247613A (en) | Sentence analytic method and sentence resolver | |
CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
CN108304468B (en) | Text classification method and text classification device | |
US10120861B2 (en) | Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time | |
CN111611810B (en) | Multi-tone word pronunciation disambiguation device and method | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN101261623A (en) | Word splitting method and device for word border-free mark language based on search | |
US20030046078A1 (en) | Supervised automatic text generation based on word classes for language modeling | |
CN103324621B (en) | A kind of Thai text spelling correcting method and device | |
WO2020233386A1 (en) | Intelligent question-answering method and device employing aiml, computer apparatus, and storage medium | |
CN103314369B (en) | Machine translation apparatus and method | |
CN113704416B (en) | Word sense disambiguation method and device, electronic equipment and computer-readable storage medium | |
CN109614620B (en) | HowNet-based graph model word sense disambiguation method and system | |
US20200311345A1 (en) | System and method for language-independent contextual embedding | |
US20180341646A1 (en) | Translated-clause generating method, translated-clause generating apparatus, and recording medium | |
KR100481580B1 (en) | Apparatus for extracting event sentences in documents and method thereof | |
CN111444704A (en) | Network security keyword extraction method based on deep neural network | |
Grif et al. | Development of computer sign language translation technology for deaf people | |
Araujo | Part-of-speech tagging with evolutionary algorithms | |
CN116561275A (en) | Object understanding method, device, equipment and storage medium | |
CN110929518A (en) | Text sequence labeling algorithm using overlapping splitting rule | |
CN110750967B (en) | Pronunciation labeling method and device, computer equipment and storage medium | |
CN112434513A (en) | Word pair up-down relation training method based on dependency semantic attention mechanism | |
CN109960782A (en) | A kind of Tibetan language segmenting method and device based on deep neural network | |
CN111859910B (en) | Word feature representation method for semantic role recognition and fusing position information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |