CN105989027A - Method and equipment for matching statements - Google Patents

Method and equipment for matching statements Download PDF

Info

Publication number
CN105989027A
CN105989027A CN201510053230.9A CN201510053230A CN105989027A CN 105989027 A CN105989027 A CN 105989027A CN 201510053230 A CN201510053230 A CN 201510053230A CN 105989027 A CN105989027 A CN 105989027A
Authority
CN
China
Prior art keywords
subtree
statement
matched
syntactic analysis
analysis tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510053230.9A
Other languages
Chinese (zh)
Inventor
吕正东
李航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510053230.9A priority Critical patent/CN105989027A/en
Publication of CN105989027A publication Critical patent/CN105989027A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

An embodiment of the invention provides a method and equipment for matching statements. The method includes acquiring to-be-matched statements and candidate statements; respectively grammatically analyzing the to-be-matched statements and the candidate statements and determining first grammatical analysis trees corresponding to the to-be-matched statements and second grammatical analysis trees corresponding to the candidate statements; determining matching vectors according to sub-tree pair databases, the first grammatical analysis trees and the second grammatical analysis trees; determining matched degrees of the to-be-matched statements and the candidate statements according to the matching vectors. The matching vectors are used for expressing matching relations between the to-be-matched statements and the candidate statements. According to the technical scheme, the method and the equipment have the advantage that the semantic matching accuracy can be improved by the aid of the method and the equipment.

Description

The method and apparatus of match statement
Technical field
The present embodiments relate to natural language processing technique field, and more particularly, to coupling language The method and apparatus of sentence.
Background technology
Semantic matches has had become as the important development direction in natural language processing technique field.The most resident Semantic matches technology tend not to enough catching well in semantic in sentence and semantic matching process Locality, non-linear and level, from being unable to complete well the semantic matches task of complexity.Existing The common feature having technology is according to set class condition, classifies the text needing coupling, then Reply according to the content that user is pre-set.Simultaneously as the uncertainty (example of the statement of input As being not intended to the lexical scoping of input, wrong word, simple language etc.), need to write substantial amounts of rule ability Reach realistic scale.Redaction rule needs substantial amounts of manpower, and, the rule that different people writes also may be used Can be different.
Summary of the invention
The embodiment of the present invention provides the method and apparatus of match statement, it is possible to increase semantic matches accurate Property.
First aspect, the embodiment of the present invention provides a kind of method of match statement, and the method includes: obtain Statement to be matched and candidate's statement;This statement to be matched and this candidate's statement are carried out syntactic analysis respectively, Determine the first syntactic analysis tree corresponding to this statement to be matched and the second language corresponding to this candidate's statement Method parsing tree;According to subtree to data base, this first syntactic analysis tree and this second syntactic analysis tree, really The matching vector of the fixed matching relationship for representing this statement to be matched and this candidate's statement;According to this coupling Vector, determines the matching degree of this statement to be matched and this candidate's statement.
In conjunction with first aspect, in the first possible implementation of first aspect, this is according to subtree pair Data base, this first syntactic analysis tree and this second syntactic analysis tree, determine for representing this language to be matched Sentence and the matching vector of matching relationship of this candidate's statement, including: according to this first syntactic analysis tree and this Second syntactic analysis tree, determine matched children to set, this matched children includes at least one to set Gamete tree is right, and each matched children is to including a subtree belonging to this first syntactic analysis tree and a genus Subtree in this second syntactic analysis tree;According to this subtree to data base and this matched children to set, really This matching vector fixed.
In conjunction with the first possible implementation of first aspect, the reality that the second in first aspect is possible In existing mode, this according to this subtree to data base and this matched children to set, determine this matching vector, Including: judge that this subtree is to whether including in data base that this matched children is to one or more in set Gamete tree is right;If, it is determined that this subtree to the matched children included by data base at this subtree logarithm According to the position in storehouse;According to this subtree to data base and this subtree to the matched children pair included by data base In this subtree to the position in data base, determine this matching vector.
In conjunction with first aspect or any of the above-described kind of possible implementation of first aspect, in first aspect In the third possible implementation, this subtree includes M subtree pair to data base, this M subtree To obtaining in the following manner: utilize Large Scale Graphs mining algorithm, N group alignment language material is carried out language Method is analyzed, and determines the N group syntactic analysis tree corresponding to this N group alignment language material, and wherein, this N group is right I-th group of alignment language material in neat language material includes two texts that can make up dialogue, corresponding to this i-th group I-th group of syntactic analysis tree of alignment language material includes two syntactic analysis trees, in this i-th group alignment language material Two texts and two syntactic analysis tree one_to_one corresponding in this i-th group of syntactic analysis tree, wherein N is just Integer, i is the positive integer less than N;According to this N group syntactic analysis tree, determine this M subtree pair, Wherein the first subtree of this M subtree centering is to including the first subtree and the second subtree, this first subtree First the syntactic analysis tree being belonging respectively in same group of syntactic analysis tree with this second subtree and second Syntactic analysis tree, wherein M is the positive integer more than N.
In conjunction with first aspect or any of the above-described kind of possible implementation of first aspect, in first aspect In 4th kind of possible implementation, this is according to this matching vector, determines this statement to be matched and this candidate The matching degree of statement, including: according to this matching vector, by neural network model, determine that this is to be matched Statement and the matching degree of this candidate's statement.
In conjunction with the 4th kind of possible implementation of first aspect, in the 5th kind of possible reality of first aspect In existing mode, this neural network model is multi-layer perception (MLP) MLP model.
Second aspect, the embodiment of the present invention provides a kind of equipment for match statement, and this equipment includes: Acquiring unit, is used for obtaining statement to be matched and candidate's statement;First determines unit, for treating this Join statement and this candidate's statement carries out syntactic analysis respectively, determine the first language corresponding to this statement to be matched Method parsing tree and the second syntactic analysis tree corresponding to this candidate's statement;Second determines unit, for basis Subtree, to data base, this first syntactic analysis tree and this second syntactic analysis tree, determines for representing that this is treated The matching vector of the matching relationship of match statement and this candidate's statement;3rd determines unit, for according to being somebody's turn to do Matching vector, determines the matching degree of this statement to be matched and this candidate's statement.
In conjunction with second aspect, in the first possible implementation of second aspect, this second determines list Unit, specifically for according to this first syntactic analysis tree and this second syntactic analysis tree, determining matched children pair Set, this matched children includes at least one matched children pair to set, and each matched children is to including one The individual subtree belonging to this first syntactic analysis tree and a subtree belonging to this second syntactic analysis tree;According to This subtree to set, determines this matching vector to data base and this matched children.
In conjunction with the first possible implementation of second aspect, the reality that the second in second aspect is possible In existing mode, this second determines unit, specifically for judging whether this subtree includes this in data base Gamete tree is to the one or more matched children pair in set;If, it is determined that this subtree is to data base institute Including matched children in this subtree to the position in data base;According to this subtree to data base and this son Set to the matched children included by data base in this subtree to the position in data base, determine this coupling to Amount.
In conjunction with second aspect or any of the above-described kind of possible implementation of second aspect, in second aspect In the third possible implementation, the 3rd determines unit, specifically for according to this matching vector, logical Cross neural network model, determine the matching degree of this statement to be matched and this candidate's statement.
In conjunction with the third possible implementation of second aspect, in the 4th kind of possible reality of second aspect In existing mode, this neural network model is multi-layer perception (MLP) MLP model.
Technique scheme is when carrying out match statement, it is not necessary to writing substantial amounts of rule in advance just can be real The accurate coupling of existing statement.Therefore, technique scheme can reduce the manpower consumed due to redaction rule And it can be avoided that the statement of coupling that the regular difference write due to different people causes is different.
Accompanying drawing explanation
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be in the embodiment of the present invention The required accompanying drawing used is briefly described, it should be apparent that, drawings described below is only this Some embodiments of invention, for those of ordinary skill in the art, are not paying creative work Under premise, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the indicative flowchart of the method for the match statement provided according to embodiments of the present invention.
Fig. 2 is the schematic diagram of the first syntactic analysis tree provided according to embodiments of the present invention.
Fig. 3 is the schematic diagram of the second syntactic analysis tree provided according to embodiments of the present invention.
Fig. 4 is the structured flowchart of the equipment of the match statement provided according to embodiments of the present invention.
Fig. 5 is the structured flowchart of the equipment of another match statement provided according to embodiments of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out Clearly and completely describe, it is clear that described embodiment be the present invention a part of embodiment rather than All embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creation Property work on the premise of the every other embodiment that obtained, all should belong to the scope of protection of the invention.
Fig. 1 is the indicative flowchart of the method for the match statement provided according to embodiments of the present invention.
101, obtain statement to be matched and candidate's statement.
102, this statement to be matched and this candidate's statement are carried out syntactic analysis respectively, determines corresponding to this First syntactic analysis tree of statement to be matched and the second syntactic analysis tree corresponding to this candidate's statement.
103, according to subtree to data base, this first syntactic analysis tree and this second syntactic analysis tree, really The matching vector of the fixed matching relationship for representing this statement to be matched and this candidate's statement.
104, according to this matching vector, determine the matching degree of this statement to be matched and this candidate's statement.
Method shown in Fig. 1 can be directly according to statement to be matched and the matching degree of candidate's statement, thus can To select suitable match statement to mate with this statement to be matched according to matching degree.Without by presetting Rule carries out statement matching, such that it is able to avoid the coupling that the regular difference write due to different people causes Statement is different.Simultaneously, it is possible to reduce the manpower consumed during redaction rule.
This candidate's statement can obtain from candidate text data storehouse.By in a large number in this candidate text data storehouse Text composition.Text in this candidate text data storehouse is the statement that can express the complete meaning.This time The method selecting the matched text that can utilize routine of statement obtains, and the present invention does not limit.Such as, may be used To determine there is the text of identical lexical item as this time with this band match statement from candidate text data storehouse Selection is originally.For another example, it is also possible to determine from this candidate text data storehouse according to preset rules and mate with this band Statement has the text of the lexical item of correspondence as this candidate's text.
Optionally, this according to subtree to data base, this first syntactic analysis tree and this second syntactic analysis tree, Determine the matching vector for representing this statement to be matched and the matching relationship of this candidate's statement, including: root According to this first syntactic analysis tree and this second syntactic analysis tree, determine matched children to set, this coupling Tree includes at least one matched children pair to set, and each matched children is to including that belongs to this first language The subtree of method parsing tree and a subtree belonging to this second syntactic analysis tree;According to this subtree to data base With this matched children to set, determine this matching vector.
This according to this subtree to data base and this matched children to set, determine this matching vector, including: Judge that this subtree is to whether including in data base that this matched children is to one or more coupling in set It is right to set;If, it is determined that the matched children that data base is somebody's turn to do by this subtree in this subtree in data base Position;According to this subtree to data base and this subtree to the matched children included by data base at this son Tree, to the position in data base, determines this matching vector.In other words, this matching vector is used for representing this Subtree in data base with this matched children to identical subtree in this subtree to the position in data base. As a rule, this subtree potentially includes substantial amounts of subtree to (such as, millions of to several ten million to data base Individual subtree to).This matched children to the quantity of the matched children pair in set seldom (for example, it may be possible to only Have tens matched children to).Hit this subtree matched children to data base to the most little.Therefore, This matching vector is a vector the most sparse.
This subtree includes M subtree pair to data base, and this M subtree is to being to obtain in the following manner : utilize Large Scale Graphs mining algorithm, N group alignment language material is carried out syntactic analysis, determines corresponding to this The N group syntactic analysis tree of N group alignment language material, wherein, i-th group of alignment language in this N group alignment language material Material includes can make up two texts of dialogue, and i-th group of grammer corresponding to this i-th group alignment language material divides Analysis tree includes two syntactic analysis trees, two texts in this i-th group alignment language material and this i-th group of grammer Two syntactic analysis tree one_to_one corresponding in parsing tree, wherein N is positive integer, and i is the most whole less than N Number;According to this N group syntactic analysis tree, determine this M subtree pair, wherein this M subtree centering First subtree is to including the first subtree and the second subtree, and this first subtree and this second subtree are belonging respectively to same First syntactic analysis tree in one group of syntactic analysis tree and second syntactic analysis tree, wherein M is big Positive integer in N.This first subtree any one subtree pair to being this M subtree centering.Need reason Solving, this subtree is effective subtree to the subtree in data base.Effectively subtree refers to by distinguishing The subtree of the part composition of the syntactic analysis tree of the information of matching degree.The information of matching degree can be distinguished Can be the morpheme with practical significance, such as noun, verb, adjective etc..
Data base can be preserved by the way of subtree is to table by this subtree, it is also possible to by subtree to and rope The mode drawn preserves (i.e. each subtree is to there being a corresponding index), and the present invention does not limit.But Being no matter to use which kind of mode to preserve this M subtree pair, each subtree is to being all allocated a sequence number. It is to say, the subtree pair required to look up can be found by sequence number, and determine this subtree according to sequence number To in this subtree to the position in data base.
It is understood that this subtree is that training in advance is good to data base.Determining a language to be matched When sentence and the matching degree of different candidate's statements, the subtree of use is identical to data base.
Further, this, according to this matching vector, determines the matching degree of this statement to be matched and this candidate's statement, Including: according to this matching vector, by neural network model, determine this statement to be matched and this candidate's language The matching degree of sentence.If this matching vector is a vector the most sparse (such as, 10,000,000 Only having tens in the vector of position is 1, and remaining position is all 0), then permissible according to this neural network model Quickly determine matching degree.
Further, it is also possible to carry out Structure learning and parameter learning according to this matching vector, determine optimization Neural network model.It is, for example possible to use random method determines described neural network model ground floor Sparse on-link mode (OLM).
Optionally, as an embodiment, this neural network model is that multi-layer perception (MLP) is (English: Multi Layer Perceptron, is called for short: MLP) model.
In order to help those skilled in the art to be more fully understood that the present invention, below in conjunction with specific embodiment pair The present invention is described further.It should be noted that this specific embodiment is only to assist in preferably Understand the present invention, and not limitation of the present invention.
Assuming that this statement to be matched is " Barcelona defeats Real Madrid ", this candidate's statement is for " Barcelona champion arrive Hands ".This statement to be matched is carried out syntactic analysis, obtains the first syntactic analysis tree as shown in Figure 2. This candidate's statement is analyzed, obtains the second syntactic analysis tree as shown in Figure 3.
According to this first syntactic analysis tree and this second syntactic analysis tree, it may be determined that coupling as follows Subtree pair, during in its bracket, the previous item of ", " is the first syntactic analysis tree of this statement to be matched Subtree, the subtree during after ", " one is the second syntactic analysis tree of this candidate's statement in bracket:
Matched children is to 1:(Barcelona, Barcelona)
2:(Barcelona is defeated by matched children, Barcelona champion)
3:(is defeated Real Madrid, Barcelona by matched children).
It is understood that this matched children to 1 to 3 be only can according to this first syntactic analysis tree and The part matched children pair of the matched children centering that this second syntactic analysis tree determines, those skilled in the art More matched children can also be determined according to this first syntactic analysis tree and this second syntactic analysis tree Right, just need not enumerate at this.
Assume this subtree to data base only includes this matched children as above to 1 to this matched children Three subtrees pair to 3, and this subtree to the 1st in data base, 3,5 subtrees to for this coupling Subtree pair, then may determine that subtree to the 1st in data base, 3,5 subtrees to hitting this matched children Right.In matching vector, the position of the position at place is set to be 1 by the subtree hitting this matched children pair, Remaining position is set to 0.Such as, this subtree to data base includes 1000 these subtrees pair altogether, then this It is 1 that orientation amount is the 1st, 3,5 in one 1000 vector tieed up and this matching vector, and remaining position is 0。
This matching vector is input in neural network model, determines this statement to be matched and candidate's statement Matching degree.It will be understood by those skilled in the art that " Barcelona champion is in one's hands " may be only multiple candidates One of them in statement, it is also possible to " Barcelona is defeated to include other may be used for mating this statement to be matched Real Madrid " candidate statement.Those skilled in the art can calculate each candidate according to said method Statement and the matching degree of this statement to be matched, select candidate's statement that matching degree is the highest as being used for mating this The match statement of statement to be matched.
Fig. 4 is the structured flowchart of the equipment of the match statement provided according to embodiments of the present invention.Shown in Fig. 4 Equipment be able to carry out each step of the method shown in Fig. 1.As shown in Figure 4, equipment 400 includes obtaining Take unit 401, first determine unit 402, second determine that unit 403 and the 3rd determines unit 404.
Acquiring unit 401, is used for obtaining statement to be matched and candidate's statement.
First determines unit 402, divides for this statement to be matched and this candidate's statement are carried out grammer respectively Analysis, determines the first syntactic analysis tree corresponding to this statement to be matched and corresponds to the second of this candidate's statement Syntactic analysis tree.
Second determines unit 403, is used for, this first syntactic analysis tree and this second syntactic analysis tree, really The matching vector of the fixed matching relationship for representing this statement to be matched and this candidate's statement.
3rd determines unit 404, for according to this matching vector, determines this statement to be matched and this candidate The matching degree of statement.
Equipment 400 shown in Fig. 4 can directly according to statement to be matched and the matching degree of candidate's statement, from And suitable match statement can be selected to mate with this statement to be matched according to matching degree.Without passing through Preset rules carries out statement matching, such that it is able to avoid that the regular difference write due to different people causes The statement joined is different.Simultaneously, it is possible to reduce the manpower consumed during redaction rule.
Optionally, second determines unit 403, specifically for according to this first syntactic analysis tree and this second Syntactic analysis tree, determine matched children to set, this matched children set is included at least one coupling son It is right to set, each matched children to include a subtree belonging to this first syntactic analysis tree and one belong to this The subtree of the second syntactic analysis tree;According to this subtree to data base and this matched children to set, determine this Matching vector.
Second determines unit 403, specifically for judging that this subtree is to whether including this coupling in data base Set the one or more matched children pair in set;If, it is determined that data base is somebody's turn to do by this subtree Matched children in this subtree to the position in data base;According to this subtree to data base and this subtree logarithm According to the matched children included by storehouse in this subtree to the position in data base, determine this matching vector.Change Sentence talk about, this matching vector for represent this subtree in data base with this matched children to identical subtree To in this subtree to the position in data base.As a rule, data base is potentially included substantial amounts of by this subtree Subtree is to (such as, millions of to several ten million son tree to).This matched children is to coupling in set Set to quantity seldom (for example, it may be possible to only tens matched children to).Hit this subtree to data The matched children in storehouse is to the most little.Therefore, this matching vector is a vector the most sparse.
This subtree includes M subtree pair to data base, and this M subtree is to being to obtain in the following manner : utilize Large Scale Graphs mining algorithm, N group alignment language material is carried out syntactic analysis, determines corresponding to this The N group syntactic analysis tree of N group alignment language material, wherein, i-th group of alignment language in this N group alignment language material Material includes can make up two texts of dialogue, and i-th group of grammer corresponding to this i-th group alignment language material divides Analysis tree includes two syntactic analysis trees, two texts in this i-th group alignment language material and this i-th group of grammer Two syntactic analysis tree one_to_one corresponding in parsing tree, wherein N is positive integer, and i is the most whole less than N Number;According to this N group syntactic analysis tree, determine this M subtree pair, wherein this M subtree centering First subtree is to including the first subtree and the second subtree, and this first subtree and this second subtree are belonging respectively to same First syntactic analysis tree in one group of syntactic analysis tree and second syntactic analysis tree, wherein M is big Positive integer in N.This first subtree any one subtree pair to being this M subtree centering.
3rd determines that unit 404, specifically for according to this matching vector, by neural network model, determines This statement to be matched and the matching degree of this candidate's statement.
Optionally, as an embodiment, this neural network model is MLP model.
Fig. 5 is the structured flowchart of the equipment of another match statement provided according to embodiments of the present invention.Fig. 5 Shown equipment is able to carry out each step of the method shown in Fig. 1.Equipment 500 shown in Fig. 5 includes: Processor 501, memorizer 502.
Each assembly in the network equipment 500 is coupled by bus system 503, wherein total linear system System 503, in addition to including data/address bus, also includes power bus, controls bus and status signal bus in addition. But for the sake of understanding explanation, in Figure 5 various buses are all designated as bus system 503.
The method that the invention described above embodiment discloses can apply in processor 501, or by processor 501 realize.Processor 501 is probably a kind of IC chip, has the disposal ability of signal.? During realization, each step of said method can be by the integration logic electricity of the hardware in processor 501 The instruction of road or software form completes.Above-mentioned processor 501 can be general processor, numeral letter Number processor (Digital Signal Processor, DSP), special IC (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field Programmable Gate Array, Or other PLDs, discrete gate or transistor logic, discrete hardware FPGA) Assembly.Can realize or perform disclosed each method, step and the logic diagram in the embodiment of the present invention. The processor etc. that general processor can be microprocessor or this processor can also be any routine.Knot The step closing the method disclosed in the embodiment of the present invention can be embodied directly in the execution of hardware decoding processor Complete, or complete with the hardware in decoding processor and software module combination execution.Software module is permissible It is positioned at random access memory (Random Access Memory, RAM), flash memory, read only memory (Read-Only Memory, ROM), programmable read only memory or electrically erasable programmable storage In the storage medium that this area such as device, depositor is ripe.This storage medium is positioned at memorizer 502, processes Device 501 reads the instruction in memorizer 502, completes the step of said method in conjunction with its hardware.
Processor 501, is used for obtaining statement to be matched and candidate's statement.
Processor 501, is additionally operable to this statement to be matched and this candidate's statement are carried out syntactic analysis respectively, Determine the first syntactic analysis tree corresponding to this statement to be matched and the second language corresponding to this candidate's statement Method parsing tree.
Processor 501, is additionally operable to according to subtree data base, this first syntactic analysis tree and this second language Method parsing tree, determine the coupling of matching relationship for representing this statement to be matched and this candidate's statement to Amount.
Processor 501, is additionally operable to according to this matching vector, determines this statement to be matched and this candidate's statement Matching degree.
Equipment 500 shown in Fig. 5 can directly according to statement to be matched and the matching degree of candidate's statement, from And suitable match statement can be selected to mate with this statement to be matched according to matching degree.Without passing through Preset rules carries out statement matching, such that it is able to avoid that the regular difference write due to different people causes The statement joined is different.Simultaneously, it is possible to reduce the manpower consumed during redaction rule.
Optionally, processor 501, specifically for dividing according to this first syntactic analysis tree and this second grammer Analysis tree, determine matched children to set, this matched children includes at least one matched children pair to set, Each matched children to include a subtree belonging to this first syntactic analysis tree and one belong to this second The subtree of syntactic analysis tree;According to this subtree to data base and this matched children to set, determine this coupling Vector.
Processor 501, specifically for judging that this subtree is to whether including in data base that this matched children is to collection One or more matched children pair in conjunction;If, it is determined that coupling that data base is somebody's turn to do by this subtree Set in this subtree to the position in data base;According to this subtree to data base and this subtree to data base institute Including matched children in this subtree to the position in data base, determine this matching vector.In other words, This matching vector for represent this subtree in data base with this matched children to identical subtree at this Subtree is to the position in data base.As a rule, this subtree potentially includes substantial amounts of subtree pair to data base (such as, millions of to several ten million son tree to).This matched children is to the matched children pair in set Quantity seldom (for example, it may be possible to only tens matched children to).Hit this subtree to data base Gamete tree is to the most little.Therefore, this matching vector is a vector the most sparse.
This subtree includes M subtree pair to data base, and this M subtree is to being to obtain in the following manner : utilize Large Scale Graphs mining algorithm, N group alignment language material is carried out syntactic analysis, determines corresponding to this The N group syntactic analysis tree of N group alignment language material, wherein, i-th group of alignment language in this N group alignment language material Material includes can make up two texts of dialogue, and i-th group of grammer corresponding to this i-th group alignment language material divides Analysis tree includes two syntactic analysis trees, two texts in this i-th group alignment language material and this i-th group of grammer Two syntactic analysis tree one_to_one corresponding in parsing tree, wherein N is positive integer, and i is the most whole less than N Number;According to this N group syntactic analysis tree, determine this M subtree pair, wherein this M subtree centering First subtree is to including the first subtree and the second subtree, and this first subtree and this second subtree are belonging respectively to same First syntactic analysis tree in one group of syntactic analysis tree and second syntactic analysis tree, wherein M is big Positive integer in N.This first subtree any one subtree pair to being this M subtree centering.
Processor 501 is specifically for according to this matching vector, by neural network model, determines that this is treated Join the matching degree of statement and this candidate's statement.
Optionally, as an embodiment, this neural network model is MLP model.
Those of ordinary skill in the art are it is to be appreciated that combine each of the embodiments described herein description The unit of example and algorithm steps, it is possible to electronic hardware or computer software and the knot of electronic hardware Incompatible realization.These functions perform with hardware or software mode actually, depend on the spy of technical scheme Fixed application and design constraint.Professional and technical personnel can use not Tongfang to each specifically should being used for Method realizes described function, but this realization is it is not considered that beyond the scope of this invention.
Those skilled in the art is it can be understood that arrive, and for convenience and simplicity of description, above-mentioned retouches The specific works process of system, device and the unit stated, is referred to the correspondence in preceding method embodiment Process, does not repeats them here.
In several embodiments provided herein, it should be understood that disclosed system, device and Method, can realize by another way.Such as, device embodiment described above is only shown Meaning property, such as, the division of described unit, be only a kind of logic function and divide, actual can when realizing There to be other dividing mode, the most multiple unit or assembly can in conjunction with or be desirably integrated into another System, or some features can ignore, or do not perform.Another point, shown or discussed each other Coupling direct-coupling or communication connection can be the INDIRECT COUPLING by some interfaces, device or unit Or communication connection, can be electrical, machinery or other form.
The described unit illustrated as separating component can be or may not be physically separate, makees The parts shown for unit can be or may not be physical location, i.e. may be located at a place, Or can also be distributed on multiple NE.Can select according to the actual needs part therein or The whole unit of person realizes the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit In, it is also possible to it is that unit is individually physically present, it is also possible to two or more unit are integrated in one In individual unit.
If described function realizes using the form of SFU software functional unit and as independent production marketing or make Used time, can be stored in a computer read/write memory medium.Based on such understanding, the present invention The part that the most in other words prior art contributed of technical scheme or the portion of this technical scheme Dividing and can embody with the form of software product, this computer software product is stored in a storage medium In, including some instructions with so that computer equipment (can be personal computer, server, Or the network equipment etc.) or processor (processor) perform method described in each embodiment of the present invention All or part of step.And aforesaid storage medium includes: USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), the various medium that can store program code such as magnetic disc or CD.
The above, the only detailed description of the invention of the present invention, but protection scope of the present invention is not limited to In this, any those familiar with the art, can be easily in the technical scope that the invention discloses The change expected or replacement, all should contain within protection scope of the present invention, therefore the protection of the present invention Scope should be as the criterion with scope of the claims.

Claims (11)

1. the method for a match statement, it is characterised in that described method includes:
Obtain statement to be matched and candidate's statement;
Described statement to be matched and described candidate's statement are carried out syntactic analysis respectively, determines corresponding to described First syntactic analysis tree of statement to be matched and the second syntactic analysis tree corresponding to described candidate's statement;
According to subtree to data base, described first syntactic analysis tree and described second syntactic analysis tree, determine For representing the matching vector of the matching relationship of described statement to be matched and described candidate's statement;
According to described matching vector, determine described statement to be matched and the matching degree of described candidate's statement.
2. the method for claim 1, it is characterised in that described according to subtree to data base, Described first syntactic analysis tree and described second syntactic analysis tree, determine for representing described statement to be matched With the matching vector of the matching relationship of described candidate's statement, including:
According to described first syntactic analysis tree and described second syntactic analysis tree, determine that matched children is to collection Closing, described matched children includes at least one matched children pair to set, and each matched children is to including one The individual subtree belonging to described first syntactic analysis tree and a subtree belonging to described second syntactic analysis tree;
According to described subtree to data base and described matched children to set, determine described matching vector.
3. method as claimed in claim 2, it is characterised in that described according to described subtree to data Storehouse and described matched children, to set, determine described matching vector, including:
Judge that described subtree is to whether including in data base that described matched children is to or many in set Individual matched children pair;If so,
Then determine described subtree to the matched children included by data base in described subtree in data base Position;
According to described subtree to data base and described subtree to the matched children included by data base in institute State subtree to the position in data base, determine described matching vector.
4. method as claimed any one in claims 1 to 3, it is characterised in that described subtree pair Data base includes M subtree pair, and described M subtree is to obtaining in the following manner:
Utilize Large Scale Graphs mining algorithm, N group alignment language material is carried out syntactic analysis, determines corresponding to institute State the N group syntactic analysis tree of N group alignment language material, wherein, i-th group in described N group alignment language material Alignment language material includes two texts that can make up dialogue, corresponding to the i-th of described i-th group of alignment language material Group syntactic analysis tree includes two syntactic analysis trees, two texts in described i-th group of alignment language material and institute Stating two syntactic analysis tree one_to_one corresponding in i-th group of syntactic analysis tree, wherein N is positive integer, and i is Positive integer less than N;
According to described N group syntactic analysis tree, determine described M subtree pair, wherein said M subtree First subtree of centering to including the first subtree and the second subtree, described first subtree and described second subtree First the syntactic analysis tree being belonging respectively in same group of syntactic analysis tree and second syntactic analysis tree, its Middle M is the positive integer more than N.
5. the method as according to any one of Claims 1-4, it is characterised in that described according to institute State matching vector, determine described statement to be matched and the matching degree of described candidate's statement, including:
According to described matching vector, by neural network model, determine described statement to be matched and described time Select the matching degree of statement.
6. method as claimed in claim 5, it is characterised in that described neural network model is multilamellar Perceptron MLP model.
7. the equipment for match statement, it is characterised in that described equipment includes:
Acquiring unit, is used for obtaining statement to be matched and candidate's statement;
First determines unit, divides for described statement to be matched and described candidate's statement are carried out grammer respectively Analysis, determines the first syntactic analysis tree corresponding to described statement to be matched and corresponding to described candidate's statement Second syntactic analysis tree;
Second determines unit, for according to subtree to data base, described first syntactic analysis tree and described the Two syntactic analysis trees, determine matching relationship for representing described statement to be matched and described candidate's statement Matching vector;
3rd determines unit, for according to described matching vector, determines described statement to be matched and described time Select the matching degree of statement.
8. equipment as claimed in claim 7, it is characterised in that described second determines unit, specifically For according to described first syntactic analysis tree and described second syntactic analysis tree, determining that matched children is to collection Closing, described matched children includes at least one matched children pair to set, and each matched children is to including one The individual subtree belonging to described first syntactic analysis tree and a subtree belonging to described second syntactic analysis tree; According to described subtree to data base and described matched children to set, determine described matching vector.
9. equipment as claimed in claim 8, it is characterised in that described second determines unit, specifically For judging that described subtree is to whether including in data base that described matched children is to or many in set Individual matched children pair;If, it is determined that described subtree to the matched children included by data base to described Subtree is to the position in data base;According to described subtree to data base and described subtree to included by data base Matched children in described subtree to the position in data base, determine described matching vector.
10. the equipment as according to any one of claim 7 to 9, it is characterised in that described 3rd true Cell, specifically for according to described matching vector, by neural network model, determines described to be matched Statement and the matching degree of described candidate's statement.
11. equipment as claimed in claim 10, it is characterised in that described neural network model is many Layer perceptron MLP model.
CN201510053230.9A 2015-01-30 2015-01-30 Method and equipment for matching statements Pending CN105989027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510053230.9A CN105989027A (en) 2015-01-30 2015-01-30 Method and equipment for matching statements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510053230.9A CN105989027A (en) 2015-01-30 2015-01-30 Method and equipment for matching statements

Publications (1)

Publication Number Publication Date
CN105989027A true CN105989027A (en) 2016-10-05

Family

ID=57036766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510053230.9A Pending CN105989027A (en) 2015-01-30 2015-01-30 Method and equipment for matching statements

Country Status (1)

Country Link
CN (1) CN105989027A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019080648A1 (en) * 2017-10-26 2019-05-02 华为技术有限公司 Retelling sentence generation method and apparatus
US12056125B2 (en) 2020-01-10 2024-08-06 Huawei Technologies Co., Ltd. Database processing method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040162806A1 (en) * 2002-09-13 2004-08-19 Fuji Xerox Co., Ltd. Text sentence comparing apparatus
CN101329666A (en) * 2008-06-18 2008-12-24 南京大学 Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match
CN102298642A (en) * 2011-09-15 2011-12-28 苏州大学 Method and system for extracting text information
US20120226492A1 (en) * 2011-03-03 2012-09-06 International Business Machines Corporation Information processing apparatus, natural language analysis method, program and recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040162806A1 (en) * 2002-09-13 2004-08-19 Fuji Xerox Co., Ltd. Text sentence comparing apparatus
CN101329666A (en) * 2008-06-18 2008-12-24 南京大学 Automatic analysis method Chinese syntax based on corpus and tree type structural pattern match
US20120226492A1 (en) * 2011-03-03 2012-09-06 International Business Machines Corporation Information processing apparatus, natural language analysis method, program and recording medium
CN102298642A (en) * 2011-09-15 2011-12-28 苏州大学 Method and system for extracting text information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李伟: "中文语句相似度计算的方法初探", 《兰州工业高等专科学校学报》 *
王利局: "基于语义分析树核的句子相似度计算", 《中国优秀硕士学位论文全文数据库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019080648A1 (en) * 2017-10-26 2019-05-02 华为技术有限公司 Retelling sentence generation method and apparatus
US11586814B2 (en) 2017-10-26 2023-02-21 Huawei Technologies Co., Ltd. Paraphrase sentence generation method and apparatus
US12056125B2 (en) 2020-01-10 2024-08-06 Huawei Technologies Co., Ltd. Database processing method and apparatus

Similar Documents

Publication Publication Date Title
US10664660B2 (en) Method and device for extracting entity relation based on deep learning, and server
US10467114B2 (en) Hierarchical data processor tester
AU2017408800B2 (en) Method and system of mining information, electronic device and readable storable medium
CN103154936B (en) For the method and system of robotization text correction
US20160364377A1 (en) Language Processing And Knowledge Building System
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
US10095685B2 (en) Phrase pair collecting apparatus and computer program therefor
US20180075366A1 (en) System and method for generating full questions from natural language queries
CN110569335B (en) Triple verification method and device based on artificial intelligence and storage medium
CN107807915B (en) Error correction model establishing method, device, equipment and medium based on error correction platform
WO2019200705A1 (en) Method and apparatus for automatically generating cloze test
CN111339768B (en) Sensitive text detection method, system, electronic equipment and medium
CN108509409A (en) A method of automatically generating semantic similarity sentence sample
CN1975714A (en) Method and device for managing content
EP3387525B1 (en) Learning from input patterns in programing-by-example
CN105608003B (en) Java applet Static Analysis Method based on control flow analysis and data-flow analysis
CN106339368A (en) Text emotional tendency acquiring method and device
MXPA04011788A (en) Learning and using generalized string patterns for information extraction.
CN107491536A (en) Test question checking method, test question checking device and electronic equipment
RU2640718C1 (en) Verification of information object attributes
CN104391837A (en) Intelligent grammatical analysis method based on case semantics
Parameswarappa et al. Kannada word sense disambiguation using decision list
CN111832281A (en) Composition scoring method and device, computer equipment and computer readable storage medium
CN112148862A (en) Question intention identification method and device, storage medium and electronic equipment
Falkenjack et al. Classifying easy-to-read texts without parsing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination