A kind of building system and method for knowledge mapping
Technical field
The present invention relates to natural language processing, technical field of computer information processing, in particular to a kind of knowledge mapping
Construct system and method.
Background technique
Knowledge mapping is one kind centered on natural language processing (NLP), connected applications mathematics, graphics, visualization of information
The knowledge organization form and specification of the multiple technologies of change.Recent knowledge mapping possesses mature answer in many industries of artificial intelligence
With, such as search engine, chat robots, intelligent medical, Intelligent hardware.Knowledge mapping is divided into domain knowledge map and general knows
Know map, Google proposes the concept of world knowledge map within 2012.World knowledge map emphasizes range, is hardly produced of overall importance
The unified management of body layer.Common world knowledge map includes:Freebase, DBpedia, zhishi.me etc..Domain knowledge map is
Based on specific area, different business scenarios is coped with, the knowledge base system with certain depth and completeness.Certain world knowledge
Map and domain knowledge map are not mutually contradictory, but a mutually complementary relationship, utilize general knowledge mapping
The depth of range combination domain knowledge map, can form more perfect knowledge mapping.
Knowledge mapping is a kind of effective manifestation mode of relationship, and different types of information is linked together to obtain one
Relational network.By knowledge mapping, semantic understanding and reasoning are realized using relation derivation.The basic expressions form of relationship is ternary
Group is such as:<node,relation,node>, can indicate that there is two entities a certain relationship or some entity to contain certain
One attribute.Such as:<Zhang San, parent, Li Si>,<Old six, parent, Li Si>,<Zhang San, gender, male>,<Old six, gender, female>
=><Zhang San, spouse, old six>, four triples are represented sequentially as:Zhang San and Li Si are relationship between parents, old six and Li Si be also
Relationship between parents, Zhang San possess gender attribute for male, and old six possess gender attribute for female, can be derived and be opened by this four knowledge
Three and old sixth is that pair bond.
The building core link of knowledge mapping is exactly Relation extraction.The scheme of the building of existing domain knowledge map is main
Have:First is that first creating the data pattern based on ontology to pushing up to following formula, reflected using the structural signature data of high quality according to figure
It penetrates to obtain relationship triple.This method reliability is higher, but takes time and effort very much, and needs stronger domain knowledge conduct
Support, general data scale can not be made very big.Second is that formula from bottom to up, concentrated using certain technological means from public data real
Existing Relation extraction.Public data collection usually contains a small amount of semi-structured data and a large amount of unstructured datas, semi-structured data
Such as table, list, dictionary, infobox generally use decorator (wrapper), the form redaction rule presented according to data
To extract relationship.And the relationship in non-structured plain text often present it is varied.For example four sections of texts are ok below
Indicate the pair bond of A and B:1, A and B get married.2, A has married B.3, B marries A.4, the Papa and Mama A and B of C.Four words are equal
Pair bond is embodied, although there are some characteristics that can follow, is difficult to handle by mode of rule merely.In non-structured text
Relationship is often associated with the semantic feature of the sentence.Also useful regular template extracts relationship triple in currently existing scheme
, the advantages of this method is that comparison is accurate and reliable.But disadvantage is it is obvious that first is that need manual compiling template automatic
Change, second is that specific sentence pattern can only be adapted to.Have and proposes advanced pedestrian's work rule on the basis of the rule-based extraction of scheme
It practises, generates new rule set, then with the new non-classified relation schema of Rule Extraction.Although this scheme can improve Rule Extraction
Ability, but the deployment that can not be automated, the stage of rule learning need constantly intervention manual examination and verification, are not one fine
Solution.Relationship is extracted from non-structured plain text and constructs knowledge mapping, is an intractable problem always.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of building system of knowledge mapping and sides
Method.The technical solution is as follows:
In a first aspect, a kind of building system of knowledge mapping is provided, including:
Crawler module carries out crawler and data cleansing to text;
Basic labeling module, for carrying out the basis mark work including subject completion operation;
Candidate relationship extraction module, for extracting the candidate relationship including candidate relationship sentence and/or relationship entity pair;
Characteristic extracting module, for carrying out feature extraction;
Relationship classifier training module, for extracting result and feature extraction result progress model instruction according to candidate relationship
Practice, constructs relationship classifier;
Relationship auditing module, the candidate sentences relationship for obtaining to the relationship classifier carry out audit determination, according to
The determining result of audit adjusts accordingly the relationship classifier.
With reference to first aspect, in the first possible implementation, the system also includes:
Heuristics rule base, for the heuristic rule of relationship extraction to be arranged;
Candidate sentences relationship that the relationship auditing module is used to obtain in conjunction with the relationship classifier and described heuristic
Rule carries out audit determination, and according to audit, determining result adjusts accordingly the relationship classifier.
The possible implementation of with reference to first aspect the first, in the second possible implementation, the system is also
Including:
Log analysis module obtains the heuristic rule for excavating to original log;And/or according to described
The determining result of relationship auditing module audit is excavated, and the heuristic rule is updated.
With reference to first aspect and first and second kind of possible implementation of first aspect, in third to five kinds of possible realities
In existing mode, the system also includes:
Feature weight update module, for being classified according to the determining result of relationship auditing module audit to the relationship
Device carries out weight update.
With reference to first aspect, in a sixth possible implementation, the basic labeling module, includes point for carrying out
Word, part-of-speech tagging, name Entity recognition, syntax dependency parsing, the basic of subject completion operation mark work.
With reference to first aspect, in the 7th kind of possible implementation, the characteristic extracting module, for being based on nerve net
The word insertion feature of network language model, based on the feature of the vocabulary level of co-occurrence sequence between word and/or based on syntactic structure
Grammar property is embedded in feature, the feature based on the vocabulary level of co-occurrence sequence between word based on the word of neural network language model
And/or grammar property based on syntactic structure.
With reference to first aspect and first aspect first and second, six, seven kind of possible implementation, the eight to ten one kind
In possible implementation, the subject completion operation includes:
Judge whether sentence includes subject,
If so, judge whether subject refers to pronoun, if so, whether upper one that judges the sentence include subject,
If so, judging whether the subject is entity word, if so, carrying out the subject completion of the sentence according to the subject;
If it is not, whether upper one that then judges the sentence include subject, if so, judging whether the subject is entity
Word, if so, carrying out the subject completion of the sentence according to the subject.
With reference to first aspect and first aspect first and second, six, seven kind of possible implementation, the 12nd to 15
In kind possible implementation, the relationship auditing module is waited by the method using voting mechanism and/or manually adjudicated
Relationship audit is selected to determine.
Second aspect, a kind of construction method of knowledge mapping, including:
Crawler and data cleansing are carried out to text:
Carry out the basis mark work including subject completion operation;
Extract the candidate relationship including candidate relationship sentence and/or relationship entity pair;
Carry out feature extraction;
Result is extracted according to candidate relationship and feature extraction result carries out model training, constructs relationship classifier;
The candidate sentences relationship obtained to the relationship classifier carries out audit determination, according to the determining result of audit to institute
The relationship classifier of stating adjusts accordingly.
In conjunction with second aspect, in the first possible implementation, the method also includes:
The heuristic rule that setting relationship is extracted;
The candidate sentences relationship obtained to the relationship classifier carries out audit determination, according to the determining result of audit
The relationship classifier is adjusted accordingly, including:
The candidate sentences relationship and the heuristic rule obtained in conjunction with the relationship classifier carries out audit determination, according to
The determining result of audit adjusts accordingly the relationship classifier.
In conjunction with the first possible implementation of second aspect, in the second possible implementation, the method is also
Including:
Original log is excavated, the heuristic rule is obtained;And/or
According to relationship auditing module audit, determining result is excavated, and updates the heuristic rule.
In conjunction with first and second kind of possible implementation of second aspect and second aspect, in third to five kinds of possible realities
In existing mode, the method also includes:
According to relationship auditing module audit, determining result carries out weight update to the relationship classifier.
In conjunction with second aspect, in a sixth possible implementation, the basis including subject completion operation is carried out
Work is marked, including:
It segmented, part-of-speech tagging, the basis of Entity recognition, syntax dependency parsing, subject completion operation named to mark work
Make.
In conjunction with second aspect, in the 7th kind of possible implementation, feature extraction is carried out, including:
Word based on neural network language model is embedded in feature, the feature based on the vocabulary level of co-occurrence sequence between word
And/or word insertion feature of the grammar property based on syntactic structure based on neural network language model, based on co-occurrence sequence between word
The feature of the vocabulary level of column and/or grammar property based on syntactic structure.
In conjunction with second aspect and second aspect first and second, six, seven kind of possible implementation, in the eight to ten one kind
In possible implementation, the subject completion operation includes:
Judge whether sentence includes subject,
If so, judge whether subject refers to pronoun, if so, whether upper one that judges the sentence include subject,
If so, judging whether the master is entity word, if so, carrying out the subject completion of the sentence according to the subject;
If it is not, whether upper one that then judges the sentence include subject, if so, judging whether the subject is entity
Word, if so, carrying out the subject completion of the sentence according to the subject.
In conjunction with the second face and second aspect first and second, six, seven kind of possible implementation, at the 12nd to 15 kind
In possible implementation, candidate relationship audit is carried out by the method using voting mechanism and/or manually adjudicated and is determined.
Technical solution bring beneficial effect provided in an embodiment of the present invention is:
Knowledge mapping provided in an embodiment of the present invention constructs system and method, has compared with the prior art below beneficial to effect
Fruit:
1, it is operated due to being provided with subject completion in the mark work of basis, by combining crawler, other basis marks, waiting
Other operations such as relationship extraction, feature extraction, statistical machine learning training, relationship audit are selected, so that knowledge mapping building system
System and method have stronger Relation extraction ability, realize and extract relationship building knowledge mapping from non-structured plain text
The convenient deployment of automation;
2, the mark means combined using heuristic rules library and statistical machine learning avoid marking corpus on a large scale
Also ensure that relatively high standard calls rate together simultaneously;
3, log analysis and weight update, so that this system possesses continuous iterative learning ability, can increase in data volume
Possess better Relation extraction ability later;
Generally speaking, knowledge mapping provided in an embodiment of the present invention constructs system and method, by subject completion technology with
It is combined using the statistical machine learning of relationship classifier, continuous iteration updates, and Optimal Parameters realize stronger Relation extraction
Ability reduces the cost manually participated in, improves the efficiency of building knowledge mapping.Exactly its stronger Relation extraction ability and
Treatment effeciency, the knowledge mapping constructing plan are particularly suitable for handling the knowledge mapping building of non-structured plain text, relate to
And the field of knowledge mapping has a good application prospect.It should be noted that above-described embodiment is with emphasis on financial field company
Practice reference is given in the building of map, but theoretically, scheme provided in an embodiment of the present invention is suitable for any domain knowledge
The building of map, while also relatively new reference role is provided to the building of world knowledge map.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is the structural schematic diagram of knowledge mapping building system provided in an embodiment of the present invention;
Fig. 2 is dependency structure example;
Fig. 3 is subject completion algorithm example;
Fig. 4 is the chart of sentence lexical feature citing;
Fig. 5 is that Relation extraction involved in the embodiment of the present invention sets up knowledge mapping example flow schematic diagram;
Fig. 6 is the knowledge mapping example of the knowledge mapping building system building provided through the embodiment of the present invention;
Fig. 7 is the structural schematic diagram of knowledge mapping building system provided in an embodiment of the present invention;
Fig. 8 is heuristic rule collection example;
Fig. 9 is flow chart of data processing schematic diagram in system involved in the embodiment of the present invention;
Figure 10 is knowledge mapping construction method flow chart provided in an embodiment of the present invention;
Figure 11 is knowledge mapping construction method flow chart provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention
Figure, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this
Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
The building system and method for knowledge mapping provided in an embodiment of the present invention, by text carry out crawler pretreatment,
Basic mark, candidate relationship extractions, feature extraction, relationship classifier training and relationship, which are audited, constructs knowledge mapping, due to
Subject completion is provided in the mark work of basis to operate, and is realized stronger Relation extraction ability, is then classified with using relationship
The statistical machine learning of device combines, and continuous iteration updates, and Optimal Parameters realize stronger Relation extraction ability, reduce
The cost manually participated in improves the efficiency of building knowledge mapping.Exactly its stronger Relation extraction ability and treatment effeciency, should
Knowledge mapping constructing plan is particularly suitable for handling the knowledge mapping building of non-structured plain text, is being related to knowledge mapping
Field has a good application prospect.
Combined with specific embodiments below, knowledge mapping provided in an embodiment of the present invention building system and method is made further
Explanation.
Embodiment 1
Fig. 1 is the structural schematic diagram of knowledge mapping building system provided in an embodiment of the present invention, as shown in Figure 1, of the invention
The knowledge mapping that embodiment provides constructs system, including consisting of structure:Crawler module, basic labeling module, candidate relationship
Extraction module, characteristic extracting module, relationship classifier training module and relationship auditing module.
Crawler module, for carrying out crawler and data cleansing to text.Specifically, crawler crawls relevent information, cleaning
Text input gives basic labeling module out.
Basic labeling module, for carrying out the basis mark work including subject completion operation.Specifically, basis mark
Injection molding block includes participle (word-seg), part-of-speech tagging (POS), name Entity recognition (NER), interdependent point of syntax for carrying out
Analyse the basis mark work of (dep-parser), subject completion operation.
It should be noted that the basis mark work that the basic labeling module of the embodiment of the present invention carries out, in addition to above-mentioned column
It can also include other any possible natural language processing (NLP) labeling operations in the prior art outside the treatment process of act, this
Inventive embodiments do not limit it especially.
Illustratively, basic labeling module carries out sentence cutting to text according to paragraph symbol or punctuation mark first,
To each sentence according to the mode of pipeline, successively segmented, part-of-speech tagging, name Entity recognition and interdependent syntax are divided
Analysis.
Wherein in NER treatment process, with the method for dictionary and models coupling, entity recognition model uses crowdsourcing platform mark
Note plus CRF model training, last recombination region dictionary provide result.The reality that those are cut open according to the result of Entity recognition
Pronouns, general term for nouns, numerals and measure words is restored.Such as:" millet science and technology " may be cut into " millet " and " science and technology ", but below can be according to the knot of NER
Fruit is reassembled into " millet science and technology " incision.At this point, we can obtain two lists, one of them is sentence tokens,
The other is the dependency structure list of sentence.
Dependency structure is one using root as the tree of root, shows the dependence of each word in sentence.Fig. 2 is
Dependency structure example, shows a typical dependency structure, and ATT indicates that fixed middle relationship, SBV indicate that subject-predicate relationship, VOB indicate
Direct object can parse sentence trunk, processing coordination etc. using dependency analysis, in the example, dependency structure tree
It is stored with list structure.
Relation extraction first must be big segment length's text segmentation at sentence, then extracts candidate entity pair and its correlated characteristic.
And when text segmentation at sentence, often encountering a sentence lacks subject or replaces subject to refer to word, but
This sentence itself includes very strong relationship characteristic.Then, based on the interdependent information of the sentence context, carried out syntax according to
After depositing analysis, completion and filling can be carried out to the subject of current sentence, i.e. progress subject completion operation.
Fig. 3 is subject completion algorithm example, shows the detailed process of a preferred subject completion algorithm.Detailed process is such as
Under:
First determine whether sentence includes subject,
If so, judge whether subject refers to pronoun, if so, whether upper one that judges sentence include subject, if so,
Then judge whether subject is entity word, if so, carrying out the subject completion of sentence according to subject;
If it is not, whether upper one that then judges sentence include subject, if so, judge whether subject is entity word, if so,
The subject completion of sentence is then carried out according to subject;
Except above-mentioned progress subject completion in the case where meeting subject completion condition, other situations are then without subject completion.
That is, if a sentence lacks subject or comprising referring to word, with sentence dependency analysis, in conjunction with upper one
A semantic structure of sentences provides completion filling.Such as sentence:" Ma Yun is born in 1964, he is that group of Alibaba mainly creates
Beginning people." it by first sentence subject known to the interdependent information of sentence is name entity (person:Ma Yun);Second sentence subject
It is " he " that predicate is "Yes", object is " the main founder of group of Alibaba ", and object modification includes entity word " Alibaba
Group ".So second reference word can be replaced using first subject entity word, become that " Ma Yun is born in 1964, horse
Cloud is the main founder of group of Alibaba."
After above-mentioned basis mark work is completed, processing data can be input to candidate relationship extraction module.
Candidate relationship extraction module, for extracting the candidate relationship including candidate relationship sentence and/or relationship entity pair.Tool
Body, according to the output of basic labeling module as a result, filtering out the candidate sentences of inclusion relation, extract process substantially:First
Judge whether be greater than some threshold value comprising entity number in sentence;Whether the entity type for secondly including in sentence meets in relationship
Entity type, the sentence for meeting two conditions is exactly qualified candidate sentences.For multiple realities in a candidate sentences
The case where body, we collect entity type requirement corresponding with relationship using Descartes, and exhaustion generates all candidate relationships pair.
Illustratively, the sentence comprising two entities or more is filtered out, and entity type will meet current relation and mention
Company's relationship is such as extracted in the requirement taken, then needing to meet two entity types is entirely corporate entity's type, extracts company and people
Relationship when, the sentence met the requirements just must include at least one corporate entity's type and name entity type.
It should be noted that the candidate relationship data extracted here, in addition to candidate relationship sentence and/or relationship entity
It externally, can also include that any possible candidate relationship extraction type, the embodiment of the present invention are not subject to spy to it in the prior art
It does not limit.
After the processing of above-mentioned candidate relationship extraction module, data can be input to characteristic extracting module, and this feature extracts mould
Block is for carrying out feature extraction.Specifically, characteristic extracting module is embedded in feature, base for the word based on neural network language model
The feature of the vocabulary level of co-occurrence sequence and grammar property based on syntactic structure between word.Word insertion refers to word
Semantic information distribution ground be expressed as dense low dimensional real-valued vectors.Word insertion is characterized in based on word2vec trained in advance
Term vector finds out the COS distance value of the insertion vector of two entity words using distributed term vector invariance property in space translation.Figure
4 be the chart of sentence lexical feature citing, and the feature citing of vocabulary level is as shown in Figure 4.Grammar property refers to based on interdependent point
The sentence structure feature of analysis and part of speech, such as the interdependent word D1 of entity word c1, the interdependent word D2's and interdependent word D1 of entity word c2
The part of speech POSD2 etc. of part of speech POSD1, interdependent word D2.Illustratively, characteristic extracting module is used to be based on neural network language model
Word insertion feature, based on the feature of the vocabulary level of co-occurrence sequence between word and/or grammar property based on syntactic structure.It lifts
Example explanation is obtaining inside sentence sequence and sentence extracting after the interdependent information of each word followed by characteristic extracting module
Sentence contextual feature, such as:Two entity middle verbs, the previous word of first entity, second entity the latter word etc.
Deng.
Next, relationship classifier training module extracts result according to candidate relationship and feature extraction result carries out model instruction
Practice, constructs relationship classifier.Here relationship classifier is preferably Bayes classifier.The building process of classifier have with
Lower two ways:
Mode one, first collection fraction entity relationship example crawl its related text using crawler orientation, artificial to mark
A small amount of sample, one Relation extraction model of pre-training;Then result is extracted according to candidate relationship and feature extraction result carries out mould
Type training constructs relationship classifier;
Mode two directly extracts result according to candidate relationship and feature extraction result carries out model training, constructs relation
Class device.
Illustratively, during carrying out classifier building using aforesaid way one, manual sorting goes out a small amount of company and closes
System to and company's character relation be trained to example, and with the sentence comprising wherein relationship.It needs among these a small amount of artificial
Work is marked, but is not duration, the preparation process only trained in advance.Artificial mark low volume data is for just
Beginningization characteristic value.The step of constructing classifier is roughly divided into:
A) data set is converted to frequency meter;
B) it creates and calculates the probability likelihood table that different characteristic sets up relationship;
C) score set up using Bayes company calculated relationship;
Note that classifier only determines a kind of positive and negative class of relationship in present design, the judgement of multirelation can be put down
Capable establishes multiple classifiers.
The candidate sentences relationship that above-mentioned relation classifier obtains is input to relationship auditing module, relationship auditing module is to it
Carry out audit determination, obtain the data result for meeting audit condition, then according to the data result to above-mentioned relation classifier into
The corresponding adjustment of row, to be optimized to it.
The relationship classifier optimized by above-mentioned audit obtains a series of relational result data, by relationship entity to being stored in
Rudimentary knowledge carrier of the relational database as knowledge mapping, for high-level interface inquiry and knowledge is processed and reasoning, is so far
System completes the construction work of knowledge mapping.Illustratively, the relationship triple of extraction is finally stored in relational database, establishes base
Plinth data platform selects neo4j chart database, result is stored in database automatically according to cypher graphic query language, and
It establishes and supports upper layer query interface.
Fig. 5 is that Relation extraction involved in the embodiment of the present invention sets up knowledge mapping example flow schematic diagram, is shown logical
It crosses knowledge mapping building system and final knowledge mapping as shown in Figure 5 is obtained by plain text.Fig. 6 is through the embodiment of the present invention
The knowledge mapping example of the knowledge mapping building system building of offer, shows the knowledge mapping of shareholder's relationship.
Embodiment 2
Fig. 7 is the structural schematic diagram for the knowledge mapping building system that the embodiment of the present invention 2 provides, as shown in fig. 7, of the invention
The knowledge mapping that embodiment provides constructs system, including consisting of structure:Crawler module, the basis NLP (natural language processing)
Labeling module, candidate relationship extraction module, characteristic extracting module, relationship classifier training module, heuristics rule base, relationship are examined
Core module, log analysis module and feature weight module.
Here the basic labeling module of crawler module, NLP (natural language processing), candidate relationship extraction module, feature mention
Modulus block, relationship classifier training module are identical as corresponding module described in embodiment 1, therefore repeat no more.
Heuristics rule base, for the heuristic rule of relationship extraction to be arranged.
Specifically, heuristic rule, which can be, manually can be set some heuristic rule collection, such as according to domain knowledge,
Manual sorting heuristic rule collection;Can also be according to the excavation to original log, automatic summarize obtains, such as owns in log
Tape label sentence carries out sequential mining, provides heuristic rule automatically in conjunction with respective algorithms.Fig. 8 is heuristic rule collection example,
As shown in figure 8, showing an example of heuristic rule collection.
The inspiration of candidate sentences relationship and heuristics rule base that relationship auditing module is obtained for marriage relation classifier
Formula rule carries out audit determination, and according to audit, determining result adjusts accordingly relationship classifier, to optimize relationship classification
Device.Above-mentioned audit determination process, can be carried out as follows:
A sentence is acted on while heuristic rule and classifier, obtained result is by an arbitration mechanism come really
Fixed last relationship is determining, the method which is combined using voting mechanism method, artificial decision method or both.Show
Example property, according to relationship classifier and heuristic rule to entity candidate in non-classified new sentence to being given a mark and thrown respectively
Ticket, classifier marking rule are:Classification score (classify_score) is more than that some threshold value just throws positive ticket (+1), otherwise
Throw negative ticket (- 1).Regular marking mechanism is exactly to meet some rule just to throw positive ticket, negative ticket is otherwise thrown, then all ballots
Results added, if two mode final votes the result is that 0, by relationship auditing module carry out final audit judgement come it is true
It is fixed.If heuristic rule and classifier provide judgement simultaneously, ruling is provided with the method for ballot;If can not solve to rush
It is prominent then mark into log analysis module, wait artificial ruling.
As for log analysis module, original log is excavated it has been mentioned hereinbefore that can use it, is obtained heuristic
The heuristic rule of rule base is excavated in addition, it is also used to the determining result of relationship auditing module audit, is opened to update
Hairdo rule.Log analysis module mainly visually provides classifier score and error situation, according to common mistake
Type sorts out enlightening artificial rule base, to excavate heuristic rule, improves accuracy rate.Above-mentioned Web log mining process
It can use PrefixSpan algorithm, summarized, can also be manually summarized automatically in conjunction with cluster, the present invention is to the realization process
The method of use is without being particularly limited to.
While starting log analysis module, feature weight update module, feature weight can be triggered after relationship audit
Update module, for carrying out weight update to relationship classifier according to the determining result of relationship auditing module audit.Illustratively,
Feature weight update module recalculates the weight of existing feature according to the sentence of tape label, is input in relationship classifier,
The recognition capability for updating classifier, the candidate relationship sentence after relationship is judged can effectively obtain the feature needed to classifier
Carry out weight update, feed back to relationship classifier, realize iterative learning, make it have better accuracy, thus make entirely be
System realizes the iterative learning of automation, can also possess stronger recognition capability the case where data volume increases.
It should be noted that the day that update iterative process and log analysis module that feature weight update module carries out carry out
Will analysis mining process can be carried out simultaneously as described above, can also sequentially be carried out, such as first pass through feature weight
Update module is updated iterative process and passes through log analysis module progress log analysis mining process again, alternatively, first passing through day
Will analysis module progress log analysis mining process passes through feature weight update module again and is updated iterative process, and the present invention is real
Example is applied not limit it especially.
A series of relational result data are finally obtained by the above process, and relationship entity is made to relational database is stored in
For the rudimentary knowledge carrier of knowledge mapping, for high-level interface inquiry and knowledge processing and reasoning, so far system completes knowledge graph
The construction work of spectrum.Illustratively, the relationship triple of extraction is finally stored in relational database, establishes Base data platform,
Neo4j chart database is selected, result is stored in automatically by database according to cypher graphic query language, and establishes and supports upper layer
Query interface.Fig. 9 is flow chart of data processing schematic diagram in system involved in the embodiment of the present invention, the data that above-mentioned module executes
Process flow is as shown in Figure 9.Fig. 5 and Fig. 6 are returned, Fig. 5 is that Relation extraction involved in the embodiment of the present invention sets up knowledge graph
Example flow schematic diagram is composed, shows and system is constructed by plain text acquisition final knowledge as shown in Figure 5 by knowledge mapping
Map;Fig. 6 is the knowledge mapping example of the knowledge mapping building system building provided through the embodiment of the present invention, shows company
The knowledge mapping of ownership and membership relations.
It is worth noting that, above-mentioned module executes the detailed process of corresponding operating, other than manner described above, also
It can realize that the process, the embodiment of the present invention are not limited specific mode by other means.
Embodiment 3
Figure 10 is knowledge mapping construction method flow chart provided in an embodiment of the present invention, and as shown in Figure 10, the present invention is implemented
The knowledge mapping construction method that example provides, includes the following steps:
301, crawler and data cleansing are carried out to text:
302, the basis mark work including subject completion operation is carried out;
303, the candidate relationship including candidate relationship sentence and/or relationship entity pair is extracted;
304, feature extraction is carried out;
305, result is extracted according to candidate relationship and feature extraction result carries out model training, construct relationship classifier;
306, the candidate sentences relationship obtained to relationship classifier carries out audit determination, according to the determining result of audit to pass
It is that classifier adjusts accordingly.
Embodiment 4
Figure 11 is knowledge mapping construction method flow chart provided in an embodiment of the present invention, and as shown in figure 11, the present invention is implemented
The knowledge mapping construction method that example provides, includes the following steps:
401, crawler and data cleansing are carried out to text.
402, it segmented, part-of-speech tagging, the basis of Entity recognition, syntax dependency parsing, subject completion operation named to mark
Infuse work.
Specifically, subject completion operation includes:
Judge whether sentence includes subject,
If so, judge whether subject refers to pronoun, if so, whether upper one that judges sentence include subject, if so,
Then judge whether subject is entity word, if so, carrying out the subject completion of sentence according to subject;
If it is not, whether upper one that then judges the sentence include subject, if so, judge whether subject is entity word,
If so, carrying out the subject completion of sentence according to subject.
403, the candidate relationship including candidate relationship sentence and/or relationship entity pair is extracted.
404, it extracts the word based on neural network language model and is embedded in feature, the vocabulary level based on co-occurrence sequence between word
Feature and/or grammar property based on syntactic structure.
405, result is extracted according to candidate relationship and feature extraction result carries out model training, construct relationship classifier.
406, original log is excavated, obtains the heuristic rule.
407, the candidate sentences relationship and heuristic rule that marriage relation classifier obtains carry out audit determination, according to audit
Determining result adjusts accordingly relationship classifier.
It is determined specifically, carrying out candidate relationship audit by the method using voting mechanism and/or manually adjudicated.
408, it is excavated according to the determining result of relationship auditing module audit, updates heuristic rule.
409, weight update is carried out to relationship classifier according to the determining result of relationship auditing module audit
It is worth noting that, the process of step 401-409, other than the mode described in the above-mentioned steps, can also pass through
Other modes realize that the process, the embodiment of the present invention are not limited specific mode.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer
It repeats one by one.
It should be noted that:Knowledge mapping building system provided by the above embodiment constructs business in triggering knowledge mapping
When, only the example of the division of the above functional modules, in practical application, it can according to need and divide above-mentioned function
With being completed by different functional modules, i.e., the internal structure of system is divided into different functional modules, to complete above description
All or part of function.In addition, knowledge mapping construction method provided by the above embodiment and knowledge mapping building system are real
It applies example and belongs to same design, specific implementation process is detailed in system embodiment, and which is not described herein again.
Knowledge mapping provided in an embodiment of the present invention constructs system and method, has compared with the prior art below beneficial to effect
Fruit:
1, it is operated due to being provided with subject completion in the mark work of basis, by combining crawler, other basis marks, waiting
Other operations such as relationship extraction, feature extraction, statistical machine learning training, relationship audit are selected, so that knowledge mapping building system
System and method have stronger Relation extraction ability, realize and extract relationship building knowledge mapping from non-structured plain text
The convenient deployment of automation;
2, the mark means combined using heuristic rules library and statistical machine learning avoid marking corpus on a large scale
Also ensure that relatively high standard calls rate together simultaneously;
3, log analysis and weight update, so that this system possesses continuous iterative learning ability, can increase in data volume
Possess better Relation extraction ability later;
Generally speaking, knowledge mapping provided in an embodiment of the present invention constructs system and method, by subject completion technology with
It is combined using the statistical machine learning of relationship classifier, continuous iteration updates, and Optimal Parameters realize stronger Relation extraction
Ability reduces the cost manually participated in, improves the efficiency of building knowledge mapping.Exactly its stronger Relation extraction ability and
Treatment effeciency, the knowledge mapping constructing plan are particularly suitable for handling the knowledge mapping building of non-structured plain text, relate to
And the field of knowledge mapping has a good application prospect.It should be noted that above-described embodiment is with emphasis on financial field company
Practice reference is given in the building of map, but theoretically, scheme provided in an embodiment of the present invention is suitable for any domain knowledge
The building of map, while also relatively new reference role is provided to the building of world knowledge map.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
It should be understood by those skilled in the art that, the embodiment in the embodiment of the present application can provide as method, system or meter
Calculation machine program product.Therefore, complete hardware embodiment, complete software embodiment can be used in the embodiment of the present application or combine soft
The form of the embodiment of part and hardware aspect.Moreover, being can be used in the embodiment of the present application in one or more wherein includes meter
Computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, the optical memory of calculation machine usable program code
Deng) on the form of computer program product implemented.
It is referring to according to the method for embodiment, equipment (system) and calculating in the embodiment of the present application in the embodiment of the present application
The flowchart and/or the block diagram of machine program product describes.It should be understood that can be realized by computer program instructions flow chart and/or
The combination of the process and/or box in each flow and/or block and flowchart and/or the block diagram in block diagram.It can mention
For the processing of these computer program instructions to general purpose computer, special purpose computer, Embedded Processor or other programmable datas
The processor of equipment is to generate a machine, so that being executed by computer or the processor of other programmable data processing devices
Instruction generation refer to for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of fixed function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment in the embodiment of the present application has been described, once a person skilled in the art knows
Basic creative concept, then additional changes and modifications may be made to these embodiments.So appended claims are intended to explain
Being includes preferred embodiment and all change and modification for falling into range in the embodiment of the present application.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.