CN109992673A - A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing - Google Patents
A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN109992673A CN109992673A CN201910292766.4A CN201910292766A CN109992673A CN 109992673 A CN109992673 A CN 109992673A CN 201910292766 A CN201910292766 A CN 201910292766A CN 109992673 A CN109992673 A CN 109992673A
- Authority
- CN
- China
- Prior art keywords
- vector
- document
- entity
- triple
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013507 mapping Methods 0.000 title claims abstract description 91
- 238000000034 method Methods 0.000 title claims abstract description 57
- 239000013598 vector Substances 0.000 claims abstract description 220
- 239000011159 matrix material Substances 0.000 claims abstract description 118
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000012216 screening Methods 0.000 claims abstract description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 239000004744 fabric Substances 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000000694 effects Effects 0.000 abstract description 4
- 241000209094 Oryza Species 0.000 description 9
- 235000007164 Oryza sativa Nutrition 0.000 description 9
- 235000009566 rice Nutrition 0.000 description 9
- 238000010422 painting Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000007619 statistical method Methods 0.000 description 3
- 238000009412 basement excavation Methods 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 239000003643 water by type Substances 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of knowledge mapping generation methods, this method comprises: obtaining the description document of each entity in triple entity structure vector and triple entity structure vector;Participle statistical disposition is carried out to description document, document word distribution matrix is obtained, and utilize document word distribution matrix, obtains entity description feature vector;Entity description feature vector is carried out being added processing with triple entity structure vector, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target triple vector, and generates knowledge mapping using target triple vector.The relationship between entity and entity is obtained, more and accurate triple is can get, can so construct extensive and accurate knowledge mapping using the description document of computer technology analysis entities using this method.The invention also discloses a kind of knowledge mapping generating means, equipment and readable storage medium storing program for executing, have corresponding technical effect.
Description
Technical field
The present invention relates to knowledge base applied technical fields, more particularly to a kind of knowledge mapping generation method, device, equipment
And readable storage medium storing program for executing.
Background technique
Knowledge base is by the knowledge system of the structure of knowledge, is to push Artificial Intelligence Development and support intelligent information application
The important technology of (such as precisely recommending, intelligent search).The purpose of knowledge base is built mainly from the internet information of magnanimity
Structured information is obtained, the related applications such as knowledge reasoning can be completed.The representation of knowledge is the basis of knowledge acquisition and application, because
This research representation of knowledge becomes particularly significant.
Currently, expressing for knowledge method is defined as to the triple of " entity-relationship-entity " form, a large amount of triple
Constitute knowledge mapping, that is, knowledge network.But due to existing knowledge base be mostly towards specific area, limited coverage area,
Entity relationship Sparse and integrality is not high, Entity Semantics or relationship calculate that accuracy rate is undesirable, lead to the knowledge graph of creation
Compose small scale, incomplete problem.
In conclusion the problems such as how efficiently solving the creation of knowledge mapping, is that current those skilled in the art are badly in need of
The technical issues of solution.
Summary of the invention
The object of the present invention is to provide a kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing, in conjunction with reality
The description document of body obtains more accurate triple structure, and based on more accurate triple Structure Creating go out it is extensive and
Perfect knowledge mapping.
In order to solve the above technical problems, the invention provides the following technical scheme:
A kind of knowledge mapping generation method, comprising:
Obtain the description text of each entity in triple entity structure vector and the triple entity structure vector
Shelves;
Participle statistical disposition is carried out to the description document, obtains document word distribution matrix, and utilize the document word
Language distribution matrix obtains entity description feature vector;
The entity description feature vector is carried out to be added processing with the triple entity structure vector, obtains initial three
Tuple structure vector;
Screening is carried out to the initial triple structure vector and obtains target triple vector, and utilizes the target ternary
Group vector generates knowledge mapping.
Preferably, participle statistical disposition is carried out to the description document, obtains document word distribution matrix, comprising:
Word in the description document is modeled, to the word vectors in document, obtains message vector matrix;
Cluster merging is carried out according to term vector similarity, word frequency is counted, obtains the document word distribution matrix.
Preferably, the word in the description document is modeled, to the word vectors in document, obtain document to
Moment matrix, comprising:
Word segmentation processing is carried out to the description document, obtains document word collection;
The document word collection is exported into word vectors model, obtains the message vector matrix.
Preferably, described to utilize the document word distribution matrix, obtain entity description feature vector, comprising:
The document word distribution matrix is decomposed, document subject matter distribution matrix and Topic word moment of distribution are obtained
Battle array;
In conjunction with the document subject matter distribution matrix and the Topic word distribution matrix, each entity description is determined
The maximum keyword of the degree of association obtains keyword vector matrix;
By keyword vector matrix conversion at most domain knowledge map triplet information space, obtains the entity and retouch
State feature vector.
Preferably, described that the document word distribution matrix is decomposed, obtain document subject matter distribution matrix and theme
Word distribution matrix, comprising:
The document word distribution matrix is input to document subject matter and generates progress document word matrix modeling in model, is obtained
Obtain the document subject matter distribution matrix and the Topic word distribution matrix.
Preferably, it by keyword vector matrix conversion at most domain knowledge map triplet information space, obtains real
Body Expressive Features vector, comprising:
The keyword matrix is input in neural network and is mapped, the entity description feature vector is obtained.
It is preferably, described that screening acquisition target triple vector is carried out to initial triple structure vector, comprising:
The initial triple structure vector is screened using default reliability assessment function, obtains target triple
Vector.
A kind of knowledge mapping generating means, comprising:
Description information obtain module, for obtain triple entity structure vector and the triple entity structure to
The description document of each entity in amount;
Entity description feature vector obtains module, for carrying out participle statistical disposition to the description document, obtains document
Word distribution matrix, and the document word distribution matrix is utilized, obtain entity description feature vector;
Vector Fusion module, for carrying out phase to the entity description feature vector and the triple entity structure vector
Add processing, obtains initial triple structure vector;
Knowledge mapping generation module, for the initial triple structure vector carry out screening obtain target triple to
Amount, and knowledge mapping is generated using the target triple vector.
A kind of knowledge mapping generating device, comprising:
Memory, for storing computer program;
Processor, the step of above-mentioned knowledge mapping generation method is realized when for executing the computer program.
A kind of readable storage medium storing program for executing is stored with computer program, the computer program quilt on the readable storage medium storing program for executing
The step of processor realizes above-mentioned knowledge mapping generation method when executing.
Using method provided by the embodiment of the present invention, triple entity structure vector and ternary group object knot are obtained
The description document of each entity in structure vector;Participle statistical disposition is carried out to description document, obtains document word distribution matrix, and
Using document word distribution matrix, entity description feature vector is obtained;To entity description feature vector and triple entity structure
Vector carries out addition processing, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target
Triple vector, and knowledge mapping is generated using target triple vector.
Often it is related to describing the entity with another entity in the description document of an entity, that is,
It says, it is for statistical analysis by the description document to entity, it can obtain the relationship of an entity and another entity.It is based on
This can obtain the description document of each entity in triple entity structure vector when getting triple entity structure vector.
Then, participle is carried out to description document and counts processing, document word distribution matrix can be obtained.Utilize document word moment of distribution
Battle array obtains entity description feature vector.Entity description feature vector is carried out being added processing with triple entity structure vector, is obtained
To initial triple structure vector.Initial triple structure vector is screened, can be obtained for generating knowledge mapping
Target triple vector.Knowledge mapping is generated using target triple vector.This method is analyzed real using computer technology
The description document of body obtains the relationship between entity and entity, can get more and accurate triple, so can structure
Build out extensive and accurate knowledge mapping, further can based on the knowledge mapping promoted knowledge based map knowledge recommendation,
The performance of question answering system and retrieval application.
Correspondingly, the embodiment of the invention also provides knowledge mapping corresponding with above-mentioned knowledge mapping generation method generations
Device, equipment and readable storage medium storing program for executing, have above-mentioned technique effect, and details are not described herein.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of implementation flow chart of knowledge mapping generation method in the embodiment of the present invention;
Fig. 2 is the schematic diagram of document theme distribution matrix and Topic word distribution matrix in the embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of knowledge mapping generating means in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of knowledge mapping generating device in the embodiment of the present invention;
Fig. 5 is a kind of concrete structure schematic diagram of knowledge mapping generating device in the embodiment of the present invention.
Specific embodiment
It when constructing knowledge mapping, needs to obtain a large amount of triple, knowledge mapping is then generated based on triple again.Mesh
Before, the mode for obtaining triple has following Three models:
Unsupervised mode needs related knowledge domain expertise hand-coding rule/mode.Such as with " A work
For B " describes employer-employee relationship, such rule/mode application is then entered sentence, to excavate specific triple.
Semi-supervised mode: manually kind of a sub-instance (Seed instances) is provided.Such as " (John, HuaWei),
(Alice, Apple) ".Then machine is given, mode (Pattern) --- " A included in this kind of kind of sub-instance out is learnt
Seed reality is added followed by the new example for meeting the mode of the mode excavation, then by these new examples in work for B "
In example.As it can be seen that the above process is the process of a bootstrap (guidance).In this process, it may be incorporated into artificial mutual
It is dynamic.Such as to the mode that machine learning is arrived, artificial screening can be carried out.The triple example newly learnt can be marked just
Negative example.
Unsupervised mode: the verb of certain syntax rule will be met in sentence as relationship, by the noun of the verb or so
As entity.
Wherein, unsupervised mode needs related knowledge domain expertise hand-coding rule/mode, for creating big rule
For the knowledge mapping of mould, take a long time;Semi-supervised model is also required to manually provide kind of a sub-instance, it is desirable that plants the quantity of sub-instance
It is enough to obtain the other triple of more relation objects;Unsupervised mode is directly using corresponding verb as relationship, and the three of acquisition
Tuple accuracy rate is lower.
To solve the above-mentioned problems, the invention proposes a kind of knowledge mapping generation methods.The description information of one entity
Often it is related to another entity and the relationship between them, the description document of entity is carried out going deep into excavation, it can be true
A large amount of and accurately and effectively triple is made, larger, the more accurate knowledge mapping of relationship description is produced.For example, right
It can in the description document of " traditional Chinese Painting " this entity, the entity specifically: traditional Chinese Painting generally refers to be drawn in thin,tough silk, rice paper, on silks and adds
With the scroll painting of mounting;Traditional Chinese Painting is the traditional drawing form of China, is to dip in water, ink, coloured silk with writing brush to be drawn on thin,tough silk or paper;Tool
Have writing brush, ink, Chinese painting color, rice paper, thin,tough silk etc. with material, subject matter can divide personage, scenery with hills and waters, birds and flowers etc., skill and technique can divide tool as with write
Meaning.For " rice paper " this entity, the description document of the entity can specifically: and rice paper is the classic painting paper of Chinese tradition,
It is one of Han nationality's traditional paper-making process;Rice paper " starts from the Tang Dynasty, originates in Jingxian County ", administers because the Tang Dynasty Jingxian County is subordinate to Xuanzhou mansion, therefore because
Ground is gained the name rice paper, has more than 1500 years history so far;It is easy to save since rice paper has, it is prolonged not crisp, the features such as will not fading, therefore
There is the reputation of " paper Shou Qiannian ".By can be obtained in the description document of manual read or the two entities of traditional Chinese Painting and rice paper of reading,
There are bearing relations for traditional Chinese Painting and rice paper, i.e. " traditional Chinese Painting-carrier-rice paper ".Knowledge mapping creation side provided by the embodiment of the present invention
Method, it is contemplated that there are the relationship descriptions between entity and entity in entity description document, are carried out at analysis using computer technology
Reason obtains the machine recognizable knowledge mapping being made of triple.This method by the description document of entity obtain entity with
Relationship between entity, can get more and accurate triple, can so construct extensive and accurate knowledge
Map can further promote the knowledge recommendation of knowledge based map, the property of question answering system and retrieval application based on the knowledge mapping
Energy.
The embodiment of the invention also provides knowledge mapping generating means corresponding with above-mentioned knowledge mapping generation method, set
Standby and readable storage medium storing program for executing, has above-mentioned technique effect.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Embodiment one:
Referring to FIG. 1, Fig. 1 is a kind of flow chart of knowledge mapping generation method in the embodiment of the present invention, this method includes
Following steps:
S101, the description text for obtaining each entity in triple entity structure vector and triple entity structure vector
Shelves.
Triple entity structure vector can be obtained directly from existing knowledge map;It is obtained when it is of course also possible to extract entity
?.For the description of each entity in triple entity structure vector can be obtained convenient for determining the relationship between entity and entity
Document.The quantity of the triple entity structure vector got can be for 1 or multiple, correspondingly, ternary group object
The description document of entity in structure, can the corresponding description document of entity, multiple description documents can also be corresponded to.
S102, participle statistical disposition is carried out to description document, obtains document word distribution matrix, and utilize document word point
Cloth matrix obtains entity description feature vector.
Preferably, in order to save memory space and improve search efficiency, such as removal can be also carried out for description document and is stopped
Word, the pretreatment for removing punctuation mark.Stop words mainly includes English character, number, mathematical character, punctuation mark and use
The extra-high Chinese word character etc. of frequency.Stop words can substantially divide using very extensive word;Word without substantive meaning, including the tone help
Word, adverbial word, preposition, conjunction etc. usually itself have no specific meaning, and only putting it into a complete sentence just has
Certain effect, as it is common " ", " ", "Yes" etc.
Wherein, the acquisition process of the entity description feature vector, comprising:
Step 1: modeling to the word in description document, to the word vectors in document, document moment of a vector is obtained
Battle array;
Step 2: carrying out Cluster merging according to term vector similarity, word frequency is counted, obtains document word distribution matrix;
Step 3: decomposing to document word distribution matrix, document subject matter distribution matrix and Topic word distribution are obtained
Matrix;
Step 4: determining each entity description association in conjunction with document subject matter distribution matrix and Topic word distribution matrix
Maximum keyword is spent, keyword vector matrix is obtained.
Step 5: obtaining entity and retouching keyword vector matrix conversion at most domain knowledge map triplet information space
State feature vector.
It is illustrated for ease of description, below combining above-mentioned five steps.
Wherein, obtaining message vector matrix can specifically: carries out word segmentation processing to description document, obtains document word collection;
Document word collection is exported into word vectors model, obtains message vector matrix.
When being modeled to the word in description document, vectorization can be carried out to the word in description document, obtain text
Shelves word distribution matrix;Document word distribution matrix is input to progress document word matrix in document subject matter generation model to build
Mould obtains document subject matter distribution matrix and subject key words distribution matrix.Wherein, using such as this kind of word of word2vec model
Language vectorization model models the word of description document.Word2vec model is a kind of effective prediction model
(Predictive models), there are two versions for tool: Continuous Bag-of-Words model (CBOW) and Skip-Gram
Model.Specifically, word segmentation processing can be carried out to description document, document word collection is obtained;Document word collection is exported into word vectors
Change model, obtains document word vectors collection.By taking Word2vec model as an example, i.e., using document word collection as input, utilize
Word2vec model is trained, and each this of output is corresponded to out term vector and is added to document word vectors concentration.Obtain document
After word vectors collection, same or similar word is merged by Unsupervised clustering, and counts the word frequency after merging, it is raw
At document word distribution matrix.Wherein, it carries out Cluster merging to be clustered according to vector similarity, such as based between vector
Similarity distance carries out Cluster merging division.Document word distribution matrix is input to such as LDA model (Latent Dirichlet
Allocation, a kind of document subject matter generate model), the document subject matter of NMF model generate and modeled in model, obtain document
Theme distribution matrix and subject key words distribution matrix.Wherein, NMF (non-negative matrix factorization,
For Non-negative Matrix Factorization) model be a kind of matrix disassembling method.
Below by taking LDA model as an example, modeling process is described in detail.
The probability of word in a document is calculated using P (w | d)=P (w | t) * P (t | d), wherein w is in every document
Word, a shared n, t is the theme, and a shared k, d is entity description document, and one is m shared;P (w | d) it is that word is describing
Probability in document, the probability of word under P (w | the t) t that is the theme, P (t | d) are the probability of theme t in document.Referring to FIG. 2, figure
2 be the schematic diagram of document theme distribution matrix and Topic word distribution matrix in the embodiment of the present invention.Wherein, the matrix of m row n column
For document word distribution matrix, the probability distribution of word under all themes is combined into theme-word probability square of k row n column
Battle array, i.e. Topic word distribution matrix;The theme probabilistic combination of all documents becomes document-theme probability matrix of m row k column, i.e.,
Document subject matter distribution matrix.
LDA model loss function:Wherein, vI, jFor document word
Word W in vector setjIn entity HiEntity description frequency,For entity description theme vector,For corresponding word
Theme distribution vector, works as LlossWhen minimum, LDA model performance is optimal, and exports document subject matter distribution matrix and Topic word
Distribution matrix.
Then, by searching for document subject matter distribution matrix and Topic word distribution matrix, determine that each entity description closes
The quantity of the maximum keyword of connection degree, keyword can be indicated with e, obtain keyword vector matrix.Then by keyword moment of a vector
Battle array conversion at most domain knowledge map triplet information space, can obtain entity description feature vector.Obtain triple knot
It is indicated in structure about the vector of relationship.Specifically, can be mapped by deep learning method, obtain entity description feature to
Amount.Keyword matrix is input in neural network and is mapped, entity description feature vector is obtained.
After obtaining entity description feature vector, the operation of step S103 can be executed.
S103, entity description feature vector is carried out being added processing with triple entity structure vector, obtains initial ternary
Group structure vector.
Triple structure is specially " entity-relationship-entity ", is getting triple entity structure vector, and with three
It, can be by triple entity structure vector and entity description after the corresponding entity description feature vector of tuple entity structure vector
Feature vector is added, to obtain initial triple structure vector, i.e., initial triple structure.
Specifically, entity description feature vector is dissolved into triple entity structure vector using convolutional neural networks
In, obtain initial triple structure vector.It can be found in following fusion formula and carry out Vector Fusion:Wherein,
H, t are triple entity structure vector, h~, t~It is the entity description feature vector of neural network output.To convolutional Neural net
When network adjusting parameter, in order to reduce the complexity of system and algorithm, preferably joined using stochastic gradient descent method adjustment network
Number.
After obtaining initial triple structure vector, the operation of step S105 can be executed.
S104, screening acquisition target triple vector is carried out to initial triple structure vector, and utilize target triple
Vector generates knowledge mapping.
Initial triple structure vector is screened, i.e., by reliability in initial triple structure vector is poor or mistake
Triple vector deleted.Specifically, screening using default reliability assessment function to fusion structure, mesh is obtained
Mark triple vector.Wherein presetting reliability assessment function can specifically:
Fwhole=∑(h, r, t) ∈ T∑(h ', r, t ') ∈ T 'Max (f (h, r, t)-f (h ', r, t ')+α, 0)+Lloss;Wherein,Wherein f (h, r, t) is correct triple vector, and f (h ', r, t ') is wrong
Triple vector;T ' is triple error sample collection, T '={ (h ', r, t) | h ' ∈ H } ∪ (h, r, t ') | t ' ∈ T };α is super
Parameter, perseverance are greater than 0.Correct triple vector directly can be determined as target triple vector, be then based on the target ternary
Group vector generates knowledge mapping.It, can be referring specifically to knowledge mapping on how to generate knowledge mapping based on target triple vector
Composed structure, this is no longer going to repeat them.
In addition, if obtaining triple entity structure vector is obtained from existing knowledge mapping in S101 step,
Existing knowledge mapping can be expanded and be corrected based on target triple at this time, specifically, i.e. by the entity relationship of mistake into
Row replacement, or supplement the relationship between new entity and entity, i.e. extension knowledge mapping.
Obtain it is larger, relationship it is more accurate knowledge topology after, can be applied to such as search with recommend field, ask
Answer system (such as customer service robot, Personal Assistant, essence are the extensions of search with recommendation).In semantic search in application, being based on
The search of knowledge mapping is different from conventional search, and conventional search is that corresponding collections of web pages is found according to keyword, then
It goes to carry out ranking to the webpage in collections of web pages by page rank scheduling algorithm, then shows user;Knowledge based map
Search is to traverse knowledge in existing map knowledge base, and the knowledge inquired is then returned to user, if usually path
Correctly, the knowledge checked out only has 1 or several, quite precisely.Question answering system is in application, system equally can be first in knowledge
The problem of being proposed to user using natural language with the help of map carries out semantic analysis and syntactic analysis, and then converts it into
The query statement of structured form, then inquires answer in knowledge mapping.
Using method provided by the embodiment of the present invention, triple entity structure vector and ternary group object knot are obtained
The description document of each entity in structure vector;Participle statistical disposition is carried out to description document, obtains document word distribution matrix, and
Using document word distribution matrix, entity description feature vector is obtained;To entity description feature vector and triple entity structure
Vector carries out addition processing, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target
Triple vector, and knowledge mapping is generated using target triple vector.
Often it is related to describing the entity with another entity in the description document of an entity, that is,
It says, it is for statistical analysis by the description document to entity, it can obtain the relationship of an entity and another entity.It is based on
This can obtain the description document of each entity in triple entity structure vector when getting triple entity structure vector.
Then, participle is carried out to description document and counts processing, document word distribution matrix can be obtained.Utilize document word moment of distribution
Battle array obtains entity description feature vector.Entity description feature vector is carried out being added processing with triple entity structure vector, is obtained
To initial triple structure vector.Initial triple structure vector is screened, can be obtained for generating knowledge mapping
Target triple vector.Knowledge mapping is generated using target triple vector.This method is analyzed real using computer technology
The description document of body obtains the relationship between entity and entity, can get more and accurate triple, so can structure
Build out extensive and accurate knowledge mapping, further can based on the knowledge mapping promoted knowledge based map knowledge recommendation,
The performance of question answering system and retrieval application.
Embodiment two:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of knowledge mapping generating means, hereafter
The knowledge mapping generating means of description can correspond to each other reference with above-described knowledge mapping generation method.
Shown in Figure 3, which comprises the following modules:
Description information obtains module 101, for obtaining triple entity structure vector and triple entity structure vector
In each entity description document;
Entity description feature vector obtains module 102, for carrying out participle statistical disposition to description document, obtains document word
Language distribution matrix, and document word distribution matrix is utilized, obtain entity description feature vector;
Vector Fusion module 103, for carrying out being added place with triple entity structure vector to entity description feature vector
Reason, obtains initial triple structure vector;
Knowledge mapping generation module 104, for initial triple structure vector carry out screening obtain target triple to
Amount, and knowledge mapping is generated using target triple vector.
Using device provided by the embodiment of the present invention, triple entity structure vector and ternary group object knot are obtained
The description document of each entity in structure vector;Participle statistical disposition is carried out to description document, obtains document word distribution matrix, and
Using document word distribution matrix, entity description feature vector is obtained;To entity description feature vector and triple entity structure
Vector carries out addition processing, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target
Triple vector, and knowledge mapping is generated using target triple vector.
Often it is related to describing the entity with another entity in the description document of an entity, that is,
It says, it is for statistical analysis by the description document to entity, it can obtain the relationship of an entity and another entity.It is based on
This can obtain the description document of each entity in triple entity structure vector when getting triple entity structure vector.
Then, participle is carried out to description document and counts processing, document word distribution matrix can be obtained.Utilize document word moment of distribution
Battle array obtains entity description feature vector.Entity description feature vector is carried out being added processing with triple entity structure vector, is obtained
To initial triple structure vector.Initial triple structure vector is screened, can be obtained for generating knowledge mapping
Target triple vector.Knowledge mapping is generated using target triple vector.This method is analyzed real using computer technology
The description document of body obtains the relationship between entity and entity, can get more and accurate triple, so can structure
Build out extensive and accurate knowledge mapping, further can based on the knowledge mapping promoted knowledge based map knowledge recommendation,
The performance of question answering system and retrieval application.
In a kind of specific embodiment of the invention, entity description feature vector obtains module 102, comprising:
Document word distribution matrix acquiring unit, for being modeled to the word in description document, to the word in document
Language vectorization obtains message vector matrix;Cluster merging is carried out according to term vector similarity, counts word frequency, obtains document word
Distribution matrix;
Entity description feature vector acquiring unit obtains document subject matter for decomposing to document word distribution matrix
Distribution matrix and Topic word distribution matrix;In conjunction with document subject matter distribution matrix and Topic word distribution matrix, determine each
The maximum keyword of the entity description degree of association obtains keyword vector matrix;Keyword vector matrix is converted to multi-field and is known
Know map triplet information space, obtains entity description feature vector.
In a kind of specific embodiment of the invention, document word distribution matrix acquiring unit is specifically used for description
Document carries out word segmentation processing, obtains document word collection;Document word collection is exported into word vectors model, obtains document moment of a vector
Battle array.
In a kind of specific embodiment of the invention, entity description feature vector acquiring unit is specifically used for document
Word distribution matrix is input to document subject matter and generates progress document word matrix modeling in model, obtains document subject matter distribution matrix
With Topic word distribution matrix.
In a kind of specific embodiment of the invention, entity description feature vector acquiring unit, being specifically used for will be crucial
Word Input matrix is mapped into neural network, obtains entity description feature vector.
In a kind of specific embodiment of the invention, knowledge mapping generation module 104 is specifically used for using default reliable
Property valuation functions initial triple structure vector is screened, obtain target triple vector.
Embodiment three:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of knowledge mapping generating devices, hereafter
A kind of knowledge mapping generating device of description can correspond to each other reference with a kind of above-described knowledge mapping generation method.
Shown in Figure 4, which includes:
Memory D1, for storing computer program;
Processor D2 realizes the step of the knowledge mapping generation method of above method embodiment when for executing computer program
Suddenly.
Specifically, referring to FIG. 5, Fig. 5 is that a kind of specific structure of knowledge mapping generating device provided in this embodiment shows
It is intended to, it may include one or one which, which can generate bigger difference because configuration or performance are different,
It a above processor (central processing units, CPU) 322 (for example, one or more processors) and deposits
Reservoir 332, one or more storage application programs 342 or data 344 storage medium 330 (such as one or one with
Upper mass memory unit).Wherein, memory 332 and storage medium 330 can be of short duration storage or persistent storage.It is stored in
The program of storage media 330 may include one or more modules (diagram does not mark), and each module may include to data
Series of instructions operation in processing equipment.Further, central processing unit 322 can be set to logical with storage medium 330
Letter executes the series of instructions operation in storage medium 330 in knowledge mapping generating device 301.
Knowledge mapping generating device 301 can also include one or more power supplys 326, one or more are wired
Or radio network interface 350, one or more input/output interfaces 358, and/or, one or more operating systems
341.For example, WindowS ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in knowledge mapping generation method as described above can be realized by the structure of knowledge mapping generating device.
Example IV:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of readable storage medium storing program for executing, are described below
A kind of readable storage medium storing program for executing can correspond to each other reference with a kind of above-described knowledge mapping generation method.
A kind of readable storage medium storing program for executing is stored with computer program on readable storage medium storing program for executing, and computer program is held by processor
The step of knowledge mapping generation method of above method embodiment is realized when row.
The readable storage medium storing program for executing be specifically as follows USB flash disk, mobile hard disk, read-only memory (Read-Only Memory,
ROM), the various program storage generations such as random access memory (Random Access Memory, RAM), magnetic or disk
The readable storage medium storing program for executing of code.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
Claims (10)
1. a kind of knowledge mapping generation method characterized by comprising
Obtain the description document of each entity in triple entity structure vector and the triple entity structure vector;
Participle statistical disposition is carried out to the description document, obtains document word distribution matrix, and utilize the document word point
Cloth matrix obtains entity description feature vector;
The entity description feature vector is carried out being added processing with the triple entity structure vector, obtains initial triple
Structure vector;
To the initial triple structure vector carry out screening obtain target triple vector, and using the target triple to
Amount generates knowledge mapping.
2. knowledge mapping generation method according to claim 1, which is characterized in that carry out participle system to the description document
Meter processing, obtains document word distribution matrix, comprising:
Word in the description document is modeled, to the word vectors in document, obtains message vector matrix;
Cluster merging is carried out according to term vector similarity, word frequency is counted, obtains the document word distribution matrix.
3. knowledge mapping generation method according to claim 2, which is characterized in that it is described description document in word into
Row modeling obtains message vector matrix to the word vectors in document, comprising:
Word segmentation processing is carried out to the description document, obtains document word collection;
The document word collection is exported into word vectors model, obtains the message vector matrix.
4. knowledge mapping generation method according to claim 1, which is characterized in that described to be distributed using the document word
Matrix obtains entity description feature vector, comprising:
The document word distribution matrix is decomposed, document subject matter distribution matrix and Topic word distribution matrix are obtained;
In conjunction with the document subject matter distribution matrix and the Topic word distribution matrix, each entity description association is determined
Maximum keyword is spent, keyword vector matrix is obtained;
By keyword vector matrix conversion at most domain knowledge map triplet information space, it is special to obtain the entity description
Levy vector.
5. knowledge mapping generation method according to claim 4, which is characterized in that described to the document word moment of distribution
Battle array is decomposed, and document subject matter distribution matrix and Topic word distribution matrix are obtained, comprising:
The document word distribution matrix is input to document subject matter and generates progress document word matrix modeling in model, obtains institute
State document subject matter distribution matrix and the Topic word distribution matrix.
6. knowledge mapping generation method according to claim 4, which is characterized in that convert the keyword vector matrix
At most domain knowledge map triplet information space obtains entity description feature vector, comprising:
The keyword matrix is input in neural network and is mapped, the entity description feature vector is obtained.
7. knowledge mapping generation method according to any one of claims 1 to 6, which is characterized in that described to initial ternary
Group structure vector carries out screening and obtains target triple vector, comprising:
The initial triple structure vector is screened using default reliability assessment function, obtain target triple to
Amount.
8. a kind of knowledge mapping generating means characterized by comprising
Description information obtains module, for obtaining in triple entity structure vector and the triple entity structure vector
The description document of each entity;
Entity description feature vector obtains module, for carrying out participle statistical disposition to the description document, obtains document word
Distribution matrix, and the document word distribution matrix is utilized, obtain entity description feature vector;
Vector Fusion module, for carrying out being added place with the triple entity structure vector to the entity description feature vector
Reason, obtains initial triple structure vector;
Knowledge mapping generation module obtains target triple vector for carrying out screening to the initial triple structure vector,
And knowledge mapping is generated using the target triple vector.
9. a kind of knowledge mapping generating device characterized by comprising
Memory, for storing computer program;
Processor realizes the knowledge mapping generation side as described in any one of claim 1 to 7 when for executing the computer program
The step of method.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with computer program, the meter on the readable storage medium storing program for executing
It is realized when calculation machine program is executed by processor as described in any one of claim 1 to 7 the step of knowledge mapping generation method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910292766.4A CN109992673A (en) | 2019-04-10 | 2019-04-10 | A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910292766.4A CN109992673A (en) | 2019-04-10 | 2019-04-10 | A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109992673A true CN109992673A (en) | 2019-07-09 |
Family
ID=67133594
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910292766.4A Pending CN109992673A (en) | 2019-04-10 | 2019-04-10 | A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992673A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674307A (en) * | 2019-08-21 | 2020-01-10 | 北京邮电大学 | Knowledge deduction method and system for knowledge center network |
CN110825822A (en) * | 2019-09-30 | 2020-02-21 | 深圳云天励飞技术有限公司 | Personnel relationship query method and device, electronic equipment and storage medium |
CN111026875A (en) * | 2019-11-26 | 2020-04-17 | 中国人民大学 | Knowledge graph complementing method based on entity description and relation path |
CN111159431A (en) * | 2019-12-30 | 2020-05-15 | 深圳Tcl新技术有限公司 | Knowledge graph-based information visualization method, device, equipment and storage medium |
CN111325033A (en) * | 2020-03-20 | 2020-06-23 | 中国建设银行股份有限公司 | Entity identification method, entity identification device, electronic equipment and computer readable storage medium |
CN111353106A (en) * | 2020-02-26 | 2020-06-30 | 贝壳技术有限公司 | Recommendation method and device, electronic equipment and storage medium |
CN111472754A (en) * | 2019-12-23 | 2020-07-31 | 北京国双科技有限公司 | Fault processing method and device for oil pumping well, storage medium and electronic equipment |
CN113487143A (en) * | 2021-06-15 | 2021-10-08 | 中国农业大学 | Fish shoal feeding decision method and device, electronic equipment and storage medium |
CN113569050A (en) * | 2021-09-24 | 2021-10-29 | 湖南大学 | Method and device for automatically constructing government affair field knowledge map based on deep learning |
WO2022205833A1 (en) * | 2021-03-29 | 2022-10-06 | 网络通信与安全紫金山实验室 | Method and system for constructing and analyzing knowledge graph of wireless network protocol, and device and medium |
WO2022222226A1 (en) * | 2021-04-19 | 2022-10-27 | 平安科技(深圳)有限公司 | Structured-information-based relation alignment method and apparatus, and device and medium |
CN116091120A (en) * | 2023-04-11 | 2023-05-09 | 北京智蚁杨帆科技有限公司 | Full stack type electricity price consulting and managing system based on knowledge graph technology |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106126734A (en) * | 2016-07-04 | 2016-11-16 | 北京奇艺世纪科技有限公司 | The sorting technique of document and device |
CN109299284A (en) * | 2018-08-31 | 2019-02-01 | 中国地质大学(武汉) | A kind of knowledge mapping expression learning method based on structural information and text description |
CN109522416A (en) * | 2018-10-19 | 2019-03-26 | 广东工业大学 | A kind of construction method of Financial Risk Control knowledge mapping |
-
2019
- 2019-04-10 CN CN201910292766.4A patent/CN109992673A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106126734A (en) * | 2016-07-04 | 2016-11-16 | 北京奇艺世纪科技有限公司 | The sorting technique of document and device |
CN109299284A (en) * | 2018-08-31 | 2019-02-01 | 中国地质大学(武汉) | A kind of knowledge mapping expression learning method based on structural information and text description |
CN109522416A (en) * | 2018-10-19 | 2019-03-26 | 广东工业大学 | A kind of construction method of Financial Risk Control knowledge mapping |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674307A (en) * | 2019-08-21 | 2020-01-10 | 北京邮电大学 | Knowledge deduction method and system for knowledge center network |
CN110825822B (en) * | 2019-09-30 | 2022-11-22 | 深圳云天励飞技术有限公司 | Personnel relationship query method and device, electronic equipment and storage medium |
CN110825822A (en) * | 2019-09-30 | 2020-02-21 | 深圳云天励飞技术有限公司 | Personnel relationship query method and device, electronic equipment and storage medium |
CN111026875A (en) * | 2019-11-26 | 2020-04-17 | 中国人民大学 | Knowledge graph complementing method based on entity description and relation path |
CN111472754A (en) * | 2019-12-23 | 2020-07-31 | 北京国双科技有限公司 | Fault processing method and device for oil pumping well, storage medium and electronic equipment |
CN111159431A (en) * | 2019-12-30 | 2020-05-15 | 深圳Tcl新技术有限公司 | Knowledge graph-based information visualization method, device, equipment and storage medium |
CN111353106A (en) * | 2020-02-26 | 2020-06-30 | 贝壳技术有限公司 | Recommendation method and device, electronic equipment and storage medium |
CN111325033A (en) * | 2020-03-20 | 2020-06-23 | 中国建设银行股份有限公司 | Entity identification method, entity identification device, electronic equipment and computer readable storage medium |
CN111325033B (en) * | 2020-03-20 | 2023-07-11 | 中国建设银行股份有限公司 | Entity identification method, entity identification device, electronic equipment and computer readable storage medium |
WO2022205833A1 (en) * | 2021-03-29 | 2022-10-06 | 网络通信与安全紫金山实验室 | Method and system for constructing and analyzing knowledge graph of wireless network protocol, and device and medium |
WO2022222226A1 (en) * | 2021-04-19 | 2022-10-27 | 平安科技(深圳)有限公司 | Structured-information-based relation alignment method and apparatus, and device and medium |
CN113487143A (en) * | 2021-06-15 | 2021-10-08 | 中国农业大学 | Fish shoal feeding decision method and device, electronic equipment and storage medium |
CN113569050A (en) * | 2021-09-24 | 2021-10-29 | 湖南大学 | Method and device for automatically constructing government affair field knowledge map based on deep learning |
CN113569050B (en) * | 2021-09-24 | 2021-12-07 | 湖南大学 | Method and device for automatically constructing government affair field knowledge map based on deep learning |
CN116091120A (en) * | 2023-04-11 | 2023-05-09 | 北京智蚁杨帆科技有限公司 | Full stack type electricity price consulting and managing system based on knowledge graph technology |
CN116091120B (en) * | 2023-04-11 | 2023-06-23 | 北京智蚁杨帆科技有限公司 | Full stack type electricity price consulting and managing system based on knowledge graph technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992673A (en) | A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing | |
CN111078836B (en) | Machine reading understanding method, system and device based on external knowledge enhancement | |
CN111125365B (en) | Address data labeling method and device, electronic equipment and storage medium | |
Do et al. | Multiview deep learning for predicting twitter users' location | |
Lin et al. | Multi-modal contrastive representation learning for entity alignment | |
CN106940726B (en) | Creative automatic generation method and terminal based on knowledge network | |
CN112417289B (en) | Information intelligent recommendation method based on deep clustering | |
CN107436942A (en) | Word embedding grammar, system, terminal device and storage medium based on social media | |
CN114238653B (en) | Method for constructing programming education knowledge graph, completing and intelligently asking and answering | |
CN110046981A (en) | A kind of credit estimation method, device and storage medium | |
CN112115971B (en) | Method and system for carrying out student portrait based on heterogeneous academic network | |
CN111930936A (en) | Method and system for excavating platform message text | |
CN110929532A (en) | Data processing method, device, equipment and storage medium | |
CN113901224A (en) | Knowledge distillation-based secret-related text recognition model training method, system and device | |
CN116756347B (en) | Semantic information retrieval method based on big data | |
Kovács et al. | Conceptualization with incremental bron-kerbosch algorithm in big data architecture | |
Xiao et al. | Web services clustering based on HDP and SOM neural network | |
CN110083828A (en) | A kind of Text Clustering Method and device | |
CN111782964B (en) | Recommendation method of community posts | |
KR102454261B1 (en) | Collaborative partner recommendation system and method based on user information | |
CN113535945B (en) | Text category recognition method, device, equipment and computer readable storage medium | |
CN117651950A (en) | Interpreted natural language artifact recombination with context awareness | |
CN114661616A (en) | Target code generation method and device | |
CN113536772A (en) | Text processing method, device, equipment and storage medium | |
CN113987126A (en) | Retrieval method and device based on knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190709 |