CN109992673A - A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing - Google Patents

A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing Download PDF

Info

Publication number
CN109992673A
CN109992673A CN201910292766.4A CN201910292766A CN109992673A CN 109992673 A CN109992673 A CN 109992673A CN 201910292766 A CN201910292766 A CN 201910292766A CN 109992673 A CN109992673 A CN 109992673A
Authority
CN
China
Prior art keywords
vector
document
entity
triple
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910292766.4A
Other languages
Chinese (zh)
Inventor
程良伦
邓健峰
张凡龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910292766.4A priority Critical patent/CN109992673A/en
Publication of CN109992673A publication Critical patent/CN109992673A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of knowledge mapping generation methods, this method comprises: obtaining the description document of each entity in triple entity structure vector and triple entity structure vector;Participle statistical disposition is carried out to description document, document word distribution matrix is obtained, and utilize document word distribution matrix, obtains entity description feature vector;Entity description feature vector is carried out being added processing with triple entity structure vector, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target triple vector, and generates knowledge mapping using target triple vector.The relationship between entity and entity is obtained, more and accurate triple is can get, can so construct extensive and accurate knowledge mapping using the description document of computer technology analysis entities using this method.The invention also discloses a kind of knowledge mapping generating means, equipment and readable storage medium storing program for executing, have corresponding technical effect.

Description

A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing
Technical field
The present invention relates to knowledge base applied technical fields, more particularly to a kind of knowledge mapping generation method, device, equipment And readable storage medium storing program for executing.
Background technique
Knowledge base is by the knowledge system of the structure of knowledge, is to push Artificial Intelligence Development and support intelligent information application The important technology of (such as precisely recommending, intelligent search).The purpose of knowledge base is built mainly from the internet information of magnanimity Structured information is obtained, the related applications such as knowledge reasoning can be completed.The representation of knowledge is the basis of knowledge acquisition and application, because This research representation of knowledge becomes particularly significant.
Currently, expressing for knowledge method is defined as to the triple of " entity-relationship-entity " form, a large amount of triple Constitute knowledge mapping, that is, knowledge network.But due to existing knowledge base be mostly towards specific area, limited coverage area, Entity relationship Sparse and integrality is not high, Entity Semantics or relationship calculate that accuracy rate is undesirable, lead to the knowledge graph of creation Compose small scale, incomplete problem.
In conclusion the problems such as how efficiently solving the creation of knowledge mapping, is that current those skilled in the art are badly in need of The technical issues of solution.
Summary of the invention
The object of the present invention is to provide a kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing, in conjunction with reality The description document of body obtains more accurate triple structure, and based on more accurate triple Structure Creating go out it is extensive and Perfect knowledge mapping.
In order to solve the above technical problems, the invention provides the following technical scheme:
A kind of knowledge mapping generation method, comprising:
Obtain the description text of each entity in triple entity structure vector and the triple entity structure vector Shelves;
Participle statistical disposition is carried out to the description document, obtains document word distribution matrix, and utilize the document word Language distribution matrix obtains entity description feature vector;
The entity description feature vector is carried out to be added processing with the triple entity structure vector, obtains initial three Tuple structure vector;
Screening is carried out to the initial triple structure vector and obtains target triple vector, and utilizes the target ternary Group vector generates knowledge mapping.
Preferably, participle statistical disposition is carried out to the description document, obtains document word distribution matrix, comprising:
Word in the description document is modeled, to the word vectors in document, obtains message vector matrix;
Cluster merging is carried out according to term vector similarity, word frequency is counted, obtains the document word distribution matrix.
Preferably, the word in the description document is modeled, to the word vectors in document, obtain document to Moment matrix, comprising:
Word segmentation processing is carried out to the description document, obtains document word collection;
The document word collection is exported into word vectors model, obtains the message vector matrix.
Preferably, described to utilize the document word distribution matrix, obtain entity description feature vector, comprising:
The document word distribution matrix is decomposed, document subject matter distribution matrix and Topic word moment of distribution are obtained Battle array;
In conjunction with the document subject matter distribution matrix and the Topic word distribution matrix, each entity description is determined The maximum keyword of the degree of association obtains keyword vector matrix;
By keyword vector matrix conversion at most domain knowledge map triplet information space, obtains the entity and retouch State feature vector.
Preferably, described that the document word distribution matrix is decomposed, obtain document subject matter distribution matrix and theme Word distribution matrix, comprising:
The document word distribution matrix is input to document subject matter and generates progress document word matrix modeling in model, is obtained Obtain the document subject matter distribution matrix and the Topic word distribution matrix.
Preferably, it by keyword vector matrix conversion at most domain knowledge map triplet information space, obtains real Body Expressive Features vector, comprising:
The keyword matrix is input in neural network and is mapped, the entity description feature vector is obtained.
It is preferably, described that screening acquisition target triple vector is carried out to initial triple structure vector, comprising:
The initial triple structure vector is screened using default reliability assessment function, obtains target triple Vector.
A kind of knowledge mapping generating means, comprising:
Description information obtain module, for obtain triple entity structure vector and the triple entity structure to The description document of each entity in amount;
Entity description feature vector obtains module, for carrying out participle statistical disposition to the description document, obtains document Word distribution matrix, and the document word distribution matrix is utilized, obtain entity description feature vector;
Vector Fusion module, for carrying out phase to the entity description feature vector and the triple entity structure vector Add processing, obtains initial triple structure vector;
Knowledge mapping generation module, for the initial triple structure vector carry out screening obtain target triple to Amount, and knowledge mapping is generated using the target triple vector.
A kind of knowledge mapping generating device, comprising:
Memory, for storing computer program;
Processor, the step of above-mentioned knowledge mapping generation method is realized when for executing the computer program.
A kind of readable storage medium storing program for executing is stored with computer program, the computer program quilt on the readable storage medium storing program for executing The step of processor realizes above-mentioned knowledge mapping generation method when executing.
Using method provided by the embodiment of the present invention, triple entity structure vector and ternary group object knot are obtained The description document of each entity in structure vector;Participle statistical disposition is carried out to description document, obtains document word distribution matrix, and Using document word distribution matrix, entity description feature vector is obtained;To entity description feature vector and triple entity structure Vector carries out addition processing, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target Triple vector, and knowledge mapping is generated using target triple vector.
Often it is related to describing the entity with another entity in the description document of an entity, that is, It says, it is for statistical analysis by the description document to entity, it can obtain the relationship of an entity and another entity.It is based on This can obtain the description document of each entity in triple entity structure vector when getting triple entity structure vector. Then, participle is carried out to description document and counts processing, document word distribution matrix can be obtained.Utilize document word moment of distribution Battle array obtains entity description feature vector.Entity description feature vector is carried out being added processing with triple entity structure vector, is obtained To initial triple structure vector.Initial triple structure vector is screened, can be obtained for generating knowledge mapping Target triple vector.Knowledge mapping is generated using target triple vector.This method is analyzed real using computer technology The description document of body obtains the relationship between entity and entity, can get more and accurate triple, so can structure Build out extensive and accurate knowledge mapping, further can based on the knowledge mapping promoted knowledge based map knowledge recommendation, The performance of question answering system and retrieval application.
Correspondingly, the embodiment of the invention also provides knowledge mapping corresponding with above-mentioned knowledge mapping generation method generations Device, equipment and readable storage medium storing program for executing, have above-mentioned technique effect, and details are not described herein.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of implementation flow chart of knowledge mapping generation method in the embodiment of the present invention;
Fig. 2 is the schematic diagram of document theme distribution matrix and Topic word distribution matrix in the embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of knowledge mapping generating means in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of knowledge mapping generating device in the embodiment of the present invention;
Fig. 5 is a kind of concrete structure schematic diagram of knowledge mapping generating device in the embodiment of the present invention.
Specific embodiment
It when constructing knowledge mapping, needs to obtain a large amount of triple, knowledge mapping is then generated based on triple again.Mesh Before, the mode for obtaining triple has following Three models:
Unsupervised mode needs related knowledge domain expertise hand-coding rule/mode.Such as with " A work For B " describes employer-employee relationship, such rule/mode application is then entered sentence, to excavate specific triple.
Semi-supervised mode: manually kind of a sub-instance (Seed instances) is provided.Such as " (John, HuaWei), (Alice, Apple) ".Then machine is given, mode (Pattern) --- " A included in this kind of kind of sub-instance out is learnt Seed reality is added followed by the new example for meeting the mode of the mode excavation, then by these new examples in work for B " In example.As it can be seen that the above process is the process of a bootstrap (guidance).In this process, it may be incorporated into artificial mutual It is dynamic.Such as to the mode that machine learning is arrived, artificial screening can be carried out.The triple example newly learnt can be marked just Negative example.
Unsupervised mode: the verb of certain syntax rule will be met in sentence as relationship, by the noun of the verb or so As entity.
Wherein, unsupervised mode needs related knowledge domain expertise hand-coding rule/mode, for creating big rule For the knowledge mapping of mould, take a long time;Semi-supervised model is also required to manually provide kind of a sub-instance, it is desirable that plants the quantity of sub-instance It is enough to obtain the other triple of more relation objects;Unsupervised mode is directly using corresponding verb as relationship, and the three of acquisition Tuple accuracy rate is lower.
To solve the above-mentioned problems, the invention proposes a kind of knowledge mapping generation methods.The description information of one entity Often it is related to another entity and the relationship between them, the description document of entity is carried out going deep into excavation, it can be true A large amount of and accurately and effectively triple is made, larger, the more accurate knowledge mapping of relationship description is produced.For example, right It can in the description document of " traditional Chinese Painting " this entity, the entity specifically: traditional Chinese Painting generally refers to be drawn in thin,tough silk, rice paper, on silks and adds With the scroll painting of mounting;Traditional Chinese Painting is the traditional drawing form of China, is to dip in water, ink, coloured silk with writing brush to be drawn on thin,tough silk or paper;Tool Have writing brush, ink, Chinese painting color, rice paper, thin,tough silk etc. with material, subject matter can divide personage, scenery with hills and waters, birds and flowers etc., skill and technique can divide tool as with write Meaning.For " rice paper " this entity, the description document of the entity can specifically: and rice paper is the classic painting paper of Chinese tradition, It is one of Han nationality's traditional paper-making process;Rice paper " starts from the Tang Dynasty, originates in Jingxian County ", administers because the Tang Dynasty Jingxian County is subordinate to Xuanzhou mansion, therefore because Ground is gained the name rice paper, has more than 1500 years history so far;It is easy to save since rice paper has, it is prolonged not crisp, the features such as will not fading, therefore There is the reputation of " paper Shou Qiannian ".By can be obtained in the description document of manual read or the two entities of traditional Chinese Painting and rice paper of reading, There are bearing relations for traditional Chinese Painting and rice paper, i.e. " traditional Chinese Painting-carrier-rice paper ".Knowledge mapping creation side provided by the embodiment of the present invention Method, it is contemplated that there are the relationship descriptions between entity and entity in entity description document, are carried out at analysis using computer technology Reason obtains the machine recognizable knowledge mapping being made of triple.This method by the description document of entity obtain entity with Relationship between entity, can get more and accurate triple, can so construct extensive and accurate knowledge Map can further promote the knowledge recommendation of knowledge based map, the property of question answering system and retrieval application based on the knowledge mapping Energy.
The embodiment of the invention also provides knowledge mapping generating means corresponding with above-mentioned knowledge mapping generation method, set Standby and readable storage medium storing program for executing, has above-mentioned technique effect.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Embodiment one:
Referring to FIG. 1, Fig. 1 is a kind of flow chart of knowledge mapping generation method in the embodiment of the present invention, this method includes Following steps:
S101, the description text for obtaining each entity in triple entity structure vector and triple entity structure vector Shelves.
Triple entity structure vector can be obtained directly from existing knowledge map;It is obtained when it is of course also possible to extract entity ?.For the description of each entity in triple entity structure vector can be obtained convenient for determining the relationship between entity and entity Document.The quantity of the triple entity structure vector got can be for 1 or multiple, correspondingly, ternary group object The description document of entity in structure, can the corresponding description document of entity, multiple description documents can also be corresponded to.
S102, participle statistical disposition is carried out to description document, obtains document word distribution matrix, and utilize document word point Cloth matrix obtains entity description feature vector.
Preferably, in order to save memory space and improve search efficiency, such as removal can be also carried out for description document and is stopped Word, the pretreatment for removing punctuation mark.Stop words mainly includes English character, number, mathematical character, punctuation mark and use The extra-high Chinese word character etc. of frequency.Stop words can substantially divide using very extensive word;Word without substantive meaning, including the tone help Word, adverbial word, preposition, conjunction etc. usually itself have no specific meaning, and only putting it into a complete sentence just has Certain effect, as it is common " ", " ", "Yes" etc.
Wherein, the acquisition process of the entity description feature vector, comprising:
Step 1: modeling to the word in description document, to the word vectors in document, document moment of a vector is obtained Battle array;
Step 2: carrying out Cluster merging according to term vector similarity, word frequency is counted, obtains document word distribution matrix;
Step 3: decomposing to document word distribution matrix, document subject matter distribution matrix and Topic word distribution are obtained Matrix;
Step 4: determining each entity description association in conjunction with document subject matter distribution matrix and Topic word distribution matrix Maximum keyword is spent, keyword vector matrix is obtained.
Step 5: obtaining entity and retouching keyword vector matrix conversion at most domain knowledge map triplet information space State feature vector.
It is illustrated for ease of description, below combining above-mentioned five steps.
Wherein, obtaining message vector matrix can specifically: carries out word segmentation processing to description document, obtains document word collection; Document word collection is exported into word vectors model, obtains message vector matrix.
When being modeled to the word in description document, vectorization can be carried out to the word in description document, obtain text Shelves word distribution matrix;Document word distribution matrix is input to progress document word matrix in document subject matter generation model to build Mould obtains document subject matter distribution matrix and subject key words distribution matrix.Wherein, using such as this kind of word of word2vec model Language vectorization model models the word of description document.Word2vec model is a kind of effective prediction model (Predictive models), there are two versions for tool: Continuous Bag-of-Words model (CBOW) and Skip-Gram Model.Specifically, word segmentation processing can be carried out to description document, document word collection is obtained;Document word collection is exported into word vectors Change model, obtains document word vectors collection.By taking Word2vec model as an example, i.e., using document word collection as input, utilize Word2vec model is trained, and each this of output is corresponded to out term vector and is added to document word vectors concentration.Obtain document After word vectors collection, same or similar word is merged by Unsupervised clustering, and counts the word frequency after merging, it is raw At document word distribution matrix.Wherein, it carries out Cluster merging to be clustered according to vector similarity, such as based between vector Similarity distance carries out Cluster merging division.Document word distribution matrix is input to such as LDA model (Latent Dirichlet Allocation, a kind of document subject matter generate model), the document subject matter of NMF model generate and modeled in model, obtain document Theme distribution matrix and subject key words distribution matrix.Wherein, NMF (non-negative matrix factorization, For Non-negative Matrix Factorization) model be a kind of matrix disassembling method.
Below by taking LDA model as an example, modeling process is described in detail.
The probability of word in a document is calculated using P (w | d)=P (w | t) * P (t | d), wherein w is in every document Word, a shared n, t is the theme, and a shared k, d is entity description document, and one is m shared;P (w | d) it is that word is describing Probability in document, the probability of word under P (w | the t) t that is the theme, P (t | d) are the probability of theme t in document.Referring to FIG. 2, figure 2 be the schematic diagram of document theme distribution matrix and Topic word distribution matrix in the embodiment of the present invention.Wherein, the matrix of m row n column For document word distribution matrix, the probability distribution of word under all themes is combined into theme-word probability square of k row n column Battle array, i.e. Topic word distribution matrix;The theme probabilistic combination of all documents becomes document-theme probability matrix of m row k column, i.e., Document subject matter distribution matrix.
LDA model loss function:Wherein, vI, jFor document word Word W in vector setjIn entity HiEntity description frequency,For entity description theme vector,For corresponding word Theme distribution vector, works as LlossWhen minimum, LDA model performance is optimal, and exports document subject matter distribution matrix and Topic word Distribution matrix.
Then, by searching for document subject matter distribution matrix and Topic word distribution matrix, determine that each entity description closes The quantity of the maximum keyword of connection degree, keyword can be indicated with e, obtain keyword vector matrix.Then by keyword moment of a vector Battle array conversion at most domain knowledge map triplet information space, can obtain entity description feature vector.Obtain triple knot It is indicated in structure about the vector of relationship.Specifically, can be mapped by deep learning method, obtain entity description feature to Amount.Keyword matrix is input in neural network and is mapped, entity description feature vector is obtained.
After obtaining entity description feature vector, the operation of step S103 can be executed.
S103, entity description feature vector is carried out being added processing with triple entity structure vector, obtains initial ternary Group structure vector.
Triple structure is specially " entity-relationship-entity ", is getting triple entity structure vector, and with three It, can be by triple entity structure vector and entity description after the corresponding entity description feature vector of tuple entity structure vector Feature vector is added, to obtain initial triple structure vector, i.e., initial triple structure.
Specifically, entity description feature vector is dissolved into triple entity structure vector using convolutional neural networks In, obtain initial triple structure vector.It can be found in following fusion formula and carry out Vector Fusion:Wherein, H, t are triple entity structure vector, h~, t~It is the entity description feature vector of neural network output.To convolutional Neural net When network adjusting parameter, in order to reduce the complexity of system and algorithm, preferably joined using stochastic gradient descent method adjustment network Number.
After obtaining initial triple structure vector, the operation of step S105 can be executed.
S104, screening acquisition target triple vector is carried out to initial triple structure vector, and utilize target triple Vector generates knowledge mapping.
Initial triple structure vector is screened, i.e., by reliability in initial triple structure vector is poor or mistake Triple vector deleted.Specifically, screening using default reliability assessment function to fusion structure, mesh is obtained Mark triple vector.Wherein presetting reliability assessment function can specifically:
Fwhole=∑(h, r, t) ∈ T(h ', r, t ') ∈ T 'Max (f (h, r, t)-f (h ', r, t ')+α, 0)+Lloss;Wherein,Wherein f (h, r, t) is correct triple vector, and f (h ', r, t ') is wrong Triple vector;T ' is triple error sample collection, T '={ (h ', r, t) | h ' ∈ H } ∪ (h, r, t ') | t ' ∈ T };α is super Parameter, perseverance are greater than 0.Correct triple vector directly can be determined as target triple vector, be then based on the target ternary Group vector generates knowledge mapping.It, can be referring specifically to knowledge mapping on how to generate knowledge mapping based on target triple vector Composed structure, this is no longer going to repeat them.
In addition, if obtaining triple entity structure vector is obtained from existing knowledge mapping in S101 step, Existing knowledge mapping can be expanded and be corrected based on target triple at this time, specifically, i.e. by the entity relationship of mistake into Row replacement, or supplement the relationship between new entity and entity, i.e. extension knowledge mapping.
Obtain it is larger, relationship it is more accurate knowledge topology after, can be applied to such as search with recommend field, ask Answer system (such as customer service robot, Personal Assistant, essence are the extensions of search with recommendation).In semantic search in application, being based on The search of knowledge mapping is different from conventional search, and conventional search is that corresponding collections of web pages is found according to keyword, then It goes to carry out ranking to the webpage in collections of web pages by page rank scheduling algorithm, then shows user;Knowledge based map Search is to traverse knowledge in existing map knowledge base, and the knowledge inquired is then returned to user, if usually path Correctly, the knowledge checked out only has 1 or several, quite precisely.Question answering system is in application, system equally can be first in knowledge The problem of being proposed to user using natural language with the help of map carries out semantic analysis and syntactic analysis, and then converts it into The query statement of structured form, then inquires answer in knowledge mapping.
Using method provided by the embodiment of the present invention, triple entity structure vector and ternary group object knot are obtained The description document of each entity in structure vector;Participle statistical disposition is carried out to description document, obtains document word distribution matrix, and Using document word distribution matrix, entity description feature vector is obtained;To entity description feature vector and triple entity structure Vector carries out addition processing, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target Triple vector, and knowledge mapping is generated using target triple vector.
Often it is related to describing the entity with another entity in the description document of an entity, that is, It says, it is for statistical analysis by the description document to entity, it can obtain the relationship of an entity and another entity.It is based on This can obtain the description document of each entity in triple entity structure vector when getting triple entity structure vector. Then, participle is carried out to description document and counts processing, document word distribution matrix can be obtained.Utilize document word moment of distribution Battle array obtains entity description feature vector.Entity description feature vector is carried out being added processing with triple entity structure vector, is obtained To initial triple structure vector.Initial triple structure vector is screened, can be obtained for generating knowledge mapping Target triple vector.Knowledge mapping is generated using target triple vector.This method is analyzed real using computer technology The description document of body obtains the relationship between entity and entity, can get more and accurate triple, so can structure Build out extensive and accurate knowledge mapping, further can based on the knowledge mapping promoted knowledge based map knowledge recommendation, The performance of question answering system and retrieval application.
Embodiment two:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of knowledge mapping generating means, hereafter The knowledge mapping generating means of description can correspond to each other reference with above-described knowledge mapping generation method.
Shown in Figure 3, which comprises the following modules:
Description information obtains module 101, for obtaining triple entity structure vector and triple entity structure vector In each entity description document;
Entity description feature vector obtains module 102, for carrying out participle statistical disposition to description document, obtains document word Language distribution matrix, and document word distribution matrix is utilized, obtain entity description feature vector;
Vector Fusion module 103, for carrying out being added place with triple entity structure vector to entity description feature vector Reason, obtains initial triple structure vector;
Knowledge mapping generation module 104, for initial triple structure vector carry out screening obtain target triple to Amount, and knowledge mapping is generated using target triple vector.
Using device provided by the embodiment of the present invention, triple entity structure vector and ternary group object knot are obtained The description document of each entity in structure vector;Participle statistical disposition is carried out to description document, obtains document word distribution matrix, and Using document word distribution matrix, entity description feature vector is obtained;To entity description feature vector and triple entity structure Vector carries out addition processing, obtains initial triple structure vector;Screening is carried out to initial triple structure vector and obtains target Triple vector, and knowledge mapping is generated using target triple vector.
Often it is related to describing the entity with another entity in the description document of an entity, that is, It says, it is for statistical analysis by the description document to entity, it can obtain the relationship of an entity and another entity.It is based on This can obtain the description document of each entity in triple entity structure vector when getting triple entity structure vector. Then, participle is carried out to description document and counts processing, document word distribution matrix can be obtained.Utilize document word moment of distribution Battle array obtains entity description feature vector.Entity description feature vector is carried out being added processing with triple entity structure vector, is obtained To initial triple structure vector.Initial triple structure vector is screened, can be obtained for generating knowledge mapping Target triple vector.Knowledge mapping is generated using target triple vector.This method is analyzed real using computer technology The description document of body obtains the relationship between entity and entity, can get more and accurate triple, so can structure Build out extensive and accurate knowledge mapping, further can based on the knowledge mapping promoted knowledge based map knowledge recommendation, The performance of question answering system and retrieval application.
In a kind of specific embodiment of the invention, entity description feature vector obtains module 102, comprising:
Document word distribution matrix acquiring unit, for being modeled to the word in description document, to the word in document Language vectorization obtains message vector matrix;Cluster merging is carried out according to term vector similarity, counts word frequency, obtains document word Distribution matrix;
Entity description feature vector acquiring unit obtains document subject matter for decomposing to document word distribution matrix Distribution matrix and Topic word distribution matrix;In conjunction with document subject matter distribution matrix and Topic word distribution matrix, determine each The maximum keyword of the entity description degree of association obtains keyword vector matrix;Keyword vector matrix is converted to multi-field and is known Know map triplet information space, obtains entity description feature vector.
In a kind of specific embodiment of the invention, document word distribution matrix acquiring unit is specifically used for description Document carries out word segmentation processing, obtains document word collection;Document word collection is exported into word vectors model, obtains document moment of a vector Battle array.
In a kind of specific embodiment of the invention, entity description feature vector acquiring unit is specifically used for document Word distribution matrix is input to document subject matter and generates progress document word matrix modeling in model, obtains document subject matter distribution matrix With Topic word distribution matrix.
In a kind of specific embodiment of the invention, entity description feature vector acquiring unit, being specifically used for will be crucial Word Input matrix is mapped into neural network, obtains entity description feature vector.
In a kind of specific embodiment of the invention, knowledge mapping generation module 104 is specifically used for using default reliable Property valuation functions initial triple structure vector is screened, obtain target triple vector.
Embodiment three:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of knowledge mapping generating devices, hereafter A kind of knowledge mapping generating device of description can correspond to each other reference with a kind of above-described knowledge mapping generation method.
Shown in Figure 4, which includes:
Memory D1, for storing computer program;
Processor D2 realizes the step of the knowledge mapping generation method of above method embodiment when for executing computer program Suddenly.
Specifically, referring to FIG. 5, Fig. 5 is that a kind of specific structure of knowledge mapping generating device provided in this embodiment shows It is intended to, it may include one or one which, which can generate bigger difference because configuration or performance are different, It a above processor (central processing units, CPU) 322 (for example, one or more processors) and deposits Reservoir 332, one or more storage application programs 342 or data 344 storage medium 330 (such as one or one with Upper mass memory unit).Wherein, memory 332 and storage medium 330 can be of short duration storage or persistent storage.It is stored in The program of storage media 330 may include one or more modules (diagram does not mark), and each module may include to data Series of instructions operation in processing equipment.Further, central processing unit 322 can be set to logical with storage medium 330 Letter executes the series of instructions operation in storage medium 330 in knowledge mapping generating device 301.
Knowledge mapping generating device 301 can also include one or more power supplys 326, one or more are wired Or radio network interface 350, one or more input/output interfaces 358, and/or, one or more operating systems 341.For example, WindowS ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in knowledge mapping generation method as described above can be realized by the structure of knowledge mapping generating device.
Example IV:
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of readable storage medium storing program for executing, are described below A kind of readable storage medium storing program for executing can correspond to each other reference with a kind of above-described knowledge mapping generation method.
A kind of readable storage medium storing program for executing is stored with computer program on readable storage medium storing program for executing, and computer program is held by processor The step of knowledge mapping generation method of above method embodiment is realized when row.
The readable storage medium storing program for executing be specifically as follows USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), the various program storage generations such as random access memory (Random Access Memory, RAM), magnetic or disk The readable storage medium storing program for executing of code.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

Claims (10)

1. a kind of knowledge mapping generation method characterized by comprising
Obtain the description document of each entity in triple entity structure vector and the triple entity structure vector;
Participle statistical disposition is carried out to the description document, obtains document word distribution matrix, and utilize the document word point Cloth matrix obtains entity description feature vector;
The entity description feature vector is carried out being added processing with the triple entity structure vector, obtains initial triple Structure vector;
To the initial triple structure vector carry out screening obtain target triple vector, and using the target triple to Amount generates knowledge mapping.
2. knowledge mapping generation method according to claim 1, which is characterized in that carry out participle system to the description document Meter processing, obtains document word distribution matrix, comprising:
Word in the description document is modeled, to the word vectors in document, obtains message vector matrix;
Cluster merging is carried out according to term vector similarity, word frequency is counted, obtains the document word distribution matrix.
3. knowledge mapping generation method according to claim 2, which is characterized in that it is described description document in word into Row modeling obtains message vector matrix to the word vectors in document, comprising:
Word segmentation processing is carried out to the description document, obtains document word collection;
The document word collection is exported into word vectors model, obtains the message vector matrix.
4. knowledge mapping generation method according to claim 1, which is characterized in that described to be distributed using the document word Matrix obtains entity description feature vector, comprising:
The document word distribution matrix is decomposed, document subject matter distribution matrix and Topic word distribution matrix are obtained;
In conjunction with the document subject matter distribution matrix and the Topic word distribution matrix, each entity description association is determined Maximum keyword is spent, keyword vector matrix is obtained;
By keyword vector matrix conversion at most domain knowledge map triplet information space, it is special to obtain the entity description Levy vector.
5. knowledge mapping generation method according to claim 4, which is characterized in that described to the document word moment of distribution Battle array is decomposed, and document subject matter distribution matrix and Topic word distribution matrix are obtained, comprising:
The document word distribution matrix is input to document subject matter and generates progress document word matrix modeling in model, obtains institute State document subject matter distribution matrix and the Topic word distribution matrix.
6. knowledge mapping generation method according to claim 4, which is characterized in that convert the keyword vector matrix At most domain knowledge map triplet information space obtains entity description feature vector, comprising:
The keyword matrix is input in neural network and is mapped, the entity description feature vector is obtained.
7. knowledge mapping generation method according to any one of claims 1 to 6, which is characterized in that described to initial ternary Group structure vector carries out screening and obtains target triple vector, comprising:
The initial triple structure vector is screened using default reliability assessment function, obtain target triple to Amount.
8. a kind of knowledge mapping generating means characterized by comprising
Description information obtains module, for obtaining in triple entity structure vector and the triple entity structure vector The description document of each entity;
Entity description feature vector obtains module, for carrying out participle statistical disposition to the description document, obtains document word Distribution matrix, and the document word distribution matrix is utilized, obtain entity description feature vector;
Vector Fusion module, for carrying out being added place with the triple entity structure vector to the entity description feature vector Reason, obtains initial triple structure vector;
Knowledge mapping generation module obtains target triple vector for carrying out screening to the initial triple structure vector, And knowledge mapping is generated using the target triple vector.
9. a kind of knowledge mapping generating device characterized by comprising
Memory, for storing computer program;
Processor realizes the knowledge mapping generation side as described in any one of claim 1 to 7 when for executing the computer program The step of method.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with computer program, the meter on the readable storage medium storing program for executing It is realized when calculation machine program is executed by processor as described in any one of claim 1 to 7 the step of knowledge mapping generation method.
CN201910292766.4A 2019-04-10 2019-04-10 A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing Pending CN109992673A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910292766.4A CN109992673A (en) 2019-04-10 2019-04-10 A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910292766.4A CN109992673A (en) 2019-04-10 2019-04-10 A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing

Publications (1)

Publication Number Publication Date
CN109992673A true CN109992673A (en) 2019-07-09

Family

ID=67133594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910292766.4A Pending CN109992673A (en) 2019-04-10 2019-04-10 A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing

Country Status (1)

Country Link
CN (1) CN109992673A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674307A (en) * 2019-08-21 2020-01-10 北京邮电大学 Knowledge deduction method and system for knowledge center network
CN110825822A (en) * 2019-09-30 2020-02-21 深圳云天励飞技术有限公司 Personnel relationship query method and device, electronic equipment and storage medium
CN111026875A (en) * 2019-11-26 2020-04-17 中国人民大学 Knowledge graph complementing method based on entity description and relation path
CN111159431A (en) * 2019-12-30 2020-05-15 深圳Tcl新技术有限公司 Knowledge graph-based information visualization method, device, equipment and storage medium
CN111325033A (en) * 2020-03-20 2020-06-23 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN111353106A (en) * 2020-02-26 2020-06-30 贝壳技术有限公司 Recommendation method and device, electronic equipment and storage medium
CN111472754A (en) * 2019-12-23 2020-07-31 北京国双科技有限公司 Fault processing method and device for oil pumping well, storage medium and electronic equipment
CN113487143A (en) * 2021-06-15 2021-10-08 中国农业大学 Fish shoal feeding decision method and device, electronic equipment and storage medium
CN113569050A (en) * 2021-09-24 2021-10-29 湖南大学 Method and device for automatically constructing government affair field knowledge map based on deep learning
WO2022205833A1 (en) * 2021-03-29 2022-10-06 网络通信与安全紫金山实验室 Method and system for constructing and analyzing knowledge graph of wireless network protocol, and device and medium
WO2022222226A1 (en) * 2021-04-19 2022-10-27 平安科技(深圳)有限公司 Structured-information-based relation alignment method and apparatus, and device and medium
CN116091120A (en) * 2023-04-11 2023-05-09 北京智蚁杨帆科技有限公司 Full stack type electricity price consulting and managing system based on knowledge graph technology

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126734A (en) * 2016-07-04 2016-11-16 北京奇艺世纪科技有限公司 The sorting technique of document and device
CN109299284A (en) * 2018-08-31 2019-02-01 中国地质大学(武汉) A kind of knowledge mapping expression learning method based on structural information and text description
CN109522416A (en) * 2018-10-19 2019-03-26 广东工业大学 A kind of construction method of Financial Risk Control knowledge mapping

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126734A (en) * 2016-07-04 2016-11-16 北京奇艺世纪科技有限公司 The sorting technique of document and device
CN109299284A (en) * 2018-08-31 2019-02-01 中国地质大学(武汉) A kind of knowledge mapping expression learning method based on structural information and text description
CN109522416A (en) * 2018-10-19 2019-03-26 广东工业大学 A kind of construction method of Financial Risk Control knowledge mapping

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674307A (en) * 2019-08-21 2020-01-10 北京邮电大学 Knowledge deduction method and system for knowledge center network
CN110825822B (en) * 2019-09-30 2022-11-22 深圳云天励飞技术有限公司 Personnel relationship query method and device, electronic equipment and storage medium
CN110825822A (en) * 2019-09-30 2020-02-21 深圳云天励飞技术有限公司 Personnel relationship query method and device, electronic equipment and storage medium
CN111026875A (en) * 2019-11-26 2020-04-17 中国人民大学 Knowledge graph complementing method based on entity description and relation path
CN111472754A (en) * 2019-12-23 2020-07-31 北京国双科技有限公司 Fault processing method and device for oil pumping well, storage medium and electronic equipment
CN111159431A (en) * 2019-12-30 2020-05-15 深圳Tcl新技术有限公司 Knowledge graph-based information visualization method, device, equipment and storage medium
CN111353106A (en) * 2020-02-26 2020-06-30 贝壳技术有限公司 Recommendation method and device, electronic equipment and storage medium
CN111325033A (en) * 2020-03-20 2020-06-23 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN111325033B (en) * 2020-03-20 2023-07-11 中国建设银行股份有限公司 Entity identification method, entity identification device, electronic equipment and computer readable storage medium
WO2022205833A1 (en) * 2021-03-29 2022-10-06 网络通信与安全紫金山实验室 Method and system for constructing and analyzing knowledge graph of wireless network protocol, and device and medium
WO2022222226A1 (en) * 2021-04-19 2022-10-27 平安科技(深圳)有限公司 Structured-information-based relation alignment method and apparatus, and device and medium
CN113487143A (en) * 2021-06-15 2021-10-08 中国农业大学 Fish shoal feeding decision method and device, electronic equipment and storage medium
CN113569050A (en) * 2021-09-24 2021-10-29 湖南大学 Method and device for automatically constructing government affair field knowledge map based on deep learning
CN113569050B (en) * 2021-09-24 2021-12-07 湖南大学 Method and device for automatically constructing government affair field knowledge map based on deep learning
CN116091120A (en) * 2023-04-11 2023-05-09 北京智蚁杨帆科技有限公司 Full stack type electricity price consulting and managing system based on knowledge graph technology
CN116091120B (en) * 2023-04-11 2023-06-23 北京智蚁杨帆科技有限公司 Full stack type electricity price consulting and managing system based on knowledge graph technology

Similar Documents

Publication Publication Date Title
CN109992673A (en) A kind of knowledge mapping generation method, device, equipment and readable storage medium storing program for executing
CN111078836B (en) Machine reading understanding method, system and device based on external knowledge enhancement
CN111125365B (en) Address data labeling method and device, electronic equipment and storage medium
Do et al. Multiview deep learning for predicting twitter users' location
Lin et al. Multi-modal contrastive representation learning for entity alignment
CN106940726B (en) Creative automatic generation method and terminal based on knowledge network
CN112417289B (en) Information intelligent recommendation method based on deep clustering
CN107436942A (en) Word embedding grammar, system, terminal device and storage medium based on social media
CN114238653B (en) Method for constructing programming education knowledge graph, completing and intelligently asking and answering
CN110046981A (en) A kind of credit estimation method, device and storage medium
CN112115971B (en) Method and system for carrying out student portrait based on heterogeneous academic network
CN111930936A (en) Method and system for excavating platform message text
CN110929532A (en) Data processing method, device, equipment and storage medium
CN113901224A (en) Knowledge distillation-based secret-related text recognition model training method, system and device
CN116756347B (en) Semantic information retrieval method based on big data
Kovács et al. Conceptualization with incremental bron-kerbosch algorithm in big data architecture
Xiao et al. Web services clustering based on HDP and SOM neural network
CN110083828A (en) A kind of Text Clustering Method and device
CN111782964B (en) Recommendation method of community posts
KR102454261B1 (en) Collaborative partner recommendation system and method based on user information
CN113535945B (en) Text category recognition method, device, equipment and computer readable storage medium
CN117651950A (en) Interpreted natural language artifact recombination with context awareness
CN114661616A (en) Target code generation method and device
CN113536772A (en) Text processing method, device, equipment and storage medium
CN113987126A (en) Retrieval method and device based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190709