CN107665252A - A kind of method and device of creation of knowledge collection of illustrative plates - Google Patents

A kind of method and device of creation of knowledge collection of illustrative plates Download PDF

Info

Publication number
CN107665252A
CN107665252A CN201710890548.1A CN201710890548A CN107665252A CN 107665252 A CN107665252 A CN 107665252A CN 201710890548 A CN201710890548 A CN 201710890548A CN 107665252 A CN107665252 A CN 107665252A
Authority
CN
China
Prior art keywords
entity
instance
attribute
incidence relation
sets
Prior art date
Application number
CN201710890548.1A
Other languages
Chinese (zh)
Other versions
CN107665252B (en
Inventor
毛瑞彬
朱菁
张俊
王仁勇
邓永翠
赵洪杰
Original Assignee
深圳证券信息有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳证券信息有限公司 filed Critical 深圳证券信息有限公司
Priority to CN201710890548.1A priority Critical patent/CN107665252B/en
Publication of CN107665252A publication Critical patent/CN107665252A/en
Application granted granted Critical
Publication of CN107665252B publication Critical patent/CN107665252B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • G06N3/0454Architectures, e.g. interconnection topology using a combination of multiple neural nets

Abstract

A kind of method and device of creation of knowledge collection of illustrative plates, methods described are applied to data analysis set-up, and methods described includes:Data source is obtained, data source includes multiple entities;Semantic analysis and cluster analysis are carried out to data source, entity sets and attribute set are extracted from data source, attribute set includes the entity attribute of each entity in entity sets;Obtain the incidence relation between each entity and attribute in entity sets;Created according to the incidence relation between entity sets, attribute set and entity and attribute and export knowledge mapping, knowledge mapping includes the incidence relation of the incidence relation and inter-entity between entity, entity attribute, entity and attribute.By using this programme, knowledge mapping can be accurately created that, the relation between entity and attribute, and the incidence relation of inter-entity can intuitively be presented.

Description

A kind of method and device of creation of knowledge collection of illustrative plates

Technical field

The application is related to big data processing technology field, more particularly to a kind of method and device of creation of knowledge collection of illustrative plates.

Background technology

Knowledge mapping hints obliquely at map for the visual of knowledge domain, be explicit knowledge development process and structural relation it is a series of A variety of figures, it can be used for presenting between knowledge resource, knowledge resource carrier, excavation, analysis, structure and explicit knowledge Incidence relation.Knowledge mapping can be used for the intelligent answer of intelligent robot, and application is wider.

But in current mechanism, when building knowledge mapping, all entities got in data source can all be analyzed, Then the incidence relation established between all entities, entity attribute.So, although the model that the knowledge mapping constructed can be covered Enclose relatively extensively, but important structural relation can not intuitively be presented to user, when causing managerial knowledge collection of illustrative plates, can not rapidly know Do not go out effective information, the reference value used is limited, it is necessary to which the user effort long period goes to analyze, so as to there is pin Presentation key structure information to property.

The content of the invention

This application provides a kind of method and device of creation of knowledge collection of illustrative plates, it can solve the problem that and constructed in the prior art know The problem of specific aim of knowledge collection of illustrative plates is relatively low.

The application first aspect provides a kind of method of creation of knowledge collection of illustrative plates, and methods described is applied to data analysis set-up, Methods described includes:

Data source is obtained, the data source includes multiple entities;

Semantic analysis and cluster analysis are carried out to the data source, entity sets and property set are extracted from the data source Close, the attribute set includes the entity attribute of each entity in the entity sets;

Obtain the incidence relation between each entity and attribute in the entity sets;

Created according to the incidence relation between the entity sets, attribute set and entity and attribute and export knowledge Collection of illustrative plates, the knowledge mapping include associating for incidence relation between entity, entity attribute, entity and attribute and inter-entity Relation.

In some possible designs, methods described also includes:

Vectorization is carried out respectively to each entity in the entity sets, obtains training vector.

In some possible designs, each entity in the entity sets carries out vectorization respectively, is instructed Practice vector, including:

Entity recognition is named to each entity in the entity sets using multilayer neural network, obtains each entity Entity context;

The incidence relation of each inter-entity is extracted from the entity context for obtaining each entity;

According to the entity context of each entity and the incidence relation of each inter-entity, the training vector is obtained.

It is described that each entity in the entity sets is ordered using multilayer neural network in some possible designs Name Entity recognition, it is described that each reality is extracted from the entity context for obtaining each entity after obtaining the entity context of each entity Before incidence relation between body, methods described also includes:

Maximization processing is carried out using the entity context of each entity of the max log likelihood method to obtaining respectively.

In some possible designs, the association that each inter-entity is extracted from the entity context for obtaining each entity is closed It is described according to the entity context of each entity and the incidence relation of each inter-entity after system, obtain the entity instruction Before practicing vector, methods described also includes:

Maximization processing is carried out using max log likelihood method respectively to the incidence relation of the inter-entity of obtained each entity.

In some possible designs, the association that each inter-entity is extracted from the entity context for obtaining each entity is closed System, including:

According to the attribute set, the entity sets and time recurrent neural networks model, respectively to the entity set Each entity in conjunction is associated relationship marking, and the incidence relation of mark includes position of the word in entity, incidence relation class Type and incidence relation position;

Using the weighted value for the method calculated relationship type for calculating incidence embedding;

Candidate association relation is filtered out from the incidence relation of mark according to minimum distance principle, incidence relation type;

According to the keyword of incidence relation type to classifying to the candidate association relation filtered out, to obtain the reality Incidence relation between body.

It is described after the extraction entity sets from the data source and attribute set in some possible designs Before obtaining the incidence relation in the entity sets between each entity and attribute, methods described also includes:

The weighted value of each entity in the entity sets is calculated according to the entity attribute of entity;

According to the weighted value of entity, each entity attributes in the entity sets are ranked up.

In some possible designs, methods described also includes:

The similarity of each inter-entity, or phase identical to entity type in the knowledge mapping are calculated by entity attribute insertion As entity merge, in duplicate removal and differentiation at least one of.

In some possible designs, the data source includes the first tables of data and the second tables of data, the multiple entity Including at least one first instance and at least one second instance, the first instance belongs to the first tables of data, and described second is real Body belongs to the second tables of data, and the knowledge mapping is schemed including at least two UNICOMs, son between at least two UNICOM figure be present Grandson's relation and/or set membership.

It is described that the same or analogous entity of entity type in the knowledge mapping is closed in some possible designs And, in duplicate removal and differentiation at least one of, including:

If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance At least one UNICOM figure is belonged to the second instance, then merges the first instance and the second instance, or from institute State knowledge mapping and delete the first instance or the second instance.

If the similarity of the first instance and the second instance is higher than the default similarity, and determines described first Entity and the second instance do not belong to any one UNICOM figure, then distinguished in the knowledge mapping first instance and The second instance.

It is described that the same or analogous entity of entity type in the knowledge mapping is closed in some possible designs And, in duplicate removal and differentiation at least one of, including:

If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance At least one UNICOM's figure is belonged to the second instance, it is determined that the first instance collection with the first instance direct correlation Close, and the second instance set with the second instance direct correlation;

When it is determined that the first instance set and the second instance intersection of sets collection comprise at least two entities, then merging The first instance and the second instance, or delete the first instance or the second instance from the knowledge mapping.

In some possible designs, the knowledge mapping is based on time dimension, each time window on time dimension Intraoral UNICOM's figure is the incidence relation of inter-entity in the time window, and the snapshot of entity attribute.

In some possible designs, the knowledge mapping at least also meets one of following item:

In the knowledge mapping, have between the entity of incidence relation according to incidence relation intensity from showing to weak gradual change by force Show;

The special entity in the knowledge mapping is highlighted, the special entity marks risk assessment value, described specific Entity refers to that risk assessment value is higher than the entity of default risk assessment value;

When the entity renewal in the knowledge mapping, the entity of renewal is distinguished;

Time shaft is increased to the entity attribute of having time renewal, shows the time of replacement on a timeline;

For the entity attribute of same entity, according to the weighted value of entity attribute from high to low, by going deep into shallow coloring.

It is described that semantic analysis and cluster analysis are carried out to the language material set in some possible designs, from institute's predicate Entity sets and attribute set are extracted in material set, including:

Language material in the language material set is segmented, semantic tagger processing, obtain the entity sets and the category Property set;

Mark the incidence relation type of the inter-entity in the entity sets;

Based on conditional random field models, the entity sets and the attribute set are adjusted respectively, and to institute Each attribute stated in each entity and the attribute set in entity sets is predicted respectively, obtains the pass of entity and inter-entity Join relationship type, and obtain the mapping between entity and attribute.

The application second aspect provides a kind of device for creation of knowledge collection of illustrative plates, and there is realization to correspond to above-mentioned first party The function of the method for the creation of knowledge collection of illustrative plates that face provides.The function can be realized by hardware, can also be performed by hardware Corresponding software is realized.Hardware or software include one or more modules corresponding with above-mentioned function phase, and the module can be Software and/or hardware.

In a kind of possible design, the device for creation of knowledge collection of illustrative plates includes:

Transceiver module, for obtaining data source, the data source includes multiple entities;

Processing module, the data source for being obtained to the transceiver module carry out semantic analysis and cluster analysis, from Entity sets and attribute set are extracted in the data source, the attribute set includes the entity of each entity in the entity sets Attribute;Obtain the incidence relation between each entity and attribute in the entity sets;According to the entity sets, attribute set, And the incidence relation between entity and attribute creates and exports knowledge mapping, the knowledge mapping include entity, entity attribute, The incidence relation of incidence relation and inter-entity between entity and attribute.

In some possible designs, the processing module is additionally operable to:

Vectorization is carried out respectively to each entity in the entity sets, obtains training vector.

In some possible designs, the processing module is additionally operable to:

Entity recognition is named to each entity in the entity sets using multilayer neural network, obtains each entity Entity context;

The incidence relation of each inter-entity is extracted from the entity context for obtaining each entity;

According to the entity context of each entity and the incidence relation of each inter-entity, the training vector is obtained.

In some possible designs, the processing module is using multilayer neural network to each reality in the entity sets Body is named Entity recognition, after obtaining the entity context of each entity, is extracted from the entity context for obtaining each entity Before the incidence relation of each inter-entity, it is additionally operable to:

Maximization processing is carried out using the entity context of each entity of the max log likelihood method to obtaining respectively.

In some possible designs, the processing module extracts each inter-entity from the entity context for obtaining each entity Incidence relation after, according to the entity context of each entity and the incidence relation of each inter-entity, obtain the reality Before body training vector, it is additionally operable to:

Maximization processing is carried out using max log likelihood method respectively to the incidence relation of the inter-entity of obtained each entity.

In some possible designs, the processing module is specifically used for:

According to the attribute set, the entity sets and time recurrent neural networks model, respectively to the entity set Each entity in conjunction is associated relationship marking, and the incidence relation of mark includes position of the word in entity, incidence relation class Type and incidence relation position;

Using the weighted value of incidence relation embedding inlay technique calculated relationship type;

Candidate association relation is filtered out from the incidence relation of mark according to minimum distance principle, incidence relation type;

According to the keyword of incidence relation type to classifying to the candidate association relation filtered out, to obtain the reality Incidence relation between body.

In some possible designs, the processing module extracted from the data source entity sets and attribute set it Afterwards, before obtaining the incidence relation in the entity sets between each entity and attribute, it is additionally operable to:

The weighted value of each entity in the entity sets is calculated according to the entity attribute of entity;

According to the weighted value of entity, each entity attributes in the entity sets are ranked up.

In some possible designs, the processing module is additionally operable to:

The similarity of each inter-entity, or phase identical to entity type in the knowledge mapping are calculated by entity attribute insertion As entity merge, in duplicate removal and differentiation at least one of.

In some possible designs, the data source includes the first tables of data and the second tables of data, the multiple entity Including at least one first instance and at least one second instance, the first instance belongs to the first tables of data, and described second is real Body belongs to the second tables of data, and the knowledge mapping is schemed including at least two UNICOMs, son between at least two UNICOM figure be present Grandson's relation and/or set membership.

In some possible designs, the processing module is specifically used for:

If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance At least one UNICOM figure is belonged to the second instance, then merges the first instance and the second instance, or from institute State knowledge mapping and delete the first instance or the second instance.

If the similarity of the first instance and the second instance is higher than the default similarity, and determines described first Entity and the second instance do not belong to any one UNICOM figure, then distinguished in the knowledge mapping first instance and The second instance.

In some possible designs, the processing module is specifically used for:

If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance At least one UNICOM's figure is belonged to the second instance, it is determined that the first instance collection with the first instance direct correlation Close, and the second instance set with the second instance direct correlation;

When it is determined that the first instance set and the second instance intersection of sets collection comprise at least two entities, then merging The first instance and the second instance, or delete the first instance or the second instance from the knowledge mapping.

In some possible designs, the knowledge mapping is based on time dimension, each time window on time dimension Intraoral UNICOM's figure is the incidence relation of inter-entity in the time window, and the snapshot of entity attribute.

In some possible designs, the knowledge mapping at least also meets one of following item:

In the knowledge mapping, have between the entity of incidence relation according to incidence relation intensity from showing to weak gradual change by force Show;

The special entity in the knowledge mapping is highlighted, the special entity marks risk assessment value, described specific Entity refers to that risk assessment value is higher than the entity of default risk assessment value;

When the entity renewal in the knowledge mapping, the entity of renewal is distinguished;

Time shaft is increased to the entity attribute of having time renewal, shows the time of replacement on a timeline;

For the entity attribute of same entity, according to the weighted value of entity attribute from high to low, by going deep into shallow coloring.

In some possible designs, the processing module is specifically used for:

Language material in the language material set is segmented, semantic tagger processing, obtain the entity sets and the category Property set;

Mark the incidence relation type of the inter-entity in the entity sets;

Based on conditional random field models, the entity sets and the attribute set are adjusted respectively, and to institute Each attribute stated in each entity and the attribute set in entity sets is predicted respectively, obtains the pass of entity and inter-entity Join relationship type, and obtain the mapping between entity and attribute.

The another aspect of the application provides a kind of device for creation of knowledge collection of illustrative plates, and it includes the place of at least one connection Device, memory, transmitter and receiver are managed, wherein, the memory is used for store program codes, and the processor is used to call Program code in the memory performs the method described in above-mentioned each side.

The another aspect of the application provides a kind of computer-readable storage medium, and it includes instruction, when it runs on computers When so that computer performs the method described in above-mentioned each side.

In the scheme provided compared to prior art, the application, semantic analysis and cluster are carried out to the data source got Analysis, entity sets and attribute set are extracted from data source, according to each entity in the entity attribute computational entity set of entity Weighted value;According to the weighted value of entity, each entity attributes in entity sets are ranked up, then created each in entity sets Incidence relation between entity and attribute, and associate pass according between entity sets, attribute set and entity and attribute System creates and exports knowledge mapping.The knowledge mapping include incidence relation between entity, entity attribute, entity and attribute, with And the incidence relation of inter-entity.By using this programme, knowledge mapping can be accurately created that, entity can intuitively be presented Relation between attribute, and the incidence relation of inter-entity.And it is easy to the displaying directly perceived of personalization, can be also easy to uniformly manage Reason.

Brief description of the drawings

Fig. 1-1 is a kind of schematic flow sheet of creation of knowledge collection of illustrative plates in the embodiment of the present invention;

Fig. 1-2 is a kind of schematic flow sheet that first function is constructed in the embodiment of the present invention;

Fig. 1-3 be the embodiment of the present invention in each entity, inter-entity incidence relation structural representation;

Fig. 1-4 is a kind of schematic flow sheet that second function is constructed in the embodiment of the present invention;

Fig. 2 is a kind of schematic flow sheet that Company Knowledge collection of illustrative plates is created in the embodiment of the present invention;

Fig. 3 is a kind of schematic diagram for merging the employee having the same given name and family name in the embodiment of the present invention;

Fig. 4 is a kind of structural representation of Company Knowledge collection of illustrative plates in the embodiment of the present invention;

Fig. 5 is a kind of structural representation for the device for being used for creation of knowledge collection of illustrative plates in the embodiment of the present invention;

Fig. 6 is a kind of structural representation for being used to perform the entity apparatus of creation of knowledge collection of illustrative plates in the embodiment of the present invention.

Embodiment

Term " first ", " second " in the description and claims of this application and above-mentioned accompanying drawing etc. are to be used to distinguish Similar object, without for describing specific order or precedence.It should be appreciated that the data so used are in appropriate feelings It can be exchanged under condition, so that the embodiments described herein can be real with the order in addition to the content for illustrating or describing herein Apply.In addition, term " comprising " and " having " and their any deformation, it is intended that cover it is non-exclusive include, for example, bag Contained series of steps or module process, method, system, product or equipment be not necessarily limited to those steps clearly listed or Module, but may include not list clearly or for the intrinsic other steps of these processes, method, product or equipment or Module, the division of the module appeared in the application, a kind of only division in logic, can have when realizing in practical application Other dividing mode, such as multiple modules can be combined into or are integrated in another system, or some features can be ignored, Or do not perform, in addition, shown or discussion mutual coupling or direct-coupling or communication connection can be by one A little interfaces, INDIRECT COUPLING between module or communication connection can be electrical or other similar forms are not made in the application Limit.Also, the module or submodule illustrated as separating component can may not be separation physically, can be Physical module is can not be, or can be distributed in multiple circuit modules, portion therein can be selected according to the actual needs Point or whole modules realize the purpose of application scheme.

The application has supplied a kind of method and device of creation of knowledge collection of illustrative plates, for big data technical field.Carry out below detailed Describe in detail bright.

Fig. 1-1 is refer to, providing a kind of method of creation of knowledge collection of illustrative plates to the application below is illustrated, the side Method is applied to data analysis set-up, and the data analysis set-up in the application can be server or terminal device, data analysis Device can also be mounted to the application of server or terminal device, and specific the application is not construed as limiting.Methods described is mainly wrapped Include:

101st, data source is obtained.

The data source includes multiple entities.Data source can be the text datas such as news, model, popular article, these Text data can be the form of tables of data or other forms, and specific the application is not construed as limiting.Data source can claim For the set of language material.The acquisition modes of data source can be to Internet news, bulletin, legal documents, industry and commerce by reptile mode The crawler capturing of website, enterprise official website, personal homepage, Baidupedia etc..Data source can also be it is any can be used for collect and send out The device of data is sent to obtain, such as terminal device, terminal device can be smart mobile phone, tablet personal computer, laptop computer, desktop electricity Brain or crawler server etc..

102nd, semantic analysis and cluster analysis are carried out to the data source, entity sets and category is extracted from the data source Property set.

Wherein, the attribute set includes the entity attribute of each entity in the entity sets.

Such as when creating Company Knowledge collection of illustrative plates, entity can refer to employee's name and enterprise's name, entity attribute then refers to look forward to Industry attribute, employee's attribute.

Wherein, the entity attribute of employee can be employee's position, employee's sex, employee's educational background, winning information, Yuan Gongji Not, employee's resume, patent, news & event etc..

The entity attribute of enterprise can be bulletin, news, legal documents, intellectual property, product, qualification, official website, recruitment, The information such as administrative penalty, research team and event, stock code, shareholder's information, investment and senior executive.

Semantic analysis refers to carry out semantic test and processing according to the grammatical category of syntax analyzer identification, to obtain text Substantive implication.

Cluster analysis refers to point that the set of physics or abstract object is grouped into the multiple classes being made up of similar object Analysis process.

103rd, the incidence relation in the entity sets between each entity and attribute is obtained.

Optionally, in some embodiments, after step 102, can also include the steps of before step 103:

The weighted value of each entity in the entity sets is calculated according to the entity attribute of entity.

Wherein, the weighted value of entity refers to importance of the entity in whole knowledge mapping to be created, can be with self-defined. Can also be by incidence relation be embedded in that weighted value is calculated.

And the weighted value according to entity, each entity attributes in the entity sets are ranked up.

104th, create and export according to the incidence relation between the entity sets, attribute set and entity and attribute Knowledge mapping.

The knowledge mapping may include incidence relation between entity, entity attribute, entity and attribute and inter-entity Incidence relation.

Knowledge mapping refers to priori, and it refers to provide in detail for the entity included in user's inquiry or the answer of return Thin structured summary, it mainly includes concept, concept hierarchy, attribute, attribute Value Types, relation, contextual definition domain (Domain) concept set and range of relation (Range) concept set.

The knowledge mapping can be big to cover by collecting the structural data from encyclopaedia class website and various vertical websites Part common sense knowledge, its attribute-value pair by extracting related entities from various semi-structured data (shaped like html table) To enrich the description of entity.In addition, find new entity or new entity attribute so as to not by searching for daily record (query log) The coverage rate of disconnected extension knowledge mapping.Compared to common sense knowledge, the knowledge data for extracting to obtain by data mining is bigger, more can Reflect the query demand of active user and newest entity or the fact can be found in time.

When establishing the knowledge mapping, also using the redundancy of internet in follow-up excavation by ballot or other Confidence level is added in the default knowledge mapping by aggregating algorithm to assess its confidence level by manual examination and verification.From various The data source of type extracts the various candidate's entities (concept) and its Attribute Association needed for structure knowledge mapping, forms one by one Isolated extraction collection of illustrative plates (Extraction Graphs).

For example, when the knowledge mapping is Company Knowledge collection of illustrative plates, the visioning procedure schematic diagram of Company Knowledge collection of illustrative plates refer to as Schematic flow sheet shown in Fig. 2.May include in the Company Knowledge collection of illustrative plates incidence relation between enterprise, the incidence relation between employee, The attribute of enterprise, the attribute of employee, and the incidence relation of enterprise and task.Employee and enterprise are exactly to be mapped in Company Knowledge figure Node in spectrum.

Wherein, the incidence relation between enterprise includes investments abroad, shareholder, client and competitive relation etc., enterprise and employee it Between incidence relation include tenure, shareholder and legal representative etc.;Incidence relation between employee includes colleague, classmate, relatives and right Hand relation etc.;The attribute of enterprise includes bulletin, news, legal documents, intellectual property, product, qualification, official website, recruitment, administrative service Penalize, research team and event etc.;The attribute of employee includes educational background, experience, patent, news & event etc..

It can be seen that the knowledge mapping in the embodiment of the present application can be used for the analysis result for showing big data, can be used for Search engine.Such as after the server later stage receives the searching request of user, searching request is entered based on obtained knowledge mapping Row semantic analysis, right rear line return to answer.To a certain extent, can be that user exports the higher answer of the degree of accuracy, moreover it is possible to The enough speed for accelerating response user.Also, the search of knowledge based collection of illustrative plates, the computational load of search engine can be reduced, improved The performance of search engine.

Compared with current mechanism, in the embodiment of the present application, data analysis set-up can carry out language to the data source got Justice analysis and cluster analysis, extract entity sets and attribute set, according to the entity attribute computational entity of entity from data source The weighted value of each entity in set;According to the weighted value of entity, each entity attributes in entity sets are ranked up, then creates Incidence relation in entity sets between each entity and attribute, and according to entity sets, attribute set and entity and attribute Between incidence relation create and export knowledge mapping.The knowledge mapping is included between entity, entity attribute, entity and attribute The incidence relation of incidence relation and inter-entity.By using this programme, knowledge mapping, Neng Gouzhi can be accurately created that The relation presented between entity and attribute of sight, and the incidence relation of inter-entity.And it is easy to the displaying directly perceived of personalization, It can be easy to be managed collectively.

Optionally, in some inventive embodiments, after step 103, before step 104, methods described also includes:

Vectorization is carried out respectively to each entity in the entity sets, obtains training vector.

Specifically, it may include following operation:

(a) Entity recognition is named to each entity in the entity sets using multilayer neural network, obtains each reality The entity context of body.

Wherein, entity context refers to that the set of the word on entity periphery, such as entity e contexts can use Context (e) tables Show.

In some embodiments, max log likelihood method can also be used to the entity context point of obtained each entity Maximization processing is not carried out.Such as max log likelihood method construction first function can be used, then the first function is carried out Maximization is handled.First function is as follows:

Function

Wherein C is training corpus, and θ is undetermined parameter collection, and F (e, Context (e), θ) represents first function, the first letter Number can be constructed by multilayer neural network, for example, can be constructed by the schematic diagram shown in Fig. 1-2.

In this step, by carrying out maximization processing to first function, entity context corresponding to each entity is finally obtained.

(b) incidence relation of each inter-entity is extracted from the entity context for obtaining each entity.

Specifically, using each entity context obtained in step (a) as the incidence relation extraction system of inter-entity Input, finally export the expression set of the incidence relation of inter-entity.Such as the association of each entity shown in Fig. 1-3, inter-entity is closed The structural representation of system, describes the incidence relation corresponding to each entity in the Fig. 1-3, the fe in Fig. 1-31-fel、e0、be1- besIt is entity, w1-wlAnd v1-vsRepresent incidence relation.Such as w1Represent fe1With e0Between incidence relation.

For an entity e0For, { fei;I=1, Kl } it is entity e0It is relevant in coupled front it is real Body (front NEs), { bej;J=1, Ks } it is entity e0It is relevant in coupled rear entity (back NEs). For example, for language material, " Facebook has purchased a content copyright start-up company Source3 ", right in " purchase " relation For name entity " Source3 ", " Facebook " is its front entity;For name entity " Facebook ", " Source3 " is its rear entity.{ωi;I=1, Kl } and { υj;J=1, Ks } it is relation weight (relation Weights), determined according to finance activities priority level.

(c) according to the entity context of each entity and the incidence relation of each inter-entity, obtain it is described train to Amount.

In some embodiments, the incidence relation based on inter-entity obtained above, max log can also be used seemingly Right method carries out maximization processing respectively to the incidence relation of the inter-entity of obtained each entity.

Specifically, based on the embodiment shown in Fig. 1-2 and Fig. 1-3, max log likelihood method can be used to construct the second letter Number, then carries out maximization processing to second function.Second function is as follows:

Wherein C is training corpus, and λ is undetermined parameter collection, and G (e, front (e), back (e), λ) represents second function, together Reason, G (e, front (e), back (e), λ) can also be constructed by multilayer neural network.For example, showing shown in Fig. 1-4 can be passed through Intention is constructed.

In step (c), the incidence relation for the inter-entity for training to obtain to relation before and after ultimately generating from entity (can use EntityRepe;relationRepresent).Finally, obtained each entity context and each entity are combined using Kronecker products Between incidence relation, form the final training vector of entity, training vector can be represented with expressions below:

By carrying out vectorization processing, entity can be converted into a kind of distributed expression, can after being expressed using distribution To cause associated or similar entity apart from upper and close, so as to further assign the spy that text can be used in calculating Property, to carry out Knowledge Extraction and incidence relation reasoning to follow-up knowledge mapping.

Optionally, it is described that each inter-entity is extracted from the entity context for obtaining each entity in some inventive embodiments Incidence relation, including:

(a) using the attribute set and the entity sets as input, based on time recurrent neural networks model, difference It is associated relationship marking to each entity in the entity sets, the incidence relation of mark includes word the position in entity Put, incidence relation type and incidence relation position, the incidence relation position refer to position of the incidence relation in knowledge mapping.

(b) weighted value of incidence relation embedding inlay technique calculated relationship type is used;

(c) candidate association relation is filtered out from the incidence relation of mark according to minimum distance principle, incidence relation type.

(d) according to the keyword of incidence relation type to classifying to the candidate association relation filtered out, to obtain State the incidence relation of inter-entity.

It can be seen that the incidence relation by extracting inter-entity from text data, can recognize that each inter-entity institute is potential Semantic relation.In some embodiments, the incidence relation of inter-entity can use entity relationship triple (entity 1, incidence relation type Or incidence relation configured information, entity 2) represent.Specifically, the application can turn the extraction problem of the incidence relation of inter-entity Sequence labelling task (Sequence Tagging Task) is melted into, neural network model can be used to carry out sequence labelling task.

For example, neural network model can be then based on using the attribute set and the entity sets as input Sequence labelling is carried out to training corpus.After sequence labelling being carried out based on neural network model to training corpus, ultimate sequence mark Output result have two kinds:

A kind of output result represents the label unrelated with incidence relation to be extracted, unrelated with incidence relation to be extracted Label can use ' O ' to represent.

Another output result then represents the label relevant with incidence relation to be extracted, can use the relation mark in addition to ' O ' Label represent.Relational tags represent position, incidence relation type and incidence relation position of the label in entity.

Finally, go to calculate by the output layer shown in Fig. 1-4 and be based on Tag Estimation vector TtNormalization entity tag it is general Rate:

yt=WyTt+by,

Wherein, WtIt is weight matrix, NtIt is total number of labels, F (Nt,yt) represent normalized function, TtAfter representing vectorization Label.And neural network model can be to rely on during Chief Learning Officer, CLO, therefore decoding process can establish label interaction.

3rd function is constructed using log-likelihood function, the 3rd function is as follows:

Wherein, LjIt is sentence xjLength, | D | be training set sample size, conditiont,jIt is a predicate, Θ is Coefficient Space, and α is biasing weight, and its value is bigger, and corresponding relational tags influence bigger in a model, and I (O) is difference The transfer function that ' O ' loses between label and relational tags, the function are as follows:

Candidate relationship triple can be tentatively extracted with reference to above method and relationship type, minimum distance principle, and is led to Relation classification keyword is crossed to screen candidate's entity relationship triple, has finally been divided the entity relationship triple of class.

It should be noted that the first function provided in the application, second function and formula corresponding to the 3rd function are only It is a kind of citing, can be deformed on the formula provided, the mode and concrete form the application of formula specifically deformed It is not construed as limiting.

Optionally, in some inventive embodiments, because there are many kinds in the acquisition source of data source, then, getting Entity sets in may have the same or analogous entity of title, can be with to simplify and improving the degree of accuracy of knowledge mapping Merge same or analogous entity, or it is identical to title but substantially to belong to be that the entities of different entities makes a distinction.At this Apply in embodiment, methods described also includes:

The same or analogous entity of entity type in the knowledge mapping is merged, at least one in duplicate removal and differentiation .

In some embodiments, the data source may include the first tables of data and the second tables of data, the multiple entity Including at least one first instance and at least one second instance, the first instance belongs to the first tables of data, and described second is real Body belongs to the second tables of data, and the knowledge mapping is schemed including at least two UNICOMs, son between at least two UNICOM figure be present Grandson's relation and/or set membership.

Specifically, due to the data such as the entity in ultimately constructed knowledge mapping, incidence relation relation may be from There is the phenomenon repeated in different data sources, the entity that may finally occur in knowledge mapping, so will before using diagram data Duplicate removal processing is carried out, merger is carried out to conventional synonym first, is then closed using knowledge mapping connectedness in itself And same entity is merged into if the distance between entity of the same name in physical network is less than N, the relation all to its Also merging treatment is made, N value may be adjusted according to the rare degree of entity name, i.e., entity name is rarer, and N's takes Otherwise value is bigger smaller, and specific value the application is not construed as limiting.

The same or analogous entity of entity type in the knowledge mapping is merged separately below, duplicate removal and distinguish into Row is for example, wherein similarity is calculated by entity attribute insertion:

If the first, the similarity of the first instance and the second instance is higher than default similarity, and determines described first Entity and the second instance belong at least one UNICOM's figure, then merge the first instance and the second instance, or The first instance or the second instance are deleted from the knowledge mapping.

By taking Company Knowledge collection of illustrative plates as an example, as shown in a in Fig. 3, the Company Knowledge collection of illustrative plates includes two Zhang Long, and two are opened Dragon correspond to respective employee's attribute, such as a Zhang Long is in company A, also in B companies, B companies for company A subsidiary, separately One Zhang Long is in B companies.It is possible to be identified by the way of maximum UNICOM's figure, carried first from Company Knowledge collection of illustrative plates Take maximum UNICOM to scheme, be then combined with the people being had the same given name and family name in UNICOM's figure, as shown in the b in Fig. 3, final two " Zhang Long " are considered as It is same person.If not in same UNICOM's figure, then it is assumed that be different people.

If the 2nd, the similarity of the first instance and the second instance is higher than the default similarity, and described in determination First instance and the second instance do not belong to any one UNICOM figure, then it is real that described first is distinguished in the knowledge mapping Body and the second instance.

If the 3rd, the similarity of the first instance and the second instance is higher than default similarity, and determines described first Entity and the second instance belong at least one UNICOM's figure, it is determined that the first instance with the first instance direct correlation Set, and the second instance set with the second instance direct correlation.Then first instance set and second instance are compared Set, when it is determined that the first instance set and the second instance intersection of sets collection comprise at least two entities, then merging institute First instance and the second instance are stated, or the first instance or the second instance are deleted from the knowledge mapping.

Optionally, in some inventive embodiments, the knowledge mapping is based on time dimension, each on time dimension UNICOM's figure in time window is the incidence relation of inter-entity in the time window, and the snapshot of entity attribute.

Optionally, in some inventive embodiments, the knowledge mapping at least also meets one of following item:

In the knowledge mapping, have between the entity of incidence relation according to incidence relation intensity from showing to weak gradual change by force Show;

The special entity in the knowledge mapping is highlighted, the special entity marks risk assessment value, described specific Entity refers to that risk assessment value is higher than the entity of default risk assessment value;

When the entity renewal in the knowledge mapping, the entity of renewal is distinguished.

Time shaft is increased to the entity attribute of having time renewal, shows the time of replacement on a timeline.

For the entity attribute of same entity, according to the weighted value of entity attribute from high to low, by going deep into shallow coloring.

For example, when the knowledge mapping in the embodiment of the present application is applied to Company Knowledge collection of illustrative plates, Company Knowledge collection of illustrative plates Each entity and each generic attribute associated with each entity can intuitively be presented.For ease of management and rapid identification Go out the information such as critical entities, the entity changed, Company Knowledge collection of illustrative plates can also be handled as follows:

(1) importance ranking is carried out to different attribute, the importance can be customized setting for different user.

(2) display of attribute can be arranged according to significance level and coloured from deep to shallow with the color of same colour system.

(3) for occur in knowledge mapping Entity Change, inter-entity incidence relation change in the case of, can use with it is conventional Color to distinctly display the entity of stateful change or entity associated relation.

(4) can be shown by the thickness of lines, to highlight the compactness of inter-entity incidence relation.

(5) commenting by major issues such as the just negative analysis to news, the ups and downs of share price, finance change, purchase merging Estimate to obtain the risk assessment of enterprise, and identify corresponding color respectively.

(6) shown respectively using different shape or different colors for listed company's (different plates), private company Show.

After processing by having carried out above-mentioned (1)-(6) for Company Knowledge collection of illustrative plates, it can finally obtain as shown in Figure 4 Company Knowledge collection of illustrative plates.

Optionally, it is described that semantic analysis and cluster analysis are carried out to the language material set in some inventive embodiments, from Entity sets and attribute set are extracted in the language material set, including:

Language material in the language material set is segmented, semantic tagger processing, obtain the entity sets and the category Property set.

Mark the incidence relation type of the inter-entity in the entity sets.

Based on sequence labelling model, the entity sets and the attribute set are adjusted respectively, and to described Each attribute in each entity and the attribute set in entity sets is predicted respectively, obtains associating for entity and inter-entity Relationship type, and obtain the mapping between entity and attribute.

Optionally, in some inventive embodiments, the application also provides a kind of hierarchical layout side of the knowledge mapping of optimization Method.Main method is as follows:

Centered on specified entity a, the entity in knowledge mapping can be referred to as to node, such as entity a is referred to as center Node.Entity a neighboring entities are distributed on each layer annulus centered on entity a, bigger apart from hop count apart from entity a Node is then located at more on the annulus of outer layer, and the node of same level is on an anchor ring.In order in unit area screen The more nodes of upper displaying, the node on same level annulus are also distributed by multi-layer spiral, and it is whole to prevent that annular radii from crossing senior general The support of individual figure size is big, at the same during in order to check adjacent node visual field span minimum by all sons of adjacent node and Same Vertices As far as possible together, key step is as follows for the placement location of node:

(a) calculate and treat that each node is apart from entity a node distance in visual network.

(b) according to the relevant configuration such as node distance, node size, node spacing, interlayer spacing, determine in knowledge mapping The position candidate of all nodes, position now is that room does not lay any node up.

(c) central node is centrally disposed on room.

(d) all neighbor nodes of central node are taken out, distance center node is placed into most for each neighbor node On a near room.

(e) operation of (a)-(d) descriptions is repeated, until all nodes have all been placed on position, that is, is had vacant position It has been occupied full.

Optionally, in some inventive embodiments, the incidence relation any node in knowledge mapping can also be entered Row visualization processing.

Due in practical business sometimes for the incidence relation between two entity nodes of examination, so first using most Short path algorithm calculates all shortest paths between two nodes of any giant's incidence relation, then by two nodes to be examined or check It is placed on the left and right ends of knowledge mapping, then the line x of two leafs and makees N bar vertical lines, divides line x equally.

Other nodes on shortest path are randomly placed on corresponding vertical line according to the hop count of the leaf of distance two, then The position of intermediate node is adjusted using the power guiding placement algorithm of Problem with Some Constrained Conditions, the layer of node will not be adjusted during adjustment Level, the position where only adjusting node in level.The operation of the position of specific adjustment intermediate node is as follows:

(1) each node is positioned in corresponding level according to the distance away from two leafs, there is multiple knots in same level Site position randomly places in the level during point.

(2) side for connecting two nodes is envisioned as a spring, each pair node can be calculated according to phase Hooke's law Between stress, all stress of each point are synthesized to obtain the stressing conditions of the point.

(3) stress is converted into corresponding displacement according to the stressing conditions of every, i.e., node is subjected to position movement, it is mobile Process is limited by level and position, i.e., node can only move up and down in same level, as long as and node predetermined several Moved on fixed position, to ensure that the interval between each node is consistent, interface is more attractive in appearance.

(4) stressing conditions of each point are calculated after moving a certain distance again, operate each node until whole figure repeatedly Stress sum converge to a stationary value.

Optionally, in some inventive embodiments, the application can carry out visualization processing to two time point diagram changes, Using the time as dimension, the incidence relation of inter-entity dynamic change on two time points can be shown., it is specifically, first Two time points are first selected on the interface of knowledge mapping:Such as selection " time point one " and " time point two ", " time point one " For time point earlier in one day, " time point two " is then later time point in one day, " time point one " and " time point two " For time point on the same day.

Accordingly, " time point one " and " each self-corresponding network of time point two ", then to two networks can be obtained Node and side information in figure merge operation respectively, that is, the network after merging is simultaneously comprising " time point one " and " time Node and side information in point two ", when showing knowledge mapping, can to it is therein " node that is newly increased in time point two " and Incidence relation is identified using highlight color, for compared to " time point one ", " side being not present in time point two " and knot Point uses broken line representation.

A kind of method of creation of knowledge collection of illustrative plates in the application is illustrated above, below to performing above-mentioned creation of knowledge figure The device of the method for spectrum is described, and has the side for realizing the creation of knowledge collection of illustrative plates corresponded to provided in above method embodiment The function of method.The function can be realized by hardware, and corresponding software can also be performed by hardware and is realized.Hardware or software Including one or more modules corresponding with above-mentioned function phase, the module can be software and/or hardware.Number in the application Can be server or terminal device according to analytical equipment, data analysis set-up can also be mounted to server or terminal is set Standby application, specific the application are not construed as limiting.When the device is terminal device, terminal device can refer to provide a user language The equipment of sound and/or data connectivity, there is the portable equipment of wireless connecting function or be connected to radio modem Other processing equipments.Wireless terminal can be through wireless access network (English full name:Radio Access Network, English abbreviation: RAN) communicated with one or more core nets, wireless terminal can be mobile terminal, if mobile phone (or is " honeycomb " Phone) and with mobile terminal computer, for example, it may be portable, pocket, hand-held, built-in computer or Vehicle-mounted mobile device, they exchange voice and/or data with wireless access network.For example, PCS (English full name: Personal Communication Service, English abbreviation:PCS) phone, wireless phone, Session initiation Protocol (SIP) words Machine, WLL (Wireless Local Loop, English abbreviation:WLL) stand, personal digital assistant (English full name: Personal Digital Assistant, English abbreviation:The equipment such as PDA).Wireless terminal is referred to as system, Ding Hudan First (Subscriber Unit), subscriber station (Subscriber Station), movement station (Mobile Station), mobile station (Mobile), distant station (Remote Station), access point (Access Point), remote terminal (Remote Terminal), access terminal (Access Terminal), user terminal (User Terminal), terminal device, user agent (User Agent), user equipment (User Device) or subscriber's installation (User Equipment).It is as shown in figure 5, described Device 50 for creation of knowledge collection of illustrative plates includes:

Transceiver module 501, for obtaining data source, the data source includes multiple entities;

Processing module 502, the data source for being obtained to the transceiver module 501 carry out semantic analysis and cluster point Analysis, extracts entity sets and attribute set from the data source, and the attribute set includes each entity in the entity sets Entity attribute;The weighted value of each entity in the entity sets is calculated according to the entity attribute of entity;According to the weight of entity Each entity attributes in the entity sets are ranked up by value;

Create the incidence relation between each entity and attribute in the entity sets;According to the entity sets, property set Close and the incidence relation between entity and attribute creates and exports knowledge mapping, the knowledge mapping includes entity, entity category The incidence relation of incidence relation and inter-entity between property, entity and attribute.

In the embodiment of the present application, the data source that processing module 502 is got to transceiver module 501 carries out semantic analysis and gathered Alanysis, entity sets and attribute set are extracted from data source, according to each reality in the entity attribute computational entity set of entity The weighted value of body;According to the weighted value of entity, each entity attributes in entity sets are ranked up, then creates in entity sets Incidence relation between each entity and attribute, and according to associating between entity sets, attribute set and entity and attribute Relation creates and exports knowledge mapping.The knowledge mapping include incidence relation between entity, entity attribute, entity and attribute, And the incidence relation of inter-entity.By using this programme, knowledge mapping can be accurately created that, can be intuitively in reality Relation between body and attribute, and the incidence relation of inter-entity.And the knowledge mapping also allows for the displaying directly perceived of personalization, Also can be easy to be managed collectively.

Optionally, the processing module 502 is additionally operable to:

The same or analogous entity of entity type in the knowledge mapping is merged, at least one in duplicate removal and differentiation .

Optionally, the data source includes the first tables of data and the second tables of data, and the multiple entity includes at least one First instance and at least one second instance, the first instance belong to the first tables of data, and the second instance belongs to the second number According to table, the knowledge mapping is schemed including at least two UNICOMs, descendants's relation and/or father be present between at least two UNICOM figure Subrelation.

Optionally, the processing module 502 is specifically used for:

If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance At least one UNICOM figure is belonged to the second instance, then merges the first instance and the second instance, or from institute State knowledge mapping and delete the first instance or the second instance.

If the similarity of the first instance and the second instance is higher than the default similarity, and determines described first Entity and the second instance do not belong to any one UNICOM figure, then distinguished in the knowledge mapping first instance and The second instance.

Optionally, the processing module 502 is specifically used for:

If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance At least one UNICOM's figure is belonged to the second instance, it is determined that the first instance collection with the first instance direct correlation Close, and the second instance set with the second instance direct correlation;

When it is determined that the first instance set and the second instance intersection of sets collection comprise at least two entities, then merging The first instance and the second instance, or delete the first instance or the second instance from the knowledge mapping.

Optionally, the knowledge mapping is based on time dimension, UNICOM's figure in each time window on time dimension It is the incidence relation of inter-entity in the time window, and the snapshot of entity attribute.

Optionally, the knowledge mapping at least also meets one of following item:

In the knowledge mapping, have between the entity of incidence relation according to incidence relation intensity from showing to weak gradual change by force Show;

The special entity in the knowledge mapping is highlighted, the special entity marks risk assessment value, described specific Entity refers to that risk assessment value is higher than the entity of default risk assessment value;

When the entity renewal in the knowledge mapping, the entity of renewal is distinguished;

Time shaft is increased to the entity attribute of having time renewal, shows the time of replacement on a timeline;

For the entity attribute of same entity, according to the weighted value of entity attribute from high to low, by going deep into shallow coloring.

Optionally, the processing module 502 is specifically used for:

Language material in the language material set is segmented, semantic tagger processing, obtain the entity sets and the category Property set;

Mark the incidence relation type of the inter-entity in the entity sets;

Based on sequence labelling model, the entity sets and the attribute set are adjusted respectively, and to described Each attribute in each entity and the attribute set in entity sets is predicted respectively, obtains associating for entity and inter-entity Relationship type, and obtain the mapping between entity and attribute.

It should be noted that entity device corresponding to transceiver module in the embodiment corresponding to the application Fig. 5 is receives Device is sent out, entity device corresponding to processing module can be processor.Each device shown in Fig. 5 can have as shown in Figure 6 Structure, when one of which device has structure as shown in Figure 6, processor, transmitter and receiver in Fig. 6 are realized foregoing To should device the device embodiment processing module, sending module and the same or analogous function of receiving module that provide, in Fig. 6 The above-mentioned creation of knowledge collection of illustrative plates of memory storage computing device method when need the program code that calls.

In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and module, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.

In several embodiments provided herein, it should be understood that disclosed system, apparatus and method can be with Realize by another way.For example, device embodiment described above is only schematical, for example, the module Division, only a kind of division of logic function, can there is other dividing mode, such as multiple module or components when actually realizing Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or The mutual coupling discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or module Close or communicate to connect, can be electrical, mechanical or other forms.

The module illustrated as separating component can be or may not be physically separate, show as module The part shown can be or may not be physical module, you can with positioned at a place, or can also be distributed to multiple On mixed-media network modules mixed-media.Some or all of module therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

In addition, each functional module in each embodiment of the application can be integrated in a processing module, can also That modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated module is such as Fruit realized in the form of software function module and as independent production marketing or in use, a computer can be stored in can Read in storage medium.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its any combination real It is existing.When implemented in software, can realize in the form of a computer program product whole or in part.

The computer program product includes one or more computer instructions.Load and perform on computers the meter During calculation machine programmed instruction, produce whole or in part according to the flow or function described in the embodiment of the present invention.The computer can To be all-purpose computer, special-purpose computer, computer network or other programmable devices.The computer instruction can be deposited Store up in a computer-readable storage medium, or from a computer-readable recording medium to another computer-readable recording medium Transmission, for example, the computer instruction can pass through wired (example from a web-site, computer, server or data center Such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another website Website, computer, server or data center are transmitted.The computer-readable recording medium can be that computer can be deposited Any usable medium of storage is either set comprising data storages such as the integrated server of one or more usable mediums, data centers It is standby.The usable medium can be magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or partly lead Body medium (such as solid state hard disc Solid State Disk (SSD)) etc..

Technical scheme provided herein is described in detail above, specific case is applied in the application to this The principle and embodiment of application are set forth, the explanation of above example be only intended to help understand the present processes and Its core concept;Meanwhile for those of ordinary skill in the art, according to the thought of the application, in embodiment and answer With there will be changes in scope, in summary, this specification content should not be construed as the limitation to the application.

Claims (15)

1. a kind of method of creation of knowledge collection of illustrative plates, methods described are applied to data analysis set-up, it is characterised in that methods described bag Include:
Data source is obtained, the data source includes multiple entities;
Semantic analysis and cluster analysis are carried out to the data source, entity sets and attribute set are extracted from the data source, The attribute set includes the entity attribute of each entity in the entity sets;
Obtain the incidence relation between each entity and attribute in the entity sets;
Created according to the incidence relation between the entity sets, attribute set and entity and attribute and export knowledge mapping, The knowledge mapping includes the incidence relation of the incidence relation and inter-entity between entity, entity attribute, entity and attribute.
2. according to the method for claim 1, it is characterised in that methods described also includes:
Vectorization is carried out respectively to each entity in the entity sets, obtains training vector.
3. according to the method for claim 2, it is characterised in that each entity in the entity sets is carried out respectively Vectorization, training vector is obtained, including:
Entity recognition is named to each entity in the entity sets using multilayer neural network, obtains the entity of each entity Context;
The incidence relation of each inter-entity is extracted from the entity context for obtaining each entity;
According to the entity context of each entity and the incidence relation of each inter-entity, the training vector is obtained.
4. according to the method for claim 3, it is characterised in that described to use multilayer neural network in the entity sets Each entity be named Entity recognition, it is described from obtaining each entity physically after obtaining the entity context of each entity Before the incidence relation for hereinafter extracting each inter-entity, methods described also includes:
Maximization processing is carried out using the entity context of each entity of the max log likelihood method to obtaining respectively.
5. according to the method for claim 3, it is characterised in that described to be extracted respectively from the entity context for obtaining each entity It is described according to the entity context of each entity and the incidence relation of each inter-entity after the incidence relation of inter-entity, Before obtaining the entity training vector, methods described also includes:
Maximization processing is carried out using max log likelihood method respectively to the incidence relation of the inter-entity of obtained each entity.
6. according to any described method in claim 1-5, it is characterised in that described from obtaining the entity context of each entity The middle incidence relation for extracting each inter-entity, including:
According to the attribute set, the entity sets and time recurrent neural networks model, respectively in the entity sets Each entity be associated relationship marking, the incidence relation of mark include position of the word in entity, incidence relation type and Incidence relation position;
Using the weighted value of incidence relation embedding inlay technique calculated relationship type;
Candidate association relation is filtered out from the incidence relation of mark according to minimum distance principle, incidence relation type;
According to the keyword of incidence relation type to classifying to the candidate association relation filtered out, to obtain the inter-entity Incidence relation.
7. according to any described method in claim 1-6, it is characterised in that methods described also includes:
The similarity of each inter-entity is calculated by entity attribute insertion, it is same or analogous to entity type in the knowledge mapping Entity merges, in duplicate removal and differentiation at least one of.
8. according to the method for claim 7, it is characterised in that the data source includes the first tables of data and the second data Table, the multiple entity include at least one first instance and at least one second instance, and the first instance belongs to the first number According to table, the second instance belongs to the second tables of data, and the knowledge mapping is schemed including at least two UNICOMs, and described at least two Descendants's relation and/or set membership be present between logical figure.
9. according to the method for claim 8, it is characterised in that described identical to entity type in the knowledge mapping or phase As entity merge, in duplicate removal and differentiation at least one of, including:
If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance and institute State second instance and belong at least one UNICOM's figure, then merge the first instance and the second instance, or know from described Know collection of illustrative plates and delete the first instance or the second instance;
If the similarity of the first instance and the second instance is higher than the default similarity, and determines the first instance Any one UNICOM figure is not belonged to the second instance, then the first instance and described is distinguished in the knowledge mapping Second instance.
10. according to the method for claim 8, it is characterised in that it is described identical to entity type in the knowledge mapping or Similar entity merges, in duplicate removal and differentiation at least one of, including:
If the similarity of the first instance and the second instance is higher than default similarity, and determines the first instance and institute State second instance and belong at least one UNICOM's figure, it is determined that the first instance set with the first instance direct correlation, with And the second instance set with the second instance direct correlation;
When it is determined that the first instance set comprises at least two entities with the second instance intersection of sets collection, then described in merging First instance and the second instance, or delete the first instance or the second instance from the knowledge mapping.
11. according to any described methods of claim 1-10, it is characterised in that the knowledge mapping is based on time dimension, UNICOM's figure in each time window on time dimension is the incidence relation of inter-entity in the time window, and entity category The snapshot of property.
12. according to the method for claim 11, it is characterised in that the knowledge mapping at least also meets one of following item:
In the knowledge mapping, have between the entity of incidence relation according to incidence relation intensity from being shown to weak gradual change by force;
The special entity in the knowledge mapping is highlighted, the special entity marks risk assessment value, the special entity Refer to that risk assessment value is higher than the entity of default risk assessment value;
When the entity renewal in the knowledge mapping, the entity of renewal is distinguished;
Time shaft is increased to the entity attribute of existence time renewal, shows the time of replacement on a timeline;
For the entity attribute of same entity, according to the weighted value of entity attribute from high to low, by going deep into shallow coloring.
13. according to any described methods of claim 1-10, it is characterised in that described that semantic point is carried out to the language material set Analysis and cluster analysis, entity sets and attribute set are extracted from the language material set, including:
Language material in the language material set is segmented, semantic tagger processing, obtain the entity sets and the property set Close;
Mark the incidence relation type of the inter-entity in the entity sets;
Based on conditional random field models, the entity sets and the attribute set are adjusted respectively, and to the reality Each attribute in each entity and the attribute set in body set is predicted respectively, obtains associating for entity and inter-entity Set type, and obtain the mapping between entity and attribute.
14. a kind of device for creation of knowledge collection of illustrative plates, it is characterised in that described device includes:
At least one processor, memory, receiver and transmitter;
Wherein, the memory is used for store program codes, and the processor is used to call the program stored in the memory Code performs the method as described in claim any one of 1-13.
15. a kind of computer-readable storage medium, it is characterised in that it includes instruction, when run on a computer so that calculate Machine performs the method as described in claim 1-13 is any.
CN201710890548.1A 2017-09-27 2017-09-27 Method and device for creating knowledge graph CN107665252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710890548.1A CN107665252B (en) 2017-09-27 2017-09-27 Method and device for creating knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710890548.1A CN107665252B (en) 2017-09-27 2017-09-27 Method and device for creating knowledge graph

Publications (2)

Publication Number Publication Date
CN107665252A true CN107665252A (en) 2018-02-06
CN107665252B CN107665252B (en) 2020-08-25

Family

ID=61098564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710890548.1A CN107665252B (en) 2017-09-27 2017-09-27 Method and device for creating knowledge graph

Country Status (1)

Country Link
CN (1) CN107665252B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363695A (en) * 2018-02-23 2018-08-03 西南交通大学 A kind of user comment attribute extraction method based on bidirectional dependency syntax tree characterization
CN108491421A (en) * 2018-02-07 2018-09-04 北京百度网讯科技有限公司 A kind of method, apparatus, equipment and computer storage media generating question and answer
CN108877336A (en) * 2018-03-26 2018-11-23 深圳市波心幻海科技有限公司 Teaching method, cloud service platform and tutoring system based on augmented reality
CN108921213A (en) * 2018-06-28 2018-11-30 国信优易数据有限公司 A kind of entity classification model training method and device
CN109271504A (en) * 2018-11-07 2019-01-25 爱因互动科技发展(北京)有限公司 The method of the reasoning dialogue of knowledge based map
CN109522419A (en) * 2018-11-15 2019-03-26 北京搜狗科技发展有限公司 Session information complementing method and device
CN109582933A (en) * 2018-11-13 2019-04-05 北京合享智慧科技有限公司 A kind of method and relevant apparatus of determining text novelty degree
CN109635120A (en) * 2018-10-30 2019-04-16 百度在线网络技术(北京)有限公司 Construction method, device and the storage medium of knowledge mapping
CN109800671A (en) * 2018-12-28 2019-05-24 北京市遥感信息研究所 The multi-source remote sensing information knowledge map construction method and system of object-oriented interpretation
CN109815296A (en) * 2018-12-29 2019-05-28 北京中科闻歌科技股份有限公司 The personage's construction of knowledge base method, apparatus and storage medium of notarization document
CN109933674A (en) * 2019-03-22 2019-06-25 中国电子科技集团公司信息科学研究院 A kind of knowledge mapping embedding grammar and its storage medium based on attribute polymerization
CN110134842A (en) * 2019-04-03 2019-08-16 深圳价值在线信息科技股份有限公司 Information matching method, device, storage medium and server based on Information Atlas
CN110321435A (en) * 2019-06-28 2019-10-11 京东数字科技控股有限公司 A kind of data source division methods, device, equipment and storage medium
WO2020007224A1 (en) * 2018-07-06 2020-01-09 中兴通讯股份有限公司 Knowledge graph construction and smart response method and apparatus, device, and storage medium
TWI682287B (en) * 2018-10-25 2020-01-11 財團法人資訊工業策進會 Knowledge graph generating apparatus, method, and computer program product thereof
WO2020038100A1 (en) * 2018-08-22 2020-02-27 阿里巴巴集团控股有限公司 Feature relationship recommendation method and apparatus, computing device and storage medium
WO2020143326A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Knowledge data storage method, device, computer apparatus, and storage medium
CN111612633A (en) * 2020-05-27 2020-09-01 佛山市知识图谱科技有限公司 Stock analysis method, stock analysis device, computer equipment and storage medium
CN109815296B (en) * 2018-12-29 2020-12-22 北京中科闻歌科技股份有限公司 Figure knowledge base construction method and device for notarization document and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132329A1 (en) * 2015-11-05 2017-05-11 Microsoft Technology Licensing, Llc Techniques for digital entity correlation
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN106777274A (en) * 2016-06-16 2017-05-31 北京理工大学 A kind of Chinese tour field knowledge mapping construction method and system
CN106933985A (en) * 2017-02-20 2017-07-07 广东省中医院 A kind of analysis of core side finds method
CN106934032A (en) * 2017-03-14 2017-07-07 软通动力信息技术(集团)有限公司 A kind of city knowledge mapping construction method and device
CN107169078A (en) * 2017-05-10 2017-09-15 京东方科技集团股份有限公司 Knowledge of TCM collection of illustrative plates and its method for building up and computer system
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132329A1 (en) * 2015-11-05 2017-05-11 Microsoft Technology Licensing, Llc Techniques for digital entity correlation
CN106777274A (en) * 2016-06-16 2017-05-31 北京理工大学 A kind of Chinese tour field knowledge mapping construction method and system
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN106933985A (en) * 2017-02-20 2017-07-07 广东省中医院 A kind of analysis of core side finds method
CN106934032A (en) * 2017-03-14 2017-07-07 软通动力信息技术(集团)有限公司 A kind of city knowledge mapping construction method and device
CN107169078A (en) * 2017-05-10 2017-09-15 京东方科技集团股份有限公司 Knowledge of TCM collection of illustrative plates and its method for building up and computer system
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491421A (en) * 2018-02-07 2018-09-04 北京百度网讯科技有限公司 A kind of method, apparatus, equipment and computer storage media generating question and answer
CN108363695A (en) * 2018-02-23 2018-08-03 西南交通大学 A kind of user comment attribute extraction method based on bidirectional dependency syntax tree characterization
CN108363695B (en) * 2018-02-23 2020-04-24 西南交通大学 User comment attribute extraction method based on bidirectional dependency syntax tree representation
CN108877336A (en) * 2018-03-26 2018-11-23 深圳市波心幻海科技有限公司 Teaching method, cloud service platform and tutoring system based on augmented reality
CN108921213A (en) * 2018-06-28 2018-11-30 国信优易数据有限公司 A kind of entity classification model training method and device
WO2020007224A1 (en) * 2018-07-06 2020-01-09 中兴通讯股份有限公司 Knowledge graph construction and smart response method and apparatus, device, and storage medium
WO2020038100A1 (en) * 2018-08-22 2020-02-27 阿里巴巴集团控股有限公司 Feature relationship recommendation method and apparatus, computing device and storage medium
TWI682287B (en) * 2018-10-25 2020-01-11 財團法人資訊工業策進會 Knowledge graph generating apparatus, method, and computer program product thereof
CN109635120A (en) * 2018-10-30 2019-04-16 百度在线网络技术(北京)有限公司 Construction method, device and the storage medium of knowledge mapping
CN109271504A (en) * 2018-11-07 2019-01-25 爱因互动科技发展(北京)有限公司 The method of the reasoning dialogue of knowledge based map
CN109582933A (en) * 2018-11-13 2019-04-05 北京合享智慧科技有限公司 A kind of method and relevant apparatus of determining text novelty degree
CN109522419B (en) * 2018-11-15 2020-08-04 北京搜狗科技发展有限公司 Session information completion method and device
CN109522419A (en) * 2018-11-15 2019-03-26 北京搜狗科技发展有限公司 Session information complementing method and device
CN109800671A (en) * 2018-12-28 2019-05-24 北京市遥感信息研究所 The multi-source remote sensing information knowledge map construction method and system of object-oriented interpretation
CN109815296A (en) * 2018-12-29 2019-05-28 北京中科闻歌科技股份有限公司 The personage's construction of knowledge base method, apparatus and storage medium of notarization document
CN109815296B (en) * 2018-12-29 2020-12-22 北京中科闻歌科技股份有限公司 Figure knowledge base construction method and device for notarization document and storage medium
WO2020143326A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Knowledge data storage method, device, computer apparatus, and storage medium
CN109933674A (en) * 2019-03-22 2019-06-25 中国电子科技集团公司信息科学研究院 A kind of knowledge mapping embedding grammar and its storage medium based on attribute polymerization
CN110134842A (en) * 2019-04-03 2019-08-16 深圳价值在线信息科技股份有限公司 Information matching method, device, storage medium and server based on Information Atlas
CN110321435A (en) * 2019-06-28 2019-10-11 京东数字科技控股有限公司 A kind of data source division methods, device, equipment and storage medium
CN111612633A (en) * 2020-05-27 2020-09-01 佛山市知识图谱科技有限公司 Stock analysis method, stock analysis device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN107665252B (en) 2020-08-25

Similar Documents

Publication Publication Date Title
US20190073434A1 (en) Dynamically modifying elements of user interface based on knowledge graph
Steiger et al. An advanced systematic literature review on spatiotemporal analyses of t witter data
Mishra et al. Vision, applications and future challenges of Internet of Things
Giannoulakis et al. Evaluating the descriptive power of Instagram hashtags
Yuan et al. We know how you live: exploring the spectrum of urban lifestyles
Lim et al. Business intelligence and analytics: Research directions
CN104854583B (en) Search result rank and presentation
Zheng et al. Visual analytics in urban computing: An overview
CN106408252B (en) It presents and is directed to current location or the information of time
CN103714450B (en) The warning of natural language condition metric generates
McKenzie et al. A weighted multi-attribute method for matching user-generated points of interest
CN103823844B (en) Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
Saggion et al. Ontology-based information extraction for business intelligence
AU2010330720B2 (en) System and method for attentive clustering and related analytics and visualizations
CN103116588B (en) A kind of personalized recommendation method and system
JP5525673B2 (en) Enterprise web mining system and method
US8543532B2 (en) Method and apparatus for providing a co-creation platform
CN103399883B (en) Method and system for performing personalized recommendation according to user interest points/concerns
CN103955505B (en) A kind of event method of real-time and system based on microblogging
CN101420313B (en) Method and system for clustering customer terminal user group
Psyllidis et al. A platform for urban analytics and semantic data integration in city planning
Kumar et al. Twitter data analytics
Federico et al. A survey on visual approaches for analyzing scientific literature and patents
Di Lorenzo et al. EXSED: an intelligent tool for exploration of social events dynamics from augmented trajectories
US20120296974A1 (en) Social network for media topics of information relating to the science of positivism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant