CN110188207A - Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment - Google Patents
Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment Download PDFInfo
- Publication number
- CN110188207A CN110188207A CN201910408077.5A CN201910408077A CN110188207A CN 110188207 A CN110188207 A CN 110188207A CN 201910408077 A CN201910408077 A CN 201910408077A CN 110188207 A CN110188207 A CN 110188207A
- Authority
- CN
- China
- Prior art keywords
- entity
- attribute
- link
- information
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclose a kind of knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment, the embodiment of the present application uses the data set partially marked, that is the structuring of encyclopaedia and semi-structured data are linked using the characteristic of itself, improve the accuracy rate and efficiency of link process.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of knowledge mapping construction methods and device, readable storage
Medium, electronic equipment.
Background technique
The building process of knowledge mapping generally comprises information extraction, knowledge fusion, knowledge processing.Wherein, information extraction packet
Include entity extraction, Relation extraction, attribute extraction;Knowledge fusion includes that entity link and knowledge merge;Knowledge processing includes ontology
Building, knowledge reasoning, quality evaluation and the renewal of knowledge.The main tool about the fusion of knowledge mapping has Falcon- at present
AO, Dedupe, Limes, Silk etc..
In traditional link method the data mode of information extraction mainly include structural data, semi-structured data and
Unstructured data.The structure of structural data is more single, can be converted into triple use, but data scale is smaller,
Generally used in specific area;Unstructured data is needed using the methods of statistical learning or machine learning, by entity
Extraction, Relation extraction, attribute extraction could use after being converted into triple, but current accuracy rate is lower, is not met by quotient
Industry demand.Semi-structured data balanced data scale and accuracy rate can switch to structuring number by pre-processing and standardizing
According to so be converted to triple.Simultaneously in the link process of knowledge fusion, the program of conventional link method it is cumbersome and it is easy go out
Existing error leads to accuracy rate and inefficient.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of knowledge mapping construction method and device, readable storage medium storing program for executing, electricity
Sub- equipment, it is intended to improve the accuracy rate and efficiency of link process.
In a first aspect, the knowledge mapping is for indicating the embodiment of the invention discloses a kind of knowledge mapping construction method
Relationship between the entity information and different entities of entity, which comprises
Information pattern is created according to the data information that crawls, and the information pattern includes concept and corresponding with concept
Attribute;
Entity classification is carried out to data information according to the information pattern, the entity information of each entity is determined and is used for
Identify that the entity identifier of the entity, the entity information include entity attribute and attribute value, the entity attribute includes data
Attribute and object properties, the attribute value of the object properties are directed toward another entity;
Unify the entity attribute title of different entities according to the title of attribute in the information pattern;
A mark link or non-identifying link are determined for each entity;
The identical entity of comparison link prefix, by the entity attribute of non-identifying link and the entity attribute of inlet identity link,
Merge the identical entity attribute of each entity, it does not include the part identified that the link prefix, which is in linking,;
The identical entity of title is compared, is greater than threshold value in response to matched entity attribute, by the entity information of different entities
It is incorporated to same entity, merges the identical entity attribute of each entity;
Export the relationship between the entity information and different entities of entity.
Further, described to determine that a mark links or non-identifying link includes: for each entity
The object properties of segmentation entity make the corresponding attribute value of each object properties;
A mark link or non-identifying link are determined for each object attribute values of each entity and the entity.
It is further, described to determine a mark link or non-identifying link for each entity further include:
In response to there is no marks corresponding with the entity to link in the data information, a non-identifying chain is determined
It connects.
It is further, described to determine a mark link or non-identifying link for each entity further include:
It being linked in response to there is mark corresponding with the entity in the data information, selecting a mark link.
Further, the method also includes:
Data type in response to the data attribute value is character string type, increases language for the data attribute value
Label;
Data type in response to the data attribute value is numeric type, the unit of the unified data attribute value.
Further, the entity attribute by the non-identifying link entity attribute that simultaneously inlet identity links includes:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame to institute
There is the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
Further, described the entity information of different entities is incorporated to same entity to include:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame to institute
There is the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
Second aspect, the embodiment of the invention discloses a kind of knowledge mapping construction device, the knowledge mapping is for indicating
Relationship between the entity information and different entities of entity, described device include:
Mode construction module, for creating information pattern according to the data information crawled, the information pattern includes general
Thought and attribute corresponding with concept.
Entity classification module determines each entity for carrying out entity classification to data information according to the information pattern
Entity information and the entity identifier of the entity for identification, the entity information includes entity attribute and attribute value, described
Entity attribute includes data attribute and object properties, and the attribute value of the object properties is directed toward another entity.
Standardized module, for unifying the entity attribute name of different entities according to the title of attribute in the information pattern
Claim.
Determining module is linked, for determining a mark link or non-identifying link for each entity.
The entity attribute of non-identifying link is incorporated to mark for comparing the identical entity of title by first instance merging module
The entity attribute for knowing link, merges the identical entity attribute of each entity.
Second instance merging module is greater than threshold in response to matched entity attribute for comparing the identical entity of title
Value, is incorporated to same entity for the entity information of different entities, merges the identical entity attribute of each entity.
Information transmission modular, the relationship between entity information and different entities for exporting entity.
The third aspect, the embodiment of the invention discloses a kind of computer readable storage medium, the computer program instructions
As above described in any item methods are realized when being executed by processor.
Fourth aspect, the embodiment of the invention discloses a kind of electronic equipment, including memory and processor, the storages
Device is for storing one or more computer program instructions, wherein one or more computer program instructions are by the place
Reason device is executed to realize described in any item methods as above.
The embodiment of the present invention uses the data set partially marked, the i.e. structuring of encyclopaedia and semi-structured data, benefit
It is linked with the characteristic of itself, improves the accuracy rate of link process.
Detailed description of the invention
By referring to the drawings to the description of the embodiment of the present invention, the above and other purposes of the present invention, feature and
Advantage will be apparent from, in the accompanying drawings:
Fig. 1 is the flow chart of the knowledge mapping construction method of the embodiment of the present invention;
Fig. 2 is the schematic diagram of two entity informations of one optional implementation of the embodiment of the present invention;
Fig. 3 is that one optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link
Flow chart;
Fig. 4 is that one optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link
Schematic diagram;
Fig. 5 is the schematic diagram of the knowledge mapping construction device of the embodiment of the present invention;
Fig. 6 is the schematic diagram of the electronic equipment of the embodiment of the present invention.
Specific embodiment
Below based on embodiment, present invention is described, but the present invention is not restricted to these embodiments.Under
Text is detailed to describe some specific detail sections in datail description of the invention.Do not have for a person skilled in the art
The present invention can also be understood completely in the description of these detail sections.In order to avoid obscuring essence of the invention, well known method,
There is no narrations in detail for process, process.
In addition, it should be understood by one skilled in the art that provided herein attached drawing be provided to explanation purpose, and
And attached drawing is not necessarily drawn to scale.
Unless the context clearly requires otherwise, "include", "comprise" otherwise throughout the specification and claims etc. are similar
Word should be construed as the meaning for including rather than exclusive or exhaustive meaning;That is, be " including but not limited to " contains
Justice.
In the description of the present invention, it is to be understood that, term " first ", " second " etc. are used for description purposes only, without
It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present invention, unless otherwise indicated, the meaning of " multiple "
It is two or more.
Fig. 1 is a kind of flow chart of knowledge mapping construction method of the embodiment of the present invention, as shown in Figure 1, the knowledge graph
Composing construction method includes:
Step S100: information pattern is created according to the data information crawled.
Specifically, the information pattern includes concept and attribute corresponding with concept.For example may be used in the data information source
To be encyclopaedia webpage, content for example may include URL, title, synonym, brief introduction, text, Infobox, entry label, webpage
Link etc..The information pattern includes concept and attribute corresponding with concept.Wherein the concept further can also include general
Hyponymy between thought, such as: when the upperseat concept is people, subordinate concept may include: singer, performer, scholar, religion
It awards, enterpriser etc..The attribute further can also include attribute value range and attribute unit, the attribute value range
For judging the reasonability of the attribute value, such as when the attribute is the height of people, attribute value is that 30m is that can determine whether institute
It is unreasonable to state attribute value.The attribute unit according to the attribute setup, such as when the attribute be height when, the attribute
Unit is cm or m etc.;When the attribute is weight, the attribute unit is kg or g etc..
During creating the information pattern, needs first to analyze the data information, count the data
The appearance frequency of each attribute in information, and the attribute is ranked up according to the size of the frequency.Further according to setting
Ratio screens the attribute in the attribute list.Such as when the ratio set is 70%, and the attribute list is according to category
Property occur the descending sequence of frequency, then take before the attribute list 70% attribute.The method can be to important category
Property screened, prevent attribute in the information pattern excessively caused by low efficiency.
Step S200: to data information carry out entity classification, determine each entity entity information and for identification described in
The mark of entity.
Specifically, according in the information pattern concept and attribute corresponding with concept to data information carry out entity
Classification determines the entity information and the mark of the entity for identification of each entity.Such as according in the information pattern
" Himalaya " is divided into " product " entity and " mountain range " entity by concept " product " and " mountain range ".The entity information includes
Entity attribute and attribute value, the entity attribute include data attribute and object properties, and the attribute value of the object properties refers to
To another entity.
During carrying out entity classification, disaggregated model is obtained by pretreatment first, specifically, parses the number
It is believed that breath, extract the characteristic information in the data information, such as can be entity attribute title, entry label, brief introduction etc..
According to the characteristic information train classification models, the process of the train classification models can for example be carried out by neural network,
The neural network for example can be TextCNN or TextRNN etc..Into the disaggregated model, input includes the feature again
The data of information obtain entity and entity information corresponding with the entity.Entity attribute in the entity is the letter
The attribute set in breath mode.
Step S300: unify the entity attribute title of different entities according to the title of attribute in the information pattern.
Specifically, the entity attribute title difference in the data information, which may cause identical entity, to merge, example
Such as: for same entity, because of entity attribute: " birthplace " and " being born in ", " height " and "high", " occupation " and " work " word
It is different on face, although the same meaning, but if directly compare entity attribute title can make entity attribute equivalent in meaning without
Method compares, and causes should merge two entities that cannot merge.Therefore it needs to set a property according in the information pattern
Title unify the entity attribute titles of different entities, prevent same entity from can not merge.
Step S400: a mark link or non-identifying link are determined for each entity.
Specifically, the attribute value of object properties is directed toward another entity in the entity information of the entity, therefore described is
Each entity determines that the process of a mark link or non-identifying link includes determining one for each object attribute values of entity
Mark link or non-identifying link.Wherein it is described mark be linked as include the mark of the entity for identification link, climbing
It is selected in the data information got.It is described it is non-identifying be linked as do not include the mark of the entity for identification link.
Further, while determining a mark link or non-identifying link for each entity, in response to the number
Data type according to attribute value is character string type, increases linguistic labels for the data attribute value.In response to the data category
Property value data type be numeric type, the unit of the unified data attribute value.I.e. when the type of the data attribute value is
It is that the data attribute value of the character string type increases linguistic labels when character string type.Such as: the attribute value is " famous person
First " then increases linguistic labels@zh, when the type of the data attribute value is numeric type, the unified data attribute value
Unit.Such as: the attribute value is " 173cm ", and the specification attribute value unit is m, is exported " 1.73m ".
Step S500: the identical entity of comparison link prefix, by the entity attribute of non-identifying link and inlet identity link
Entity attribute merges the identical entity attribute of each entity.
Specifically, the link of the entity includes mark link and non-identifying link, i.e. the entity of same names includes mark
Know the entity of link and the entity of non-identifying link.Such as: the entity of entitled Zhang San includes link are as follows: "/item/ Zhang San/
1234 " entity, " Zhang San/6789 /item/ " entity and be linked as the entity of "/item/ Zhang San/".The link prefix
For the part in link not including mark, such as "/item/ Zhang San/" portion in mark link " Zhang San/1234 /item/ "
Point and it is non-identifying link "/item/ Zhang San/" full content.It, first will link during merging the entity attribute
Entity attribute for the entity of non-identifying link is split, and the entity attribute after fractionation is distinguished, difference is respectively incorporated into
The entity for the mark link being linked as.Finally identical entity attribute in each entity is merged, such as when the chain
The entity for being connected in " Zhang San/1234 /item/ " includes entity attribute " age: 23 years old ", described by being linked as non-identifying link
When the entity attribute that entity is incorporated to also includes " age: 23 years old ", the entity attribute " age " is merged.
Further, when the entity attribute process of the entity attribute by non-identifying link and inlet identity link is based on
When different hosts, determines the mapping relations between target entity mark and entity identifier to be combined, broadcasted by Spark frame
To all for storing the host of the data information.Further, the target entity mark is linked as described non-identifying
The entity identifier of entity is linked, the entity identifier to be combined is the entity identifier for being linked as the mark link entity.Every
The locally stored entity identifier of screening, executes merging in platform host.That is first reflecting the entity identifier for needing combined entity
It penetrates figure and is collected into driving end, broadcast mechanism is recycled to broadcast the mapping graph, every host can screen the machine after receiving mapping
Only there is the entity identifier for including in mapping graph and just merge in entity identifier.Due to the mapping graph storage entity mark
Know, without information such as attributes, so volume greatly reduces, the efficiency of broadcast is greatly improved.
Step S600: the identical entity of comparison title is greater than threshold value in response to matched entity attribute, by different entities
Entity information be incorporated to same entity, merge the identical entity attribute of each entity.
Specifically, all entities are grouped according to the title of the entity, the entitled broad sense title, such as can be with
It is: a variety of attributes relevant to name such as the pet name, title.Then compare in order, since second entity with it is previous
Entity is compared, if entity attribute matches, just remembers+1, if it is different, so just remembering 0.In response to matched entity
Attribute is greater than threshold value, it is believed that two entities are same entity, and the entity information of different entities is incorporated to same entity.Such as it sets
Fixed matched entity attribute is greater than the half of all entity attribute intersections, even if then matching.Specifically, two entitled songs
A song A is crossed in the entity of hand first, a personal concert, and one has recorded 100 first songs, wherein one is song A, then also judging
To be matched on " works " this entity attribute.
Further, when judging whether entity attribute matches, for data attribute, the judgement if attribute value is equal
Match.For object properties, then judged with the entity name that attribute value is directed toward, such as the name of two Zhou Jielun their spouses
Word is all elder brother's icepro, is a people regardless of the two elder brothers insult, we judge that the two Zhou Jielun are the same persons.
Further, it when the process that the entity information of different entities is incorporated to same entity is based on different hosts, determines
Mapping relations between target entity mark and entity identifier to be combined are broadcasted to all by Spark frame for storing institute
State the host of data information.Locally stored entity identifier is screened in every host, executes merging.It will first need to merge
The entity identifier of entity and the entity identifier of target entity between mapping graph be collected into driving end, recycle broadcast mechanism
Broadcast the mapping graph, every host receives the entity identifier that locally stored entity can be screened after mapping, only there is mapping
Entity identifier in figure just merges.Since the mapping graph storage entity identifies, without information such as attributes, so body
Product greatly reduces, and the efficiency of broadcast greatly improves.
Step S700: the relationship between the entity information and different entities of entity is exported.
Specifically, output is through merging treated entity and entity information, i.e., the described entity identifier, entity attribute and category
Property value constitute triple.The attribute value of the object properties of entity is directed toward another entity, and different entities is made to generate connection.Cause
This also exports the entity and different entities while exporting the triple that entity, entity attribute and attribute value are constituted together
Between relationship.
The method is directly linked using the data rule in encyclopaedia data source, and the accurate of link process is improved
Degree.The process of the knowledge fusion is based on Spark frame simultaneously, is executed using Distributed Architecture, can be general in more configurations
It is concurrently executed on host, reduces the cost for purchasing high-performance host, and improve the efficiency of execution.
Fig. 2 is the schematic diagram of two entity informations of one optional implementation of the embodiment of the present invention, as shown in Fig. 2, two
" title " of a entity is " famous person's first ", but entity identifier also includes attribute { " birthplace ": " platform for the entity of " 123 "
Gulf ";" height ": " 173cm ";" blood group ": " O ";" occupation ": " singer " }, entity identifier is that the entity of " 124 " also includes attribute
{ " being born in ": " Taiwan ";"high": " 173cm ";" blood group ": " O ";" work ": " singer " }.Wherein entity attribute " blood group ":
" O " is identical, and the attribute value phase of entity attribute " birthplace " and " being born in ", " height " and "high", " occupation " and " work "
Together, entity attribute title difference.The different entity attribute title only literal upper difference, although the same meaning,
It can make entity attribute equivalent in meaning that can not compare if directly comparing entity attribute title, lead to should merge two
Entity cannot merge.Therefore need to unify according to the title to set a property the entity attribute of different entities in the information pattern
Title prevents identical entity from can not merge.Such as when the Property Name of setting is " birthplace ", " height ", " blood group ", " work
Make " when, after consolidated entity Property Name, entity identifier is that the entity of " 123 " includes attribute { " birthplace ": " Taiwan ";
" height ": " 173cm ";" blood group ": " O ";" work ": " singer " }, entity identifier is that the entity of " 124 " equally includes attribute
{ " birthplace ": " Taiwan ";" height ": " 173cm ";" blood group ": " O ";" work ": " singer " }.Pass through comparison entity attribute-name
Attribute value corresponding with the entity attribute is that can determine whether that the entity identifier is referred to as " 123 " and entity identifier is the reality of " 124 "
Body is same entity, can directly be merged.
Fig. 3 is that an optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link
Flow chart, determine mark link as shown in figure 3, described for entity or the method for non-identifying link includes:
Step S410: the object properties for dividing entity make the corresponding attribute value of each object properties.
Specifically, the attribute value of the object properties of the entity is split, such as the works of famous person's first there are " works
A " and " works B ", it is split as { " works: works A ", " works: works B " }
Step S420: a non-identifying link is determined.
Specifically, the step S420 is in response to being not present mark chain corresponding with the entity in the data information
It connects, determines a non-identifying link.Further, in response in the Infobox in data information or other positions include and institute
The corresponding link of entity is stated, and the link is determined as the corresponding non-identifying link of entity without mark by the link;It rings
It should be linked in data information there is no corresponding with the entity, create a virtual linkage corresponding with the entity, institute
It states virtual linkage and does not include mark.
Step S430: one mark link of selection.
Specifically, the step S430 is linked in response to there is mark corresponding with the entity in the data information,
Select a mark link.Further, in response in the Infobox in data information or other positions include and the reality
The link comprising mark is chosen in the corresponding link of body in the link, and comprising mark link to be determined as entity corresponding by described
Mark link.
The method can determine corresponding link for each entity, and the entity is facilitated directly to utilize encyclopaedia data source
In data rule linked, accuracy is high.
It should be understood that those skilled in the art can also realize above-mentioned pre-treatment step using other existing algorithms.
Fig. 4 is that one optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link
Schematic diagram, as shown in figure 4, the initial triple includes entity attribute and entity property value.The wherein entity attribute packet
It includes: { " title ": " famous person's first ";" height ": " 173cm ";" wife ": " famous person's second ";" works ": " works A, works B, works
C " }, wherein " title " and " height " is data attribute, " wife " and " works " is object properties, wherein each
Attribute value is directed toward an entity.
The attribute value of object properties in the initial triple is split, entity in triple after obtained segmentation
Attribute includes { " title ": " famous person's first ";" height ": " 173cm ";" wife ": " famous person's second ";" works ": " works A ";" make
Product ": " works B ";" works ": " works C " }, so that each object properties is only corresponded to an object attribute values.
Triple after the segmentation is handled, determines a mark link or non-identifying link for each entity, i.e.,
A mark or non-identifying link are determined for the attribute value of each object properties.Simultaneously in response to the data attribute value
Data type is character string type, increases linguistic labels for the data attribute value.In response to the data of the data attribute value
Type, the unit of the unified data attribute value.If treated in Fig. 4 shown in triple, the data attribute " title "
Attribute value is " famous person's first ", and the data type is character string type, therefore increases linguistic labels@for the data attribute value
Zh, the attribute value of the data attribute " height " are " 173cm ", and the data type is numeric type, therefore according to setting
The unit of the unified data attribute value of attribute value unit, such as when the unit set is m, by the data attribute value
" 173cm " is converted to " 1.73m "." wife " and " works " is object properties, therefore is the attribute of each object properties
Value determines a connection.Such as: mark link http://baike/item/ famous person second/1234 are selected for " famous person's second ", are
" works A " selection mark link " http://baike/item/ works A/567 ", for " works B " selection mark link
" http://baike/item/ works B/557 " determines non-identifying link http://baike/item/ works for " works C "
C/。
Fig. 5 is the schematic diagram of the knowledge mapping construction device of the embodiment of the present invention, as shown in figure 5, the knowledge mapping structure
Building device includes mode construction module 51, entity classification module 52, standardized module 53, link determining module 54, first instance
Merging module 55, second instance merging module 56 and information transmission modular 57.
Specifically, mode construction module 51 is used for according to the data information creation information pattern crawled, the information mould
Formula includes concept and attribute corresponding with concept.Entity classification module 52 be used for according to the information pattern to data information into
Row entity classification determines the entity information and the entity identifier of the entity for identification of each entity, the entity information packet
Entity attribute and attribute value are included, the entity attribute includes data attribute and object properties, and the attribute value of the object properties refers to
To another entity.Standardized module 53 is used to unify according to the title of attribute in the information pattern entity of different entities
Property Name.Link determining module 54 is used to determine a mark link or non-identifying link for each entity.First instance closes
And module 55 is for comparing the identical entity of title, by the entity attribute of non-identifying link and the entity attribute of inlet identity link,
Merge the identical entity attribute of each entity.Second instance merging module 56 for comparing the identical entity of title, in response to
The entity attribute matched is greater than threshold value, and the entity information of different entities is incorporated to same entity, merges the identical reality of each entity
Body attribute.Information transmission modular 57 is used to export the relationship between the entity information and different entities of entity.
Described device is directly linked using the data rule in encyclopaedia data source, and the accurate of link process is improved
Rate and efficiency.
Fig. 6 is the schematic diagram of the electronic equipment of the embodiment of the present invention, as shown in fig. 6, in the present embodiment, the electronics
Equipment includes server, terminal etc..As shown, the electronic equipment includes: at least one processor 62;With at least one
The memory 61 of processor communication connection;And the communication component 63 with storage medium communication connection, communication component 63 are being handled
Data are sended and received under the control of device 62;Wherein, memory 61 is stored with the finger that can be executed by least one processor 62
It enables, instruction is executed by least one processor 62 to realize the knowledge mapping construction method in above-described embodiment.
Specifically, the memory 61 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-easy
The property lost software program, non-volatile computer executable program and module.Processor 62 is stored in memory 61 by operation
In non-volatile software program, instruction and module, thereby executing the various function application and data processing of equipment, i.e.,
Realize above-mentioned knowledge mapping construction method.
Memory 61 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, at least one function;It storage data area can the Save option list etc..In addition, memory 61 can be with
It can also include nonvolatile memory including high-speed random access memory, a for example, at least disk memory is dodged
Memory device or other non-volatile solid state memory parts.In some embodiments, it includes relative to processing that memory 61 is optional
The remotely located memory of device, these remote memories can pass through network connection to external equipment.The example packet of above-mentioned network
Include but be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
One or more module is stored in memory 61, when being executed by one or more processing 62, in execution
State the knowledge mapping construction method in any means embodiment.
The said goods can be performed the embodiment of the present application provided by method, have the corresponding functional module of execution method and
Beneficial effect, the not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The invention further relates to a kind of computer readable storage mediums, for storing computer-readable program, the computer
Readable program is used to execute above-mentioned all or part of embodiment of the method for computer.
That is, it will be understood by those skilled in the art that implement the method for the above embodiments be can be with
Relevant hardware is instructed to complete by program, which is stored in a storage medium, including some instructions are to make
It obtains an equipment (can be single-chip microcontroller, chip etc.) or processor (processor) executes each embodiment the method for the application
All or part of the steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store
The medium of program code.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art
For, the invention can have various changes and changes.All any modifications made within the spirit and principles of the present invention, etc.
With replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of knowledge mapping construction method, the knowledge mapping is between the entity information and different entities of presentation-entity
Relationship, which is characterized in that the described method includes:
Information pattern is created according to the data information crawled, the information pattern includes concept and attribute corresponding with concept;
Entity classification is carried out to data information according to the information pattern, the entity information for determining each entity and institute for identification
State the entity identifier of entity, the entity information includes entity attribute and attribute value, the entity attribute include data attribute and
Object properties, the attribute value of the object properties are directed toward another entity;
Unify the entity attribute title of different entities according to the title of attribute in the information pattern;
A mark link or non-identifying link are determined for each entity;
The identical entity of comparison link prefix merges the entity attribute of non-identifying link and the entity attribute of inlet identity link
The identical entity attribute of each entity, the link prefix be do not include the part identified in link;
The identical entity of title is compared, is greater than threshold value in response to matched entity attribute, the entity information of different entities is incorporated to
Same entity merges the identical entity attribute of each entity;
Export the relationship between the entity information and different entities of entity.
2. the method as described in claim 1, which is characterized in that described is that each entity determines a mark link or non-identifying
Link includes:
The object properties of segmentation entity make the corresponding attribute value of each object properties;
A mark link or non-identifying link are determined for each object attribute values of each entity and the entity.
3. method according to claim 2, which is characterized in that described is that each entity determines a mark link or non-identifying
Link further include:
In response to there is no marks corresponding with the entity to link in the data information, a non-identifying link is determined.
4. method according to claim 2, which is characterized in that described is that each entity determines a mark link or non-identifying
Link further include:
It being linked in response to there is mark corresponding with the entity in the data information, selecting a mark link.
5. the method as described in claim 1, which is characterized in that the method also includes:
Data type in response to the data attribute value is character string type, increases linguistic labels for the data attribute value;
Data type in response to the data attribute value is numeric type, the unit of the unified data attribute value.
6. the method as described in claim 1, which is characterized in that the entity attribute by non-identifying link and inlet identity link
Entity attribute include:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame useful to institute
In the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
7. the method as described in claim 1, which is characterized in that described that the entity information of different entities is incorporated to same entity packet
It includes:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame useful to institute
In the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
8. a kind of knowledge mapping construction device, the knowledge mapping is between the entity information and different entities of presentation-entity
Relationship, which is characterized in that described device includes:
Mode construction module, for constructing information pattern according to the data information that crawls, the information pattern include concept and
Attribute corresponding with concept;
Entity classification module determines the reality of each entity for carrying out entity classification to data information according to the information pattern
Body information and for identification entity identifier of the entity, the entity information include entity attribute and attribute value, the entity
Attribute includes data attribute and object properties, and the attribute value of the object properties is directed toward another entity;
Standardized module, for unifying the entity attribute title of different entities according to the title of attribute in the information pattern;
Determining module is linked, for determining a mark link or non-identifying link for each entity;
First instance merging module, for comparing the identical entity of title, by the entity attribute of non-identifying link and inlet identity chain
The entity attribute connect merges the identical entity attribute of each entity;
Second instance merging module is greater than threshold value in response to matched entity attribute for comparing the identical entity of title, will not
Entity information with entity is incorporated to same entity, merges the identical entity attribute of each entity;
Information transmission modular, the relationship between entity information and different entities for exporting entity.
9. a kind of computer readable storage medium, which is characterized in that the computer program instructions are real when being executed by processor
Now such as method of any of claims 1-7.
10. a kind of electronic equipment, including memory and processor, which is characterized in that the memory is for storing one or more
Computer program instructions, wherein one or more computer program instructions are executed by the processor to realize such as power
Benefit requires method described in any one of 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910408077.5A CN110188207B (en) | 2019-05-15 | 2019-05-15 | Knowledge graph construction method and device, readable storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910408077.5A CN110188207B (en) | 2019-05-15 | 2019-05-15 | Knowledge graph construction method and device, readable storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110188207A true CN110188207A (en) | 2019-08-30 |
CN110188207B CN110188207B (en) | 2021-06-04 |
Family
ID=67716582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910408077.5A Active CN110188207B (en) | 2019-05-15 | 2019-05-15 | Knowledge graph construction method and device, readable storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110188207B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390099A (en) * | 2019-06-28 | 2019-10-29 | 河海大学 | A kind of object relationship extraction system and abstracting method based on template library |
CN112507035A (en) * | 2020-11-25 | 2021-03-16 | 国网电力科学研究院武汉南瑞有限责任公司 | Power transmission line multi-source heterogeneous data unified standardized processing system and method |
CN112765283A (en) * | 2021-01-19 | 2021-05-07 | 上海明略人工智能(集团)有限公司 | Entity link relation management method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776711A (en) * | 2016-11-14 | 2017-05-31 | 浙江大学 | A kind of Chinese medical knowledge mapping construction method based on deep learning |
US20190012405A1 (en) * | 2017-07-10 | 2019-01-10 | International Business Machines Corporation | Unsupervised generation of knowledge learning graphs |
CN109446343A (en) * | 2018-11-05 | 2019-03-08 | 上海德拓信息技术股份有限公司 | A kind of method of public safety knowledge mapping building |
CN109597855A (en) * | 2018-11-29 | 2019-04-09 | 北京邮电大学 | Domain knowledge map construction method and system based on big data driving |
-
2019
- 2019-05-15 CN CN201910408077.5A patent/CN110188207B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776711A (en) * | 2016-11-14 | 2017-05-31 | 浙江大学 | A kind of Chinese medical knowledge mapping construction method based on deep learning |
US20190012405A1 (en) * | 2017-07-10 | 2019-01-10 | International Business Machines Corporation | Unsupervised generation of knowledge learning graphs |
CN109446343A (en) * | 2018-11-05 | 2019-03-08 | 上海德拓信息技术股份有限公司 | A kind of method of public safety knowledge mapping building |
CN109597855A (en) * | 2018-11-29 | 2019-04-09 | 北京邮电大学 | Domain knowledge map construction method and system based on big data driving |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390099A (en) * | 2019-06-28 | 2019-10-29 | 河海大学 | A kind of object relationship extraction system and abstracting method based on template library |
CN110390099B (en) * | 2019-06-28 | 2023-01-31 | 河海大学 | Object relation extraction system and method based on template library |
CN112507035A (en) * | 2020-11-25 | 2021-03-16 | 国网电力科学研究院武汉南瑞有限责任公司 | Power transmission line multi-source heterogeneous data unified standardized processing system and method |
CN112765283A (en) * | 2021-01-19 | 2021-05-07 | 上海明略人工智能(集团)有限公司 | Entity link relation management method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110188207B (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108446540B (en) | Program code plagiarism type detection method and system based on source code multi-label graph neural network | |
JP5040925B2 (en) | Information extraction rule creation support system, information extraction rule creation support method, and information extraction rule creation support program | |
CN111159395B (en) | Chart neural network-based rumor standpoint detection method and device and electronic equipment | |
CN111444334B (en) | Data processing method, text recognition device and computer equipment | |
CN101739335B (en) | Recommended application evaluation system | |
KR101276602B1 (en) | System and method for searching and matching data having ideogrammatic content | |
CN103823824B (en) | A kind of method and system that text classification corpus is built automatically by the Internet | |
US8370808B2 (en) | Apparatus and a method for generating a test case | |
CN110347894A (en) | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler | |
US20080097937A1 (en) | Distributed method for integrating data mining and text categorization techniques | |
CN109325201A (en) | Generation method, device, equipment and the storage medium of entity relationship data | |
CN105912633A (en) | Sparse sample-oriented focus type Web information extraction system and method | |
CN110413780A (en) | Text emotion analysis method, device, storage medium and electronic equipment | |
CN105378731A (en) | Correlating corpus/corpora value from answered questions | |
CN109933656A (en) | Public sentiment polarity prediction technique, device, computer equipment and storage medium | |
CN109726274A (en) | Problem generation method, device and storage medium | |
CN110188207A (en) | Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment | |
CN111079043A (en) | Key content positioning method | |
KR20210023452A (en) | Apparatus and method for review analysis per attribute | |
CN109241319A (en) | A kind of picture retrieval method, device, server and storage medium | |
CN108664599A (en) | Intelligent answer method, apparatus, intelligent answer server and storage medium | |
CN107102993A (en) | A kind of user's demand analysis method and device | |
CN111507083A (en) | Text analysis method, device, equipment and storage medium | |
CN115547466B (en) | Medical institution registration and review system and method based on big data | |
CN113254507B (en) | Intelligent construction and inventory method for data asset directory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210412 Address after: 210038 8th floor, building D11, Hongfeng science and Technology Park, Nanjing Economic and Technological Development Zone, Jiangsu Province Applicant after: New Technology Co.,Ltd. Address before: 100190 1001, 10th floor, office building a, 19 Zhongguancun Street, Haidian District, Beijing Applicant before: Mobvoi Information Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |