CN110188207A - Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment - Google Patents

Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment Download PDF

Info

Publication number
CN110188207A
CN110188207A CN201910408077.5A CN201910408077A CN110188207A CN 110188207 A CN110188207 A CN 110188207A CN 201910408077 A CN201910408077 A CN 201910408077A CN 110188207 A CN110188207 A CN 110188207A
Authority
CN
China
Prior art keywords
entity
attribute
link
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910408077.5A
Other languages
Chinese (zh)
Other versions
CN110188207B (en
Inventor
徐丰硕
林凤绿
王倪东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mobvoi Innovation Technology Co Ltd
Original Assignee
Chumen Wenwen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chumen Wenwen Information Technology Co Ltd filed Critical Chumen Wenwen Information Technology Co Ltd
Priority to CN201910408077.5A priority Critical patent/CN110188207B/en
Publication of CN110188207A publication Critical patent/CN110188207A/en
Application granted granted Critical
Publication of CN110188207B publication Critical patent/CN110188207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclose a kind of knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment, the embodiment of the present application uses the data set partially marked, that is the structuring of encyclopaedia and semi-structured data are linked using the characteristic of itself, improve the accuracy rate and efficiency of link process.

Description

Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment
Technical field
The present invention relates to field of computer technology more particularly to a kind of knowledge mapping construction methods and device, readable storage Medium, electronic equipment.
Background technique
The building process of knowledge mapping generally comprises information extraction, knowledge fusion, knowledge processing.Wherein, information extraction packet Include entity extraction, Relation extraction, attribute extraction;Knowledge fusion includes that entity link and knowledge merge;Knowledge processing includes ontology Building, knowledge reasoning, quality evaluation and the renewal of knowledge.The main tool about the fusion of knowledge mapping has Falcon- at present AO, Dedupe, Limes, Silk etc..
In traditional link method the data mode of information extraction mainly include structural data, semi-structured data and Unstructured data.The structure of structural data is more single, can be converted into triple use, but data scale is smaller, Generally used in specific area;Unstructured data is needed using the methods of statistical learning or machine learning, by entity Extraction, Relation extraction, attribute extraction could use after being converted into triple, but current accuracy rate is lower, is not met by quotient Industry demand.Semi-structured data balanced data scale and accuracy rate can switch to structuring number by pre-processing and standardizing According to so be converted to triple.Simultaneously in the link process of knowledge fusion, the program of conventional link method it is cumbersome and it is easy go out Existing error leads to accuracy rate and inefficient.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of knowledge mapping construction method and device, readable storage medium storing program for executing, electricity Sub- equipment, it is intended to improve the accuracy rate and efficiency of link process.
In a first aspect, the knowledge mapping is for indicating the embodiment of the invention discloses a kind of knowledge mapping construction method Relationship between the entity information and different entities of entity, which comprises
Information pattern is created according to the data information that crawls, and the information pattern includes concept and corresponding with concept Attribute;
Entity classification is carried out to data information according to the information pattern, the entity information of each entity is determined and is used for Identify that the entity identifier of the entity, the entity information include entity attribute and attribute value, the entity attribute includes data Attribute and object properties, the attribute value of the object properties are directed toward another entity;
Unify the entity attribute title of different entities according to the title of attribute in the information pattern;
A mark link or non-identifying link are determined for each entity;
The identical entity of comparison link prefix, by the entity attribute of non-identifying link and the entity attribute of inlet identity link, Merge the identical entity attribute of each entity, it does not include the part identified that the link prefix, which is in linking,;
The identical entity of title is compared, is greater than threshold value in response to matched entity attribute, by the entity information of different entities It is incorporated to same entity, merges the identical entity attribute of each entity;
Export the relationship between the entity information and different entities of entity.
Further, described to determine that a mark links or non-identifying link includes: for each entity
The object properties of segmentation entity make the corresponding attribute value of each object properties;
A mark link or non-identifying link are determined for each object attribute values of each entity and the entity.
It is further, described to determine a mark link or non-identifying link for each entity further include:
In response to there is no marks corresponding with the entity to link in the data information, a non-identifying chain is determined It connects.
It is further, described to determine a mark link or non-identifying link for each entity further include:
It being linked in response to there is mark corresponding with the entity in the data information, selecting a mark link.
Further, the method also includes:
Data type in response to the data attribute value is character string type, increases language for the data attribute value Label;
Data type in response to the data attribute value is numeric type, the unit of the unified data attribute value.
Further, the entity attribute by the non-identifying link entity attribute that simultaneously inlet identity links includes:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame to institute There is the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
Further, described the entity information of different entities is incorporated to same entity to include:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame to institute There is the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
Second aspect, the embodiment of the invention discloses a kind of knowledge mapping construction device, the knowledge mapping is for indicating Relationship between the entity information and different entities of entity, described device include:
Mode construction module, for creating information pattern according to the data information crawled, the information pattern includes general Thought and attribute corresponding with concept.
Entity classification module determines each entity for carrying out entity classification to data information according to the information pattern Entity information and the entity identifier of the entity for identification, the entity information includes entity attribute and attribute value, described Entity attribute includes data attribute and object properties, and the attribute value of the object properties is directed toward another entity.
Standardized module, for unifying the entity attribute name of different entities according to the title of attribute in the information pattern Claim.
Determining module is linked, for determining a mark link or non-identifying link for each entity.
The entity attribute of non-identifying link is incorporated to mark for comparing the identical entity of title by first instance merging module The entity attribute for knowing link, merges the identical entity attribute of each entity.
Second instance merging module is greater than threshold in response to matched entity attribute for comparing the identical entity of title Value, is incorporated to same entity for the entity information of different entities, merges the identical entity attribute of each entity.
Information transmission modular, the relationship between entity information and different entities for exporting entity.
The third aspect, the embodiment of the invention discloses a kind of computer readable storage medium, the computer program instructions As above described in any item methods are realized when being executed by processor.
Fourth aspect, the embodiment of the invention discloses a kind of electronic equipment, including memory and processor, the storages Device is for storing one or more computer program instructions, wherein one or more computer program instructions are by the place Reason device is executed to realize described in any item methods as above.
The embodiment of the present invention uses the data set partially marked, the i.e. structuring of encyclopaedia and semi-structured data, benefit It is linked with the characteristic of itself, improves the accuracy rate of link process.
Detailed description of the invention
By referring to the drawings to the description of the embodiment of the present invention, the above and other purposes of the present invention, feature and Advantage will be apparent from, in the accompanying drawings:
Fig. 1 is the flow chart of the knowledge mapping construction method of the embodiment of the present invention;
Fig. 2 is the schematic diagram of two entity informations of one optional implementation of the embodiment of the present invention;
Fig. 3 is that one optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link Flow chart;
Fig. 4 is that one optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link Schematic diagram;
Fig. 5 is the schematic diagram of the knowledge mapping construction device of the embodiment of the present invention;
Fig. 6 is the schematic diagram of the electronic equipment of the embodiment of the present invention.
Specific embodiment
Below based on embodiment, present invention is described, but the present invention is not restricted to these embodiments.Under Text is detailed to describe some specific detail sections in datail description of the invention.Do not have for a person skilled in the art The present invention can also be understood completely in the description of these detail sections.In order to avoid obscuring essence of the invention, well known method, There is no narrations in detail for process, process.
In addition, it should be understood by one skilled in the art that provided herein attached drawing be provided to explanation purpose, and And attached drawing is not necessarily drawn to scale.
Unless the context clearly requires otherwise, "include", "comprise" otherwise throughout the specification and claims etc. are similar Word should be construed as the meaning for including rather than exclusive or exhaustive meaning;That is, be " including but not limited to " contains Justice.
In the description of the present invention, it is to be understood that, term " first ", " second " etc. are used for description purposes only, without It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present invention, unless otherwise indicated, the meaning of " multiple " It is two or more.
Fig. 1 is a kind of flow chart of knowledge mapping construction method of the embodiment of the present invention, as shown in Figure 1, the knowledge graph Composing construction method includes:
Step S100: information pattern is created according to the data information crawled.
Specifically, the information pattern includes concept and attribute corresponding with concept.For example may be used in the data information source To be encyclopaedia webpage, content for example may include URL, title, synonym, brief introduction, text, Infobox, entry label, webpage Link etc..The information pattern includes concept and attribute corresponding with concept.Wherein the concept further can also include general Hyponymy between thought, such as: when the upperseat concept is people, subordinate concept may include: singer, performer, scholar, religion It awards, enterpriser etc..The attribute further can also include attribute value range and attribute unit, the attribute value range For judging the reasonability of the attribute value, such as when the attribute is the height of people, attribute value is that 30m is that can determine whether institute It is unreasonable to state attribute value.The attribute unit according to the attribute setup, such as when the attribute be height when, the attribute Unit is cm or m etc.;When the attribute is weight, the attribute unit is kg or g etc..
During creating the information pattern, needs first to analyze the data information, count the data The appearance frequency of each attribute in information, and the attribute is ranked up according to the size of the frequency.Further according to setting Ratio screens the attribute in the attribute list.Such as when the ratio set is 70%, and the attribute list is according to category Property occur the descending sequence of frequency, then take before the attribute list 70% attribute.The method can be to important category Property screened, prevent attribute in the information pattern excessively caused by low efficiency.
Step S200: to data information carry out entity classification, determine each entity entity information and for identification described in The mark of entity.
Specifically, according in the information pattern concept and attribute corresponding with concept to data information carry out entity Classification determines the entity information and the mark of the entity for identification of each entity.Such as according in the information pattern " Himalaya " is divided into " product " entity and " mountain range " entity by concept " product " and " mountain range ".The entity information includes Entity attribute and attribute value, the entity attribute include data attribute and object properties, and the attribute value of the object properties refers to To another entity.
During carrying out entity classification, disaggregated model is obtained by pretreatment first, specifically, parses the number It is believed that breath, extract the characteristic information in the data information, such as can be entity attribute title, entry label, brief introduction etc.. According to the characteristic information train classification models, the process of the train classification models can for example be carried out by neural network, The neural network for example can be TextCNN or TextRNN etc..Into the disaggregated model, input includes the feature again The data of information obtain entity and entity information corresponding with the entity.Entity attribute in the entity is the letter The attribute set in breath mode.
Step S300: unify the entity attribute title of different entities according to the title of attribute in the information pattern.
Specifically, the entity attribute title difference in the data information, which may cause identical entity, to merge, example Such as: for same entity, because of entity attribute: " birthplace " and " being born in ", " height " and "high", " occupation " and " work " word It is different on face, although the same meaning, but if directly compare entity attribute title can make entity attribute equivalent in meaning without Method compares, and causes should merge two entities that cannot merge.Therefore it needs to set a property according in the information pattern Title unify the entity attribute titles of different entities, prevent same entity from can not merge.
Step S400: a mark link or non-identifying link are determined for each entity.
Specifically, the attribute value of object properties is directed toward another entity in the entity information of the entity, therefore described is Each entity determines that the process of a mark link or non-identifying link includes determining one for each object attribute values of entity Mark link or non-identifying link.Wherein it is described mark be linked as include the mark of the entity for identification link, climbing It is selected in the data information got.It is described it is non-identifying be linked as do not include the mark of the entity for identification link.
Further, while determining a mark link or non-identifying link for each entity, in response to the number Data type according to attribute value is character string type, increases linguistic labels for the data attribute value.In response to the data category Property value data type be numeric type, the unit of the unified data attribute value.I.e. when the type of the data attribute value is It is that the data attribute value of the character string type increases linguistic labels when character string type.Such as: the attribute value is " famous person First " then increases linguistic labels@zh, when the type of the data attribute value is numeric type, the unified data attribute value Unit.Such as: the attribute value is " 173cm ", and the specification attribute value unit is m, is exported " 1.73m ".
Step S500: the identical entity of comparison link prefix, by the entity attribute of non-identifying link and inlet identity link Entity attribute merges the identical entity attribute of each entity.
Specifically, the link of the entity includes mark link and non-identifying link, i.e. the entity of same names includes mark Know the entity of link and the entity of non-identifying link.Such as: the entity of entitled Zhang San includes link are as follows: "/item/ Zhang San/ 1234 " entity, " Zhang San/6789 /item/ " entity and be linked as the entity of "/item/ Zhang San/".The link prefix For the part in link not including mark, such as "/item/ Zhang San/" portion in mark link " Zhang San/1234 /item/ " Point and it is non-identifying link "/item/ Zhang San/" full content.It, first will link during merging the entity attribute Entity attribute for the entity of non-identifying link is split, and the entity attribute after fractionation is distinguished, difference is respectively incorporated into The entity for the mark link being linked as.Finally identical entity attribute in each entity is merged, such as when the chain The entity for being connected in " Zhang San/1234 /item/ " includes entity attribute " age: 23 years old ", described by being linked as non-identifying link When the entity attribute that entity is incorporated to also includes " age: 23 years old ", the entity attribute " age " is merged.
Further, when the entity attribute process of the entity attribute by non-identifying link and inlet identity link is based on When different hosts, determines the mapping relations between target entity mark and entity identifier to be combined, broadcasted by Spark frame To all for storing the host of the data information.Further, the target entity mark is linked as described non-identifying The entity identifier of entity is linked, the entity identifier to be combined is the entity identifier for being linked as the mark link entity.Every The locally stored entity identifier of screening, executes merging in platform host.That is first reflecting the entity identifier for needing combined entity It penetrates figure and is collected into driving end, broadcast mechanism is recycled to broadcast the mapping graph, every host can screen the machine after receiving mapping Only there is the entity identifier for including in mapping graph and just merge in entity identifier.Due to the mapping graph storage entity mark Know, without information such as attributes, so volume greatly reduces, the efficiency of broadcast is greatly improved.
Step S600: the identical entity of comparison title is greater than threshold value in response to matched entity attribute, by different entities Entity information be incorporated to same entity, merge the identical entity attribute of each entity.
Specifically, all entities are grouped according to the title of the entity, the entitled broad sense title, such as can be with It is: a variety of attributes relevant to name such as the pet name, title.Then compare in order, since second entity with it is previous Entity is compared, if entity attribute matches, just remembers+1, if it is different, so just remembering 0.In response to matched entity Attribute is greater than threshold value, it is believed that two entities are same entity, and the entity information of different entities is incorporated to same entity.Such as it sets Fixed matched entity attribute is greater than the half of all entity attribute intersections, even if then matching.Specifically, two entitled songs A song A is crossed in the entity of hand first, a personal concert, and one has recorded 100 first songs, wherein one is song A, then also judging To be matched on " works " this entity attribute.
Further, when judging whether entity attribute matches, for data attribute, the judgement if attribute value is equal Match.For object properties, then judged with the entity name that attribute value is directed toward, such as the name of two Zhou Jielun their spouses Word is all elder brother's icepro, is a people regardless of the two elder brothers insult, we judge that the two Zhou Jielun are the same persons.
Further, it when the process that the entity information of different entities is incorporated to same entity is based on different hosts, determines Mapping relations between target entity mark and entity identifier to be combined are broadcasted to all by Spark frame for storing institute State the host of data information.Locally stored entity identifier is screened in every host, executes merging.It will first need to merge The entity identifier of entity and the entity identifier of target entity between mapping graph be collected into driving end, recycle broadcast mechanism Broadcast the mapping graph, every host receives the entity identifier that locally stored entity can be screened after mapping, only there is mapping Entity identifier in figure just merges.Since the mapping graph storage entity identifies, without information such as attributes, so body Product greatly reduces, and the efficiency of broadcast greatly improves.
Step S700: the relationship between the entity information and different entities of entity is exported.
Specifically, output is through merging treated entity and entity information, i.e., the described entity identifier, entity attribute and category Property value constitute triple.The attribute value of the object properties of entity is directed toward another entity, and different entities is made to generate connection.Cause This also exports the entity and different entities while exporting the triple that entity, entity attribute and attribute value are constituted together Between relationship.
The method is directly linked using the data rule in encyclopaedia data source, and the accurate of link process is improved Degree.The process of the knowledge fusion is based on Spark frame simultaneously, is executed using Distributed Architecture, can be general in more configurations It is concurrently executed on host, reduces the cost for purchasing high-performance host, and improve the efficiency of execution.
Fig. 2 is the schematic diagram of two entity informations of one optional implementation of the embodiment of the present invention, as shown in Fig. 2, two " title " of a entity is " famous person's first ", but entity identifier also includes attribute { " birthplace ": " platform for the entity of " 123 " Gulf ";" height ": " 173cm ";" blood group ": " O ";" occupation ": " singer " }, entity identifier is that the entity of " 124 " also includes attribute { " being born in ": " Taiwan ";"high": " 173cm ";" blood group ": " O ";" work ": " singer " }.Wherein entity attribute " blood group ": " O " is identical, and the attribute value phase of entity attribute " birthplace " and " being born in ", " height " and "high", " occupation " and " work " Together, entity attribute title difference.The different entity attribute title only literal upper difference, although the same meaning, It can make entity attribute equivalent in meaning that can not compare if directly comparing entity attribute title, lead to should merge two Entity cannot merge.Therefore need to unify according to the title to set a property the entity attribute of different entities in the information pattern Title prevents identical entity from can not merge.Such as when the Property Name of setting is " birthplace ", " height ", " blood group ", " work Make " when, after consolidated entity Property Name, entity identifier is that the entity of " 123 " includes attribute { " birthplace ": " Taiwan "; " height ": " 173cm ";" blood group ": " O ";" work ": " singer " }, entity identifier is that the entity of " 124 " equally includes attribute { " birthplace ": " Taiwan ";" height ": " 173cm ";" blood group ": " O ";" work ": " singer " }.Pass through comparison entity attribute-name Attribute value corresponding with the entity attribute is that can determine whether that the entity identifier is referred to as " 123 " and entity identifier is the reality of " 124 " Body is same entity, can directly be merged.
Fig. 3 is that an optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link Flow chart, determine mark link as shown in figure 3, described for entity or the method for non-identifying link includes:
Step S410: the object properties for dividing entity make the corresponding attribute value of each object properties.
Specifically, the attribute value of the object properties of the entity is split, such as the works of famous person's first there are " works A " and " works B ", it is split as { " works: works A ", " works: works B " }
Step S420: a non-identifying link is determined.
Specifically, the step S420 is in response to being not present mark chain corresponding with the entity in the data information It connects, determines a non-identifying link.Further, in response in the Infobox in data information or other positions include and institute The corresponding link of entity is stated, and the link is determined as the corresponding non-identifying link of entity without mark by the link;It rings It should be linked in data information there is no corresponding with the entity, create a virtual linkage corresponding with the entity, institute It states virtual linkage and does not include mark.
Step S430: one mark link of selection.
Specifically, the step S430 is linked in response to there is mark corresponding with the entity in the data information, Select a mark link.Further, in response in the Infobox in data information or other positions include and the reality The link comprising mark is chosen in the corresponding link of body in the link, and comprising mark link to be determined as entity corresponding by described Mark link.
The method can determine corresponding link for each entity, and the entity is facilitated directly to utilize encyclopaedia data source In data rule linked, accuracy is high.
It should be understood that those skilled in the art can also realize above-mentioned pre-treatment step using other existing algorithms.
Fig. 4 is that one optional implementation of the embodiment of the present invention is that entity determines mark link or non-identifying link Schematic diagram, as shown in figure 4, the initial triple includes entity attribute and entity property value.The wherein entity attribute packet It includes: { " title ": " famous person's first ";" height ": " 173cm ";" wife ": " famous person's second ";" works ": " works A, works B, works C " }, wherein " title " and " height " is data attribute, " wife " and " works " is object properties, wherein each Attribute value is directed toward an entity.
The attribute value of object properties in the initial triple is split, entity in triple after obtained segmentation Attribute includes { " title ": " famous person's first ";" height ": " 173cm ";" wife ": " famous person's second ";" works ": " works A ";" make Product ": " works B ";" works ": " works C " }, so that each object properties is only corresponded to an object attribute values.
Triple after the segmentation is handled, determines a mark link or non-identifying link for each entity, i.e., A mark or non-identifying link are determined for the attribute value of each object properties.Simultaneously in response to the data attribute value Data type is character string type, increases linguistic labels for the data attribute value.In response to the data of the data attribute value Type, the unit of the unified data attribute value.If treated in Fig. 4 shown in triple, the data attribute " title " Attribute value is " famous person's first ", and the data type is character string type, therefore increases linguistic labels@for the data attribute value Zh, the attribute value of the data attribute " height " are " 173cm ", and the data type is numeric type, therefore according to setting The unit of the unified data attribute value of attribute value unit, such as when the unit set is m, by the data attribute value " 173cm " is converted to " 1.73m "." wife " and " works " is object properties, therefore is the attribute of each object properties Value determines a connection.Such as: mark link http://baike/item/ famous person second/1234 are selected for " famous person's second ", are " works A " selection mark link " http://baike/item/ works A/567 ", for " works B " selection mark link " http://baike/item/ works B/557 " determines non-identifying link http://baike/item/ works for " works C " C/。
Fig. 5 is the schematic diagram of the knowledge mapping construction device of the embodiment of the present invention, as shown in figure 5, the knowledge mapping structure Building device includes mode construction module 51, entity classification module 52, standardized module 53, link determining module 54, first instance Merging module 55, second instance merging module 56 and information transmission modular 57.
Specifically, mode construction module 51 is used for according to the data information creation information pattern crawled, the information mould Formula includes concept and attribute corresponding with concept.Entity classification module 52 be used for according to the information pattern to data information into Row entity classification determines the entity information and the entity identifier of the entity for identification of each entity, the entity information packet Entity attribute and attribute value are included, the entity attribute includes data attribute and object properties, and the attribute value of the object properties refers to To another entity.Standardized module 53 is used to unify according to the title of attribute in the information pattern entity of different entities Property Name.Link determining module 54 is used to determine a mark link or non-identifying link for each entity.First instance closes And module 55 is for comparing the identical entity of title, by the entity attribute of non-identifying link and the entity attribute of inlet identity link, Merge the identical entity attribute of each entity.Second instance merging module 56 for comparing the identical entity of title, in response to The entity attribute matched is greater than threshold value, and the entity information of different entities is incorporated to same entity, merges the identical reality of each entity Body attribute.Information transmission modular 57 is used to export the relationship between the entity information and different entities of entity.
Described device is directly linked using the data rule in encyclopaedia data source, and the accurate of link process is improved Rate and efficiency.
Fig. 6 is the schematic diagram of the electronic equipment of the embodiment of the present invention, as shown in fig. 6, in the present embodiment, the electronics Equipment includes server, terminal etc..As shown, the electronic equipment includes: at least one processor 62;With at least one The memory 61 of processor communication connection;And the communication component 63 with storage medium communication connection, communication component 63 are being handled Data are sended and received under the control of device 62;Wherein, memory 61 is stored with the finger that can be executed by least one processor 62 It enables, instruction is executed by least one processor 62 to realize the knowledge mapping construction method in above-described embodiment.
Specifically, the memory 61 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-easy The property lost software program, non-volatile computer executable program and module.Processor 62 is stored in memory 61 by operation In non-volatile software program, instruction and module, thereby executing the various function application and data processing of equipment, i.e., Realize above-mentioned knowledge mapping construction method.
Memory 61 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;It storage data area can the Save option list etc..In addition, memory 61 can be with It can also include nonvolatile memory including high-speed random access memory, a for example, at least disk memory is dodged Memory device or other non-volatile solid state memory parts.In some embodiments, it includes relative to processing that memory 61 is optional The remotely located memory of device, these remote memories can pass through network connection to external equipment.The example packet of above-mentioned network Include but be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
One or more module is stored in memory 61, when being executed by one or more processing 62, in execution State the knowledge mapping construction method in any means embodiment.
The said goods can be performed the embodiment of the present application provided by method, have the corresponding functional module of execution method and Beneficial effect, the not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The invention further relates to a kind of computer readable storage mediums, for storing computer-readable program, the computer Readable program is used to execute above-mentioned all or part of embodiment of the method for computer.
That is, it will be understood by those skilled in the art that implement the method for the above embodiments be can be with Relevant hardware is instructed to complete by program, which is stored in a storage medium, including some instructions are to make It obtains an equipment (can be single-chip microcontroller, chip etc.) or processor (processor) executes each embodiment the method for the application All or part of the steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store The medium of program code.
The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For, the invention can have various changes and changes.All any modifications made within the spirit and principles of the present invention, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of knowledge mapping construction method, the knowledge mapping is between the entity information and different entities of presentation-entity Relationship, which is characterized in that the described method includes:
Information pattern is created according to the data information crawled, the information pattern includes concept and attribute corresponding with concept;
Entity classification is carried out to data information according to the information pattern, the entity information for determining each entity and institute for identification State the entity identifier of entity, the entity information includes entity attribute and attribute value, the entity attribute include data attribute and Object properties, the attribute value of the object properties are directed toward another entity;
Unify the entity attribute title of different entities according to the title of attribute in the information pattern;
A mark link or non-identifying link are determined for each entity;
The identical entity of comparison link prefix merges the entity attribute of non-identifying link and the entity attribute of inlet identity link The identical entity attribute of each entity, the link prefix be do not include the part identified in link;
The identical entity of title is compared, is greater than threshold value in response to matched entity attribute, the entity information of different entities is incorporated to Same entity merges the identical entity attribute of each entity;
Export the relationship between the entity information and different entities of entity.
2. the method as described in claim 1, which is characterized in that described is that each entity determines a mark link or non-identifying Link includes:
The object properties of segmentation entity make the corresponding attribute value of each object properties;
A mark link or non-identifying link are determined for each object attribute values of each entity and the entity.
3. method according to claim 2, which is characterized in that described is that each entity determines a mark link or non-identifying Link further include:
In response to there is no marks corresponding with the entity to link in the data information, a non-identifying link is determined.
4. method according to claim 2, which is characterized in that described is that each entity determines a mark link or non-identifying Link further include:
It being linked in response to there is mark corresponding with the entity in the data information, selecting a mark link.
5. the method as described in claim 1, which is characterized in that the method also includes:
Data type in response to the data attribute value is character string type, increases linguistic labels for the data attribute value;
Data type in response to the data attribute value is numeric type, the unit of the unified data attribute value.
6. the method as described in claim 1, which is characterized in that the entity attribute by non-identifying link and inlet identity link Entity attribute include:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame useful to institute In the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
7. the method as described in claim 1, which is characterized in that described that the entity information of different entities is incorporated to same entity packet It includes:
It determines the mapping relations between target entity mark and entity identifier to be combined, is broadcasted by Spark frame useful to institute In the host for storing the data information;
Locally stored entity identifier is screened in every host, executes merging.
8. a kind of knowledge mapping construction device, the knowledge mapping is between the entity information and different entities of presentation-entity Relationship, which is characterized in that described device includes:
Mode construction module, for constructing information pattern according to the data information that crawls, the information pattern include concept and Attribute corresponding with concept;
Entity classification module determines the reality of each entity for carrying out entity classification to data information according to the information pattern Body information and for identification entity identifier of the entity, the entity information include entity attribute and attribute value, the entity Attribute includes data attribute and object properties, and the attribute value of the object properties is directed toward another entity;
Standardized module, for unifying the entity attribute title of different entities according to the title of attribute in the information pattern;
Determining module is linked, for determining a mark link or non-identifying link for each entity;
First instance merging module, for comparing the identical entity of title, by the entity attribute of non-identifying link and inlet identity chain The entity attribute connect merges the identical entity attribute of each entity;
Second instance merging module is greater than threshold value in response to matched entity attribute for comparing the identical entity of title, will not Entity information with entity is incorporated to same entity, merges the identical entity attribute of each entity;
Information transmission modular, the relationship between entity information and different entities for exporting entity.
9. a kind of computer readable storage medium, which is characterized in that the computer program instructions are real when being executed by processor Now such as method of any of claims 1-7.
10. a kind of electronic equipment, including memory and processor, which is characterized in that the memory is for storing one or more Computer program instructions, wherein one or more computer program instructions are executed by the processor to realize such as power Benefit requires method described in any one of 1-7.
CN201910408077.5A 2019-05-15 2019-05-15 Knowledge graph construction method and device, readable storage medium and electronic equipment Active CN110188207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910408077.5A CN110188207B (en) 2019-05-15 2019-05-15 Knowledge graph construction method and device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910408077.5A CN110188207B (en) 2019-05-15 2019-05-15 Knowledge graph construction method and device, readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110188207A true CN110188207A (en) 2019-08-30
CN110188207B CN110188207B (en) 2021-06-04

Family

ID=67716582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910408077.5A Active CN110188207B (en) 2019-05-15 2019-05-15 Knowledge graph construction method and device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110188207B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390099A (en) * 2019-06-28 2019-10-29 河海大学 A kind of object relationship extraction system and abstracting method based on template library
CN112507035A (en) * 2020-11-25 2021-03-16 国网电力科学研究院武汉南瑞有限责任公司 Power transmission line multi-source heterogeneous data unified standardized processing system and method
CN112765283A (en) * 2021-01-19 2021-05-07 上海明略人工智能(集团)有限公司 Entity link relation management method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
US20190012405A1 (en) * 2017-07-10 2019-01-10 International Business Machines Corporation Unsupervised generation of knowledge learning graphs
CN109446343A (en) * 2018-11-05 2019-03-08 上海德拓信息技术股份有限公司 A kind of method of public safety knowledge mapping building
CN109597855A (en) * 2018-11-29 2019-04-09 北京邮电大学 Domain knowledge map construction method and system based on big data driving

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
US20190012405A1 (en) * 2017-07-10 2019-01-10 International Business Machines Corporation Unsupervised generation of knowledge learning graphs
CN109446343A (en) * 2018-11-05 2019-03-08 上海德拓信息技术股份有限公司 A kind of method of public safety knowledge mapping building
CN109597855A (en) * 2018-11-29 2019-04-09 北京邮电大学 Domain knowledge map construction method and system based on big data driving

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110390099A (en) * 2019-06-28 2019-10-29 河海大学 A kind of object relationship extraction system and abstracting method based on template library
CN110390099B (en) * 2019-06-28 2023-01-31 河海大学 Object relation extraction system and method based on template library
CN112507035A (en) * 2020-11-25 2021-03-16 国网电力科学研究院武汉南瑞有限责任公司 Power transmission line multi-source heterogeneous data unified standardized processing system and method
CN112765283A (en) * 2021-01-19 2021-05-07 上海明略人工智能(集团)有限公司 Entity link relation management method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110188207B (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
JP5040925B2 (en) Information extraction rule creation support system, information extraction rule creation support method, and information extraction rule creation support program
CN111159395B (en) Chart neural network-based rumor standpoint detection method and device and electronic equipment
CN111444334B (en) Data processing method, text recognition device and computer equipment
CN101739335B (en) Recommended application evaluation system
KR101276602B1 (en) System and method for searching and matching data having ideogrammatic content
CN103823824B (en) A kind of method and system that text classification corpus is built automatically by the Internet
US8370808B2 (en) Apparatus and a method for generating a test case
CN110347894A (en) Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
US20080097937A1 (en) Distributed method for integrating data mining and text categorization techniques
CN109325201A (en) Generation method, device, equipment and the storage medium of entity relationship data
CN105912633A (en) Sparse sample-oriented focus type Web information extraction system and method
CN110413780A (en) Text emotion analysis method, device, storage medium and electronic equipment
CN105378731A (en) Correlating corpus/corpora value from answered questions
CN109933656A (en) Public sentiment polarity prediction technique, device, computer equipment and storage medium
CN109726274A (en) Problem generation method, device and storage medium
CN110188207A (en) Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment
CN111079043A (en) Key content positioning method
KR20210023452A (en) Apparatus and method for review analysis per attribute
CN109241319A (en) A kind of picture retrieval method, device, server and storage medium
CN108664599A (en) Intelligent answer method, apparatus, intelligent answer server and storage medium
CN107102993A (en) A kind of user's demand analysis method and device
CN111507083A (en) Text analysis method, device, equipment and storage medium
CN115547466B (en) Medical institution registration and review system and method based on big data
CN113254507B (en) Intelligent construction and inventory method for data asset directory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210412

Address after: 210038 8th floor, building D11, Hongfeng science and Technology Park, Nanjing Economic and Technological Development Zone, Jiangsu Province

Applicant after: New Technology Co.,Ltd.

Address before: 100190 1001, 10th floor, office building a, 19 Zhongguancun Street, Haidian District, Beijing

Applicant before: Mobvoi Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant