CN109446341A - The construction method and device of knowledge mapping - Google Patents

The construction method and device of knowledge mapping Download PDF

Info

Publication number
CN109446341A
CN109446341A CN201811236863.3A CN201811236863A CN109446341A CN 109446341 A CN109446341 A CN 109446341A CN 201811236863 A CN201811236863 A CN 201811236863A CN 109446341 A CN109446341 A CN 109446341A
Authority
CN
China
Prior art keywords
data
entity
entity sets
knowledge mapping
industry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811236863.3A
Other languages
Chinese (zh)
Inventor
孙喜民
罗鹏
张宾
周晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Agel Ecommerce Ltd
State Grid Corp of China SGCC
Original Assignee
State Grid Agel Ecommerce Ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Agel Ecommerce Ltd, State Grid Corp of China SGCC filed Critical State Grid Agel Ecommerce Ltd
Priority to CN201811236863.3A priority Critical patent/CN109446341A/en
Publication of CN109446341A publication Critical patent/CN109446341A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of knowledge mapping construction method and devices, are related to knowledge mapping field, for solving the problems, such as that a variety of data can not be integrated.This method comprises: obtaining industry data, the relationship of multiple entities in the entity sets and the entity sets in the industry data is obtained;By the relationship of multiple entities in the entity sets and the entity sets according to knowledge mapping described in preset data mode construction.The embodiment of the present application is applied to integrate a variety of industry datas.

Description

The construction method and device of knowledge mapping
Technical field
The present invention relates to knowledge mapping field more particularly to the construction methods and device of a kind of knowledge mapping.
Background technique
Electric business platform is the platform that both parties trade on line around commodity.So electric business knowledge mapping Core is commodity.There is the polygonal color ginseng such as brand quotient, platform operation, consumer, national structure, logistics provider in entire business activity With.Electric quotient data source is many and diverse, is related to the multi-dimensional datas such as operation platform, customer service consulting, commodity data, and each data The structuring degree difference in source is larger, and the complicated multiplicity of the association between the entity that is related to, entity relationship is dispersed in each system In, and the structuring degree of the data of each role is different, while a large amount of unstructured text is easily generated in process of exchange Notebook data, existing technology are difficult to these data unified integrations.
Summary of the invention
Embodiments herein provides the construction method and device of a kind of knowledge graph spectrometry, can not for solving a variety of data The problem of integrating.
In order to achieve the above objectives, embodiments herein adopts the following technical scheme that
In a first aspect, a kind of construction method of knowledge mapping is provided, this method comprises:
Obtain industry data;Obtain multiple entities in the entity sets and the entity sets in the industry data Relationship;
By the relationship of multiple entities in the entity sets and the entity sets according to preset data mode construction institute State knowledge mapping.
Second aspect provides a kind of construction device of knowledge mapping, which includes:
Acquiring unit, for obtaining industry data;
It is more in the acquiring unit, the entity sets for being also used to obtain in the industry data and the entity sets The relationship of a entity;
Construction unit, for by the relationship of multiple entities in the entity sets and the entity sets according to present count According to knowledge mapping described in mode construction.
The third aspect, provides a kind of computer readable storage medium for storing one or more programs, it is one or Multiple programs include instruction, and described instruction makes the side of the computer execution as described in relation to the first aspect when executed by a computer Method.
Fourth aspect provides a kind of computer program product comprising instruction, when described instruction is run on computers When, so that computer executes the construction method of knowledge mapping as described in relation to the first aspect.
5th aspect, provides a kind of construction device of knowledge mapping, comprising: processor and memory, memory is for depositing Program is stored up, processor calls the program of memory storage, to execute the construction method of knowledge mapping described in above-mentioned first aspect.
The construction method and device for the knowledge mapping that embodiments herein provides, obtain entity sets from industry data And in entity sets multiple entities relationship, the multiple entity relationships in entity sets and entity sets that then will acquire by It is merged according to preset data mode, ultimately forms knowledge mapping, solve the case where isolated all kinds of each dimension datas, dispersion, Effectively data link is got up.
Detailed description of the invention
Fig. 1 is the block schematic illustration of the knowledge mapping for the building that embodiments herein provides;
Fig. 2 is the construction method flow diagram one for the knowledge mapping that embodiments herein provides;
Fig. 3 is the construction method flow diagram two for the knowledge mapping that embodiments herein provides;
Fig. 4 is the construction method flow diagram three for the knowledge mapping that embodiments herein provides;
Fig. 5 is the construction method flow diagram four for the knowledge mapping that embodiments herein provides;
Fig. 6 is the construction device structural schematic diagram for the knowledge mapping that embodiments herein provides.
Specific embodiment
The application implements the construction method and device of the knowledge mapping provided, which can be electric business knowledge graph Spectrum by obtaining industry data, and obtains the relationship in the entity sets in industry data and entity sets between multiple entities, By the relationship of multiple entities in entity sets and entity sets according to preset data mode construction knowledge mapping.
Fig. 1 is the general frame figure of the knowledge mapping of building provided by the embodiments of the present application.Shown in referring to Fig.1, whole frame Frame includes original data layer 110, internet information acquisition and cleaning layer 120, Knowledge Extraction layer 130, knowledge fusion layer 140, knows Know 150 three parts of storage layer.
Original data layer 110 includes internal data and external data, wherein internal data may include relational data etc. Structural data, external data may include the unstructured datas such as the content of video website.
Internet information acquisition and clear layer 120 include acquisition crawler, acquisition tasks scheduling system and resolver;Wherein, Acquiring crawler may include industrial sustainability crawler and general crawler;Resolver may include industrial sustainability resolver.
Knowledge Extraction layer 130 maps (database to resource including database to resource description framework Description framework, D2R) and industrial sustainability extraction;Wherein, D2R mapping includes configuration file mapping, updates and match It sets and task scheduling system, configuration file mapping includes: table-concept, record-entity, column name-attribute, record data-category Property value and table association-relationship;Task scheduling system includes: original importing, batch updating and incremental update;Industrial sustainability is taken out Taking may include packaging Wrapper plug-in unit and task speed-regulating system, task scheduling system include update detection and period more Newly.When the data in original data layer 110 are internal data, are mapped by D2R and extract knowledge;When in original data layer 110 Data when being external data, need after internet information acquisition and clear layer 120 are handled, then pass through industrial sustainability extraction Knowledge.
Knowledge fusion layer 140 includes that data Layer, mode layer, Conflict solving and knowledge mapping update;Wherein, data Layer includes Entity is aligned, entity type alignment and entity attribute alignment, mode layer are raw including enclosing relationship generation and concept attribute up and down At it includes mode layer update and data Layer update that knowledge mapping, which updates, and Conflict solving, which includes that conflict is automatic, to be detected and conflict It is automatic to solve.The data pattern of data Layer can be defined by human expert, specify entity in knowledge mapping, relationship and each The attribute of entity and relationship.When being integrated into knowledge mapping after the knowledge that multiple data sources extract is merged, need to solve A plurality of types of data collision problems, for example, a phrase corresponds to, multiple entities, entity attribute name are inconsistent, entity attribute lacks It loses, situations such as entity property value is inconsistent, entity property value one-to-many mapping.
Knowledge Storage layer 150 includes diagram data storage and distributed document index.
The data that the embodiment of the present application is imported into knowledge mapping are specially structural data.In data importing process, It is related to the processes such as pretreatment, entity alignment, attribute alignment, the attribute selection to structural data, finally by the knot for the condition that meets Structure data import knowledge mapping.Subsequently through modes such as increment iteratives, guarantee sustainable renewal and the extension of knowledge mapping.
The building process of this knowledge mapping is described in detail using detailed embodiment below.
Embodiment 1,
The embodiment of the present application provides a kind of construction method of knowledge mapping, and referring to fig. 2, this method may include S101-S103:
S101, industry data is obtained.
By taking the building of the knowledge mapping of electric business industry as an example, the building of electric business knowledge mapping is substantially using disappearing inside industry Take the related datas such as the relevant vertical FIELD Data of data, electric business platform, commodity data as data source.These data generate It is preferable with business associativity during electric business platform operation or publicity, therefore usually have the advantage that
Industry covering is compared with wide, industry depth is considerable.Data source is all from the data of electric business platform strong correlation, data dependence It combines closely with industry, the basic data for covering electric business industry and being related to, and data are as supplementing using during e-commerce operation, data With industry depth;
High reliablity: the internal structured data of industry are used to support the business of enterprise itself, therefore reliability is very high; Business data is stored in relevant database, and relational data need to only be carried out to a degree of conversion can be obtained structuring Triple data, reliability are good;
Structural strong: internal structured data majority is stored by relational database;Open industry data base It is structural good by being delivered after the web editor of better quality in sheet.
Therefore, carry out electric business knowledge mapping building when, can pay the utmost attention to using in industry internal structured data and Open industry knowledge base.
Optionally, before obtaining industry data, the data pattern of knowledge mapping can be defined first.Data pattern is to know Know part most crucial in map, data pattern can be defined by human expert, and use top-down knowledge mapping mode, It, can will be from entity, entity relationship or the entity category in the industry data that various data sources are got after defining data pattern Property etc. carry out data plane filling, wherein knowledge mapping can be improved by the data pattern that people expert defines knowledge mapping The integrality and accuracy of data.
Optionally, referring to fig. 3, S101 may include S201-S204:
S201, target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website.
Wherein, seed vocabulary is the dedicated vocabulary of industry, and target webpage includes the external linkage of web document and encyclopaedia webpage.
Specifically, can use some seed vocabulary that can represent industry, in the search of search engine and online encyclopaedia It is scanned in interface.For the web document that search engine returns, the data result for arranging forward is directly appended to target Web page listings;For the page that encyclopaedia returns, it is introduced into corresponding article page, is then found in articles page common outer The exterior chain of portion link and bibliography, and these two types link is added to target webpage list.
S202, the corresponding target webpage in each website progress data are collected in website according to predetermined depth value Hold.
Target webpage is sorted out according to website, and interior acquisition of standing, the depth capacity of acquisition are carried out to obtained webpage It can be set to 3 layers, i.e., since homepage, using depth-first acquisition strategies, acquire 3 layers in total.Usual industry data website The structure of complete website can be traversed under 3 layers of depth.
If S203, website content in corresponding seed vocabulary occurrence frequency be more than threshold value, using this website as Corresponding syndicated data source.
Content analysis to website, and preservation is extracted to the content of each website collected webpage;For The content of website illustrates that the website is related to the sector if wherein the frequency comprising industry keywords is more than threshold value, can be with As corresponding syndicated data source.
S204, industry data is obtained from syndicated data source.
After determining syndicated data source, industry data can be obtained from syndicated data source by tools such as search engines.
The relationship of multiple entities in entity sets and entity sets in S102, acquisition industry data.
The key problem in technology point of knowledge mapping building is entity extraction and entity relation extraction, by the entity and reality in data Body Relation extraction comes out, and is integrated, available more neat data, convenient for management and application.
Optionally, referring to fig. 4, S102 may include:
If S301, industry data are structural data, according to relational database to resource description framework mapping language (relation database to resource description framework mapping language, D2RML) Specification and Map Profile by the Knowledge conversion in industry data at the relationship of multiple entities in entity sets and entity sets.
When carrying out Knowledge Mapping from structural data, it is thus necessary to determine that the basic structure in structural data, including it is each Association between the meaning and table of table, while determining the structure of knowledge mapping, then using D2RML language structuring number Concept in table and knowledge mapping or entity associated in are got up.It, can be according to configuration after defining Map Profile The conversion knowledge from source database, detailed process can be with are as follows: the target data configured in Knowledge conversion engine connection profile The data in corresponding table are read in library, in relational database table and column data be mapped to respectively the entity of concept, entity it Between relationship and entity attributes, then will the obtained knowledge store of mapping into knowledge mapping.
Optionally, referring to fig. 4, S102 may include:
If S401, industry data are unstructured data, according to conditional random field models (conditional Random field algorithm, CRF) entity sets in industry data are extracted, and according to support vector machines-k nearest neighbor point Class method (support vector machine- k-nearest neighbor, SVM-KNN) extracts the reality in industry data The relationship of multiple entities in body set.
Specifically, CRF model be another group of output stochastic variable under the conditions of given one group of input stochastic variable condition it is general Rate distributed model.Wherein, the parameterized form of CRF model can be with are as follows: for observation sequence x=(x1,x2..., xn) and state sequence Arrange y=(y1,y2,…,yn), if P (y | x) it is linear chain conditional random, then under conditions of stochastic variable X value is x, with Machine variable Y value is that the form of the conditional probability of y is as follows:
Wherein,fkAnd hlIt is characterized function, λkAnd μl For corresponding weight, Z (x) is normalization factor.
Naming Entity recognition process is exactly sequence labelling process, regards sentence as an observation sequence, each in sentence Character or word regard a symbol as, then assign a state to symbol.Maximization parameter lambda is carried out by training setkAnd μl, Obtain the conditional probability for meeting condition.
For a list entries, most probable output token sequence namely optimum state sequence, it may be assumed that
Based on the process of the entity in CRF model extraction unstructured data, it is specifically as follows: when corpus is training corpus When, by corpus it is preprocessed with feature selecting after, obtain data characteristics collection, data characteristics collection through CRF training and imported into CRF mould Type obtains entity sets;When corpus is testing material, corpus is imported to CRF model after pretreatment, obtains entity sets. Wherein, corpus is basic matching database;The feature of selection may include: language feature, contextual feature and entity boundary Feature, but not limited to this.
It should be noted that language feature can reflect out the essential information of character, it is a kind of essential characteristic.Due to textual data According to randomness and freedom, participle mistake will appear to text participle and eventually lead to entity and can not identify, and word granularity includes Recognition effect can be improved in the information such as more entity internal structures, therefore the embodiment of the present application is special as language using word granularity Sign.Word granularity language feature is as shown in table 1 below.
1 word granularity language feature of table indicates
Label Feature Description
1 Character(-2) The first two character
2 Character(-1) Previous character
3 Character(0) Current character
4 Character(1) The latter character
5 Character(2) Latter two character
Contextual feature refers to the relation of interdependence in entity vocabulary length of window between observation, and this feature can be very The correlation of the good dependence depicted inside entity and entity and non-physical;
Entity boundary is characterized in determining the important evidence of character boundary feature locations information, the boundary pair of determining name entity Name Entity recognition plays a crucial role.The embodiment of the present application describes the word side of observation sequence using BIO coding mode Boundary's feature simultaneously encodes entity type, the wherein beginning of B presentation-entity, the remainder of I presentation-entity, and O indicates non-reality Body sequence.
CRF modeling tool, which needs to formulate template file (Template File) using user, is trained training corpus, Feature templates are as shown in table 2 below.
2 essential characteristic template of table
Signature identification Feature description
U00:%x [- 2,0] The first two word of current word
U01:%x [- 1,0] The previous word of current word
U02:%x [0,0] Current word
U03:%x [0,1] The latter word of current word
U04:%x [0,2] Latter two word of current word
U05:%x [- 1,0]/%x [0,0] Current word and previous combinatorics on words
U06:%x [0,0]/%x [0,1] Current word and the latter combinatorics on words
U07:%x [- 1,0]/%x [0,0]/%x [1,0] Current word and preceding the latter combinatorics on words
U08:%x [- 2,0]/%x [- 1,0]/%x [0,0] Current word and the first two combinatorics on words
U09:%x [0,0]/%x [1,0]/%x [2,0] Current word and latter two combinatorics on words
Wherein, every row represents a feature templates, can determine a mark (Token), feature templates in training data In basic format %x [row, col], wherein row is indicated and the opposite line number of current Token, the absolute columns of col expression.
Specifically, SVM-KNN classification method is the combined method of svm classifier method and KNN classification method.SVM-KNN points Class method performance is good and algorithm complexity is low, has been applied to the multinomial text-processing research such as text classification, proper noun extraction, And achieve better effects.Therefore, the embodiment of the present application can be used SVM-KNN classification method and realize that the entity of industry data closes System extracts.
Optionally, referring to fig. 5, it can wrap according to the entity relationship that SVM-KNN classification method extracts industry data Include S501-S504:
S501, corpus pretreatment and feature vector are formed, and sample to be tested is obtained.
Wherein, corpus may include training corpus and testing material;Corpus pretreatment may include that part-of-speech tagging, stem mention It takes, syntactic analysis, predicate extracts and semantic character labeling etc..The feature that the embodiment of the present application uses has: entity and context are special Word feature between sign, sentence verb root feature, physical distance feature, entity extension feature, semantic role feature and entity, In, contextual feature may include the word of entity and its front and back, the stem of word and part of speech, and semantic role feature may include predicate (predicate) feature, semantic role are to (semantic role pair) feature, the p- predicate feature of semantic role.
S502, according to svm classifier model treatment sample to be tested.
SVM classifier theoretical frame is perfect, versatility and strong robustness, calculates simple, but also has stronger anti-noise Sound ability and higher classification accuracy rate.
If S503, sample to be tested are to determine area sample, Direct Classification, to obtain entity relationship.
SVM classifier directly can be exported as a result, entity relationship can be obtained to determining area sample.
If S504, sample to be tested are not determining area samples, secondary classification is carried out by KNN classifier.
If sample to be tested is not determining area sample, i.e., sample to be tested is fuzzy region sample, and KNN classification can be used Device carries out secondary classification, can obtain entity relationship.
S103, by the relationship of multiple entities in entity sets and entity sets according to preset data mode construction knowledge graph Spectrum.
It, can be by entity-relation-after extracting the relationship of entity in entity sets and entity sets in industry data Entity triple is stored as RDF data into the database of knowledge mapping.
Optionally, the embodiment of the present application can also realize the formulation of mapping principle by visual specification configuration tool.
The application implements the construction method of the knowledge mapping provided, and the building by domain knowledge map can be the industry Platform the case where solving isolated internal all kinds of each dimension datas, dispersion, effectively data link is got up, to find difference Existing data relationship carries out data mining between department's different role.To maximize the mining data resource of degree, complete structure The knowledge mapping built may be that subsequent recommendation, search, intelligent customer service etc. provide solid foundation.
Embodiment 2,
The embodiment of the present application provides a kind of construction device of knowledge mapping, and referring to fig. 6, which can be with Include:
Acquiring unit 501, for obtaining industry data.
Acquiring unit 501 can be also used for obtaining multiple realities in the entity sets and entity sets in industry data The relationship of body.
Construction unit 502, for by the relationship of multiple entities in entity sets and entity sets according to preset data mould Formula constructs knowledge mapping.
Optionally, acquiring unit 501 can be specifically used for: if industry data is structural data, be advised according to D2RML Model and Map Profile are by the Knowledge conversion in industry data at the relationship of multiple entities in entity sets and entity sets.
Optionally, acquiring unit 501 can be specifically used for: if industry data is unstructured data, according to CRF mould Type extracts the entity sets in industry data, and according to more in the entity sets in SVM-KNN classification method extraction industry data The relationship of a entity.
Optionally, acquiring unit 501 can be specifically used for:
Target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website, wherein seed vocabulary is industry Dedicated vocabulary, target webpage include the external linkage of web document and encyclopaedia webpage;According to predetermined depth value to each website pair The target webpage answered carries out the content that data collect website;If the appearance frequency of corresponding seed vocabulary in the content of website Degree is more than threshold value, then using this website as corresponding syndicated data source;Industry data is obtained from syndicated data source.
The embodiment of the present invention provides a kind of computer readable storage medium for storing one or more programs, it is one or Multiple programs include instruction, and described instruction knows that the computer execution as described in Fig. 2-Fig. 5 Know the construction method of map.
The embodiment of the present invention provides a kind of computer program product comprising instruction, when instruction is run on computers When, so that computer executes the construction method of the knowledge mapping as described in Fig. 2-Fig. 5.
The embodiment of the present invention provides a kind of construction device of knowledge mapping, comprising: processor and memory, memory are used In storage program, processor calls the program of memory storage, to execute the building of the knowledge mapping as described in Fig. 2-Fig. 5 Method.
By the construction device of knowledge mapping in an embodiment of the present invention, computer readable storage medium, computer journey Sequence product can be applied to the above method, therefore, can be obtained technical effect see also above method embodiment, this hair Details are not described herein for bright embodiment.
It should be noted that above-mentioned each unit can be the processor individually set up, also can integrate controller certain It is realized in one processor, in addition it is also possible to be stored in the form of program code in the memory of controller, by controller Some processor calls and executes the function of the above each unit.Processor described here can be a central processing unit (Central Processing Unit, CPU) or specific integrated circuit (Application Specific Integrated Circuit, ASIC), or be arranged to implement one or more integrated circuits of the embodiment of the present application.
It should be understood that magnitude of the sequence numbers of the above procedures are not meant to execute suitable in the various embodiments of the application Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present application Process constitutes any restriction.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed Scope of the present application.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method, it can be with It realizes by another way.For example, apparatus embodiments described above are merely indicative, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of equipment or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When being realized using software program, can entirely or partly realize in the form of a computer program product.The computer Program product includes one or more computer instructions.On computers load and execute computer program instructions when, all or It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center Word user line (Digital Subscriber Line, DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another A web-site, computer, server or data center are transmitted.The computer readable storage medium can be computer Any usable medium that can be accessed either includes the numbers such as one or more server, data centers that medium can be used to integrate According to storage equipment.The usable medium can be magnetic medium (for example, floppy disk, hard disk, tape), optical medium (for example, DVD), Or semiconductor medium (such as solid state hard disk (Solid State Disk, SSD)) etc..
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.

Claims (11)

1. a kind of construction method of knowledge mapping characterized by comprising
Obtain industry data;
Obtain the relationship of multiple entities in the entity sets and the entity sets in the industry data;
The relationship of multiple entities in the entity sets and the entity sets is known according to described in preset data mode construction Know map.
2. the construction method of knowledge mapping according to claim 1, which is characterized in that described to obtain in the industry data Entity sets and the entity sets in multiple entities relationship, comprising:
If the industry data is structural data, according to relational database to resource description framework mapping language D2RML Specification and Map Profile by the Knowledge conversion in the industry data at more in the entity sets and the entity sets The relationship of a entity.
3. the construction method of knowledge mapping according to claim 1, which is characterized in that described to obtain in the industry data Entity sets and the entity sets in multiple entities relationship, comprising:
If the industry data is unstructured data, according in industry data described in condition random field CRF model extraction Entity sets, and the entity in the industry data is extracted according to support vector machines-k nearest neighbor SVM-KNN classification method The relationship of multiple entities in set.
4. the construction method of knowledge mapping according to claim 1, which is characterized in that the acquisition industry data, comprising:
The target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website, wherein the seed vocabulary is The dedicated vocabulary of industry, the target webpage include the external linkage of web document and encyclopaedia webpage;
The content that data collect the website is carried out to the corresponding target webpage in each website according to predetermined depth value;
If the occurrence frequency of corresponding seed vocabulary is more than threshold value in the content of the website, using this website as corresponding Syndicated data source;
The industry data is obtained from the syndicated data source.
5. a kind of construction device of knowledge mapping characterized by comprising
Acquiring unit, for obtaining industry data;
Multiple realities in the acquiring unit, the entity sets for being also used to obtain in the industry data and the entity sets The relationship of body;
Construction unit, for by the relationship of multiple entities in the entity sets and the entity sets according to preset data mould Formula constructs the knowledge mapping.
6. the construction device of knowledge mapping according to claim 5, which is characterized in that the acquiring unit is specifically used for:
If the industry data is structural data, according to relational database to resource description framework mapping language D2RML Specification and Map Profile by the Knowledge conversion in the industry data at more in the entity sets and the entity sets The relationship of a entity.
7. the construction device of knowledge mapping according to claim 5, which is characterized in that the acquiring unit is specifically used for:
If the industry data is unstructured data, according to the entity sets in industry data described in CRF model extraction, And the relationship of multiple entities in the entity sets in the industry data is extracted according to SVM-KNN classification method.
8. the construction device of knowledge mapping according to claim 5, which is characterized in that the acquiring unit is specifically used for:
The target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website, wherein the seed vocabulary is The dedicated vocabulary of industry, the target webpage include the external linkage of web document and encyclopaedia webpage;
The content that data collect the website is carried out to the corresponding target webpage in each website according to predetermined depth value;
If the occurrence frequency of corresponding seed vocabulary is more than threshold value in the content of the website, using this website as corresponding Syndicated data source;
The industry data is obtained from the syndicated data source.
9. a kind of computer readable storage medium for storing one or more programs, which is characterized in that one or more of journeys Sequence includes instruction, and it is according to any one of claims 1-4 that described instruction when executed by a computer executes the computer The construction method of knowledge mapping.
10. a kind of computer program product comprising instruction, which is characterized in that when described instruction is run on computers, make Obtain the construction method that the computer executes knowledge mapping according to any one of claims 1-4.
11. a kind of construction device of knowledge mapping characterized by comprising processor and memory, memory is for storing journey Sequence, processor calls the program of memory storage, to execute the building side of knowledge mapping according to any one of claims 1-4 Method.
CN201811236863.3A 2018-10-23 2018-10-23 The construction method and device of knowledge mapping Pending CN109446341A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811236863.3A CN109446341A (en) 2018-10-23 2018-10-23 The construction method and device of knowledge mapping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811236863.3A CN109446341A (en) 2018-10-23 2018-10-23 The construction method and device of knowledge mapping

Publications (1)

Publication Number Publication Date
CN109446341A true CN109446341A (en) 2019-03-08

Family

ID=65547730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811236863.3A Pending CN109446341A (en) 2018-10-23 2018-10-23 The construction method and device of knowledge mapping

Country Status (1)

Country Link
CN (1) CN109446341A (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960810A (en) * 2019-03-28 2019-07-02 科大讯飞(苏州)科技有限公司 A kind of entity alignment schemes and device
CN110245241A (en) * 2019-06-18 2019-09-17 卓尔智联(武汉)研究院有限公司 Plastics knowledge mapping construction device, method and computer readable storage medium
CN110275919A (en) * 2019-06-18 2019-09-24 合肥工业大学 Data integrating method and device
CN110298036A (en) * 2019-06-06 2019-10-01 昆明理工大学 A kind of online medical text symptom identification method based on part of speech increment iterative
CN110489560A (en) * 2019-06-19 2019-11-22 民生科技有限责任公司 The little Wei enterprise portrait generation method and device of knowledge based graphical spectrum technology
CN110489395A (en) * 2019-07-27 2019-11-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Automatically the method for multi-source heterogeneous data knowledge is obtained
CN110597969A (en) * 2019-08-12 2019-12-20 中国农业大学 Agricultural knowledge intelligent question and answer method and system and electronic equipment
CN110750647A (en) * 2019-10-17 2020-02-04 北京华宇信息技术有限公司 Construction method of ELP model of multi-source heterogeneous information data
CN110750650A (en) * 2019-09-30 2020-02-04 中盈优创资讯科技有限公司 Construction method and device of enterprise knowledge graph
CN110781249A (en) * 2019-10-16 2020-02-11 华电国际电力股份有限公司技术服务分公司 Knowledge graph-based multi-source data fusion method and device for thermal power plant
CN110795567A (en) * 2019-09-29 2020-02-14 北京远舢智能科技有限公司 Knowledge graph platform
CN110990586A (en) * 2019-12-02 2020-04-10 浪潮软件股份有限公司 Method and device for acquiring map data
CN111061883A (en) * 2019-10-25 2020-04-24 珠海格力电器股份有限公司 Method, device and equipment for updating knowledge graph and storage medium
CN111104525A (en) * 2019-12-31 2020-05-05 西安理工大学 Construction method of building design specification knowledge graph based on graph database
CN111241299A (en) * 2020-01-09 2020-06-05 重庆理工大学 Knowledge graph automatic construction method for legal consultation and retrieval system thereof
CN111324609A (en) * 2020-02-17 2020-06-23 腾讯云计算(北京)有限责任公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN111563170A (en) * 2020-04-30 2020-08-21 北京明略软件系统有限公司 Knowledge graph generation method and device, computer storage medium and terminal
WO2020232943A1 (en) * 2019-05-23 2020-11-26 广州市香港科大霍英东研究院 Knowledge graph construction method for event prediction and event prediction method
CN112214611A (en) * 2020-09-24 2021-01-12 远光软件股份有限公司 Construction system and method of enterprise knowledge graph
CN112463984A (en) * 2020-12-04 2021-03-09 北京明略软件系统有限公司 Database mode expansion method, device, equipment and computer readable medium
CN112487212A (en) * 2020-12-18 2021-03-12 清华大学 Method and device for constructing domain knowledge graph
CN112527924A (en) * 2020-12-18 2021-03-19 清华大学 Dynamically updated knowledge graph expansion method and device
CN112765363A (en) * 2021-01-19 2021-05-07 昆明理工大学 Demand map construction method for scientific and technological service demand
CN113505245A (en) * 2021-09-10 2021-10-15 深圳平安综合金融服务有限公司 Knowledge graph generation method, computer readable storage medium and computer device
CN113722509A (en) * 2021-09-07 2021-11-30 中国人民解放军32801部队 Knowledge graph data fusion method based on entity attribute similarity
CN113783876A (en) * 2021-09-13 2021-12-10 国网电子商务有限公司 Network security situation perception method based on graph neural network and related equipment
WO2022051996A1 (en) * 2020-09-10 2022-03-17 西门子(中国)有限公司 Method and apparatus for constructing knowledge graph
WO2023040530A1 (en) * 2021-09-18 2023-03-23 华为技术有限公司 Webpage content traceability method, knowledge graph construction method and related device
CN116955639A (en) * 2023-04-24 2023-10-27 浙商期货有限公司 Method and device for constructing future industry chain knowledge graph and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110035210A1 (en) * 2009-08-10 2011-02-10 Benjamin Rosenfeld Conditional random fields (crf)-based relation extraction system
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
CN106355628A (en) * 2015-07-16 2017-01-25 中国石油化工股份有限公司 Image-text knowledge point marking method and device and image-text mark correcting method and system
CN108446368A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of construction method and equipment of Packaging Industry big data knowledge mapping

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110035210A1 (en) * 2009-08-10 2011-02-10 Benjamin Rosenfeld Conditional random fields (crf)-based relation extraction system
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
CN106355628A (en) * 2015-07-16 2017-01-25 中国石油化工股份有限公司 Image-text knowledge point marking method and device and image-text mark correcting method and system
CN108446368A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of construction method and equipment of Packaging Industry big data knowledge mapping

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘绍毓 等: "基于多分类SVM_KNN的实体关系抽取方法", 《数据采集与处理》 *
胡芳槐: "基于多种数据源的中文知识图谱构建方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960810A (en) * 2019-03-28 2019-07-02 科大讯飞(苏州)科技有限公司 A kind of entity alignment schemes and device
WO2020232943A1 (en) * 2019-05-23 2020-11-26 广州市香港科大霍英东研究院 Knowledge graph construction method for event prediction and event prediction method
CN110298036A (en) * 2019-06-06 2019-10-01 昆明理工大学 A kind of online medical text symptom identification method based on part of speech increment iterative
CN110298036B (en) * 2019-06-06 2022-07-22 昆明理工大学 Online medical text symptom identification method based on part-of-speech incremental iteration
CN110275919A (en) * 2019-06-18 2019-09-24 合肥工业大学 Data integrating method and device
CN110245241A (en) * 2019-06-18 2019-09-17 卓尔智联(武汉)研究院有限公司 Plastics knowledge mapping construction device, method and computer readable storage medium
CN110489560A (en) * 2019-06-19 2019-11-22 民生科技有限责任公司 The little Wei enterprise portrait generation method and device of knowledge based graphical spectrum technology
CN110489395A (en) * 2019-07-27 2019-11-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Automatically the method for multi-source heterogeneous data knowledge is obtained
CN110489395B (en) * 2019-07-27 2022-07-29 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for automatically acquiring knowledge of multi-source heterogeneous data
CN110597969A (en) * 2019-08-12 2019-12-20 中国农业大学 Agricultural knowledge intelligent question and answer method and system and electronic equipment
CN110597969B (en) * 2019-08-12 2022-05-24 中国农业大学 Agricultural knowledge intelligent question and answer method and system and electronic equipment
CN110795567A (en) * 2019-09-29 2020-02-14 北京远舢智能科技有限公司 Knowledge graph platform
CN110750650A (en) * 2019-09-30 2020-02-04 中盈优创资讯科技有限公司 Construction method and device of enterprise knowledge graph
CN110781249A (en) * 2019-10-16 2020-02-11 华电国际电力股份有限公司技术服务分公司 Knowledge graph-based multi-source data fusion method and device for thermal power plant
CN110750647B (en) * 2019-10-17 2020-07-31 北京华宇信息技术有限公司 Method for constructing E L P model of multi-source heterogeneous information data
CN110750647A (en) * 2019-10-17 2020-02-04 北京华宇信息技术有限公司 Construction method of ELP model of multi-source heterogeneous information data
CN111061883B (en) * 2019-10-25 2023-12-08 珠海格力电器股份有限公司 Method, device, equipment and storage medium for updating knowledge graph
CN111061883A (en) * 2019-10-25 2020-04-24 珠海格力电器股份有限公司 Method, device and equipment for updating knowledge graph and storage medium
CN110990586A (en) * 2019-12-02 2020-04-10 浪潮软件股份有限公司 Method and device for acquiring map data
CN111104525A (en) * 2019-12-31 2020-05-05 西安理工大学 Construction method of building design specification knowledge graph based on graph database
CN111104525B (en) * 2019-12-31 2022-03-25 西安理工大学 Construction method of building design specification knowledge graph based on graph database
CN111241299A (en) * 2020-01-09 2020-06-05 重庆理工大学 Knowledge graph automatic construction method for legal consultation and retrieval system thereof
CN111324609A (en) * 2020-02-17 2020-06-23 腾讯云计算(北京)有限责任公司 Knowledge graph construction method and device, electronic equipment and storage medium
CN111563170A (en) * 2020-04-30 2020-08-21 北京明略软件系统有限公司 Knowledge graph generation method and device, computer storage medium and terminal
WO2022051996A1 (en) * 2020-09-10 2022-03-17 西门子(中国)有限公司 Method and apparatus for constructing knowledge graph
CN112214611A (en) * 2020-09-24 2021-01-12 远光软件股份有限公司 Construction system and method of enterprise knowledge graph
CN112214611B (en) * 2020-09-24 2023-10-31 远光软件股份有限公司 Enterprise knowledge graph construction system and method
CN112463984A (en) * 2020-12-04 2021-03-09 北京明略软件系统有限公司 Database mode expansion method, device, equipment and computer readable medium
CN112463984B (en) * 2020-12-04 2024-02-27 北京明略软件系统有限公司 Database schema extension method, device, equipment and computer readable medium
CN112527924A (en) * 2020-12-18 2021-03-19 清华大学 Dynamically updated knowledge graph expansion method and device
CN112487212A (en) * 2020-12-18 2021-03-12 清华大学 Method and device for constructing domain knowledge graph
CN112765363A (en) * 2021-01-19 2021-05-07 昆明理工大学 Demand map construction method for scientific and technological service demand
CN113722509A (en) * 2021-09-07 2021-11-30 中国人民解放军32801部队 Knowledge graph data fusion method based on entity attribute similarity
CN113505245A (en) * 2021-09-10 2021-10-15 深圳平安综合金融服务有限公司 Knowledge graph generation method, computer readable storage medium and computer device
CN113783876A (en) * 2021-09-13 2021-12-10 国网电子商务有限公司 Network security situation perception method based on graph neural network and related equipment
CN113783876B (en) * 2021-09-13 2023-10-03 国网数字科技控股有限公司 Network security situation awareness method based on graph neural network and related equipment
WO2023040530A1 (en) * 2021-09-18 2023-03-23 华为技术有限公司 Webpage content traceability method, knowledge graph construction method and related device
CN116955639A (en) * 2023-04-24 2023-10-27 浙商期货有限公司 Method and device for constructing future industry chain knowledge graph and computer equipment

Similar Documents

Publication Publication Date Title
CN109446341A (en) The construction method and device of knowledge mapping
US11790006B2 (en) Natural language question answering systems
US11442932B2 (en) Mapping natural language to queries using a query grammar
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN106447066A (en) Big data feature extraction method and device
CN105843897A (en) Vertical domain-oriented intelligent question and answer system
US20150006528A1 (en) Hierarchical data structure of documents
JP7486250B2 (en) Domain-specific language interpreter and interactive visual interface for rapid screening
CN108874783A (en) Power information O&M knowledge model construction method
CN112434024B (en) Relational database-oriented data dictionary generation method, device, equipment and medium
Rajput et al. BNOSA: A Bayesian network and ontology based semantic annotation framework
CN112925901B (en) Evaluation resource recommendation method for assisting online questionnaire evaluation and application thereof
Holzinger et al. Using ontologies for extracting product features from web pages
US20220129635A1 (en) Semantic model instantiation method, system and apparatus
US20230325384A1 (en) Interactive assistance for executing natural language queries to data sets
CN117312989A (en) Context-aware column semantic recognition method and system based on GCN and RoBERTa
US20210271637A1 (en) Creating descriptors for business analytics applications
JP2023517518A (en) Vector embedding model for relational tables with null or equivalent values
CN114429384B (en) Intelligent product recommendation method and system based on e-commerce platform
CN115982322A (en) Water conservancy industry design field knowledge graph retrieval method and retrieval system
CN113515630B (en) Triplet generation and verification method and device, electronic equipment and storage medium
CN113379432B (en) Sales system customer matching method based on machine learning
CN114911940A (en) Text emotion recognition method and device, electronic equipment and storage medium
CN113344674A (en) Product recommendation method, device, equipment and storage medium based on user purchasing power
CN110930189A (en) Personalized marketing method based on user behaviors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190308