CN109446341A - The construction method and device of knowledge mapping - Google Patents
The construction method and device of knowledge mapping Download PDFInfo
- Publication number
- CN109446341A CN109446341A CN201811236863.3A CN201811236863A CN109446341A CN 109446341 A CN109446341 A CN 109446341A CN 201811236863 A CN201811236863 A CN 201811236863A CN 109446341 A CN109446341 A CN 109446341A
- Authority
- CN
- China
- Prior art keywords
- data
- entity
- entity sets
- knowledge mapping
- industry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013507 mapping Methods 0.000 title claims abstract description 79
- 238000010276 construction Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000000605 extraction Methods 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000005055 memory storage Effects 0.000 claims description 3
- 241001269238 Data Species 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 13
- 239000000284 extract Substances 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of knowledge mapping construction method and devices, are related to knowledge mapping field, for solving the problems, such as that a variety of data can not be integrated.This method comprises: obtaining industry data, the relationship of multiple entities in the entity sets and the entity sets in the industry data is obtained;By the relationship of multiple entities in the entity sets and the entity sets according to knowledge mapping described in preset data mode construction.The embodiment of the present application is applied to integrate a variety of industry datas.
Description
Technical field
The present invention relates to knowledge mapping field more particularly to the construction methods and device of a kind of knowledge mapping.
Background technique
Electric business platform is the platform that both parties trade on line around commodity.So electric business knowledge mapping
Core is commodity.There is the polygonal color ginseng such as brand quotient, platform operation, consumer, national structure, logistics provider in entire business activity
With.Electric quotient data source is many and diverse, is related to the multi-dimensional datas such as operation platform, customer service consulting, commodity data, and each data
The structuring degree difference in source is larger, and the complicated multiplicity of the association between the entity that is related to, entity relationship is dispersed in each system
In, and the structuring degree of the data of each role is different, while a large amount of unstructured text is easily generated in process of exchange
Notebook data, existing technology are difficult to these data unified integrations.
Summary of the invention
Embodiments herein provides the construction method and device of a kind of knowledge graph spectrometry, can not for solving a variety of data
The problem of integrating.
In order to achieve the above objectives, embodiments herein adopts the following technical scheme that
In a first aspect, a kind of construction method of knowledge mapping is provided, this method comprises:
Obtain industry data;Obtain multiple entities in the entity sets and the entity sets in the industry data
Relationship;
By the relationship of multiple entities in the entity sets and the entity sets according to preset data mode construction institute
State knowledge mapping.
Second aspect provides a kind of construction device of knowledge mapping, which includes:
Acquiring unit, for obtaining industry data;
It is more in the acquiring unit, the entity sets for being also used to obtain in the industry data and the entity sets
The relationship of a entity;
Construction unit, for by the relationship of multiple entities in the entity sets and the entity sets according to present count
According to knowledge mapping described in mode construction.
The third aspect, provides a kind of computer readable storage medium for storing one or more programs, it is one or
Multiple programs include instruction, and described instruction makes the side of the computer execution as described in relation to the first aspect when executed by a computer
Method.
Fourth aspect provides a kind of computer program product comprising instruction, when described instruction is run on computers
When, so that computer executes the construction method of knowledge mapping as described in relation to the first aspect.
5th aspect, provides a kind of construction device of knowledge mapping, comprising: processor and memory, memory is for depositing
Program is stored up, processor calls the program of memory storage, to execute the construction method of knowledge mapping described in above-mentioned first aspect.
The construction method and device for the knowledge mapping that embodiments herein provides, obtain entity sets from industry data
And in entity sets multiple entities relationship, the multiple entity relationships in entity sets and entity sets that then will acquire by
It is merged according to preset data mode, ultimately forms knowledge mapping, solve the case where isolated all kinds of each dimension datas, dispersion,
Effectively data link is got up.
Detailed description of the invention
Fig. 1 is the block schematic illustration of the knowledge mapping for the building that embodiments herein provides;
Fig. 2 is the construction method flow diagram one for the knowledge mapping that embodiments herein provides;
Fig. 3 is the construction method flow diagram two for the knowledge mapping that embodiments herein provides;
Fig. 4 is the construction method flow diagram three for the knowledge mapping that embodiments herein provides;
Fig. 5 is the construction method flow diagram four for the knowledge mapping that embodiments herein provides;
Fig. 6 is the construction device structural schematic diagram for the knowledge mapping that embodiments herein provides.
Specific embodiment
The application implements the construction method and device of the knowledge mapping provided, which can be electric business knowledge graph
Spectrum by obtaining industry data, and obtains the relationship in the entity sets in industry data and entity sets between multiple entities,
By the relationship of multiple entities in entity sets and entity sets according to preset data mode construction knowledge mapping.
Fig. 1 is the general frame figure of the knowledge mapping of building provided by the embodiments of the present application.Shown in referring to Fig.1, whole frame
Frame includes original data layer 110, internet information acquisition and cleaning layer 120, Knowledge Extraction layer 130, knowledge fusion layer 140, knows
Know 150 three parts of storage layer.
Original data layer 110 includes internal data and external data, wherein internal data may include relational data etc.
Structural data, external data may include the unstructured datas such as the content of video website.
Internet information acquisition and clear layer 120 include acquisition crawler, acquisition tasks scheduling system and resolver;Wherein,
Acquiring crawler may include industrial sustainability crawler and general crawler;Resolver may include industrial sustainability resolver.
Knowledge Extraction layer 130 maps (database to resource including database to resource description framework
Description framework, D2R) and industrial sustainability extraction;Wherein, D2R mapping includes configuration file mapping, updates and match
It sets and task scheduling system, configuration file mapping includes: table-concept, record-entity, column name-attribute, record data-category
Property value and table association-relationship;Task scheduling system includes: original importing, batch updating and incremental update;Industrial sustainability is taken out
Taking may include packaging Wrapper plug-in unit and task speed-regulating system, task scheduling system include update detection and period more
Newly.When the data in original data layer 110 are internal data, are mapped by D2R and extract knowledge;When in original data layer 110
Data when being external data, need after internet information acquisition and clear layer 120 are handled, then pass through industrial sustainability extraction
Knowledge.
Knowledge fusion layer 140 includes that data Layer, mode layer, Conflict solving and knowledge mapping update;Wherein, data Layer includes
Entity is aligned, entity type alignment and entity attribute alignment, mode layer are raw including enclosing relationship generation and concept attribute up and down
At it includes mode layer update and data Layer update that knowledge mapping, which updates, and Conflict solving, which includes that conflict is automatic, to be detected and conflict
It is automatic to solve.The data pattern of data Layer can be defined by human expert, specify entity in knowledge mapping, relationship and each
The attribute of entity and relationship.When being integrated into knowledge mapping after the knowledge that multiple data sources extract is merged, need to solve
A plurality of types of data collision problems, for example, a phrase corresponds to, multiple entities, entity attribute name are inconsistent, entity attribute lacks
It loses, situations such as entity property value is inconsistent, entity property value one-to-many mapping.
Knowledge Storage layer 150 includes diagram data storage and distributed document index.
The data that the embodiment of the present application is imported into knowledge mapping are specially structural data.In data importing process,
It is related to the processes such as pretreatment, entity alignment, attribute alignment, the attribute selection to structural data, finally by the knot for the condition that meets
Structure data import knowledge mapping.Subsequently through modes such as increment iteratives, guarantee sustainable renewal and the extension of knowledge mapping.
The building process of this knowledge mapping is described in detail using detailed embodiment below.
Embodiment 1,
The embodiment of the present application provides a kind of construction method of knowledge mapping, and referring to fig. 2, this method may include
S101-S103:
S101, industry data is obtained.
By taking the building of the knowledge mapping of electric business industry as an example, the building of electric business knowledge mapping is substantially using disappearing inside industry
Take the related datas such as the relevant vertical FIELD Data of data, electric business platform, commodity data as data source.These data generate
It is preferable with business associativity during electric business platform operation or publicity, therefore usually have the advantage that
Industry covering is compared with wide, industry depth is considerable.Data source is all from the data of electric business platform strong correlation, data dependence
It combines closely with industry, the basic data for covering electric business industry and being related to, and data are as supplementing using during e-commerce operation, data
With industry depth;
High reliablity: the internal structured data of industry are used to support the business of enterprise itself, therefore reliability is very high;
Business data is stored in relevant database, and relational data need to only be carried out to a degree of conversion can be obtained structuring
Triple data, reliability are good;
Structural strong: internal structured data majority is stored by relational database;Open industry data base
It is structural good by being delivered after the web editor of better quality in sheet.
Therefore, carry out electric business knowledge mapping building when, can pay the utmost attention to using in industry internal structured data and
Open industry knowledge base.
Optionally, before obtaining industry data, the data pattern of knowledge mapping can be defined first.Data pattern is to know
Know part most crucial in map, data pattern can be defined by human expert, and use top-down knowledge mapping mode,
It, can will be from entity, entity relationship or the entity category in the industry data that various data sources are got after defining data pattern
Property etc. carry out data plane filling, wherein knowledge mapping can be improved by the data pattern that people expert defines knowledge mapping
The integrality and accuracy of data.
Optionally, referring to fig. 3, S101 may include S201-S204:
S201, target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website.
Wherein, seed vocabulary is the dedicated vocabulary of industry, and target webpage includes the external linkage of web document and encyclopaedia webpage.
Specifically, can use some seed vocabulary that can represent industry, in the search of search engine and online encyclopaedia
It is scanned in interface.For the web document that search engine returns, the data result for arranging forward is directly appended to target
Web page listings;For the page that encyclopaedia returns, it is introduced into corresponding article page, is then found in articles page common outer
The exterior chain of portion link and bibliography, and these two types link is added to target webpage list.
S202, the corresponding target webpage in each website progress data are collected in website according to predetermined depth value
Hold.
Target webpage is sorted out according to website, and interior acquisition of standing, the depth capacity of acquisition are carried out to obtained webpage
It can be set to 3 layers, i.e., since homepage, using depth-first acquisition strategies, acquire 3 layers in total.Usual industry data website
The structure of complete website can be traversed under 3 layers of depth.
If S203, website content in corresponding seed vocabulary occurrence frequency be more than threshold value, using this website as
Corresponding syndicated data source.
Content analysis to website, and preservation is extracted to the content of each website collected webpage;For
The content of website illustrates that the website is related to the sector if wherein the frequency comprising industry keywords is more than threshold value, can be with
As corresponding syndicated data source.
S204, industry data is obtained from syndicated data source.
After determining syndicated data source, industry data can be obtained from syndicated data source by tools such as search engines.
The relationship of multiple entities in entity sets and entity sets in S102, acquisition industry data.
The key problem in technology point of knowledge mapping building is entity extraction and entity relation extraction, by the entity and reality in data
Body Relation extraction comes out, and is integrated, available more neat data, convenient for management and application.
Optionally, referring to fig. 4, S102 may include:
If S301, industry data are structural data, according to relational database to resource description framework mapping language
(relation database to resource description framework mapping language, D2RML)
Specification and Map Profile by the Knowledge conversion in industry data at the relationship of multiple entities in entity sets and entity sets.
When carrying out Knowledge Mapping from structural data, it is thus necessary to determine that the basic structure in structural data, including it is each
Association between the meaning and table of table, while determining the structure of knowledge mapping, then using D2RML language structuring number
Concept in table and knowledge mapping or entity associated in are got up.It, can be according to configuration after defining Map Profile
The conversion knowledge from source database, detailed process can be with are as follows: the target data configured in Knowledge conversion engine connection profile
The data in corresponding table are read in library, in relational database table and column data be mapped to respectively the entity of concept, entity it
Between relationship and entity attributes, then will the obtained knowledge store of mapping into knowledge mapping.
Optionally, referring to fig. 4, S102 may include:
If S401, industry data are unstructured data, according to conditional random field models (conditional
Random field algorithm, CRF) entity sets in industry data are extracted, and according to support vector machines-k nearest neighbor point
Class method (support vector machine- k-nearest neighbor, SVM-KNN) extracts the reality in industry data
The relationship of multiple entities in body set.
Specifically, CRF model be another group of output stochastic variable under the conditions of given one group of input stochastic variable condition it is general
Rate distributed model.Wherein, the parameterized form of CRF model can be with are as follows: for observation sequence x=(x1,x2..., xn) and state sequence
Arrange y=(y1,y2,…,yn), if P (y | x) it is linear chain conditional random, then under conditions of stochastic variable X value is x, with
Machine variable Y value is that the form of the conditional probability of y is as follows:
Wherein,fkAnd hlIt is characterized function, λkAnd μl
For corresponding weight, Z (x) is normalization factor.
Naming Entity recognition process is exactly sequence labelling process, regards sentence as an observation sequence, each in sentence
Character or word regard a symbol as, then assign a state to symbol.Maximization parameter lambda is carried out by training setkAnd μl,
Obtain the conditional probability for meeting condition.
For a list entries, most probable output token sequence namely optimum state sequence, it may be assumed that
Based on the process of the entity in CRF model extraction unstructured data, it is specifically as follows: when corpus is training corpus
When, by corpus it is preprocessed with feature selecting after, obtain data characteristics collection, data characteristics collection through CRF training and imported into CRF mould
Type obtains entity sets;When corpus is testing material, corpus is imported to CRF model after pretreatment, obtains entity sets.
Wherein, corpus is basic matching database;The feature of selection may include: language feature, contextual feature and entity boundary
Feature, but not limited to this.
It should be noted that language feature can reflect out the essential information of character, it is a kind of essential characteristic.Due to textual data
According to randomness and freedom, participle mistake will appear to text participle and eventually lead to entity and can not identify, and word granularity includes
Recognition effect can be improved in the information such as more entity internal structures, therefore the embodiment of the present application is special as language using word granularity
Sign.Word granularity language feature is as shown in table 1 below.
1 word granularity language feature of table indicates
Label | Feature | Description |
1 | Character(-2) | The first two character |
2 | Character(-1) | Previous character |
3 | Character(0) | Current character |
4 | Character(1) | The latter character |
5 | Character(2) | Latter two character |
Contextual feature refers to the relation of interdependence in entity vocabulary length of window between observation, and this feature can be very
The correlation of the good dependence depicted inside entity and entity and non-physical;
Entity boundary is characterized in determining the important evidence of character boundary feature locations information, the boundary pair of determining name entity
Name Entity recognition plays a crucial role.The embodiment of the present application describes the word side of observation sequence using BIO coding mode
Boundary's feature simultaneously encodes entity type, the wherein beginning of B presentation-entity, the remainder of I presentation-entity, and O indicates non-reality
Body sequence.
CRF modeling tool, which needs to formulate template file (Template File) using user, is trained training corpus,
Feature templates are as shown in table 2 below.
2 essential characteristic template of table
Signature identification | Feature description |
U00:%x [- 2,0] | The first two word of current word |
U01:%x [- 1,0] | The previous word of current word |
U02:%x [0,0] | Current word |
U03:%x [0,1] | The latter word of current word |
U04:%x [0,2] | Latter two word of current word |
U05:%x [- 1,0]/%x [0,0] | Current word and previous combinatorics on words |
U06:%x [0,0]/%x [0,1] | Current word and the latter combinatorics on words |
U07:%x [- 1,0]/%x [0,0]/%x [1,0] | Current word and preceding the latter combinatorics on words |
U08:%x [- 2,0]/%x [- 1,0]/%x [0,0] | Current word and the first two combinatorics on words |
U09:%x [0,0]/%x [1,0]/%x [2,0] | Current word and latter two combinatorics on words |
Wherein, every row represents a feature templates, can determine a mark (Token), feature templates in training data
In basic format %x [row, col], wherein row is indicated and the opposite line number of current Token, the absolute columns of col expression.
Specifically, SVM-KNN classification method is the combined method of svm classifier method and KNN classification method.SVM-KNN points
Class method performance is good and algorithm complexity is low, has been applied to the multinomial text-processing research such as text classification, proper noun extraction,
And achieve better effects.Therefore, the embodiment of the present application can be used SVM-KNN classification method and realize that the entity of industry data closes
System extracts.
Optionally, referring to fig. 5, it can wrap according to the entity relationship that SVM-KNN classification method extracts industry data
Include S501-S504:
S501, corpus pretreatment and feature vector are formed, and sample to be tested is obtained.
Wherein, corpus may include training corpus and testing material;Corpus pretreatment may include that part-of-speech tagging, stem mention
It takes, syntactic analysis, predicate extracts and semantic character labeling etc..The feature that the embodiment of the present application uses has: entity and context are special
Word feature between sign, sentence verb root feature, physical distance feature, entity extension feature, semantic role feature and entity,
In, contextual feature may include the word of entity and its front and back, the stem of word and part of speech, and semantic role feature may include predicate
(predicate) feature, semantic role are to (semantic role pair) feature, the p- predicate feature of semantic role.
S502, according to svm classifier model treatment sample to be tested.
SVM classifier theoretical frame is perfect, versatility and strong robustness, calculates simple, but also has stronger anti-noise
Sound ability and higher classification accuracy rate.
If S503, sample to be tested are to determine area sample, Direct Classification, to obtain entity relationship.
SVM classifier directly can be exported as a result, entity relationship can be obtained to determining area sample.
If S504, sample to be tested are not determining area samples, secondary classification is carried out by KNN classifier.
If sample to be tested is not determining area sample, i.e., sample to be tested is fuzzy region sample, and KNN classification can be used
Device carries out secondary classification, can obtain entity relationship.
S103, by the relationship of multiple entities in entity sets and entity sets according to preset data mode construction knowledge graph
Spectrum.
It, can be by entity-relation-after extracting the relationship of entity in entity sets and entity sets in industry data
Entity triple is stored as RDF data into the database of knowledge mapping.
Optionally, the embodiment of the present application can also realize the formulation of mapping principle by visual specification configuration tool.
The application implements the construction method of the knowledge mapping provided, and the building by domain knowledge map can be the industry
Platform the case where solving isolated internal all kinds of each dimension datas, dispersion, effectively data link is got up, to find difference
Existing data relationship carries out data mining between department's different role.To maximize the mining data resource of degree, complete structure
The knowledge mapping built may be that subsequent recommendation, search, intelligent customer service etc. provide solid foundation.
Embodiment 2,
The embodiment of the present application provides a kind of construction device of knowledge mapping, and referring to fig. 6, which can be with
Include:
Acquiring unit 501, for obtaining industry data.
Acquiring unit 501 can be also used for obtaining multiple realities in the entity sets and entity sets in industry data
The relationship of body.
Construction unit 502, for by the relationship of multiple entities in entity sets and entity sets according to preset data mould
Formula constructs knowledge mapping.
Optionally, acquiring unit 501 can be specifically used for: if industry data is structural data, be advised according to D2RML
Model and Map Profile are by the Knowledge conversion in industry data at the relationship of multiple entities in entity sets and entity sets.
Optionally, acquiring unit 501 can be specifically used for: if industry data is unstructured data, according to CRF mould
Type extracts the entity sets in industry data, and according to more in the entity sets in SVM-KNN classification method extraction industry data
The relationship of a entity.
Optionally, acquiring unit 501 can be specifically used for:
Target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website, wherein seed vocabulary is industry
Dedicated vocabulary, target webpage include the external linkage of web document and encyclopaedia webpage;According to predetermined depth value to each website pair
The target webpage answered carries out the content that data collect website;If the appearance frequency of corresponding seed vocabulary in the content of website
Degree is more than threshold value, then using this website as corresponding syndicated data source;Industry data is obtained from syndicated data source.
The embodiment of the present invention provides a kind of computer readable storage medium for storing one or more programs, it is one or
Multiple programs include instruction, and described instruction knows that the computer execution as described in Fig. 2-Fig. 5
Know the construction method of map.
The embodiment of the present invention provides a kind of computer program product comprising instruction, when instruction is run on computers
When, so that computer executes the construction method of the knowledge mapping as described in Fig. 2-Fig. 5.
The embodiment of the present invention provides a kind of construction device of knowledge mapping, comprising: processor and memory, memory are used
In storage program, processor calls the program of memory storage, to execute the building of the knowledge mapping as described in Fig. 2-Fig. 5
Method.
By the construction device of knowledge mapping in an embodiment of the present invention, computer readable storage medium, computer journey
Sequence product can be applied to the above method, therefore, can be obtained technical effect see also above method embodiment, this hair
Details are not described herein for bright embodiment.
It should be noted that above-mentioned each unit can be the processor individually set up, also can integrate controller certain
It is realized in one processor, in addition it is also possible to be stored in the form of program code in the memory of controller, by controller
Some processor calls and executes the function of the above each unit.Processor described here can be a central processing unit
(Central Processing Unit, CPU) or specific integrated circuit (Application Specific
Integrated Circuit, ASIC), or be arranged to implement one or more integrated circuits of the embodiment of the present application.
It should be understood that magnitude of the sequence numbers of the above procedures are not meant to execute suitable in the various embodiments of the application
Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present application
Process constitutes any restriction.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
Scope of the present application.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method, it can be with
It realizes by another way.For example, apparatus embodiments described above are merely indicative, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of equipment or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When being realized using software program, can entirely or partly realize in the form of a computer program product.The computer
Program product includes one or more computer instructions.On computers load and execute computer program instructions when, all or
It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
Word user line (Digital Subscriber Line, DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another
A web-site, computer, server or data center are transmitted.The computer readable storage medium can be computer
Any usable medium that can be accessed either includes the numbers such as one or more server, data centers that medium can be used to integrate
According to storage equipment.The usable medium can be magnetic medium (for example, floppy disk, hard disk, tape), optical medium (for example, DVD),
Or semiconductor medium (such as solid state hard disk (Solid State Disk, SSD)) etc..
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any
Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain
Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.
Claims (11)
1. a kind of construction method of knowledge mapping characterized by comprising
Obtain industry data;
Obtain the relationship of multiple entities in the entity sets and the entity sets in the industry data;
The relationship of multiple entities in the entity sets and the entity sets is known according to described in preset data mode construction
Know map.
2. the construction method of knowledge mapping according to claim 1, which is characterized in that described to obtain in the industry data
Entity sets and the entity sets in multiple entities relationship, comprising:
If the industry data is structural data, according to relational database to resource description framework mapping language D2RML
Specification and Map Profile by the Knowledge conversion in the industry data at more in the entity sets and the entity sets
The relationship of a entity.
3. the construction method of knowledge mapping according to claim 1, which is characterized in that described to obtain in the industry data
Entity sets and the entity sets in multiple entities relationship, comprising:
If the industry data is unstructured data, according in industry data described in condition random field CRF model extraction
Entity sets, and the entity in the industry data is extracted according to support vector machines-k nearest neighbor SVM-KNN classification method
The relationship of multiple entities in set.
4. the construction method of knowledge mapping according to claim 1, which is characterized in that the acquisition industry data, comprising:
The target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website, wherein the seed vocabulary is
The dedicated vocabulary of industry, the target webpage include the external linkage of web document and encyclopaedia webpage;
The content that data collect the website is carried out to the corresponding target webpage in each website according to predetermined depth value;
If the occurrence frequency of corresponding seed vocabulary is more than threshold value in the content of the website, using this website as corresponding
Syndicated data source;
The industry data is obtained from the syndicated data source.
5. a kind of construction device of knowledge mapping characterized by comprising
Acquiring unit, for obtaining industry data;
Multiple realities in the acquiring unit, the entity sets for being also used to obtain in the industry data and the entity sets
The relationship of body;
Construction unit, for by the relationship of multiple entities in the entity sets and the entity sets according to preset data mould
Formula constructs the knowledge mapping.
6. the construction device of knowledge mapping according to claim 5, which is characterized in that the acquiring unit is specifically used for:
If the industry data is structural data, according to relational database to resource description framework mapping language D2RML
Specification and Map Profile by the Knowledge conversion in the industry data at more in the entity sets and the entity sets
The relationship of a entity.
7. the construction device of knowledge mapping according to claim 5, which is characterized in that the acquiring unit is specifically used for:
If the industry data is unstructured data, according to the entity sets in industry data described in CRF model extraction,
And the relationship of multiple entities in the entity sets in the industry data is extracted according to SVM-KNN classification method.
8. the construction device of knowledge mapping according to claim 5, which is characterized in that the acquiring unit is specifically used for:
The target webpage is classified according to seed bilingual lexicon acquisition target webpage, and according to website, wherein the seed vocabulary is
The dedicated vocabulary of industry, the target webpage include the external linkage of web document and encyclopaedia webpage;
The content that data collect the website is carried out to the corresponding target webpage in each website according to predetermined depth value;
If the occurrence frequency of corresponding seed vocabulary is more than threshold value in the content of the website, using this website as corresponding
Syndicated data source;
The industry data is obtained from the syndicated data source.
9. a kind of computer readable storage medium for storing one or more programs, which is characterized in that one or more of journeys
Sequence includes instruction, and it is according to any one of claims 1-4 that described instruction when executed by a computer executes the computer
The construction method of knowledge mapping.
10. a kind of computer program product comprising instruction, which is characterized in that when described instruction is run on computers, make
Obtain the construction method that the computer executes knowledge mapping according to any one of claims 1-4.
11. a kind of construction device of knowledge mapping characterized by comprising processor and memory, memory is for storing journey
Sequence, processor calls the program of memory storage, to execute the building side of knowledge mapping according to any one of claims 1-4
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811236863.3A CN109446341A (en) | 2018-10-23 | 2018-10-23 | The construction method and device of knowledge mapping |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811236863.3A CN109446341A (en) | 2018-10-23 | 2018-10-23 | The construction method and device of knowledge mapping |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109446341A true CN109446341A (en) | 2019-03-08 |
Family
ID=65547730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811236863.3A Pending CN109446341A (en) | 2018-10-23 | 2018-10-23 | The construction method and device of knowledge mapping |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446341A (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960810A (en) * | 2019-03-28 | 2019-07-02 | 科大讯飞(苏州)科技有限公司 | A kind of entity alignment schemes and device |
CN110245241A (en) * | 2019-06-18 | 2019-09-17 | 卓尔智联(武汉)研究院有限公司 | Plastics knowledge mapping construction device, method and computer readable storage medium |
CN110275919A (en) * | 2019-06-18 | 2019-09-24 | 合肥工业大学 | Data integrating method and device |
CN110298036A (en) * | 2019-06-06 | 2019-10-01 | 昆明理工大学 | A kind of online medical text symptom identification method based on part of speech increment iterative |
CN110489560A (en) * | 2019-06-19 | 2019-11-22 | 民生科技有限责任公司 | The little Wei enterprise portrait generation method and device of knowledge based graphical spectrum technology |
CN110489395A (en) * | 2019-07-27 | 2019-11-22 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Automatically the method for multi-source heterogeneous data knowledge is obtained |
CN110597969A (en) * | 2019-08-12 | 2019-12-20 | 中国农业大学 | Agricultural knowledge intelligent question and answer method and system and electronic equipment |
CN110750647A (en) * | 2019-10-17 | 2020-02-04 | 北京华宇信息技术有限公司 | Construction method of ELP model of multi-source heterogeneous information data |
CN110750650A (en) * | 2019-09-30 | 2020-02-04 | 中盈优创资讯科技有限公司 | Construction method and device of enterprise knowledge graph |
CN110781249A (en) * | 2019-10-16 | 2020-02-11 | 华电国际电力股份有限公司技术服务分公司 | Knowledge graph-based multi-source data fusion method and device for thermal power plant |
CN110795567A (en) * | 2019-09-29 | 2020-02-14 | 北京远舢智能科技有限公司 | Knowledge graph platform |
CN110990586A (en) * | 2019-12-02 | 2020-04-10 | 浪潮软件股份有限公司 | Method and device for acquiring map data |
CN111061883A (en) * | 2019-10-25 | 2020-04-24 | 珠海格力电器股份有限公司 | Method, device and equipment for updating knowledge graph and storage medium |
CN111104525A (en) * | 2019-12-31 | 2020-05-05 | 西安理工大学 | Construction method of building design specification knowledge graph based on graph database |
CN111241299A (en) * | 2020-01-09 | 2020-06-05 | 重庆理工大学 | Knowledge graph automatic construction method for legal consultation and retrieval system thereof |
CN111324609A (en) * | 2020-02-17 | 2020-06-23 | 腾讯云计算(北京)有限责任公司 | Knowledge graph construction method and device, electronic equipment and storage medium |
CN111563170A (en) * | 2020-04-30 | 2020-08-21 | 北京明略软件系统有限公司 | Knowledge graph generation method and device, computer storage medium and terminal |
WO2020232943A1 (en) * | 2019-05-23 | 2020-11-26 | 广州市香港科大霍英东研究院 | Knowledge graph construction method for event prediction and event prediction method |
CN112214611A (en) * | 2020-09-24 | 2021-01-12 | 远光软件股份有限公司 | Construction system and method of enterprise knowledge graph |
CN112463984A (en) * | 2020-12-04 | 2021-03-09 | 北京明略软件系统有限公司 | Database mode expansion method, device, equipment and computer readable medium |
CN112487212A (en) * | 2020-12-18 | 2021-03-12 | 清华大学 | Method and device for constructing domain knowledge graph |
CN112527924A (en) * | 2020-12-18 | 2021-03-19 | 清华大学 | Dynamically updated knowledge graph expansion method and device |
CN112765363A (en) * | 2021-01-19 | 2021-05-07 | 昆明理工大学 | Demand map construction method for scientific and technological service demand |
CN113505245A (en) * | 2021-09-10 | 2021-10-15 | 深圳平安综合金融服务有限公司 | Knowledge graph generation method, computer readable storage medium and computer device |
CN113722509A (en) * | 2021-09-07 | 2021-11-30 | 中国人民解放军32801部队 | Knowledge graph data fusion method based on entity attribute similarity |
CN113783876A (en) * | 2021-09-13 | 2021-12-10 | 国网电子商务有限公司 | Network security situation perception method based on graph neural network and related equipment |
WO2022051996A1 (en) * | 2020-09-10 | 2022-03-17 | 西门子(中国)有限公司 | Method and apparatus for constructing knowledge graph |
WO2023040530A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Webpage content traceability method, knowledge graph construction method and related device |
CN116955639A (en) * | 2023-04-24 | 2023-10-27 | 浙商期货有限公司 | Method and device for constructing future industry chain knowledge graph and computer equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110035210A1 (en) * | 2009-08-10 | 2011-02-10 | Benjamin Rosenfeld | Conditional random fields (crf)-based relation extraction system |
CN104268197A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Industry comment data fine grain sentiment analysis method |
CN106355628A (en) * | 2015-07-16 | 2017-01-25 | 中国石油化工股份有限公司 | Image-text knowledge point marking method and device and image-text mark correcting method and system |
CN108446368A (en) * | 2018-03-15 | 2018-08-24 | 湖南工业大学 | A kind of construction method and equipment of Packaging Industry big data knowledge mapping |
-
2018
- 2018-10-23 CN CN201811236863.3A patent/CN109446341A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110035210A1 (en) * | 2009-08-10 | 2011-02-10 | Benjamin Rosenfeld | Conditional random fields (crf)-based relation extraction system |
CN104268197A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Industry comment data fine grain sentiment analysis method |
CN106355628A (en) * | 2015-07-16 | 2017-01-25 | 中国石油化工股份有限公司 | Image-text knowledge point marking method and device and image-text mark correcting method and system |
CN108446368A (en) * | 2018-03-15 | 2018-08-24 | 湖南工业大学 | A kind of construction method and equipment of Packaging Industry big data knowledge mapping |
Non-Patent Citations (2)
Title |
---|
刘绍毓 等: "基于多分类SVM_KNN的实体关系抽取方法", 《数据采集与处理》 * |
胡芳槐: "基于多种数据源的中文知识图谱构建方法研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109960810A (en) * | 2019-03-28 | 2019-07-02 | 科大讯飞(苏州)科技有限公司 | A kind of entity alignment schemes and device |
WO2020232943A1 (en) * | 2019-05-23 | 2020-11-26 | 广州市香港科大霍英东研究院 | Knowledge graph construction method for event prediction and event prediction method |
CN110298036A (en) * | 2019-06-06 | 2019-10-01 | 昆明理工大学 | A kind of online medical text symptom identification method based on part of speech increment iterative |
CN110298036B (en) * | 2019-06-06 | 2022-07-22 | 昆明理工大学 | Online medical text symptom identification method based on part-of-speech incremental iteration |
CN110275919A (en) * | 2019-06-18 | 2019-09-24 | 合肥工业大学 | Data integrating method and device |
CN110245241A (en) * | 2019-06-18 | 2019-09-17 | 卓尔智联(武汉)研究院有限公司 | Plastics knowledge mapping construction device, method and computer readable storage medium |
CN110489560A (en) * | 2019-06-19 | 2019-11-22 | 民生科技有限责任公司 | The little Wei enterprise portrait generation method and device of knowledge based graphical spectrum technology |
CN110489395A (en) * | 2019-07-27 | 2019-11-22 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Automatically the method for multi-source heterogeneous data knowledge is obtained |
CN110489395B (en) * | 2019-07-27 | 2022-07-29 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Method for automatically acquiring knowledge of multi-source heterogeneous data |
CN110597969A (en) * | 2019-08-12 | 2019-12-20 | 中国农业大学 | Agricultural knowledge intelligent question and answer method and system and electronic equipment |
CN110597969B (en) * | 2019-08-12 | 2022-05-24 | 中国农业大学 | Agricultural knowledge intelligent question and answer method and system and electronic equipment |
CN110795567A (en) * | 2019-09-29 | 2020-02-14 | 北京远舢智能科技有限公司 | Knowledge graph platform |
CN110750650A (en) * | 2019-09-30 | 2020-02-04 | 中盈优创资讯科技有限公司 | Construction method and device of enterprise knowledge graph |
CN110781249A (en) * | 2019-10-16 | 2020-02-11 | 华电国际电力股份有限公司技术服务分公司 | Knowledge graph-based multi-source data fusion method and device for thermal power plant |
CN110750647B (en) * | 2019-10-17 | 2020-07-31 | 北京华宇信息技术有限公司 | Method for constructing E L P model of multi-source heterogeneous information data |
CN110750647A (en) * | 2019-10-17 | 2020-02-04 | 北京华宇信息技术有限公司 | Construction method of ELP model of multi-source heterogeneous information data |
CN111061883B (en) * | 2019-10-25 | 2023-12-08 | 珠海格力电器股份有限公司 | Method, device, equipment and storage medium for updating knowledge graph |
CN111061883A (en) * | 2019-10-25 | 2020-04-24 | 珠海格力电器股份有限公司 | Method, device and equipment for updating knowledge graph and storage medium |
CN110990586A (en) * | 2019-12-02 | 2020-04-10 | 浪潮软件股份有限公司 | Method and device for acquiring map data |
CN111104525A (en) * | 2019-12-31 | 2020-05-05 | 西安理工大学 | Construction method of building design specification knowledge graph based on graph database |
CN111104525B (en) * | 2019-12-31 | 2022-03-25 | 西安理工大学 | Construction method of building design specification knowledge graph based on graph database |
CN111241299A (en) * | 2020-01-09 | 2020-06-05 | 重庆理工大学 | Knowledge graph automatic construction method for legal consultation and retrieval system thereof |
CN111324609A (en) * | 2020-02-17 | 2020-06-23 | 腾讯云计算(北京)有限责任公司 | Knowledge graph construction method and device, electronic equipment and storage medium |
CN111563170A (en) * | 2020-04-30 | 2020-08-21 | 北京明略软件系统有限公司 | Knowledge graph generation method and device, computer storage medium and terminal |
WO2022051996A1 (en) * | 2020-09-10 | 2022-03-17 | 西门子(中国)有限公司 | Method and apparatus for constructing knowledge graph |
CN112214611A (en) * | 2020-09-24 | 2021-01-12 | 远光软件股份有限公司 | Construction system and method of enterprise knowledge graph |
CN112214611B (en) * | 2020-09-24 | 2023-10-31 | 远光软件股份有限公司 | Enterprise knowledge graph construction system and method |
CN112463984A (en) * | 2020-12-04 | 2021-03-09 | 北京明略软件系统有限公司 | Database mode expansion method, device, equipment and computer readable medium |
CN112463984B (en) * | 2020-12-04 | 2024-02-27 | 北京明略软件系统有限公司 | Database schema extension method, device, equipment and computer readable medium |
CN112527924A (en) * | 2020-12-18 | 2021-03-19 | 清华大学 | Dynamically updated knowledge graph expansion method and device |
CN112487212A (en) * | 2020-12-18 | 2021-03-12 | 清华大学 | Method and device for constructing domain knowledge graph |
CN112765363A (en) * | 2021-01-19 | 2021-05-07 | 昆明理工大学 | Demand map construction method for scientific and technological service demand |
CN113722509A (en) * | 2021-09-07 | 2021-11-30 | 中国人民解放军32801部队 | Knowledge graph data fusion method based on entity attribute similarity |
CN113505245A (en) * | 2021-09-10 | 2021-10-15 | 深圳平安综合金融服务有限公司 | Knowledge graph generation method, computer readable storage medium and computer device |
CN113783876A (en) * | 2021-09-13 | 2021-12-10 | 国网电子商务有限公司 | Network security situation perception method based on graph neural network and related equipment |
CN113783876B (en) * | 2021-09-13 | 2023-10-03 | 国网数字科技控股有限公司 | Network security situation awareness method based on graph neural network and related equipment |
WO2023040530A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Webpage content traceability method, knowledge graph construction method and related device |
CN116955639A (en) * | 2023-04-24 | 2023-10-27 | 浙商期货有限公司 | Method and device for constructing future industry chain knowledge graph and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446341A (en) | The construction method and device of knowledge mapping | |
US11790006B2 (en) | Natural language question answering systems | |
US11442932B2 (en) | Mapping natural language to queries using a query grammar | |
CN110633409B (en) | Automobile news event extraction method integrating rules and deep learning | |
CN106447066A (en) | Big data feature extraction method and device | |
CN105843897A (en) | Vertical domain-oriented intelligent question and answer system | |
US20150006528A1 (en) | Hierarchical data structure of documents | |
JP7486250B2 (en) | Domain-specific language interpreter and interactive visual interface for rapid screening | |
CN108874783A (en) | Power information O&M knowledge model construction method | |
CN112434024B (en) | Relational database-oriented data dictionary generation method, device, equipment and medium | |
Rajput et al. | BNOSA: A Bayesian network and ontology based semantic annotation framework | |
CN112925901B (en) | Evaluation resource recommendation method for assisting online questionnaire evaluation and application thereof | |
Holzinger et al. | Using ontologies for extracting product features from web pages | |
US20220129635A1 (en) | Semantic model instantiation method, system and apparatus | |
US20230325384A1 (en) | Interactive assistance for executing natural language queries to data sets | |
CN117312989A (en) | Context-aware column semantic recognition method and system based on GCN and RoBERTa | |
US20210271637A1 (en) | Creating descriptors for business analytics applications | |
JP2023517518A (en) | Vector embedding model for relational tables with null or equivalent values | |
CN114429384B (en) | Intelligent product recommendation method and system based on e-commerce platform | |
CN115982322A (en) | Water conservancy industry design field knowledge graph retrieval method and retrieval system | |
CN113515630B (en) | Triplet generation and verification method and device, electronic equipment and storage medium | |
CN113379432B (en) | Sales system customer matching method based on machine learning | |
CN114911940A (en) | Text emotion recognition method and device, electronic equipment and storage medium | |
CN113344674A (en) | Product recommendation method, device, equipment and storage medium based on user purchasing power | |
CN110930189A (en) | Personalized marketing method based on user behaviors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190308 |