CN109597855A - Domain knowledge map construction method and system based on big data driving - Google Patents
Domain knowledge map construction method and system based on big data driving Download PDFInfo
- Publication number
- CN109597855A CN109597855A CN201811447248.7A CN201811447248A CN109597855A CN 109597855 A CN109597855 A CN 109597855A CN 201811447248 A CN201811447248 A CN 201811447248A CN 109597855 A CN109597855 A CN 109597855A
- Authority
- CN
- China
- Prior art keywords
- data
- entity
- information
- knowledge
- map construction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of domain knowledge map construction method and system based on big data driving, wherein method includes the following steps: crawling the data source in network, and obtains the first data information;Data information extraction is carried out to data source, to extract the related information between entity;Knowledge fusion, and opening relationships type database are carried out to the related information between entity;Relevant database is converted to chart database model, to construct knowledge mapping.This method can provide stringent and data pattern abundant, assist the analysis application or decision support of various complexity, and accuracy is high, with guiding value and have industry meaning in knowledge mapping actual implementation process.
Description
Technical field
The present invention relates to technical field of information processing, in particular to a kind of domain knowledge map structure based on big data driving
Construction method and system.
Background technique
Domain knowledge map is structure from the semantic relation extracted between entity and entity in the specific resources of specific area
The semantic network built, the knowledge hierarchy that it includes usually have very strong field specific aim and professional.But at present both at home and abroad
The patent achievement of domain knowledge map construction highlight in isolation knowledge mapping building link in a certain respect, be all mainly about
The key technology of natural language processing part in knowledge mapping, including Entity recognition, relation recognition, entity link, knowledge fusion,
Knowledge calculates etc., and data in such as knowledge mapping indicate, storage format or knowledge acquisition method and the problems such as model;Another
Problem is that knowledge is made of data, and the building of knowledge mapping needs the support of big data platform, but fresh refer to less is known in studying
Know the big data process flow of map construction process, lacks the guiding value to knowledge mapping actual implementation process.
Knowledge mapping belongs to semantic net scope as a kind of new knowledge representation method, and target is description real world
Present in incidence relation between various entities and concept and these entities, concept.It can be divided into according to covering surface and general know
Know map and domain knowledge map.Current announced knowledge mapping is substantially world knowledge map, it emphasizes range, mainly
It is not very high to accuracy requirement applied to business such as search.
For example, (1) discloses a kind of construction method of knowledge mapping based on vertical field in the related technology, including extract
The vocabulary of the class of online encyclopaedia realizes the information merger of hyponymy, domain knowledge between class, the data attribute in field
With the definition of attribute of a relation, it is finally completed the study of physical layer;(2) a kind of knowledge based point connection pass is disclosed in the related technology
The knowledge mapping construction method of system constructs the knowledge point database being made of meta-knoeledge point by obtaining meta-knoeledge point;According to religion
A contents selection meta-knoeledge point for a characterization of gaining knowledge, and there are the rudimentary knowledge of dependence points to combine with meta-knoeledge point;Really
Determine path length of each meta-knoeledge o'clock relative to the first meta-knoeledge point in the combination of rudimentary knowledge point;According to dependence level and road
Electrical path length constructs knowledge mapping.(3) a kind of Chinese tour field knowledge mapping construction method is disclosed in the related technology and is
System is integrated with using a kind of hybrid-type entity attribute knowledge expansion method based on lexical field, supervised learning, pattern match, with
And the entity attribute knowledge expansion algorithm of search engine question and answer is to realize that tour field knowledge mapping constructs task.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, an object of the present invention is to provide a kind of domain knowledge map construction sides based on big data driving
Method.This method can provide stringent and data pattern abundant, assist the analysis application or decision support of various complexity, and accurate
Degree is high, with guiding value and has industry meaning in knowledge mapping actual implementation process.
It is another object of the present invention to propose a kind of domain knowledge map construction system based on big data driving.
In order to achieve the above objectives, one aspect of the present invention proposes the domain knowledge map construction side based on big data driving
Method, comprising the following steps: crawl the data source in network, and obtain the first data information;Data letter is carried out to the data source
Breath extracts, to extract the related information between entity;Knowledge fusion is carried out to the related information between the entity, and establishes pass
It is type database;The relevant database is converted to chart database model, to construct knowledge mapping.
The domain knowledge map construction method based on big data driving of the embodiment of the present invention emphasizes that knowledge mapping constructs ring
Each link in section provides actual techniques guidance for the building of domain knowledge map, to construct accuracy height, data mould
Formula is abundant stringent, can assist the domain knowledge map of complicated analysis and decision support, and building process has guiding value
And there is industry meaning, there is prior meaning for actual production life.
In addition, the domain knowledge map construction method according to the above embodiment of the present invention based on big data driving can be with
With following additional technical characteristic:
Further, in one embodiment of the invention, the data source includes structural data, semi-structured data
And unstructured data.
Further, in one embodiment of the invention, described that information extraction is carried out to the data source, comprising: right
The data source extracts entity, relationship and entity attribute structured message from semi-structured and non-structural data, to obtain
The related information.
Further, in one embodiment of the invention, the related information between the entity carries out knowledge
Fusion, comprising: information characteristics are extracted according to the related information between the entity, to eliminate concept ambiguity, strip redundancy and mistake
Accidentally concept;Entity link is carried out to the information characteristics, to obtain relational data.
It is further, in one embodiment of the invention, described that entity link is carried out to the information characteristics, comprising:
The information characteristics are linked to corresponding correct entity object in knowledge base.
Further, in one embodiment of the invention, the related information between the entity carries out knowledge
Fusion, and opening relationships type database, further includes: extract entity and censure item;Censuring detection entity of the same name according to the entity is
It is no to indicate different meanings and indicate identical meanings with the presence or absence of other names entity, to carry out entity disambiguation and coreference resolution;
Confirm in the knowledge base after corresponding entity object, entity denotion item is linked to the entity object.
Further, in one embodiment of the invention, further includes: after preset duration, the data source is crawled, and
Obtain the second data information;Judge whether first data information changes according to second data information;If institute
It states the first data information to change, then obtains change data, and the change data are converted to the chart database model,
To be incorporated to the knowledge mapping.
In order to achieve the above objectives, another aspect of the present invention proposes a kind of domain knowledge map structure based on big data driving
Build system, comprising: acquisition module for crawling the data source in network, and obtains the first data information;Processing module is used for
Data information extraction is carried out to the data source, to extract the related information between entity;Memory module, for the entity
Between related information carry out knowledge fusion, and opening relationships type database;Module is constructed, is used for the relevant database
It is converted to chart database model, to construct knowledge mapping.
The domain knowledge map construction system based on big data driving of the embodiment of the present invention emphasizes that knowledge mapping constructs ring
Each link in section provides actual techniques guidance for the building of domain knowledge map, to construct accuracy height, data mould
Formula is abundant stringent, can assist the domain knowledge map of complicated analysis and decision support, and building process has guiding value
And there is industry meaning, there is prior meaning for actual production life.
In addition, the domain knowledge map construction system according to the above embodiment of the present invention based on big data driving can be with
With following additional technical characteristic:
Further, in one embodiment of the invention, the data source includes structural data, semi-structured data
And unstructured data.
Further, in one embodiment of the invention, further includes: update module is used for after preset duration, again
It crawls the data source and obtains the second data information, judge whether first data source changes according to second data source,
If data change, change data are converted to the chart database model, to be incorporated in the knowledge mapping.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, in which:
Fig. 1 is the knowledge mapping formal definitions frame diagram of one embodiment of the invention;
Fig. 2 is the domain knowledge map construction method flow diagram based on big data driving of one embodiment of the invention;
Fig. 3 is the relational data mode of one embodiment of the invention to chart database mode conversion process figure;
Fig. 4 is the domain knowledge map construction frame diagram based on big data driving of one embodiment of the invention;
Fig. 5 is retrieval flow of the specific embodiment of the invention with " Facial Recognition " in wikipedia
Figure;
Fig. 6 is the data update flow chart of another specific embodiment of the invention;
Fig. 7 is the domain knowledge map construction system structure signal based on big data driving of one embodiment of the invention
Figure.
Specific embodiment
The embodiment of the present invention is described below in detail, the example of embodiment is shown in the accompanying drawings, wherein identical from beginning to end
Or similar label indicates same or similar element or element with the same or similar functions.It is retouched below with reference to attached drawing
The embodiment stated is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Firstly, the formal definitions of knowledge mapping are as follows: logically knowledge mapping can be divided into data Layer and mode layer
Two levels.In the data Layer of knowledge mapping, knowledge is stored in chart database as unit of the fact, with " entity-relation-
Entity " or " entity-attribute-attribute value " triple are stored in institute in chart database as true primary expression mode
There is the true huge entity relationship network constituted to be formed knowledge mapping.Mode layer is knowledge mapping on data Layer
Core.What it is in mode layer storage is knowledge by refinement, generallys use the mode layer that ontology library carrys out managerial knowledge map, borrows
Ontology library is helped to come type and the attribute etc. of Specification entity, relationship and entity to the tenability of axiom, rule and constraint condition
Connection between object.
Therefore, the embodiment of the present invention proposes the definition mode of knowledge mapping are as follows: knowledge mapping G is by ideograph Gs, datagram Gd
And relationship R composition between the two, formula (1) can be expressed as.
G=< Gs,Gd, R > (1)
Gs=< Ns,Es> (2)
Gd=< Nd,Ed> (3)
As shown in Figure 1, ideograph GsBy NsAnd EsComposition, is represented by formula (2).Wherein, NsIndicate the collection of class node
It closes, EsIndicate the set on attribute side, ideograph GsIn class (node) be concept in knowledge mapping, and attribute (side) is then right
Answer the semantic relation between concept.Datagram GdBy NdAnd EdComposition, is represented by formula (3), wherein NdIndicate example (knot
Point) it is the entity present in reality, E in knowledge mappingdIndicate that example relationship (side) connects one that two nodes indicate
Triple is true.
The domain knowledge map based on big data driving proposed according to embodiments of the present invention referring next to attached drawing description
Construction method and system, the field based on big data driving for describing to propose according to embodiments of the present invention first with reference to the accompanying drawings are known
Know map construction method.
Fig. 2 is the domain knowledge map construction method flow diagram based on big data driving of one embodiment of the invention.
As shown in Fig. 2, the domain knowledge map construction method that should be driven based on big data the following steps are included:
In step s101, the data source in network is crawled, and obtains the first data information.
Wherein, data source includes structural data, semi-structured data and unstructured data.
Specifically, for structural data, a large amount of link opens data and be stored in the field in relational database
Knowledge.For semi-structured data, the message box for thering are the encyclopaedias websites such as wikipedia, interaction encyclopaedia, Baidupedia to provide
(Infobox) a large amount of tables, the table data etc. that the vertical website and under different field includes.Unstructured data refers to net
A large amount of plain text content in network data, knowledge coverage is most wide, but it is also maximum to extract difficulty, it usually needs uses nature language
Speech handles (Natural Language Processing, NLP), and technology is pre-processed, including participle, part-of-speech tagging, life
Name Entity recognition and syntactic analysis;Then knowledge is obtained by technologies such as statistical analysis, machine learning.Knowledge mapping
It is most of from Internet resources to construct data source, needs to obtain by crawler.
In step s 102, data information extraction is carried out to data source, to extract the related information between entity.
Further, carrying out information extraction to data source includes: to take out from semi-structured and non-structural data to data source
Entity, relationship and entity attribute structured message are taken, to obtain related information.
Specifically, information extraction is the first step of knowledge mapping building, it is one kind automatically from semi-structured and non-
The structured message technology of entity extraction, Relation extraction and entity attribute extraction etc. is carried out in structured data.
Entity extracts, i.e. name Entity recognition, refers to that automatically identifying name entity entity from text data concentration extracts
Quality the efficiency of subsequent knowledge acquisition and quality are influenced greatly, therefore be basis and crucial portion the most in information extraction
Point.
Relation extraction, corpus of text are extracted by entity, and what is obtained is the name entity of series of discrete, in order to obtain language
Adopted information, it is also necessary to extract the incidence relation between entity from related corpus, be contacted entity (concept) by relationship
Come, it can the webbed structure of knowledge of shape.
The target that entity attribute extracts is the attribute information that special entity is acquired from different aforementioned sources.Such as some
Public figure can obtain the information such as its pet name, birthday, nationality, education background from network public information.Attribute extraction technology
These information can be collected from a variety of data sources, realize completely delineating to entity attribute.
It should be noted that entity extracts and Relation extraction is mainly realized according to machine learning model, attribute extraction is then led
The semi-structured data of message box (infobox) etc is similar on network.
In step s 103, knowledge fusion, and opening relationships type database are carried out to the related information between entity.
Further, in one embodiment of the invention, carrying out knowledge fusion to the related information between entity includes:
Information characteristics are extracted according to the related information between entity, to eliminate concept ambiguity, strip redundancy and erroneous picture;To information spy
Sign carries out entity link, to obtain relational data.
In addition, in step s 103 further include: extract entity and censure item;Censured according to entity detect entity of the same name whether table
Show different meanings and indicate identical meanings with the presence or absence of other names entity, to carry out entity disambiguation and coreference resolution;Confirmation
In knowledge base after corresponding entity object, entity denotion item is linked to entity object.
Entity is obtained from unstructured and semi-structured data, is closed it should be noted that being realized by information extraction
The target of system and entity attribute information, however may include a large amount of redundancy and error message in these results, between data
Relationship be also flattening, lack hierarchy and logicality, it is therefore necessary to be cleared up it and be integrated.Melted by knowledge
It closes, the ambiguity of concept can be eliminated, redundancy and erroneous picture are rejected, so that it is guaranteed that the quality of knowledge.Wherein, knowledge fusion packet
Include: entity link and knowledge merge.
Entity link refers to the entity object for extracting from text, be linked in knowledge base it is corresponding just
The operation of true entity object, general flow is: extracting to obtain entity denotion item by entity from text;Then entity is carried out to disappear
Discrimination and coreference resolution, whether the entity of the same name in judgemental knowledge library represents therewith whether there is in different meaning and knowledge base
Other names entity indicates identical meaning therewith;Finally confirmed again in knowledge base after corresponding correct entity object, by this
Entity censures item and is linked to correspondent entity in knowledge base.Therefore, task mostly important in entity link step is building one
Thesaurus a accurate and abundant.
Knowledge merges when constructing knowledge mapping, can know from third party's knowledge base product or existing structure data acquisition
Know input.For example, it can be regularly published by accumulation and the semantic knowledge data arranged etc. by being associated with open data items.
In step S104, relevant database is converted to chart database model, to construct knowledge mapping.
Wherein, to information characteristics carry out entity link include: by information characteristics be linked in knowledge base it is corresponding just really
Body object.
Specifically, as shown in figure 3, based on the obtained relational data mode of pre-treatment (including entity and entity close
System, entity attribute and entity property value) complete conversion to chart database mode, wherein and relevant database is converted into figure number
According to library mode, generally follows following principle and is converted:
(1) each node label is indicated with the table name of entity table, i.e., using the table name of entity table as node label name.Example
Such as, tables of data is entitled " enterprise ", then establishes the node type that label is " enterprise ".
(2) every a line in entity table corresponds to a node, and every a line can be fully described by one in relation database table
A entity and its attribute value, while can determine the globally unique identifier of node.
(3) column on relation table become nodal community, and in data line, in addition to unique mark, remaining field is all to node
Supplement and explanation, therefore be used as nodal community.
(4) table for describing incidence relation between entity is converted into relationship, and the column on these tables become attribute of a relation.
The structural relation of its external key is directed toward between relation table from a major key, is being exactly relationship between node in chart database, therefore
Column on table are converted into the attribute of relationship.
Further, the domain knowledge map construction method of the embodiment of the present invention further include: after preset duration, crawl number
According to source, and obtain the second data information;Judge whether the first data information changes according to the second data information;If first
Data information changes, then obtains change data, and change data are converted to chart database model and are incorporated to knowledge mapping.
It should be noted that information and knowledge quantity that the mankind are possessed all are the monotonically increasing functions of time, therefore knowledge graph
The content of spectrum is also required to grow with each passing hour, and building process is the process that a continuous iteration updates.Logically see, knowledge base
Update includes the update of conceptual level and the update of data Layer.The update of conceptual level obtains new concept after referring to newly-increased data,
It needs automatically to be added in the conceptual level of knowledge base and (need manually to participate in carry out with after audit) new concept.Data Layer
Update is mainly newly-increased or more novel entities, relationship and attribute value, data Layer is updated and needs to consider the reliable of data source
Property, data many factors such as consistency the problems such as (whether there is contradiction or redundancy), compared to the update of conceptual level, data
Layer updates to be completed in an automated manner, needs to handle it using three information extraction, knowledge fusion and map construction processes
After can be put in storage.
The embodiment of the present invention is carried out below with reference to big data the relevant technologies needed for building knowledge mapping process detailed
Explanation.
As shown in figure 4, required big data the relevant technologies may include: data acquisition subsystem, data process subsystem,
Data storage subsystem and data update subsystem.
(1) data acquisition subsystem: initial data is acquired from industrial sustainability, third party database, Web log and is imported
In HDFS format result file.In addition to acquiring initial data during this, the one kind that can also be proposed by this patent is based on hundred
The synonymous entity extending method of class website crawler constructs thesaurus, to realize the entity link during knowledge fusion.
Specifically, the embodiment of the present invention can realize crawler based on PythonScrapy frame, network data is obtained, is led to
It crosses Sqoop and importing event is packaged into a MapReduce task, Hadoop distributed environment is committed to, concurrently from data
Source obtains data, ultimately generates HDFS format result file.
In the process in addition to the acquisition of completion initial data, the synonymous entity based on encyclopaedia class website crawler can also be passed through
Extending method constructs a thesaurus accurate and abundant, to realize the entity link in knowledge fusion.The specific method is as follows:
In a network using entity E as initial retrieval word, setting search depth is N, the number of iterations M, indicates to examine from preceding N item
Crawl in hitch fruit from the encyclopaedias class website such as wikipedia, Baidupedia or MBA think tank, by " alias " in the page or
" recommending associative key " is added to thesaurus and retrieval dictionary, and is crawled in next round, it is assumed that it is added to E1, E2,
E3 ... En is then retrieved again with E1, E2, E3 ... En for keyword, stopping when the number of iterations is reduced to 0.Finally
E set then be the entity thesaurus.
For example, recommendation is related to close as shown in figure 5, retrieving " Facial Recognition " in English edition wikipedia
Keyword part is as shown in step1;It is retrieved with recommended keywords all in step1, we are with " Face Detection "
Example is retrieved, and is recommended shown in the step2 as follows of associative key part;And so on, with " Computational
Photography " is that keyword is retrieved, and recommends associative key part as shown in step3.
Concrete implementation process of the embodiment of the present invention is expressed as follows using pseudocode: entityWordList is same as initial solid
Adopted dictionary, entitySearchWordList is initial retrieval dictionary, and using entity E as initial retrieval word, search depth is arranged
SearchDepth is N, and the number of iterations searchTimes is M, calls SearchCommonEntity method, it is first determined whether
Continue iteration, if so, traversing to entitySearchWordList, entity thesaurus first is added in term, then
Using the word as term, the acquisition of searchSpider function and all search results of the word are called, is recalled
GetEncyclopedia further screens obtained URL, leaves behind url link relevant with encyclopaedia class, and to url
Link is traversed, and getRelatedWords is called to obtain the association vocabulary in the page, if not in temporary retrieval word list
In tempSearchWordList, then add.After the completion of next iteration, thesaurus, institute can be all added in all terms
Retrieval dictionary can be all added in the temporary term having, and the meaning done so is to avoid repeatedly crawling the same word.
(2) data process subsystem: in most cases, raw data acquisition is completed to ask after entering HDFS there are many
Topic, needs to pre-process data.And in this step can based on Spark call machine learning model complete entity,
The conversion from non-(partly) structural data to structural data is completed in the extraction of relationship, attribute.
The data processing of the embodiment of the present invention is mainly based upon Hive and completes data prediction and completed based on Spark non-
Conversion of (partly) structuring to structural data.
Wherein, Hive is the architecture of a data warehouse based on Hadoop, and one kind can store, inquires and analyze
The mechanism of large-scale data in HDFS.It can be used to carry out mass data extraction, conversion, load (ETL).Hive is defined
Simple class SQL query language (HQL) ultimately generates a series of MapReduce and appoints by being parsed and being converted to sentence
Business carries out data processing, provides table lookup characteristic and distributed storage of the user part as conventional RD BMS and calculates spy
Property.
Apache Spark be big data processing platform of new generation popular after Hadoop and one quickly
General computing engines, have been widely used at present.Due to the improvement in design, in terms of memory operation, Spark ratio
MapReduce is nearly 100 times fast, is then 10 times of MapReduce based on hard disk operation, therefore Spark can preferably be suitable for needing
Want the scene of Map Reduce iteration, such as data mining and machine learning etc..The Spark ecosphere contain SparkCore,
The components such as Spark SQL, Spark Streaming, MLLib and Graph X, these components complement each other, and formation one is powerful
One-stop big data processing platform.
(1) data prediction is completed based on Hive
In most cases, raw data acquisition is completed after entering HDFS, all there is problems, such as data field
The problems such as lacking, including mistake or exceptional value, coding or name difference, it is therefore desirable to data are pre-processed, it will be original
Input data is converted into being suitble to the data mode of analysis.Data prediction is generally divided into three steps: data decimation, tables of data category
Property unification, data scrubbing.
Data decimation: select that user is interested, appoints with Knowledge Discovery by user is specified from the raw data base of user
It is engaged in relevant data table items.Because the data bulk in database is huge, covering scope is also relatively extensive.Some data forms
In data be fundamentally not in contact with.If not carrying out simple screening to database, hash can be made to participate in excavating
Process causes the waste in various resources.Generally take the mode of man-computer cooperation.Data on the higher concept hierarchy of artificial selection
Classification selects specific data form and data column in database by good program prepared in advance.
Data Table Properties unification: when tables of data to be excavated, which has been chosen, to be finished, we start to these tables of data
In data excavated before pretreatment.As the preparation before excavation, the difference according to thesaurus to same entity is needed
Name indicates to be cleared up and integrated to carry out unification, obtains that one unified, clearly data indicate.This step is corresponding
Entity link in knowledge mapping building process.
It should be noted that sometimes the attribute value of same attribute is possible to using different linear modules, as student at
Achievement generally indicates with hundred-mark system, but also uses the five-grade marking system sometimes, or the judge mark that " excellent, good, pass, is poor " etc. obscures
Standard, we can according to need to determine a standard, and stipulated that a conversion regime, is converted to standard for non-standard expression
It indicates.All change needs are recorded, and standby access in the future or data need when updating.
Data scrubbing: after the completion of the step of front two, the frame and specification in mining data library are had determined.It below will be to it
In data specifically handled, mainly solved the problems, such as: vacancy value, wrong data, noise data and isolated point.
A. processing vacancy value can take following several method: 1. ignoring, when multiple attribute value vacancies of a tuple, lead to
Often ignore it, i.e., is deleted in data form;2. filling up, lack when tuple only has a small number of attribute values, generally will to vacancy value into
Row is filled up.Mode there are many filling up, is manually filled up, the average value under global constant and affiliated attribute.It can also be under the attribute
Data application derive tool (decision tree etc.), most probable Filling power is obtained by the analysis to other numerical value.For not
With the vacancy value under attribute, it would be desirable to different processing methods.It has been generally acknowledged that application derives the numerical value of tool analysis out more
Add reliable and with practical value.
B. wrong data is handled.It first has to that the tuple with wrong data can be told, then determines it is to change data also
It is to ignore tuple.Usually when defining data dictionary, there is a basic regulation to data.It is in the real world on this
Things has the constraint of its own, the entity of data direct bearing in database.For example, student examination score is between 0~100
A real number (other representations converted should also meet this requirement).Here it is under " student performance " attribute
One constraint, if there is the value under the attribute of which tuple jumps out this range, then this is a wrong data.Certainly
Not all constraint is all so simple, but can always find out a function as constraint function.This function is possible to
It is that attribute itself is relevant, it is also possible to be that more attributes are relevant.
C. noise data is handled.Noise data, including isolated point.For variable measurement always there are deviation, this
A little deviations are exactly noise, are exactly isolated point if deviation is larger.The technology of usual treatment deviation is known as smoothing technique.Specifically have
Following method: data are averagely divided into several casees by 1. branch mailbox (Binning), are carried out to the numerical value in each chest
Conversion, can be converted to the average value of all numerical value, intermediate value or boundary value in case.After conversion, the variation range of numerical value is with regard to phase
It should reduce.In fact, this is a kind of mode of Data Discretization.2. clustering (Clustering), cluster eliminates noise, together
When can be found that isolated point, clustering has corresponding special technology, does not repeat here.3. returning (Regression), linearly
It returns and multilinear regression analysis can be applied in the elimination of noise.
(2) non-(partly) structural data is converted into structural data
Unstructured data refers to a large amount of plain text content in network data, and knowledge coverage is most wide, but extracts hardly possible
Degree is also maximum, it usually needs is handled using natural language processing technique.Unstructured data is only completed to structuring number
According to conversion, could be completed from relevant database to the conversion of chart database mode and construct map.
In the related technology, model and the general stand-alone development of application project, by providing model encapsulation at RESTFUL API
Service.But there are problems that two in big data environment, first is that time-consuming for single machine processing in big data quantity;Second is that different
Step task execution time-consuming can be more than that HTTP maximum connects duration, it is difficult to monitoring data processing status and abnormal feelings in the process
Condition.
However, the embodiment of the present invention is model calling and calculation based on Spark, there is following two mode, first is that
It is developed based on Spark MLlib, Spark MLlib is the included machine learning algorithm library Spark, contains a large amount of point
Class, recurrence, cluster, dimensionality reduction scheduling algorithm.Such as classified using random forest, the enforcement engine of system is according to the node of process
Information instantiates the RandomForestClassifier object with relevant parameter, call fit method to the data of input into
Row fitting, generates corresponding Model object, is then saved Model sequence by intermediate data management module, confession is subsequent
Prediction or checking assembly use.By this method, it can guarantee the quality of each learning algorithm, and can be with the community Spark
It is synchronous, quickly add new algorithm assembly.Second is that carrying out model development based on other language such as Python, R, it is with Python
Example can submit task by pyspark, even single machine executes, but because Spark loads data the reason of the memory
It can be more many fastly than traditional executive mode.Executing time-consuming as asynchronous task can be more than the problem of HTTP maximum connects duration, can be with
Koa frame by application project frame Middleware implementation, such as under Node environment just can solve asynchronous task processing
And abnormal monitoring problem.
(3) data storage subsystem: data storage is mainly responsible for storage acquisition, calculates, in building and renewal process
Source data and result data.In the different phase of knowledge mapping building, data memory format also different from, the figure ultimately generated
Spectrum is stored in Neo4j and HBase.
Data storage is mainly responsible for storage acquisition, calculating, building and source data and result data in renewal process.It lifts
Example for, as shown in table 1, according to data property we data can be divided into four grades: the first order is that crawler acquired
The initial data come, is mainly stored in HDFS file system;Second rank is after data processing and knowledge fusion
Triple data, including " entity-relationship-entity ", " entity-attribute-attribute value ", " relationship-attribute-attribute value " etc., these
Data are stored in HBase;The third level is the enterprise's spectrum data built, is stored in Neo4j chart database and HBase data
Inside library;The fourth stage is map more new data, including updating type and content triple, is stored in HBase database.Entity
Attribute can be divided into static attribute and dynamic attribute.Wherein static attribute is primarily referred to as seldom changing, and important, differentiation
Spend high attribute, such as enterprise name, organization mechanism code, stock code etc..Dynamic attribute refer to often change and be not must
The attribute, such as change record, bidding record, intellectual property, recruitment information etc. of palpus.Static attribute is attached directly to entity simultaneously
It is stored in Neo4j database;Dynamic attribute is mainly stored in HBase database and is quoted by the unique identification of entity.
Table 1
(4) data update subsystem: the incremental update in order to realize knowledge mapping, need automatic by scheduling timing
Data acquisition, processing and storage work.
Specifically, the incremental update in order to realize knowledge mapping, need to start by scheduling system crawler operation and
Thesaurus crawls, pre-process to data based on Hive after data loading and call with model, and will more new data
It is incorporated to map.One reliable scheduling system is vital for the sound and stable operation of whole system.For safeguards system work
The performance of industry allows people preferably efficiently to complete the job task of plan, it is ensured that the task of plan is accurately and timely
It is executed, the finger daemon cron under Unix&Linux undoubtedly becomes optimal selection.The effect of the finger daemon is exactly fixed
When execution crontab file in user's the specifying of the task.
It should be noted that during crawling webpage url be it is metastable, web page contents are subject to variation, and climb
The rapid identification that worm can realize whether web page contents change by comparing the MD5 digest value of content is grabbed twice.
The domain knowledge map construction method based on big data driving proposed according to embodiments of the present invention, emphasizes knowledge graph
Each link in spectrum building link provides actual techniques guidance for the building of domain knowledge map, to construct accuracy
High, data pattern enriches the domain knowledge map that can strictly assist complicated analysis and decision support, and building process has
Guiding value and have industry meaning, for actual production life have prior meaning.
The domain knowledge map structure based on big data driving proposed according to embodiments of the present invention referring next to attached drawing description
Build system.
Fig. 7 is the domain knowledge map construction system structure signal based on big data driving of one embodiment of the invention
Figure.
As shown in fig. 7, should include: acquisition module 100, place based on the domain knowledge map construction system 10 that big data drives
Manage module 200, memory module 300, building module 400 and update module 500.
Wherein, acquisition module 100 is used to crawl the data source in network, and obtains the first data information.Processing module 200
For carrying out data information extraction to data source, to extract the related information between entity.Memory module 300 be used for entity it
Between related information carry out knowledge fusion, and opening relationships type database.Building module 400 is for converting relevant database
At chart database model, to construct knowledge mapping.Update module 500 is used for after preset duration, crawls data source acquisition again
Second data information judges whether the first data source changes according to the second data source, will change data conversion if data change
At chart database model, to be incorporated in knowledge mapping.
It should be noted that aforementioned explaining to the domain knowledge map construction embodiment of the method driven based on big data
Bright to be also applied for the system, details are not described herein again.
The domain knowledge map construction system based on big data driving proposed according to embodiments of the present invention, emphasizes knowledge graph
Each link in spectrum building link provides actual techniques guidance for the building of domain knowledge map, to construct accuracy
High, data pattern enriches the domain knowledge map that can strictly assist complicated analysis and decision support, and building process has
Guiding value and have industry meaning, for actual production life have prior meaning.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
In the present invention unless specifically defined or limited otherwise, term " installation ", " connected ", " connection ", " fixation " etc.
Term shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or integral;It can be mechanical connect
It connects, is also possible to be electrically connected;It can be directly connected, can also can be in two elements indirectly connected through an intermediary
The interaction relationship of the connection in portion or two elements, unless otherwise restricted clearly.For those of ordinary skill in the art
For, the specific meanings of the above terms in the present invention can be understood according to specific conditions.
In the present invention unless specifically defined or limited otherwise, fisrt feature in the second feature " on " or " down " can be with
It is that the first and second features directly contact or the first and second features pass through intermediary mediate contact.Moreover, fisrt feature exists
Second feature " on ", " top " and " above " but fisrt feature be directly above or diagonally above the second feature, or be merely representative of
First feature horizontal height is higher than second feature.Fisrt feature can be under the second feature " below ", " below " and " below "
One feature is directly under or diagonally below the second feature, or is merely representative of first feature horizontal height less than second feature.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, modifies, replacement and variant.
Claims (10)
1. a kind of domain knowledge map construction method based on big data driving, which comprises the following steps:
The data source in network is crawled, and obtains the first data information;
Data information extraction is carried out to the data source, to extract the related information between entity;
Knowledge fusion, and opening relationships type database are carried out to the related information between the entity;And
The relevant database is converted to chart database model, to construct knowledge mapping.
2. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that described
Data source includes structural data, semi-structured data and unstructured data.
3. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that described
Information extraction is carried out to the data source, comprising:
Entity, relationship and entity attribute structured message are extracted from semi-structured and non-structural data to the data source,
To obtain the related information.
4. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that described
Knowledge fusion is carried out to the related information between the entity, comprising:
Information characteristics are extracted according to the related information between the entity, to eliminate concept ambiguity, strip redundancy and erroneous picture;
Entity link is carried out to the information characteristics, to obtain relational data.
5. the domain knowledge map construction method according to claim 4 based on big data driving, which is characterized in that described
Entity link is carried out to the information characteristics, comprising:
The information characteristics are linked to corresponding correct entity object in knowledge base.
6. the domain knowledge map construction method according to claim 5 based on big data driving, which is characterized in that described
Knowledge fusion, and opening relationships type database are carried out to the related information between the entity, further includes:
It extracts entity and censures item;
Detection entity of the same name, which is censured, according to the entity indicates whether different meanings and with the presence or absence of the expression of other names entity
Identical meanings, to carry out entity disambiguation and coreference resolution;
Confirm in the knowledge base after corresponding entity object, entity denotion item is linked to the entity object.
7. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that also wrap
It includes:
After preset duration, the data source is crawled, and obtain the second data information;
Judge whether first data information changes according to second data information;
If first data information changes, change data are obtained, and the change data are converted to the figure
Database model, to be incorporated to the knowledge mapping.
8. a kind of domain knowledge map construction system based on big data driving characterized by comprising
Acquisition module for crawling the data source in network, and obtains the first data information;
Processing module, for carrying out data information extraction to the data source, to extract the related information between entity;
Memory module, for carrying out knowledge fusion, and opening relationships type database to the related information between the entity;And
Module is constructed, for the relevant database to be converted to chart database model, to construct knowledge mapping.
9. the domain knowledge map construction system according to claim 8 based on big data driving, which is characterized in that described
Data source includes structural data, semi-structured data and unstructured data.
10. the domain knowledge map construction system according to claim 8 based on big data driving, which is characterized in that also
Include:
Update module obtains the second data information for after preset duration, crawling the data source again, according to described second
Data source judges whether first data source changes, if data change, change data are converted to the chart database mould
Type, to be incorporated in the knowledge mapping.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811447248.7A CN109597855A (en) | 2018-11-29 | 2018-11-29 | Domain knowledge map construction method and system based on big data driving |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811447248.7A CN109597855A (en) | 2018-11-29 | 2018-11-29 | Domain knowledge map construction method and system based on big data driving |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109597855A true CN109597855A (en) | 2019-04-09 |
Family
ID=65959274
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811447248.7A Pending CN109597855A (en) | 2018-11-29 | 2018-11-29 | Domain knowledge map construction method and system based on big data driving |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109597855A (en) |
Cited By (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110147579A (en) * | 2019-04-19 | 2019-08-20 | 浙江省建工集团有限责任公司 | A kind of Building Information Model dynamic cooperation management method |
CN110188207A (en) * | 2019-05-15 | 2019-08-30 | 出门问问信息科技有限公司 | Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment |
CN110187678A (en) * | 2019-04-19 | 2019-08-30 | 广东省智能制造研究所 | A kind of storage of manufacturing industry process equipment information and digitlization application system |
CN110197280A (en) * | 2019-05-20 | 2019-09-03 | 中国银行股份有限公司 | A kind of knowledge mapping construction method, apparatus and system |
CN110222199A (en) * | 2019-06-20 | 2019-09-10 | 青岛大学 | A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles |
CN110232130A (en) * | 2019-05-20 | 2019-09-13 | 平安科技(深圳)有限公司 | Metadata management pedigree generation method, device, computer equipment and storage medium |
CN110377704A (en) * | 2019-07-22 | 2019-10-25 | 北京百度网讯科技有限公司 | Detection method, device and the computer equipment of data consistency |
CN110457403A (en) * | 2019-08-12 | 2019-11-15 | 南京星火技术有限公司 | The construction method of figure network decision system, method and knowledge mapping |
CN110472107A (en) * | 2019-08-22 | 2019-11-19 | 腾讯科技(深圳)有限公司 | Multi-modal knowledge mapping construction method, device, server and storage medium |
CN110543571A (en) * | 2019-08-07 | 2019-12-06 | 北京市天元网络技术股份有限公司 | knowledge graph construction method and device for water conservancy informatization |
CN110704630A (en) * | 2019-04-15 | 2020-01-17 | 中国石油大学(华东) | Self-optimization mechanism for identified associated graph |
CN110727741A (en) * | 2019-09-29 | 2020-01-24 | 全球能源互联网研究院有限公司 | Knowledge graph construction method and system of power system |
CN110750651A (en) * | 2019-10-16 | 2020-02-04 | 同方知网(北京)技术有限公司 | Knowledge graph construction method and generation device based on scientific and technological achievements |
CN110750650A (en) * | 2019-09-30 | 2020-02-04 | 中盈优创资讯科技有限公司 | Construction method and device of enterprise knowledge graph |
CN110929134A (en) * | 2019-12-04 | 2020-03-27 | 深圳市新国都金服技术有限公司 | Investment and financing data management method and device, computer equipment and storage medium |
CN110941612A (en) * | 2019-11-19 | 2020-03-31 | 上海交通大学 | Autonomous data lake construction system and method based on associated data |
CN110968650A (en) * | 2019-10-30 | 2020-04-07 | 清华大学 | Medical field knowledge graph construction method based on doctor assistance |
CN110990585A (en) * | 2019-11-29 | 2020-04-10 | 上海勘察设计研究院(集团)有限公司 | Multi-source data and time sequence processing method and device for constructing industry knowledge graph |
CN111078949A (en) * | 2019-12-31 | 2020-04-28 | 北京明略软件系统有限公司 | Product knowledge storage method and device, computer equipment and readable storage medium |
CN111090683A (en) * | 2019-11-29 | 2020-05-01 | 上海勘察设计研究院(集团)有限公司 | Engineering field knowledge graph construction method and generation device thereof |
CN111125265A (en) * | 2019-12-13 | 2020-05-08 | 四川蜀天梦图数据科技有限公司 | Method and device for generating mapping data based on relational database data |
CN111143576A (en) * | 2019-12-18 | 2020-05-12 | 中科院计算技术研究所大数据研究院 | Event-oriented dynamic knowledge graph construction method and device |
CN111341456A (en) * | 2020-02-21 | 2020-06-26 | 中南大学湘雅医院 | Method and device for generating diabetic foot knowledge map and readable storage medium |
CN111431962A (en) * | 2020-02-20 | 2020-07-17 | 北京邮电大学 | Cross-domain resource access Internet of things service discovery method based on context awareness calculation |
CN111444351A (en) * | 2020-03-24 | 2020-07-24 | 清华苏州环境创新研究院 | Method and device for constructing knowledge graph in industrial process field |
CN111475503A (en) * | 2019-12-27 | 2020-07-31 | 北京国双科技有限公司 | Virtual knowledge graph construction method and device |
CN111552820A (en) * | 2020-04-30 | 2020-08-18 | 江河瑞通(北京)技术有限公司 | Water engineering scheduling data processing method and device |
CN111625607A (en) * | 2019-12-27 | 2020-09-04 | 北京国双科技有限公司 | Oil-gas knowledge graph construction method and device, electronic equipment and storage medium |
CN111708895A (en) * | 2020-05-28 | 2020-09-25 | 北京赛博云睿智能科技有限公司 | Method and device for constructing knowledge graph system |
CN111709527A (en) * | 2020-06-15 | 2020-09-25 | 北京优特捷信息技术有限公司 | Operation and maintenance knowledge map library establishing method, device, equipment and storage medium |
CN111708893A (en) * | 2020-05-15 | 2020-09-25 | 北京邮电大学 | Scientific and technological resource integration method and system based on knowledge graph |
CN111737488A (en) * | 2020-06-12 | 2020-10-02 | 南京中孚信息技术有限公司 | Information tracing method and device based on domain entity extraction and correlation analysis |
CN111861250A (en) * | 2020-07-29 | 2020-10-30 | 广东电网有限责任公司电力调度控制中心 | Scheduling decision generation method and device, electronic equipment and storage medium |
CN111858962A (en) * | 2020-07-27 | 2020-10-30 | 腾讯科技(成都)有限公司 | Data processing method, device and computer readable storage medium |
CN111897969A (en) * | 2020-07-27 | 2020-11-06 | 武汉大学 | Method and system for analyzing correlation between food components and nutritional health based on knowledge graph |
CN112231285A (en) * | 2020-10-20 | 2021-01-15 | 北京恒华龙信数据科技有限公司 | Knowledge graph generation method and device based on data resources |
CN112417456A (en) * | 2020-11-16 | 2021-02-26 | 中国电子科技集团公司第三十研究所 | Structured sensitive data reduction detection method based on big data |
CN112463984A (en) * | 2020-12-04 | 2021-03-09 | 北京明略软件系统有限公司 | Database mode expansion method, device, equipment and computer readable medium |
CN112527997A (en) * | 2020-12-18 | 2021-03-19 | 中国南方电网有限责任公司 | Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph |
CN112580831A (en) * | 2020-11-19 | 2021-03-30 | 国网江苏省电力有限公司信息通信分公司 | Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph |
CN112580912A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Budget auditing method and device, electronic equipment and storage medium |
CN112699245A (en) * | 2019-10-18 | 2021-04-23 | 北京国双科技有限公司 | Construction method and device and application method and device of budget management knowledge graph |
CN112818131A (en) * | 2021-02-01 | 2021-05-18 | 亚信科技(成都)有限公司 | Method, system and storage medium for constructing graph of threat information |
CN112860714A (en) * | 2019-11-12 | 2021-05-28 | 斑马智行网络(香港)有限公司 | Knowledge base, database, information updating method and device |
CN112883201A (en) * | 2021-03-23 | 2021-06-01 | 西安电子科技大学昆山创新研究院 | Knowledge graph construction method based on big data of smart community |
CN113065003A (en) * | 2021-04-22 | 2021-07-02 | 国际关系学院 | Knowledge graph generation method based on multiple indexes |
CN113094515A (en) * | 2021-04-13 | 2021-07-09 | 国网北京市电力公司 | Knowledge graph entity and link extraction method based on electric power marketing data |
CN113268602A (en) * | 2021-03-29 | 2021-08-17 | 江西融思科技有限公司 | Tissue knowledge graph construction method and device |
CN113434658A (en) * | 2021-08-25 | 2021-09-24 | 西安热工研究院有限公司 | Thermal power generating unit operation question-answer generation method, system, equipment and readable storage medium |
CN113449066A (en) * | 2021-08-31 | 2021-09-28 | 北京泽云瑞弘信息技术有限公司 | Method, processor and storage medium for storing cultural relic data by using knowledge graph |
CN113569060A (en) * | 2021-09-24 | 2021-10-29 | 中国电子技术标准化研究院 | Standard text based knowledge graph disambiguation method, system, device and medium |
CN113742498A (en) * | 2021-09-24 | 2021-12-03 | 国务院国有资产监督管理委员会研究中心 | Method for constructing and updating knowledge graph |
CN113987146A (en) * | 2021-10-22 | 2022-01-28 | 国网江苏省电力有限公司镇江供电分公司 | Dedicated novel intelligence of electric power intranet system of asking for answering |
CN114090790A (en) * | 2021-11-22 | 2022-02-25 | 西安交通大学 | Human-computer-friendly data logic fusion power knowledge graph and construction method thereof |
CN114443783A (en) * | 2022-04-11 | 2022-05-06 | 浙江大学 | Supply chain data analysis and enhancement processing method and device |
CN115269745A (en) * | 2022-07-27 | 2022-11-01 | 国网江苏省电力有限公司电力科学研究院 | Relational data-to-graph data mapping method, device and storage medium |
US11520828B2 (en) | 2020-07-24 | 2022-12-06 | International Business Machines Corporation | Methods for representing and storing data in a graph data structure using artificial intelligence |
CN116340414A (en) * | 2023-05-31 | 2023-06-27 | 北京华云安信息技术有限公司 | Knowledge graph-based attack surface visual modeling method and device |
US11734626B2 (en) * | 2020-07-06 | 2023-08-22 | International Business Machines Corporation | Cognitive analysis of a project description |
US11899681B2 (en) * | 2019-09-27 | 2024-02-13 | Boe Technology Group Co., Ltd. | Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium |
US11922121B2 (en) | 2020-01-21 | 2024-03-05 | Boe Technology Group Co., Ltd. | Method and apparatus for information extraction, electronic device, and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956052A (en) * | 2016-04-27 | 2016-09-21 | 青岛海尔软件有限公司 | Building method of knowledge map based on vertical field |
CN106776711A (en) * | 2016-11-14 | 2017-05-31 | 浙江大学 | A kind of Chinese medical knowledge mapping construction method based on deep learning |
CN106777331A (en) * | 2017-01-11 | 2017-05-31 | 北京航空航天大学 | Knowledge mapping generation method and device |
CN106897273A (en) * | 2017-04-12 | 2017-06-27 | 福州大学 | A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates |
CN107491555A (en) * | 2017-09-01 | 2017-12-19 | 北京纽伦智能科技有限公司 | Knowledge mapping construction method and system |
CN107766483A (en) * | 2017-10-13 | 2018-03-06 | 华中科技大学 | The interactive answering method and system of a kind of knowledge based collection of illustrative plates |
CN107783973A (en) * | 2016-08-24 | 2018-03-09 | 慧科讯业有限公司 | The methods, devices and systems being monitored based on domain knowledge spectrum data storehouse to the Internet media event |
CN108345647A (en) * | 2018-01-18 | 2018-07-31 | 北京邮电大学 | Domain knowledge map construction system and method based on Web |
CN108460136A (en) * | 2018-03-08 | 2018-08-28 | 国网福建省电力有限公司 | Electric power O&M information knowledge map construction method |
CN108509420A (en) * | 2018-03-29 | 2018-09-07 | 赵维平 | Gu spectrum and ancient culture knowledge mapping natural language processing method |
CN108846000A (en) * | 2018-04-11 | 2018-11-20 | 中国科学院软件研究所 | A kind of common sense semanteme map construction method and device based on supernode and the common sense complementing method based on connection prediction |
-
2018
- 2018-11-29 CN CN201811447248.7A patent/CN109597855A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105956052A (en) * | 2016-04-27 | 2016-09-21 | 青岛海尔软件有限公司 | Building method of knowledge map based on vertical field |
CN107783973A (en) * | 2016-08-24 | 2018-03-09 | 慧科讯业有限公司 | The methods, devices and systems being monitored based on domain knowledge spectrum data storehouse to the Internet media event |
CN106776711A (en) * | 2016-11-14 | 2017-05-31 | 浙江大学 | A kind of Chinese medical knowledge mapping construction method based on deep learning |
CN106777331A (en) * | 2017-01-11 | 2017-05-31 | 北京航空航天大学 | Knowledge mapping generation method and device |
CN106897273A (en) * | 2017-04-12 | 2017-06-27 | 福州大学 | A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates |
CN107491555A (en) * | 2017-09-01 | 2017-12-19 | 北京纽伦智能科技有限公司 | Knowledge mapping construction method and system |
CN107766483A (en) * | 2017-10-13 | 2018-03-06 | 华中科技大学 | The interactive answering method and system of a kind of knowledge based collection of illustrative plates |
CN108345647A (en) * | 2018-01-18 | 2018-07-31 | 北京邮电大学 | Domain knowledge map construction system and method based on Web |
CN108460136A (en) * | 2018-03-08 | 2018-08-28 | 国网福建省电力有限公司 | Electric power O&M information knowledge map construction method |
CN108509420A (en) * | 2018-03-29 | 2018-09-07 | 赵维平 | Gu spectrum and ancient culture knowledge mapping natural language processing method |
CN108846000A (en) * | 2018-04-11 | 2018-11-20 | 中国科学院软件研究所 | A kind of common sense semanteme map construction method and device based on supernode and the common sense complementing method based on connection prediction |
Cited By (87)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704630A (en) * | 2019-04-15 | 2020-01-17 | 中国石油大学(华东) | Self-optimization mechanism for identified associated graph |
CN110187678A (en) * | 2019-04-19 | 2019-08-30 | 广东省智能制造研究所 | A kind of storage of manufacturing industry process equipment information and digitlization application system |
CN110147579A (en) * | 2019-04-19 | 2019-08-20 | 浙江省建工集团有限责任公司 | A kind of Building Information Model dynamic cooperation management method |
CN110188207A (en) * | 2019-05-15 | 2019-08-30 | 出门问问信息科技有限公司 | Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment |
CN110188207B (en) * | 2019-05-15 | 2021-06-04 | 出门问问创新科技有限公司 | Knowledge graph construction method and device, readable storage medium and electronic equipment |
CN110197280A (en) * | 2019-05-20 | 2019-09-03 | 中国银行股份有限公司 | A kind of knowledge mapping construction method, apparatus and system |
CN110197280B (en) * | 2019-05-20 | 2021-08-06 | 中国银行股份有限公司 | Knowledge graph construction method, device and system |
CN110232130A (en) * | 2019-05-20 | 2019-09-13 | 平安科技(深圳)有限公司 | Metadata management pedigree generation method, device, computer equipment and storage medium |
CN110232130B (en) * | 2019-05-20 | 2024-02-02 | 平安科技(深圳)有限公司 | Metadata management pedigree generation method, apparatus, computer device and storage medium |
CN110222199A (en) * | 2019-06-20 | 2019-09-10 | 青岛大学 | A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles |
CN110377704A (en) * | 2019-07-22 | 2019-10-25 | 北京百度网讯科技有限公司 | Detection method, device and the computer equipment of data consistency |
CN110377704B (en) * | 2019-07-22 | 2022-04-22 | 北京百度网讯科技有限公司 | Data consistency detection method and device and computer equipment |
CN110543571A (en) * | 2019-08-07 | 2019-12-06 | 北京市天元网络技术股份有限公司 | knowledge graph construction method and device for water conservancy informatization |
CN110457403A (en) * | 2019-08-12 | 2019-11-15 | 南京星火技术有限公司 | The construction method of figure network decision system, method and knowledge mapping |
CN110457403B (en) * | 2019-08-12 | 2022-04-22 | 南京星火技术有限公司 | Graph network decision system and method and knowledge graph construction method |
CN110472107B (en) * | 2019-08-22 | 2024-01-30 | 腾讯科技(深圳)有限公司 | Multi-mode knowledge graph construction method, device, server and storage medium |
CN110472107A (en) * | 2019-08-22 | 2019-11-19 | 腾讯科技(深圳)有限公司 | Multi-modal knowledge mapping construction method, device, server and storage medium |
US11899681B2 (en) * | 2019-09-27 | 2024-02-13 | Boe Technology Group Co., Ltd. | Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium |
CN110727741A (en) * | 2019-09-29 | 2020-01-24 | 全球能源互联网研究院有限公司 | Knowledge graph construction method and system of power system |
CN112580912A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Budget auditing method and device, electronic equipment and storage medium |
CN110750650A (en) * | 2019-09-30 | 2020-02-04 | 中盈优创资讯科技有限公司 | Construction method and device of enterprise knowledge graph |
CN110750651B (en) * | 2019-10-16 | 2023-05-26 | 同方知网数字出版技术股份有限公司 | Knowledge graph construction method based on scientific and technological achievements |
CN110750651A (en) * | 2019-10-16 | 2020-02-04 | 同方知网(北京)技术有限公司 | Knowledge graph construction method and generation device based on scientific and technological achievements |
CN112699245A (en) * | 2019-10-18 | 2021-04-23 | 北京国双科技有限公司 | Construction method and device and application method and device of budget management knowledge graph |
CN110968650A (en) * | 2019-10-30 | 2020-04-07 | 清华大学 | Medical field knowledge graph construction method based on doctor assistance |
CN112860714A (en) * | 2019-11-12 | 2021-05-28 | 斑马智行网络(香港)有限公司 | Knowledge base, database, information updating method and device |
CN110941612B (en) * | 2019-11-19 | 2020-08-11 | 上海交通大学 | Autonomous data lake construction system and method based on associated data |
CN110941612A (en) * | 2019-11-19 | 2020-03-31 | 上海交通大学 | Autonomous data lake construction system and method based on associated data |
CN111090683B (en) * | 2019-11-29 | 2023-12-22 | 上海勘察设计研究院(集团)股份有限公司 | Knowledge graph construction method and generation device thereof in engineering field |
CN110990585B (en) * | 2019-11-29 | 2024-01-30 | 上海勘察设计研究院(集团)股份有限公司 | Multi-source data and time sequence processing method and device for building industry knowledge graph |
CN111090683A (en) * | 2019-11-29 | 2020-05-01 | 上海勘察设计研究院(集团)有限公司 | Engineering field knowledge graph construction method and generation device thereof |
CN110990585A (en) * | 2019-11-29 | 2020-04-10 | 上海勘察设计研究院(集团)有限公司 | Multi-source data and time sequence processing method and device for constructing industry knowledge graph |
CN110929134A (en) * | 2019-12-04 | 2020-03-27 | 深圳市新国都金服技术有限公司 | Investment and financing data management method and device, computer equipment and storage medium |
CN111125265A (en) * | 2019-12-13 | 2020-05-08 | 四川蜀天梦图数据科技有限公司 | Method and device for generating mapping data based on relational database data |
CN111143576A (en) * | 2019-12-18 | 2020-05-12 | 中科院计算技术研究所大数据研究院 | Event-oriented dynamic knowledge graph construction method and device |
CN111625607A (en) * | 2019-12-27 | 2020-09-04 | 北京国双科技有限公司 | Oil-gas knowledge graph construction method and device, electronic equipment and storage medium |
CN111475503A (en) * | 2019-12-27 | 2020-07-31 | 北京国双科技有限公司 | Virtual knowledge graph construction method and device |
CN111078949A (en) * | 2019-12-31 | 2020-04-28 | 北京明略软件系统有限公司 | Product knowledge storage method and device, computer equipment and readable storage medium |
US11922121B2 (en) | 2020-01-21 | 2024-03-05 | Boe Technology Group Co., Ltd. | Method and apparatus for information extraction, electronic device, and storage medium |
CN111431962A (en) * | 2020-02-20 | 2020-07-17 | 北京邮电大学 | Cross-domain resource access Internet of things service discovery method based on context awareness calculation |
CN111341456A (en) * | 2020-02-21 | 2020-06-26 | 中南大学湘雅医院 | Method and device for generating diabetic foot knowledge map and readable storage medium |
CN111341456B (en) * | 2020-02-21 | 2024-02-23 | 中南大学湘雅医院 | Method and device for generating diabetic foot knowledge graph and readable storage medium |
CN111444351B (en) * | 2020-03-24 | 2023-09-12 | 清华苏州环境创新研究院 | Knowledge graph construction method and device in industrial process field |
CN111444351A (en) * | 2020-03-24 | 2020-07-24 | 清华苏州环境创新研究院 | Method and device for constructing knowledge graph in industrial process field |
CN111552820A (en) * | 2020-04-30 | 2020-08-18 | 江河瑞通(北京)技术有限公司 | Water engineering scheduling data processing method and device |
CN111708893A (en) * | 2020-05-15 | 2020-09-25 | 北京邮电大学 | Scientific and technological resource integration method and system based on knowledge graph |
CN111708895B (en) * | 2020-05-28 | 2023-06-20 | 北京赛博云睿智能科技有限公司 | Knowledge graph system construction method and device |
CN111708895A (en) * | 2020-05-28 | 2020-09-25 | 北京赛博云睿智能科技有限公司 | Method and device for constructing knowledge graph system |
CN111737488A (en) * | 2020-06-12 | 2020-10-02 | 南京中孚信息技术有限公司 | Information tracing method and device based on domain entity extraction and correlation analysis |
CN111737488B (en) * | 2020-06-12 | 2021-02-02 | 南京中孚信息技术有限公司 | Information tracing method and device based on domain entity extraction and correlation analysis |
CN111709527A (en) * | 2020-06-15 | 2020-09-25 | 北京优特捷信息技术有限公司 | Operation and maintenance knowledge map library establishing method, device, equipment and storage medium |
US11734626B2 (en) * | 2020-07-06 | 2023-08-22 | International Business Machines Corporation | Cognitive analysis of a project description |
US11520828B2 (en) | 2020-07-24 | 2022-12-06 | International Business Machines Corporation | Methods for representing and storing data in a graph data structure using artificial intelligence |
CN111897969A (en) * | 2020-07-27 | 2020-11-06 | 武汉大学 | Method and system for analyzing correlation between food components and nutritional health based on knowledge graph |
CN111858962A (en) * | 2020-07-27 | 2020-10-30 | 腾讯科技(成都)有限公司 | Data processing method, device and computer readable storage medium |
CN111861250A (en) * | 2020-07-29 | 2020-10-30 | 广东电网有限责任公司电力调度控制中心 | Scheduling decision generation method and device, electronic equipment and storage medium |
CN112231285A (en) * | 2020-10-20 | 2021-01-15 | 北京恒华龙信数据科技有限公司 | Knowledge graph generation method and device based on data resources |
CN112417456A (en) * | 2020-11-16 | 2021-02-26 | 中国电子科技集团公司第三十研究所 | Structured sensitive data reduction detection method based on big data |
CN112580831B (en) * | 2020-11-19 | 2024-03-29 | 国网江苏省电力有限公司信息通信分公司 | Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph |
CN112580831A (en) * | 2020-11-19 | 2021-03-30 | 国网江苏省电力有限公司信息通信分公司 | Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph |
CN112463984A (en) * | 2020-12-04 | 2021-03-09 | 北京明略软件系统有限公司 | Database mode expansion method, device, equipment and computer readable medium |
CN112463984B (en) * | 2020-12-04 | 2024-02-27 | 北京明略软件系统有限公司 | Database schema extension method, device, equipment and computer readable medium |
CN112527997A (en) * | 2020-12-18 | 2021-03-19 | 中国南方电网有限责任公司 | Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph |
CN112527997B (en) * | 2020-12-18 | 2024-01-23 | 中国南方电网有限责任公司 | Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph |
CN112818131B (en) * | 2021-02-01 | 2023-10-03 | 亚信科技(成都)有限公司 | Map construction method, system and storage medium for threat information |
CN112818131A (en) * | 2021-02-01 | 2021-05-18 | 亚信科技(成都)有限公司 | Method, system and storage medium for constructing graph of threat information |
CN112883201B (en) * | 2021-03-23 | 2023-11-21 | 西安电子科技大学昆山创新研究院 | Knowledge graph construction method based on big data of intelligent community |
CN112883201A (en) * | 2021-03-23 | 2021-06-01 | 西安电子科技大学昆山创新研究院 | Knowledge graph construction method based on big data of smart community |
CN113268602A (en) * | 2021-03-29 | 2021-08-17 | 江西融思科技有限公司 | Tissue knowledge graph construction method and device |
CN113094515A (en) * | 2021-04-13 | 2021-07-09 | 国网北京市电力公司 | Knowledge graph entity and link extraction method based on electric power marketing data |
CN113065003B (en) * | 2021-04-22 | 2023-05-26 | 国际关系学院 | Knowledge graph generation method based on multiple indexes |
CN113065003A (en) * | 2021-04-22 | 2021-07-02 | 国际关系学院 | Knowledge graph generation method based on multiple indexes |
CN113434658A (en) * | 2021-08-25 | 2021-09-24 | 西安热工研究院有限公司 | Thermal power generating unit operation question-answer generation method, system, equipment and readable storage medium |
CN113449066B (en) * | 2021-08-31 | 2021-12-07 | 北京泽云瑞弘信息技术有限公司 | Method, processor and storage medium for storing cultural relic data by using knowledge graph |
CN113449066A (en) * | 2021-08-31 | 2021-09-28 | 北京泽云瑞弘信息技术有限公司 | Method, processor and storage medium for storing cultural relic data by using knowledge graph |
CN113742498B (en) * | 2021-09-24 | 2024-04-09 | 国务院国有资产监督管理委员会研究中心 | Knowledge graph construction and updating method |
CN113569060A (en) * | 2021-09-24 | 2021-10-29 | 中国电子技术标准化研究院 | Standard text based knowledge graph disambiguation method, system, device and medium |
CN113742498A (en) * | 2021-09-24 | 2021-12-03 | 国务院国有资产监督管理委员会研究中心 | Method for constructing and updating knowledge graph |
CN113987146A (en) * | 2021-10-22 | 2022-01-28 | 国网江苏省电力有限公司镇江供电分公司 | Dedicated novel intelligence of electric power intranet system of asking for answering |
CN113987146B (en) * | 2021-10-22 | 2023-01-31 | 国网江苏省电力有限公司镇江供电分公司 | Dedicated intelligent question-answering system of electric power intranet |
CN114090790A (en) * | 2021-11-22 | 2022-02-25 | 西安交通大学 | Human-computer-friendly data logic fusion power knowledge graph and construction method thereof |
CN114090790B (en) * | 2021-11-22 | 2024-04-16 | 西安交通大学 | Man-machine friendly data logic fusion power knowledge graph and construction method thereof |
CN114443783A (en) * | 2022-04-11 | 2022-05-06 | 浙江大学 | Supply chain data analysis and enhancement processing method and device |
CN114443783B (en) * | 2022-04-11 | 2022-06-24 | 浙江大学 | Supply chain data analysis and enhancement processing method and device |
CN115269745A (en) * | 2022-07-27 | 2022-11-01 | 国网江苏省电力有限公司电力科学研究院 | Relational data-to-graph data mapping method, device and storage medium |
CN115269745B (en) * | 2022-07-27 | 2023-11-14 | 国网江苏省电力有限公司电力科学研究院 | Method, equipment and storage medium for mapping relational data to graph data |
CN116340414A (en) * | 2023-05-31 | 2023-06-27 | 北京华云安信息技术有限公司 | Knowledge graph-based attack surface visual modeling method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109597855A (en) | Domain knowledge map construction method and system based on big data driving | |
CN112199511B (en) | Cross-language multi-source vertical domain knowledge graph construction method | |
WO2021196520A1 (en) | Tax field-oriented knowledge map construction method and system | |
CN108573411B (en) | Mixed recommendation method based on deep emotion analysis and multi-source recommendation view fusion of user comments | |
CN112612902A (en) | Knowledge graph construction method and device for power grid main device | |
CN116628172A (en) | Dialogue method for multi-strategy fusion in government service field based on knowledge graph | |
CN110990590A (en) | Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning | |
CN113806563B (en) | Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material | |
CN111967761B (en) | Knowledge graph-based monitoring and early warning method and device and electronic equipment | |
Rajbhandari et al. | The AGROVOC concept scheme–a walkthrough | |
CN110888943A (en) | Method and system for auxiliary generation of court referee document based on micro-template | |
CN111930774A (en) | Automatic construction method and system for power knowledge graph ontology | |
US9594755B2 (en) | Electronic document repository system | |
CN113487211A (en) | Nuclear power equipment quality tracing method and system, computer equipment and medium | |
Li et al. | Neural factoid geospatial question answering | |
Antopol’skii et al. | The development of a semantic network of keywords based on definitive relationships | |
CN117473054A (en) | Knowledge graph-based general intelligent question-answering method and device | |
Mountantonakis | Services for Connecting and Integrating Big Numbers of Linked Datasets | |
Yin et al. | A deep natural language processing‐based method for ontology learning of project‐specific properties from building information models | |
Maynard et al. | Change management for metadata evolution | |
Behkamal et al. | Publishing Persian linked data; challenges and lessons learned | |
CN115759253A (en) | Power grid operation and maintenance knowledge map construction method and system | |
Ouaret et al. | AuMixDw: Towards an automated hybrid approach for building XML data warehouses | |
Ivanov et al. | Automatic generation of a large dictionary with concreteness/abstractness ratings based on a small human dictionary | |
CN115270776A (en) | Method, system, device and medium for automatically acquiring concepts in domain knowledge base |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190409 |
|
RJ01 | Rejection of invention patent application after publication |