CN109597855A - Domain knowledge map construction method and system based on big data driving - Google Patents

Domain knowledge map construction method and system based on big data driving Download PDF

Info

Publication number
CN109597855A
CN109597855A CN201811447248.7A CN201811447248A CN109597855A CN 109597855 A CN109597855 A CN 109597855A CN 201811447248 A CN201811447248 A CN 201811447248A CN 109597855 A CN109597855 A CN 109597855A
Authority
CN
China
Prior art keywords
data
entity
information
knowledge
map construction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811447248.7A
Other languages
Chinese (zh)
Inventor
鄂海红
宋美娜
王宁
杨卓
王园
周康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201811447248.7A priority Critical patent/CN109597855A/en
Publication of CN109597855A publication Critical patent/CN109597855A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of domain knowledge map construction method and system based on big data driving, wherein method includes the following steps: crawling the data source in network, and obtains the first data information;Data information extraction is carried out to data source, to extract the related information between entity;Knowledge fusion, and opening relationships type database are carried out to the related information between entity;Relevant database is converted to chart database model, to construct knowledge mapping.This method can provide stringent and data pattern abundant, assist the analysis application or decision support of various complexity, and accuracy is high, with guiding value and have industry meaning in knowledge mapping actual implementation process.

Description

Domain knowledge map construction method and system based on big data driving
Technical field
The present invention relates to technical field of information processing, in particular to a kind of domain knowledge map structure based on big data driving Construction method and system.
Background technique
Domain knowledge map is structure from the semantic relation extracted between entity and entity in the specific resources of specific area The semantic network built, the knowledge hierarchy that it includes usually have very strong field specific aim and professional.But at present both at home and abroad The patent achievement of domain knowledge map construction highlight in isolation knowledge mapping building link in a certain respect, be all mainly about The key technology of natural language processing part in knowledge mapping, including Entity recognition, relation recognition, entity link, knowledge fusion, Knowledge calculates etc., and data in such as knowledge mapping indicate, storage format or knowledge acquisition method and the problems such as model;Another Problem is that knowledge is made of data, and the building of knowledge mapping needs the support of big data platform, but fresh refer to less is known in studying Know the big data process flow of map construction process, lacks the guiding value to knowledge mapping actual implementation process.
Knowledge mapping belongs to semantic net scope as a kind of new knowledge representation method, and target is description real world Present in incidence relation between various entities and concept and these entities, concept.It can be divided into according to covering surface and general know Know map and domain knowledge map.Current announced knowledge mapping is substantially world knowledge map, it emphasizes range, mainly It is not very high to accuracy requirement applied to business such as search.
For example, (1) discloses a kind of construction method of knowledge mapping based on vertical field in the related technology, including extract The vocabulary of the class of online encyclopaedia realizes the information merger of hyponymy, domain knowledge between class, the data attribute in field With the definition of attribute of a relation, it is finally completed the study of physical layer;(2) a kind of knowledge based point connection pass is disclosed in the related technology The knowledge mapping construction method of system constructs the knowledge point database being made of meta-knoeledge point by obtaining meta-knoeledge point;According to religion A contents selection meta-knoeledge point for a characterization of gaining knowledge, and there are the rudimentary knowledge of dependence points to combine with meta-knoeledge point;Really Determine path length of each meta-knoeledge o'clock relative to the first meta-knoeledge point in the combination of rudimentary knowledge point;According to dependence level and road Electrical path length constructs knowledge mapping.(3) a kind of Chinese tour field knowledge mapping construction method is disclosed in the related technology and is System is integrated with using a kind of hybrid-type entity attribute knowledge expansion method based on lexical field, supervised learning, pattern match, with And the entity attribute knowledge expansion algorithm of search engine question and answer is to realize that tour field knowledge mapping constructs task.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, an object of the present invention is to provide a kind of domain knowledge map construction sides based on big data driving Method.This method can provide stringent and data pattern abundant, assist the analysis application or decision support of various complexity, and accurate Degree is high, with guiding value and has industry meaning in knowledge mapping actual implementation process.
It is another object of the present invention to propose a kind of domain knowledge map construction system based on big data driving.
In order to achieve the above objectives, one aspect of the present invention proposes the domain knowledge map construction side based on big data driving Method, comprising the following steps: crawl the data source in network, and obtain the first data information;Data letter is carried out to the data source Breath extracts, to extract the related information between entity;Knowledge fusion is carried out to the related information between the entity, and establishes pass It is type database;The relevant database is converted to chart database model, to construct knowledge mapping.
The domain knowledge map construction method based on big data driving of the embodiment of the present invention emphasizes that knowledge mapping constructs ring Each link in section provides actual techniques guidance for the building of domain knowledge map, to construct accuracy height, data mould Formula is abundant stringent, can assist the domain knowledge map of complicated analysis and decision support, and building process has guiding value And there is industry meaning, there is prior meaning for actual production life.
In addition, the domain knowledge map construction method according to the above embodiment of the present invention based on big data driving can be with With following additional technical characteristic:
Further, in one embodiment of the invention, the data source includes structural data, semi-structured data And unstructured data.
Further, in one embodiment of the invention, described that information extraction is carried out to the data source, comprising: right The data source extracts entity, relationship and entity attribute structured message from semi-structured and non-structural data, to obtain The related information.
Further, in one embodiment of the invention, the related information between the entity carries out knowledge Fusion, comprising: information characteristics are extracted according to the related information between the entity, to eliminate concept ambiguity, strip redundancy and mistake Accidentally concept;Entity link is carried out to the information characteristics, to obtain relational data.
It is further, in one embodiment of the invention, described that entity link is carried out to the information characteristics, comprising: The information characteristics are linked to corresponding correct entity object in knowledge base.
Further, in one embodiment of the invention, the related information between the entity carries out knowledge Fusion, and opening relationships type database, further includes: extract entity and censure item;Censuring detection entity of the same name according to the entity is It is no to indicate different meanings and indicate identical meanings with the presence or absence of other names entity, to carry out entity disambiguation and coreference resolution; Confirm in the knowledge base after corresponding entity object, entity denotion item is linked to the entity object.
Further, in one embodiment of the invention, further includes: after preset duration, the data source is crawled, and Obtain the second data information;Judge whether first data information changes according to second data information;If institute It states the first data information to change, then obtains change data, and the change data are converted to the chart database model, To be incorporated to the knowledge mapping.
In order to achieve the above objectives, another aspect of the present invention proposes a kind of domain knowledge map structure based on big data driving Build system, comprising: acquisition module for crawling the data source in network, and obtains the first data information;Processing module is used for Data information extraction is carried out to the data source, to extract the related information between entity;Memory module, for the entity Between related information carry out knowledge fusion, and opening relationships type database;Module is constructed, is used for the relevant database It is converted to chart database model, to construct knowledge mapping.
The domain knowledge map construction system based on big data driving of the embodiment of the present invention emphasizes that knowledge mapping constructs ring Each link in section provides actual techniques guidance for the building of domain knowledge map, to construct accuracy height, data mould Formula is abundant stringent, can assist the domain knowledge map of complicated analysis and decision support, and building process has guiding value And there is industry meaning, there is prior meaning for actual production life.
In addition, the domain knowledge map construction system according to the above embodiment of the present invention based on big data driving can be with With following additional technical characteristic:
Further, in one embodiment of the invention, the data source includes structural data, semi-structured data And unstructured data.
Further, in one embodiment of the invention, further includes: update module is used for after preset duration, again It crawls the data source and obtains the second data information, judge whether first data source changes according to second data source, If data change, change data are converted to the chart database model, to be incorporated in the knowledge mapping.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the knowledge mapping formal definitions frame diagram of one embodiment of the invention;
Fig. 2 is the domain knowledge map construction method flow diagram based on big data driving of one embodiment of the invention;
Fig. 3 is the relational data mode of one embodiment of the invention to chart database mode conversion process figure;
Fig. 4 is the domain knowledge map construction frame diagram based on big data driving of one embodiment of the invention;
Fig. 5 is retrieval flow of the specific embodiment of the invention with " Facial Recognition " in wikipedia Figure;
Fig. 6 is the data update flow chart of another specific embodiment of the invention;
Fig. 7 is the domain knowledge map construction system structure signal based on big data driving of one embodiment of the invention Figure.
Specific embodiment
The embodiment of the present invention is described below in detail, the example of embodiment is shown in the accompanying drawings, wherein identical from beginning to end Or similar label indicates same or similar element or element with the same or similar functions.It is retouched below with reference to attached drawing The embodiment stated is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Firstly, the formal definitions of knowledge mapping are as follows: logically knowledge mapping can be divided into data Layer and mode layer Two levels.In the data Layer of knowledge mapping, knowledge is stored in chart database as unit of the fact, with " entity-relation- Entity " or " entity-attribute-attribute value " triple are stored in institute in chart database as true primary expression mode There is the true huge entity relationship network constituted to be formed knowledge mapping.Mode layer is knowledge mapping on data Layer Core.What it is in mode layer storage is knowledge by refinement, generallys use the mode layer that ontology library carrys out managerial knowledge map, borrows Ontology library is helped to come type and the attribute etc. of Specification entity, relationship and entity to the tenability of axiom, rule and constraint condition Connection between object.
Therefore, the embodiment of the present invention proposes the definition mode of knowledge mapping are as follows: knowledge mapping G is by ideograph Gs, datagram Gd And relationship R composition between the two, formula (1) can be expressed as.
G=< Gs,Gd, R > (1)
Gs=< Ns,Es> (2)
Gd=< Nd,Ed> (3)
As shown in Figure 1, ideograph GsBy NsAnd EsComposition, is represented by formula (2).Wherein, NsIndicate the collection of class node It closes, EsIndicate the set on attribute side, ideograph GsIn class (node) be concept in knowledge mapping, and attribute (side) is then right Answer the semantic relation between concept.Datagram GdBy NdAnd EdComposition, is represented by formula (3), wherein NdIndicate example (knot Point) it is the entity present in reality, E in knowledge mappingdIndicate that example relationship (side) connects one that two nodes indicate Triple is true.
The domain knowledge map based on big data driving proposed according to embodiments of the present invention referring next to attached drawing description Construction method and system, the field based on big data driving for describing to propose according to embodiments of the present invention first with reference to the accompanying drawings are known Know map construction method.
Fig. 2 is the domain knowledge map construction method flow diagram based on big data driving of one embodiment of the invention.
As shown in Fig. 2, the domain knowledge map construction method that should be driven based on big data the following steps are included:
In step s101, the data source in network is crawled, and obtains the first data information.
Wherein, data source includes structural data, semi-structured data and unstructured data.
Specifically, for structural data, a large amount of link opens data and be stored in the field in relational database Knowledge.For semi-structured data, the message box for thering are the encyclopaedias websites such as wikipedia, interaction encyclopaedia, Baidupedia to provide (Infobox) a large amount of tables, the table data etc. that the vertical website and under different field includes.Unstructured data refers to net A large amount of plain text content in network data, knowledge coverage is most wide, but it is also maximum to extract difficulty, it usually needs uses nature language Speech handles (Natural Language Processing, NLP), and technology is pre-processed, including participle, part-of-speech tagging, life Name Entity recognition and syntactic analysis;Then knowledge is obtained by technologies such as statistical analysis, machine learning.Knowledge mapping It is most of from Internet resources to construct data source, needs to obtain by crawler.
In step s 102, data information extraction is carried out to data source, to extract the related information between entity.
Further, carrying out information extraction to data source includes: to take out from semi-structured and non-structural data to data source Entity, relationship and entity attribute structured message are taken, to obtain related information.
Specifically, information extraction is the first step of knowledge mapping building, it is one kind automatically from semi-structured and non- The structured message technology of entity extraction, Relation extraction and entity attribute extraction etc. is carried out in structured data.
Entity extracts, i.e. name Entity recognition, refers to that automatically identifying name entity entity from text data concentration extracts Quality the efficiency of subsequent knowledge acquisition and quality are influenced greatly, therefore be basis and crucial portion the most in information extraction Point.
Relation extraction, corpus of text are extracted by entity, and what is obtained is the name entity of series of discrete, in order to obtain language Adopted information, it is also necessary to extract the incidence relation between entity from related corpus, be contacted entity (concept) by relationship Come, it can the webbed structure of knowledge of shape.
The target that entity attribute extracts is the attribute information that special entity is acquired from different aforementioned sources.Such as some Public figure can obtain the information such as its pet name, birthday, nationality, education background from network public information.Attribute extraction technology These information can be collected from a variety of data sources, realize completely delineating to entity attribute.
It should be noted that entity extracts and Relation extraction is mainly realized according to machine learning model, attribute extraction is then led The semi-structured data of message box (infobox) etc is similar on network.
In step s 103, knowledge fusion, and opening relationships type database are carried out to the related information between entity.
Further, in one embodiment of the invention, carrying out knowledge fusion to the related information between entity includes: Information characteristics are extracted according to the related information between entity, to eliminate concept ambiguity, strip redundancy and erroneous picture;To information spy Sign carries out entity link, to obtain relational data.
In addition, in step s 103 further include: extract entity and censure item;Censured according to entity detect entity of the same name whether table Show different meanings and indicate identical meanings with the presence or absence of other names entity, to carry out entity disambiguation and coreference resolution;Confirmation In knowledge base after corresponding entity object, entity denotion item is linked to entity object.
Entity is obtained from unstructured and semi-structured data, is closed it should be noted that being realized by information extraction The target of system and entity attribute information, however may include a large amount of redundancy and error message in these results, between data Relationship be also flattening, lack hierarchy and logicality, it is therefore necessary to be cleared up it and be integrated.Melted by knowledge It closes, the ambiguity of concept can be eliminated, redundancy and erroneous picture are rejected, so that it is guaranteed that the quality of knowledge.Wherein, knowledge fusion packet Include: entity link and knowledge merge.
Entity link refers to the entity object for extracting from text, be linked in knowledge base it is corresponding just The operation of true entity object, general flow is: extracting to obtain entity denotion item by entity from text;Then entity is carried out to disappear Discrimination and coreference resolution, whether the entity of the same name in judgemental knowledge library represents therewith whether there is in different meaning and knowledge base Other names entity indicates identical meaning therewith;Finally confirmed again in knowledge base after corresponding correct entity object, by this Entity censures item and is linked to correspondent entity in knowledge base.Therefore, task mostly important in entity link step is building one Thesaurus a accurate and abundant.
Knowledge merges when constructing knowledge mapping, can know from third party's knowledge base product or existing structure data acquisition Know input.For example, it can be regularly published by accumulation and the semantic knowledge data arranged etc. by being associated with open data items.
In step S104, relevant database is converted to chart database model, to construct knowledge mapping.
Wherein, to information characteristics carry out entity link include: by information characteristics be linked in knowledge base it is corresponding just really Body object.
Specifically, as shown in figure 3, based on the obtained relational data mode of pre-treatment (including entity and entity close System, entity attribute and entity property value) complete conversion to chart database mode, wherein and relevant database is converted into figure number According to library mode, generally follows following principle and is converted:
(1) each node label is indicated with the table name of entity table, i.e., using the table name of entity table as node label name.Example Such as, tables of data is entitled " enterprise ", then establishes the node type that label is " enterprise ".
(2) every a line in entity table corresponds to a node, and every a line can be fully described by one in relation database table A entity and its attribute value, while can determine the globally unique identifier of node.
(3) column on relation table become nodal community, and in data line, in addition to unique mark, remaining field is all to node Supplement and explanation, therefore be used as nodal community.
(4) table for describing incidence relation between entity is converted into relationship, and the column on these tables become attribute of a relation. The structural relation of its external key is directed toward between relation table from a major key, is being exactly relationship between node in chart database, therefore Column on table are converted into the attribute of relationship.
Further, the domain knowledge map construction method of the embodiment of the present invention further include: after preset duration, crawl number According to source, and obtain the second data information;Judge whether the first data information changes according to the second data information;If first Data information changes, then obtains change data, and change data are converted to chart database model and are incorporated to knowledge mapping.
It should be noted that information and knowledge quantity that the mankind are possessed all are the monotonically increasing functions of time, therefore knowledge graph The content of spectrum is also required to grow with each passing hour, and building process is the process that a continuous iteration updates.Logically see, knowledge base Update includes the update of conceptual level and the update of data Layer.The update of conceptual level obtains new concept after referring to newly-increased data, It needs automatically to be added in the conceptual level of knowledge base and (need manually to participate in carry out with after audit) new concept.Data Layer Update is mainly newly-increased or more novel entities, relationship and attribute value, data Layer is updated and needs to consider the reliable of data source Property, data many factors such as consistency the problems such as (whether there is contradiction or redundancy), compared to the update of conceptual level, data Layer updates to be completed in an automated manner, needs to handle it using three information extraction, knowledge fusion and map construction processes After can be put in storage.
The embodiment of the present invention is carried out below with reference to big data the relevant technologies needed for building knowledge mapping process detailed Explanation.
As shown in figure 4, required big data the relevant technologies may include: data acquisition subsystem, data process subsystem, Data storage subsystem and data update subsystem.
(1) data acquisition subsystem: initial data is acquired from industrial sustainability, third party database, Web log and is imported In HDFS format result file.In addition to acquiring initial data during this, the one kind that can also be proposed by this patent is based on hundred The synonymous entity extending method of class website crawler constructs thesaurus, to realize the entity link during knowledge fusion.
Specifically, the embodiment of the present invention can realize crawler based on PythonScrapy frame, network data is obtained, is led to It crosses Sqoop and importing event is packaged into a MapReduce task, Hadoop distributed environment is committed to, concurrently from data Source obtains data, ultimately generates HDFS format result file.
In the process in addition to the acquisition of completion initial data, the synonymous entity based on encyclopaedia class website crawler can also be passed through Extending method constructs a thesaurus accurate and abundant, to realize the entity link in knowledge fusion.The specific method is as follows:
In a network using entity E as initial retrieval word, setting search depth is N, the number of iterations M, indicates to examine from preceding N item Crawl in hitch fruit from the encyclopaedias class website such as wikipedia, Baidupedia or MBA think tank, by " alias " in the page or " recommending associative key " is added to thesaurus and retrieval dictionary, and is crawled in next round, it is assumed that it is added to E1, E2, E3 ... En is then retrieved again with E1, E2, E3 ... En for keyword, stopping when the number of iterations is reduced to 0.Finally E set then be the entity thesaurus.
For example, recommendation is related to close as shown in figure 5, retrieving " Facial Recognition " in English edition wikipedia Keyword part is as shown in step1;It is retrieved with recommended keywords all in step1, we are with " Face Detection " Example is retrieved, and is recommended shown in the step2 as follows of associative key part;And so on, with " Computational Photography " is that keyword is retrieved, and recommends associative key part as shown in step3.
Concrete implementation process of the embodiment of the present invention is expressed as follows using pseudocode: entityWordList is same as initial solid Adopted dictionary, entitySearchWordList is initial retrieval dictionary, and using entity E as initial retrieval word, search depth is arranged SearchDepth is N, and the number of iterations searchTimes is M, calls SearchCommonEntity method, it is first determined whether Continue iteration, if so, traversing to entitySearchWordList, entity thesaurus first is added in term, then Using the word as term, the acquisition of searchSpider function and all search results of the word are called, is recalled GetEncyclopedia further screens obtained URL, leaves behind url link relevant with encyclopaedia class, and to url Link is traversed, and getRelatedWords is called to obtain the association vocabulary in the page, if not in temporary retrieval word list In tempSearchWordList, then add.After the completion of next iteration, thesaurus, institute can be all added in all terms Retrieval dictionary can be all added in the temporary term having, and the meaning done so is to avoid repeatedly crawling the same word.
(2) data process subsystem: in most cases, raw data acquisition is completed to ask after entering HDFS there are many Topic, needs to pre-process data.And in this step can based on Spark call machine learning model complete entity, The conversion from non-(partly) structural data to structural data is completed in the extraction of relationship, attribute.
The data processing of the embodiment of the present invention is mainly based upon Hive and completes data prediction and completed based on Spark non- Conversion of (partly) structuring to structural data.
Wherein, Hive is the architecture of a data warehouse based on Hadoop, and one kind can store, inquires and analyze The mechanism of large-scale data in HDFS.It can be used to carry out mass data extraction, conversion, load (ETL).Hive is defined Simple class SQL query language (HQL) ultimately generates a series of MapReduce and appoints by being parsed and being converted to sentence Business carries out data processing, provides table lookup characteristic and distributed storage of the user part as conventional RD BMS and calculates spy Property.
Apache Spark be big data processing platform of new generation popular after Hadoop and one quickly General computing engines, have been widely used at present.Due to the improvement in design, in terms of memory operation, Spark ratio MapReduce is nearly 100 times fast, is then 10 times of MapReduce based on hard disk operation, therefore Spark can preferably be suitable for needing Want the scene of Map Reduce iteration, such as data mining and machine learning etc..The Spark ecosphere contain SparkCore, The components such as Spark SQL, Spark Streaming, MLLib and Graph X, these components complement each other, and formation one is powerful One-stop big data processing platform.
(1) data prediction is completed based on Hive
In most cases, raw data acquisition is completed after entering HDFS, all there is problems, such as data field The problems such as lacking, including mistake or exceptional value, coding or name difference, it is therefore desirable to data are pre-processed, it will be original Input data is converted into being suitble to the data mode of analysis.Data prediction is generally divided into three steps: data decimation, tables of data category Property unification, data scrubbing.
Data decimation: select that user is interested, appoints with Knowledge Discovery by user is specified from the raw data base of user It is engaged in relevant data table items.Because the data bulk in database is huge, covering scope is also relatively extensive.Some data forms In data be fundamentally not in contact with.If not carrying out simple screening to database, hash can be made to participate in excavating Process causes the waste in various resources.Generally take the mode of man-computer cooperation.Data on the higher concept hierarchy of artificial selection Classification selects specific data form and data column in database by good program prepared in advance.
Data Table Properties unification: when tables of data to be excavated, which has been chosen, to be finished, we start to these tables of data In data excavated before pretreatment.As the preparation before excavation, the difference according to thesaurus to same entity is needed Name indicates to be cleared up and integrated to carry out unification, obtains that one unified, clearly data indicate.This step is corresponding Entity link in knowledge mapping building process.
It should be noted that sometimes the attribute value of same attribute is possible to using different linear modules, as student at Achievement generally indicates with hundred-mark system, but also uses the five-grade marking system sometimes, or the judge mark that " excellent, good, pass, is poor " etc. obscures Standard, we can according to need to determine a standard, and stipulated that a conversion regime, is converted to standard for non-standard expression It indicates.All change needs are recorded, and standby access in the future or data need when updating.
Data scrubbing: after the completion of the step of front two, the frame and specification in mining data library are had determined.It below will be to it In data specifically handled, mainly solved the problems, such as: vacancy value, wrong data, noise data and isolated point.
A. processing vacancy value can take following several method: 1. ignoring, when multiple attribute value vacancies of a tuple, lead to Often ignore it, i.e., is deleted in data form;2. filling up, lack when tuple only has a small number of attribute values, generally will to vacancy value into Row is filled up.Mode there are many filling up, is manually filled up, the average value under global constant and affiliated attribute.It can also be under the attribute Data application derive tool (decision tree etc.), most probable Filling power is obtained by the analysis to other numerical value.For not With the vacancy value under attribute, it would be desirable to different processing methods.It has been generally acknowledged that application derives the numerical value of tool analysis out more Add reliable and with practical value.
B. wrong data is handled.It first has to that the tuple with wrong data can be told, then determines it is to change data also It is to ignore tuple.Usually when defining data dictionary, there is a basic regulation to data.It is in the real world on this Things has the constraint of its own, the entity of data direct bearing in database.For example, student examination score is between 0~100 A real number (other representations converted should also meet this requirement).Here it is under " student performance " attribute One constraint, if there is the value under the attribute of which tuple jumps out this range, then this is a wrong data.Certainly Not all constraint is all so simple, but can always find out a function as constraint function.This function is possible to It is that attribute itself is relevant, it is also possible to be that more attributes are relevant.
C. noise data is handled.Noise data, including isolated point.For variable measurement always there are deviation, this A little deviations are exactly noise, are exactly isolated point if deviation is larger.The technology of usual treatment deviation is known as smoothing technique.Specifically have Following method: data are averagely divided into several casees by 1. branch mailbox (Binning), are carried out to the numerical value in each chest Conversion, can be converted to the average value of all numerical value, intermediate value or boundary value in case.After conversion, the variation range of numerical value is with regard to phase It should reduce.In fact, this is a kind of mode of Data Discretization.2. clustering (Clustering), cluster eliminates noise, together When can be found that isolated point, clustering has corresponding special technology, does not repeat here.3. returning (Regression), linearly It returns and multilinear regression analysis can be applied in the elimination of noise.
(2) non-(partly) structural data is converted into structural data
Unstructured data refers to a large amount of plain text content in network data, and knowledge coverage is most wide, but extracts hardly possible Degree is also maximum, it usually needs is handled using natural language processing technique.Unstructured data is only completed to structuring number According to conversion, could be completed from relevant database to the conversion of chart database mode and construct map.
In the related technology, model and the general stand-alone development of application project, by providing model encapsulation at RESTFUL API Service.But there are problems that two in big data environment, first is that time-consuming for single machine processing in big data quantity;Second is that different Step task execution time-consuming can be more than that HTTP maximum connects duration, it is difficult to monitoring data processing status and abnormal feelings in the process Condition.
However, the embodiment of the present invention is model calling and calculation based on Spark, there is following two mode, first is that It is developed based on Spark MLlib, Spark MLlib is the included machine learning algorithm library Spark, contains a large amount of point Class, recurrence, cluster, dimensionality reduction scheduling algorithm.Such as classified using random forest, the enforcement engine of system is according to the node of process Information instantiates the RandomForestClassifier object with relevant parameter, call fit method to the data of input into Row fitting, generates corresponding Model object, is then saved Model sequence by intermediate data management module, confession is subsequent Prediction or checking assembly use.By this method, it can guarantee the quality of each learning algorithm, and can be with the community Spark It is synchronous, quickly add new algorithm assembly.Second is that carrying out model development based on other language such as Python, R, it is with Python Example can submit task by pyspark, even single machine executes, but because Spark loads data the reason of the memory It can be more many fastly than traditional executive mode.Executing time-consuming as asynchronous task can be more than the problem of HTTP maximum connects duration, can be with Koa frame by application project frame Middleware implementation, such as under Node environment just can solve asynchronous task processing And abnormal monitoring problem.
(3) data storage subsystem: data storage is mainly responsible for storage acquisition, calculates, in building and renewal process Source data and result data.In the different phase of knowledge mapping building, data memory format also different from, the figure ultimately generated Spectrum is stored in Neo4j and HBase.
Data storage is mainly responsible for storage acquisition, calculating, building and source data and result data in renewal process.It lifts Example for, as shown in table 1, according to data property we data can be divided into four grades: the first order is that crawler acquired The initial data come, is mainly stored in HDFS file system;Second rank is after data processing and knowledge fusion Triple data, including " entity-relationship-entity ", " entity-attribute-attribute value ", " relationship-attribute-attribute value " etc., these Data are stored in HBase;The third level is the enterprise's spectrum data built, is stored in Neo4j chart database and HBase data Inside library;The fourth stage is map more new data, including updating type and content triple, is stored in HBase database.Entity Attribute can be divided into static attribute and dynamic attribute.Wherein static attribute is primarily referred to as seldom changing, and important, differentiation Spend high attribute, such as enterprise name, organization mechanism code, stock code etc..Dynamic attribute refer to often change and be not must The attribute, such as change record, bidding record, intellectual property, recruitment information etc. of palpus.Static attribute is attached directly to entity simultaneously It is stored in Neo4j database;Dynamic attribute is mainly stored in HBase database and is quoted by the unique identification of entity.
Table 1
(4) data update subsystem: the incremental update in order to realize knowledge mapping, need automatic by scheduling timing Data acquisition, processing and storage work.
Specifically, the incremental update in order to realize knowledge mapping, need to start by scheduling system crawler operation and Thesaurus crawls, pre-process to data based on Hive after data loading and call with model, and will more new data It is incorporated to map.One reliable scheduling system is vital for the sound and stable operation of whole system.For safeguards system work The performance of industry allows people preferably efficiently to complete the job task of plan, it is ensured that the task of plan is accurately and timely It is executed, the finger daemon cron under Unix&Linux undoubtedly becomes optimal selection.The effect of the finger daemon is exactly fixed When execution crontab file in user's the specifying of the task.
It should be noted that during crawling webpage url be it is metastable, web page contents are subject to variation, and climb The rapid identification that worm can realize whether web page contents change by comparing the MD5 digest value of content is grabbed twice.
The domain knowledge map construction method based on big data driving proposed according to embodiments of the present invention, emphasizes knowledge graph Each link in spectrum building link provides actual techniques guidance for the building of domain knowledge map, to construct accuracy High, data pattern enriches the domain knowledge map that can strictly assist complicated analysis and decision support, and building process has Guiding value and have industry meaning, for actual production life have prior meaning.
The domain knowledge map structure based on big data driving proposed according to embodiments of the present invention referring next to attached drawing description Build system.
Fig. 7 is the domain knowledge map construction system structure signal based on big data driving of one embodiment of the invention Figure.
As shown in fig. 7, should include: acquisition module 100, place based on the domain knowledge map construction system 10 that big data drives Manage module 200, memory module 300, building module 400 and update module 500.
Wherein, acquisition module 100 is used to crawl the data source in network, and obtains the first data information.Processing module 200 For carrying out data information extraction to data source, to extract the related information between entity.Memory module 300 be used for entity it Between related information carry out knowledge fusion, and opening relationships type database.Building module 400 is for converting relevant database At chart database model, to construct knowledge mapping.Update module 500 is used for after preset duration, crawls data source acquisition again Second data information judges whether the first data source changes according to the second data source, will change data conversion if data change At chart database model, to be incorporated in knowledge mapping.
It should be noted that aforementioned explaining to the domain knowledge map construction embodiment of the method driven based on big data Bright to be also applied for the system, details are not described herein again.
The domain knowledge map construction system based on big data driving proposed according to embodiments of the present invention, emphasizes knowledge graph Each link in spectrum building link provides actual techniques guidance for the building of domain knowledge map, to construct accuracy High, data pattern enriches the domain knowledge map that can strictly assist complicated analysis and decision support, and building process has Guiding value and have industry meaning, for actual production life have prior meaning.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
In the present invention unless specifically defined or limited otherwise, term " installation ", " connected ", " connection ", " fixation " etc. Term shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or integral;It can be mechanical connect It connects, is also possible to be electrically connected;It can be directly connected, can also can be in two elements indirectly connected through an intermediary The interaction relationship of the connection in portion or two elements, unless otherwise restricted clearly.For those of ordinary skill in the art For, the specific meanings of the above terms in the present invention can be understood according to specific conditions.
In the present invention unless specifically defined or limited otherwise, fisrt feature in the second feature " on " or " down " can be with It is that the first and second features directly contact or the first and second features pass through intermediary mediate contact.Moreover, fisrt feature exists Second feature " on ", " top " and " above " but fisrt feature be directly above or diagonally above the second feature, or be merely representative of First feature horizontal height is higher than second feature.Fisrt feature can be under the second feature " below ", " below " and " below " One feature is directly under or diagonally below the second feature, or is merely representative of first feature horizontal height less than second feature.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims (10)

1. a kind of domain knowledge map construction method based on big data driving, which comprises the following steps:
The data source in network is crawled, and obtains the first data information;
Data information extraction is carried out to the data source, to extract the related information between entity;
Knowledge fusion, and opening relationships type database are carried out to the related information between the entity;And
The relevant database is converted to chart database model, to construct knowledge mapping.
2. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that described Data source includes structural data, semi-structured data and unstructured data.
3. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that described Information extraction is carried out to the data source, comprising:
Entity, relationship and entity attribute structured message are extracted from semi-structured and non-structural data to the data source, To obtain the related information.
4. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that described Knowledge fusion is carried out to the related information between the entity, comprising:
Information characteristics are extracted according to the related information between the entity, to eliminate concept ambiguity, strip redundancy and erroneous picture;
Entity link is carried out to the information characteristics, to obtain relational data.
5. the domain knowledge map construction method according to claim 4 based on big data driving, which is characterized in that described Entity link is carried out to the information characteristics, comprising:
The information characteristics are linked to corresponding correct entity object in knowledge base.
6. the domain knowledge map construction method according to claim 5 based on big data driving, which is characterized in that described Knowledge fusion, and opening relationships type database are carried out to the related information between the entity, further includes:
It extracts entity and censures item;
Detection entity of the same name, which is censured, according to the entity indicates whether different meanings and with the presence or absence of the expression of other names entity Identical meanings, to carry out entity disambiguation and coreference resolution;
Confirm in the knowledge base after corresponding entity object, entity denotion item is linked to the entity object.
7. the domain knowledge map construction method according to claim 1 based on big data driving, which is characterized in that also wrap It includes:
After preset duration, the data source is crawled, and obtain the second data information;
Judge whether first data information changes according to second data information;
If first data information changes, change data are obtained, and the change data are converted to the figure Database model, to be incorporated to the knowledge mapping.
8. a kind of domain knowledge map construction system based on big data driving characterized by comprising
Acquisition module for crawling the data source in network, and obtains the first data information;
Processing module, for carrying out data information extraction to the data source, to extract the related information between entity;
Memory module, for carrying out knowledge fusion, and opening relationships type database to the related information between the entity;And
Module is constructed, for the relevant database to be converted to chart database model, to construct knowledge mapping.
9. the domain knowledge map construction system according to claim 8 based on big data driving, which is characterized in that described Data source includes structural data, semi-structured data and unstructured data.
10. the domain knowledge map construction system according to claim 8 based on big data driving, which is characterized in that also Include:
Update module obtains the second data information for after preset duration, crawling the data source again, according to described second Data source judges whether first data source changes, if data change, change data are converted to the chart database mould Type, to be incorporated in the knowledge mapping.
CN201811447248.7A 2018-11-29 2018-11-29 Domain knowledge map construction method and system based on big data driving Pending CN109597855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811447248.7A CN109597855A (en) 2018-11-29 2018-11-29 Domain knowledge map construction method and system based on big data driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811447248.7A CN109597855A (en) 2018-11-29 2018-11-29 Domain knowledge map construction method and system based on big data driving

Publications (1)

Publication Number Publication Date
CN109597855A true CN109597855A (en) 2019-04-09

Family

ID=65959274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811447248.7A Pending CN109597855A (en) 2018-11-29 2018-11-29 Domain knowledge map construction method and system based on big data driving

Country Status (1)

Country Link
CN (1) CN109597855A (en)

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147579A (en) * 2019-04-19 2019-08-20 浙江省建工集团有限责任公司 A kind of Building Information Model dynamic cooperation management method
CN110188207A (en) * 2019-05-15 2019-08-30 出门问问信息科技有限公司 Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment
CN110187678A (en) * 2019-04-19 2019-08-30 广东省智能制造研究所 A kind of storage of manufacturing industry process equipment information and digitlization application system
CN110197280A (en) * 2019-05-20 2019-09-03 中国银行股份有限公司 A kind of knowledge mapping construction method, apparatus and system
CN110222199A (en) * 2019-06-20 2019-09-10 青岛大学 A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles
CN110232130A (en) * 2019-05-20 2019-09-13 平安科技(深圳)有限公司 Metadata management pedigree generation method, device, computer equipment and storage medium
CN110377704A (en) * 2019-07-22 2019-10-25 北京百度网讯科技有限公司 Detection method, device and the computer equipment of data consistency
CN110457403A (en) * 2019-08-12 2019-11-15 南京星火技术有限公司 The construction method of figure network decision system, method and knowledge mapping
CN110472107A (en) * 2019-08-22 2019-11-19 腾讯科技(深圳)有限公司 Multi-modal knowledge mapping construction method, device, server and storage medium
CN110543571A (en) * 2019-08-07 2019-12-06 北京市天元网络技术股份有限公司 knowledge graph construction method and device for water conservancy informatization
CN110704630A (en) * 2019-04-15 2020-01-17 中国石油大学(华东) Self-optimization mechanism for identified associated graph
CN110727741A (en) * 2019-09-29 2020-01-24 全球能源互联网研究院有限公司 Knowledge graph construction method and system of power system
CN110750651A (en) * 2019-10-16 2020-02-04 同方知网(北京)技术有限公司 Knowledge graph construction method and generation device based on scientific and technological achievements
CN110750650A (en) * 2019-09-30 2020-02-04 中盈优创资讯科技有限公司 Construction method and device of enterprise knowledge graph
CN110929134A (en) * 2019-12-04 2020-03-27 深圳市新国都金服技术有限公司 Investment and financing data management method and device, computer equipment and storage medium
CN110941612A (en) * 2019-11-19 2020-03-31 上海交通大学 Autonomous data lake construction system and method based on associated data
CN110968650A (en) * 2019-10-30 2020-04-07 清华大学 Medical field knowledge graph construction method based on doctor assistance
CN110990585A (en) * 2019-11-29 2020-04-10 上海勘察设计研究院(集团)有限公司 Multi-source data and time sequence processing method and device for constructing industry knowledge graph
CN111078949A (en) * 2019-12-31 2020-04-28 北京明略软件系统有限公司 Product knowledge storage method and device, computer equipment and readable storage medium
CN111090683A (en) * 2019-11-29 2020-05-01 上海勘察设计研究院(集团)有限公司 Engineering field knowledge graph construction method and generation device thereof
CN111125265A (en) * 2019-12-13 2020-05-08 四川蜀天梦图数据科技有限公司 Method and device for generating mapping data based on relational database data
CN111143576A (en) * 2019-12-18 2020-05-12 中科院计算技术研究所大数据研究院 Event-oriented dynamic knowledge graph construction method and device
CN111341456A (en) * 2020-02-21 2020-06-26 中南大学湘雅医院 Method and device for generating diabetic foot knowledge map and readable storage medium
CN111431962A (en) * 2020-02-20 2020-07-17 北京邮电大学 Cross-domain resource access Internet of things service discovery method based on context awareness calculation
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field
CN111475503A (en) * 2019-12-27 2020-07-31 北京国双科技有限公司 Virtual knowledge graph construction method and device
CN111552820A (en) * 2020-04-30 2020-08-18 江河瑞通(北京)技术有限公司 Water engineering scheduling data processing method and device
CN111625607A (en) * 2019-12-27 2020-09-04 北京国双科技有限公司 Oil-gas knowledge graph construction method and device, electronic equipment and storage medium
CN111708895A (en) * 2020-05-28 2020-09-25 北京赛博云睿智能科技有限公司 Method and device for constructing knowledge graph system
CN111709527A (en) * 2020-06-15 2020-09-25 北京优特捷信息技术有限公司 Operation and maintenance knowledge map library establishing method, device, equipment and storage medium
CN111708893A (en) * 2020-05-15 2020-09-25 北京邮电大学 Scientific and technological resource integration method and system based on knowledge graph
CN111737488A (en) * 2020-06-12 2020-10-02 南京中孚信息技术有限公司 Information tracing method and device based on domain entity extraction and correlation analysis
CN111861250A (en) * 2020-07-29 2020-10-30 广东电网有限责任公司电力调度控制中心 Scheduling decision generation method and device, electronic equipment and storage medium
CN111858962A (en) * 2020-07-27 2020-10-30 腾讯科技(成都)有限公司 Data processing method, device and computer readable storage medium
CN111897969A (en) * 2020-07-27 2020-11-06 武汉大学 Method and system for analyzing correlation between food components and nutritional health based on knowledge graph
CN112231285A (en) * 2020-10-20 2021-01-15 北京恒华龙信数据科技有限公司 Knowledge graph generation method and device based on data resources
CN112417456A (en) * 2020-11-16 2021-02-26 中国电子科技集团公司第三十研究所 Structured sensitive data reduction detection method based on big data
CN112463984A (en) * 2020-12-04 2021-03-09 北京明略软件系统有限公司 Database mode expansion method, device, equipment and computer readable medium
CN112527997A (en) * 2020-12-18 2021-03-19 中国南方电网有限责任公司 Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph
CN112580831A (en) * 2020-11-19 2021-03-30 国网江苏省电力有限公司信息通信分公司 Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph
CN112580912A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Budget auditing method and device, electronic equipment and storage medium
CN112699245A (en) * 2019-10-18 2021-04-23 北京国双科技有限公司 Construction method and device and application method and device of budget management knowledge graph
CN112818131A (en) * 2021-02-01 2021-05-18 亚信科技(成都)有限公司 Method, system and storage medium for constructing graph of threat information
CN112860714A (en) * 2019-11-12 2021-05-28 斑马智行网络(香港)有限公司 Knowledge base, database, information updating method and device
CN112883201A (en) * 2021-03-23 2021-06-01 西安电子科技大学昆山创新研究院 Knowledge graph construction method based on big data of smart community
CN113065003A (en) * 2021-04-22 2021-07-02 国际关系学院 Knowledge graph generation method based on multiple indexes
CN113094515A (en) * 2021-04-13 2021-07-09 国网北京市电力公司 Knowledge graph entity and link extraction method based on electric power marketing data
CN113268602A (en) * 2021-03-29 2021-08-17 江西融思科技有限公司 Tissue knowledge graph construction method and device
CN113434658A (en) * 2021-08-25 2021-09-24 西安热工研究院有限公司 Thermal power generating unit operation question-answer generation method, system, equipment and readable storage medium
CN113449066A (en) * 2021-08-31 2021-09-28 北京泽云瑞弘信息技术有限公司 Method, processor and storage medium for storing cultural relic data by using knowledge graph
CN113569060A (en) * 2021-09-24 2021-10-29 中国电子技术标准化研究院 Standard text based knowledge graph disambiguation method, system, device and medium
CN113742498A (en) * 2021-09-24 2021-12-03 国务院国有资产监督管理委员会研究中心 Method for constructing and updating knowledge graph
CN113987146A (en) * 2021-10-22 2022-01-28 国网江苏省电力有限公司镇江供电分公司 Dedicated novel intelligence of electric power intranet system of asking for answering
CN114090790A (en) * 2021-11-22 2022-02-25 西安交通大学 Human-computer-friendly data logic fusion power knowledge graph and construction method thereof
CN114443783A (en) * 2022-04-11 2022-05-06 浙江大学 Supply chain data analysis and enhancement processing method and device
CN115269745A (en) * 2022-07-27 2022-11-01 国网江苏省电力有限公司电力科学研究院 Relational data-to-graph data mapping method, device and storage medium
US11520828B2 (en) 2020-07-24 2022-12-06 International Business Machines Corporation Methods for representing and storing data in a graph data structure using artificial intelligence
CN116340414A (en) * 2023-05-31 2023-06-27 北京华云安信息技术有限公司 Knowledge graph-based attack surface visual modeling method and device
US11734626B2 (en) * 2020-07-06 2023-08-22 International Business Machines Corporation Cognitive analysis of a project description
US11899681B2 (en) * 2019-09-27 2024-02-13 Boe Technology Group Co., Ltd. Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
US11922121B2 (en) 2020-01-21 2024-03-05 Boe Technology Group Co., Ltd. Method and apparatus for information extraction, electronic device, and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956052A (en) * 2016-04-27 2016-09-21 青岛海尔软件有限公司 Building method of knowledge map based on vertical field
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN106777331A (en) * 2017-01-11 2017-05-31 北京航空航天大学 Knowledge mapping generation method and device
CN106897273A (en) * 2017-04-12 2017-06-27 福州大学 A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates
CN107491555A (en) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 Knowledge mapping construction method and system
CN107766483A (en) * 2017-10-13 2018-03-06 华中科技大学 The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN107783973A (en) * 2016-08-24 2018-03-09 慧科讯业有限公司 The methods, devices and systems being monitored based on domain knowledge spectrum data storehouse to the Internet media event
CN108345647A (en) * 2018-01-18 2018-07-31 北京邮电大学 Domain knowledge map construction system and method based on Web
CN108460136A (en) * 2018-03-08 2018-08-28 国网福建省电力有限公司 Electric power O&M information knowledge map construction method
CN108509420A (en) * 2018-03-29 2018-09-07 赵维平 Gu spectrum and ancient culture knowledge mapping natural language processing method
CN108846000A (en) * 2018-04-11 2018-11-20 中国科学院软件研究所 A kind of common sense semanteme map construction method and device based on supernode and the common sense complementing method based on connection prediction

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956052A (en) * 2016-04-27 2016-09-21 青岛海尔软件有限公司 Building method of knowledge map based on vertical field
CN107783973A (en) * 2016-08-24 2018-03-09 慧科讯业有限公司 The methods, devices and systems being monitored based on domain knowledge spectrum data storehouse to the Internet media event
CN106776711A (en) * 2016-11-14 2017-05-31 浙江大学 A kind of Chinese medical knowledge mapping construction method based on deep learning
CN106777331A (en) * 2017-01-11 2017-05-31 北京航空航天大学 Knowledge mapping generation method and device
CN106897273A (en) * 2017-04-12 2017-06-27 福州大学 A kind of network security dynamic early-warning method of knowledge based collection of illustrative plates
CN107491555A (en) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 Knowledge mapping construction method and system
CN107766483A (en) * 2017-10-13 2018-03-06 华中科技大学 The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN108345647A (en) * 2018-01-18 2018-07-31 北京邮电大学 Domain knowledge map construction system and method based on Web
CN108460136A (en) * 2018-03-08 2018-08-28 国网福建省电力有限公司 Electric power O&M information knowledge map construction method
CN108509420A (en) * 2018-03-29 2018-09-07 赵维平 Gu spectrum and ancient culture knowledge mapping natural language processing method
CN108846000A (en) * 2018-04-11 2018-11-20 中国科学院软件研究所 A kind of common sense semanteme map construction method and device based on supernode and the common sense complementing method based on connection prediction

Cited By (87)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704630A (en) * 2019-04-15 2020-01-17 中国石油大学(华东) Self-optimization mechanism for identified associated graph
CN110187678A (en) * 2019-04-19 2019-08-30 广东省智能制造研究所 A kind of storage of manufacturing industry process equipment information and digitlization application system
CN110147579A (en) * 2019-04-19 2019-08-20 浙江省建工集团有限责任公司 A kind of Building Information Model dynamic cooperation management method
CN110188207A (en) * 2019-05-15 2019-08-30 出门问问信息科技有限公司 Knowledge mapping construction method and device, readable storage medium storing program for executing, electronic equipment
CN110188207B (en) * 2019-05-15 2021-06-04 出门问问创新科技有限公司 Knowledge graph construction method and device, readable storage medium and electronic equipment
CN110197280A (en) * 2019-05-20 2019-09-03 中国银行股份有限公司 A kind of knowledge mapping construction method, apparatus and system
CN110197280B (en) * 2019-05-20 2021-08-06 中国银行股份有限公司 Knowledge graph construction method, device and system
CN110232130A (en) * 2019-05-20 2019-09-13 平安科技(深圳)有限公司 Metadata management pedigree generation method, device, computer equipment and storage medium
CN110232130B (en) * 2019-05-20 2024-02-02 平安科技(深圳)有限公司 Metadata management pedigree generation method, apparatus, computer device and storage medium
CN110222199A (en) * 2019-06-20 2019-09-10 青岛大学 A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles
CN110377704A (en) * 2019-07-22 2019-10-25 北京百度网讯科技有限公司 Detection method, device and the computer equipment of data consistency
CN110377704B (en) * 2019-07-22 2022-04-22 北京百度网讯科技有限公司 Data consistency detection method and device and computer equipment
CN110543571A (en) * 2019-08-07 2019-12-06 北京市天元网络技术股份有限公司 knowledge graph construction method and device for water conservancy informatization
CN110457403A (en) * 2019-08-12 2019-11-15 南京星火技术有限公司 The construction method of figure network decision system, method and knowledge mapping
CN110457403B (en) * 2019-08-12 2022-04-22 南京星火技术有限公司 Graph network decision system and method and knowledge graph construction method
CN110472107B (en) * 2019-08-22 2024-01-30 腾讯科技(深圳)有限公司 Multi-mode knowledge graph construction method, device, server and storage medium
CN110472107A (en) * 2019-08-22 2019-11-19 腾讯科技(深圳)有限公司 Multi-modal knowledge mapping construction method, device, server and storage medium
US11899681B2 (en) * 2019-09-27 2024-02-13 Boe Technology Group Co., Ltd. Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN110727741A (en) * 2019-09-29 2020-01-24 全球能源互联网研究院有限公司 Knowledge graph construction method and system of power system
CN112580912A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Budget auditing method and device, electronic equipment and storage medium
CN110750650A (en) * 2019-09-30 2020-02-04 中盈优创资讯科技有限公司 Construction method and device of enterprise knowledge graph
CN110750651B (en) * 2019-10-16 2023-05-26 同方知网数字出版技术股份有限公司 Knowledge graph construction method based on scientific and technological achievements
CN110750651A (en) * 2019-10-16 2020-02-04 同方知网(北京)技术有限公司 Knowledge graph construction method and generation device based on scientific and technological achievements
CN112699245A (en) * 2019-10-18 2021-04-23 北京国双科技有限公司 Construction method and device and application method and device of budget management knowledge graph
CN110968650A (en) * 2019-10-30 2020-04-07 清华大学 Medical field knowledge graph construction method based on doctor assistance
CN112860714A (en) * 2019-11-12 2021-05-28 斑马智行网络(香港)有限公司 Knowledge base, database, information updating method and device
CN110941612B (en) * 2019-11-19 2020-08-11 上海交通大学 Autonomous data lake construction system and method based on associated data
CN110941612A (en) * 2019-11-19 2020-03-31 上海交通大学 Autonomous data lake construction system and method based on associated data
CN111090683B (en) * 2019-11-29 2023-12-22 上海勘察设计研究院(集团)股份有限公司 Knowledge graph construction method and generation device thereof in engineering field
CN110990585B (en) * 2019-11-29 2024-01-30 上海勘察设计研究院(集团)股份有限公司 Multi-source data and time sequence processing method and device for building industry knowledge graph
CN111090683A (en) * 2019-11-29 2020-05-01 上海勘察设计研究院(集团)有限公司 Engineering field knowledge graph construction method and generation device thereof
CN110990585A (en) * 2019-11-29 2020-04-10 上海勘察设计研究院(集团)有限公司 Multi-source data and time sequence processing method and device for constructing industry knowledge graph
CN110929134A (en) * 2019-12-04 2020-03-27 深圳市新国都金服技术有限公司 Investment and financing data management method and device, computer equipment and storage medium
CN111125265A (en) * 2019-12-13 2020-05-08 四川蜀天梦图数据科技有限公司 Method and device for generating mapping data based on relational database data
CN111143576A (en) * 2019-12-18 2020-05-12 中科院计算技术研究所大数据研究院 Event-oriented dynamic knowledge graph construction method and device
CN111625607A (en) * 2019-12-27 2020-09-04 北京国双科技有限公司 Oil-gas knowledge graph construction method and device, electronic equipment and storage medium
CN111475503A (en) * 2019-12-27 2020-07-31 北京国双科技有限公司 Virtual knowledge graph construction method and device
CN111078949A (en) * 2019-12-31 2020-04-28 北京明略软件系统有限公司 Product knowledge storage method and device, computer equipment and readable storage medium
US11922121B2 (en) 2020-01-21 2024-03-05 Boe Technology Group Co., Ltd. Method and apparatus for information extraction, electronic device, and storage medium
CN111431962A (en) * 2020-02-20 2020-07-17 北京邮电大学 Cross-domain resource access Internet of things service discovery method based on context awareness calculation
CN111341456A (en) * 2020-02-21 2020-06-26 中南大学湘雅医院 Method and device for generating diabetic foot knowledge map and readable storage medium
CN111341456B (en) * 2020-02-21 2024-02-23 中南大学湘雅医院 Method and device for generating diabetic foot knowledge graph and readable storage medium
CN111444351B (en) * 2020-03-24 2023-09-12 清华苏州环境创新研究院 Knowledge graph construction method and device in industrial process field
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field
CN111552820A (en) * 2020-04-30 2020-08-18 江河瑞通(北京)技术有限公司 Water engineering scheduling data processing method and device
CN111708893A (en) * 2020-05-15 2020-09-25 北京邮电大学 Scientific and technological resource integration method and system based on knowledge graph
CN111708895B (en) * 2020-05-28 2023-06-20 北京赛博云睿智能科技有限公司 Knowledge graph system construction method and device
CN111708895A (en) * 2020-05-28 2020-09-25 北京赛博云睿智能科技有限公司 Method and device for constructing knowledge graph system
CN111737488A (en) * 2020-06-12 2020-10-02 南京中孚信息技术有限公司 Information tracing method and device based on domain entity extraction and correlation analysis
CN111737488B (en) * 2020-06-12 2021-02-02 南京中孚信息技术有限公司 Information tracing method and device based on domain entity extraction and correlation analysis
CN111709527A (en) * 2020-06-15 2020-09-25 北京优特捷信息技术有限公司 Operation and maintenance knowledge map library establishing method, device, equipment and storage medium
US11734626B2 (en) * 2020-07-06 2023-08-22 International Business Machines Corporation Cognitive analysis of a project description
US11520828B2 (en) 2020-07-24 2022-12-06 International Business Machines Corporation Methods for representing and storing data in a graph data structure using artificial intelligence
CN111897969A (en) * 2020-07-27 2020-11-06 武汉大学 Method and system for analyzing correlation between food components and nutritional health based on knowledge graph
CN111858962A (en) * 2020-07-27 2020-10-30 腾讯科技(成都)有限公司 Data processing method, device and computer readable storage medium
CN111861250A (en) * 2020-07-29 2020-10-30 广东电网有限责任公司电力调度控制中心 Scheduling decision generation method and device, electronic equipment and storage medium
CN112231285A (en) * 2020-10-20 2021-01-15 北京恒华龙信数据科技有限公司 Knowledge graph generation method and device based on data resources
CN112417456A (en) * 2020-11-16 2021-02-26 中国电子科技集团公司第三十研究所 Structured sensitive data reduction detection method based on big data
CN112580831B (en) * 2020-11-19 2024-03-29 国网江苏省电力有限公司信息通信分公司 Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph
CN112580831A (en) * 2020-11-19 2021-03-30 国网江苏省电力有限公司信息通信分公司 Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph
CN112463984A (en) * 2020-12-04 2021-03-09 北京明略软件系统有限公司 Database mode expansion method, device, equipment and computer readable medium
CN112463984B (en) * 2020-12-04 2024-02-27 北京明略软件系统有限公司 Database schema extension method, device, equipment and computer readable medium
CN112527997A (en) * 2020-12-18 2021-03-19 中国南方电网有限责任公司 Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph
CN112527997B (en) * 2020-12-18 2024-01-23 中国南方电网有限责任公司 Intelligent question-answering method and system based on power grid field scheduling scene knowledge graph
CN112818131B (en) * 2021-02-01 2023-10-03 亚信科技(成都)有限公司 Map construction method, system and storage medium for threat information
CN112818131A (en) * 2021-02-01 2021-05-18 亚信科技(成都)有限公司 Method, system and storage medium for constructing graph of threat information
CN112883201B (en) * 2021-03-23 2023-11-21 西安电子科技大学昆山创新研究院 Knowledge graph construction method based on big data of intelligent community
CN112883201A (en) * 2021-03-23 2021-06-01 西安电子科技大学昆山创新研究院 Knowledge graph construction method based on big data of smart community
CN113268602A (en) * 2021-03-29 2021-08-17 江西融思科技有限公司 Tissue knowledge graph construction method and device
CN113094515A (en) * 2021-04-13 2021-07-09 国网北京市电力公司 Knowledge graph entity and link extraction method based on electric power marketing data
CN113065003B (en) * 2021-04-22 2023-05-26 国际关系学院 Knowledge graph generation method based on multiple indexes
CN113065003A (en) * 2021-04-22 2021-07-02 国际关系学院 Knowledge graph generation method based on multiple indexes
CN113434658A (en) * 2021-08-25 2021-09-24 西安热工研究院有限公司 Thermal power generating unit operation question-answer generation method, system, equipment and readable storage medium
CN113449066B (en) * 2021-08-31 2021-12-07 北京泽云瑞弘信息技术有限公司 Method, processor and storage medium for storing cultural relic data by using knowledge graph
CN113449066A (en) * 2021-08-31 2021-09-28 北京泽云瑞弘信息技术有限公司 Method, processor and storage medium for storing cultural relic data by using knowledge graph
CN113742498B (en) * 2021-09-24 2024-04-09 国务院国有资产监督管理委员会研究中心 Knowledge graph construction and updating method
CN113569060A (en) * 2021-09-24 2021-10-29 中国电子技术标准化研究院 Standard text based knowledge graph disambiguation method, system, device and medium
CN113742498A (en) * 2021-09-24 2021-12-03 国务院国有资产监督管理委员会研究中心 Method for constructing and updating knowledge graph
CN113987146A (en) * 2021-10-22 2022-01-28 国网江苏省电力有限公司镇江供电分公司 Dedicated novel intelligence of electric power intranet system of asking for answering
CN113987146B (en) * 2021-10-22 2023-01-31 国网江苏省电力有限公司镇江供电分公司 Dedicated intelligent question-answering system of electric power intranet
CN114090790A (en) * 2021-11-22 2022-02-25 西安交通大学 Human-computer-friendly data logic fusion power knowledge graph and construction method thereof
CN114090790B (en) * 2021-11-22 2024-04-16 西安交通大学 Man-machine friendly data logic fusion power knowledge graph and construction method thereof
CN114443783A (en) * 2022-04-11 2022-05-06 浙江大学 Supply chain data analysis and enhancement processing method and device
CN114443783B (en) * 2022-04-11 2022-06-24 浙江大学 Supply chain data analysis and enhancement processing method and device
CN115269745A (en) * 2022-07-27 2022-11-01 国网江苏省电力有限公司电力科学研究院 Relational data-to-graph data mapping method, device and storage medium
CN115269745B (en) * 2022-07-27 2023-11-14 国网江苏省电力有限公司电力科学研究院 Method, equipment and storage medium for mapping relational data to graph data
CN116340414A (en) * 2023-05-31 2023-06-27 北京华云安信息技术有限公司 Knowledge graph-based attack surface visual modeling method and device

Similar Documents

Publication Publication Date Title
CN109597855A (en) Domain knowledge map construction method and system based on big data driving
CN112199511B (en) Cross-language multi-source vertical domain knowledge graph construction method
WO2021196520A1 (en) Tax field-oriented knowledge map construction method and system
CN108573411B (en) Mixed recommendation method based on deep emotion analysis and multi-source recommendation view fusion of user comments
CN112612902A (en) Knowledge graph construction method and device for power grid main device
CN116628172A (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN110990590A (en) Dynamic financial knowledge map construction method based on reinforcement learning and transfer learning
CN113806563B (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN111967761B (en) Knowledge graph-based monitoring and early warning method and device and electronic equipment
Rajbhandari et al. The AGROVOC concept scheme–a walkthrough
CN110888943A (en) Method and system for auxiliary generation of court referee document based on micro-template
CN111930774A (en) Automatic construction method and system for power knowledge graph ontology
US9594755B2 (en) Electronic document repository system
CN113487211A (en) Nuclear power equipment quality tracing method and system, computer equipment and medium
Li et al. Neural factoid geospatial question answering
Antopol’skii et al. The development of a semantic network of keywords based on definitive relationships
CN117473054A (en) Knowledge graph-based general intelligent question-answering method and device
Mountantonakis Services for Connecting and Integrating Big Numbers of Linked Datasets
Yin et al. A deep natural language processing‐based method for ontology learning of project‐specific properties from building information models
Maynard et al. Change management for metadata evolution
Behkamal et al. Publishing Persian linked data; challenges and lessons learned
CN115759253A (en) Power grid operation and maintenance knowledge map construction method and system
Ouaret et al. AuMixDw: Towards an automated hybrid approach for building XML data warehouses
Ivanov et al. Automatic generation of a large dictionary with concreteness/abstractness ratings based on a small human dictionary
CN115270776A (en) Method, system, device and medium for automatically acquiring concepts in domain knowledge base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190409

RJ01 Rejection of invention patent application after publication