CN110377757A - A kind of real time knowledge map construction system - Google Patents

A kind of real time knowledge map construction system Download PDF

Info

Publication number
CN110377757A
CN110377757A CN201910642692.2A CN201910642692A CN110377757A CN 110377757 A CN110377757 A CN 110377757A CN 201910642692 A CN201910642692 A CN 201910642692A CN 110377757 A CN110377757 A CN 110377757A
Authority
CN
China
Prior art keywords
data
knowledge
consumer
entity
kafka
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910642692.2A
Other languages
Chinese (zh)
Other versions
CN110377757B (en
Inventor
杨仪军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sea - Induced Star Map Technology Co Ltd
Original Assignee
Beijing Sea - Induced Star Map Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sea - Induced Star Map Technology Co Ltd filed Critical Beijing Sea - Induced Star Map Technology Co Ltd
Priority to CN201910642692.2A priority Critical patent/CN110377757B/en
Publication of CN110377757A publication Critical patent/CN110377757A/en
Application granted granted Critical
Publication of CN110377757B publication Critical patent/CN110377757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of real time knowledge map construction systems, are related to map analysis platform technology field.The real time knowledge map construction system, including application program part, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source module;Wherein the specific works of each part are as follows: A1, data source modules are responsible for data and acquire and access KAFKA message queue;B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with SparkStreaming carries out entity extraction, relationship is extracted.The real time knowledge map construction system, pass through being used cooperatively for the modules such as application program part, knowledge store module, data source bus and data source module, using the basic data in Spark Streaming consumption KAFKA carries out entity extraction, relationship is extracted, miss data is recorded in external storage using HBase, while be can solve and be extracted loss of data or repeat and data source accesses single problem.

Description

A kind of real time knowledge map construction system
Technical field
The present invention relates to map analysis platform technology field, specially a kind of real time knowledge map construction system.
Background technique
Kafka is a kind of open source stream process platform, is a kind of distributed hair of high-throughput by Scala and written in Java Cloth subscribes to message system, it can handle everything flow data of the consumer in website, and (web page browsing is searched for this movement The action of rope and other users) be many social functions on modern network a key factor, these data are usually Due to handling capacity requirement and solved by processing log and log aggregation, for as Hadoop daily record data and from Line analysis system, but the limitation handled in real time is required, this is a feasible solution, and the purpose of Kafka is to pass through The loaded in parallel mechanism of Hadoop unifies Message Processing on line and offline, disappears in real time also for being provided by cluster Breath.
In the prior art, the application in actual current profile building field is seldom, and substantially T+1 (is constructed for one day Map relationship), therefore cannot achieve and disposably extract accurate knowledge extraction as a result, it is difficult to well adapt to data The higher business of accuracy requirement, while knowledge is extracted there are higher delay, be easy to appear knowledge extract loss of data or The case where repeating, the data source supported and considered is less, and the scalability of scheme is not high, makes troubles to the use of user.
Summary of the invention
(1) the technical issues of solving
In view of the deficiencies of the prior art, it the present invention provides a kind of real time knowledge map construction system, solves existing In technology, the application in actual current profile building field is seldom, substantially T+1 (the map relationship of building in one day), Therefore it cannot achieve and disposably extract accurate knowledge extraction as a result, it is difficult to well adapt to higher to data accuracy requirement Business, while knowledge is extracted there are higher delay, is easy to appear the case where knowledge is extracted loss of data or repeated, is supported It is less with the data source of consideration, the not high problem of the scalability of scheme.
(2) technical solution
In order to achieve the above object, the present invention is achieved by the following technical programs: a kind of real time knowledge map construction system System, including application program part, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source mould Block;
Wherein the specific works of each part are as follows:
A1, data source modules are responsible for data and acquire and access KAFKA message queue;
B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;
The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with Spark Streaming carries out entity and mentions It takes, relationship is extracted;
D1, knowledge bus are responsible for transmitting entity/relation data that Spark Streaming extracts completion;
It is real-time that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA Figure inquiry provides inquiry data;
F1, application program part are responsible for the figure inquiry of various real-time query scenes;
KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, described to know Know in bus and be provided with KAFKA cluster 11, the KAFKA cluster 1 is made of Server11-13, wherein setting in each Server Operating unit there are two setting;
The data source includes service server journal file, outside REST API request, external data storage;
It include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on such as Under:
A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check for subsequent Entity relationship details use;
B2, GDB consumer program read entity/relationship in message queue and GDB are written, and make for subsequent real-time figure inquiry With;
C2, ES consumer read the entity in message queue and Elasticsearch, entity when for subsequent real-time query are written Secondary index use;
The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES, The relationship that the API in the corresponding library such as GDB, HBase completes figure is searched and details inquiry.
Wherein concrete operations process is as follows:
S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to number According to source bus, wherein Flume obtains data by journal file;
S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will Corresponding data is sent in knowledge extraction module;
Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if Entity/relation data extracts failure and then records failure log to external storage by HBase;
Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase Consumer, GDB consumer and ES consumer;
Entity/relation data is processed into application-dependent data and is sent to the progress of knowledge store module by S5, corresponding consumer Storage, is then sent to application program part, concrete condition is as follows:
Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the exhibition of K layers of application program part It opens;
Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API Diameter shows shortest path by application program part;
Application-dependent data is sent to ES by C3, ES consumer, and shows complete trails by application program part.
Preferably, all external data sources are read by Flume/JAVA REST API/ data extraction tool/Spark program Write-in KAFKA message queue is taken, the place that message queue is written is provided with data access information monitoring, the text including accessing data Part title or the table name of relational database, turn-on time, access data volume, performance data amount and miss data amount, Middle miss data is recorded in external storage using HBase.
Preferably, entity is carried out using message in real-time streaming processing frame Spark Streaming consumption KAFKA to mention It takes, relationship extraction, extracts result and KAFKA is written, the place that message queue is written is provided with data access information monitoring, including connects Enter table name, turn-on time, the access data volume, performance data amount, miss data amount of data, wherein miss data uses HBase is recorded in external storage.
Preferably, the ES is the abbreviation of Elasticsearch.
(3) beneficial effect
The present invention provides a kind of real time knowledge map construction systems.Have following the utility model has the advantages that the real time knowledge map Building system is made by the cooperation of the modules such as application program part, knowledge store module, data source bus and data source module With using the basic data in Spark Streaming consumption KAFKA carries out entity extraction, relationship is extracted, while will unsuccessfully be counted It is recorded in external storage according to using HBase, greatly reduces delay, improve work efficiency, while can solve extraction Loss of data or repetition and data source access single problem, strengthen the scalability of scheme, the person's of being convenient to use makes With.
Detailed description of the invention
Fig. 1 is structure of the invention functional block diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides a kind of technical solution: a kind of real time knowledge map construction system, including apply journey Preamble section, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source module;
Wherein the specific works of each part are as follows:
A1, data source modules are responsible for data and acquire and access KAFKA message queue;
B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;
The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with Spark Streaming carries out entity and mentions It takes, relationship is extracted;
D1, knowledge bus are responsible for transmitting entity/relation data that Spark Streaming extracts completion;
It is real-time that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA Figure inquiry provides inquiry data;
F1, application program part are responsible for the figure inquiry of various real-time query scenes;
KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, described to know Know in bus and be provided with KAFKA cluster 11, the KAFKA cluster 1 is made of Server11-13, wherein setting in each Server Operating unit there are two setting;
The data source includes service server journal file, outside REST API request, external data storage;
It include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on such as Under:
A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check for subsequent Entity relationship details use;
B2, GDB consumer program read entity/relationship in message queue and GDB are written, and make for subsequent real-time figure inquiry With;
C2, ES consumer read the entity in message queue and Elasticsearch, entity when for subsequent real-time query are written Secondary index use;
The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES, The relationship that the API in the corresponding library such as GDB, HBase completes figure is searched and details inquiry (abbreviation that ES is Elasticsearch).
Wherein concrete operations process is as follows:
S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to number According to source bus, wherein Flume obtains data by journal file;
S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will Corresponding data is sent in knowledge extraction module;
Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if Entity/relation data extracts failure and then records failure log to external storage by HBase;
Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase Consumer, GDB consumer and ES consumer;
Entity/relation data is processed into application-dependent data and is sent to the progress of knowledge store module by S5, corresponding consumer Storage, is then sent to application program part, concrete condition is as follows:
Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the exhibition of K layers of application program part It opens;
Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API Diameter shows shortest path by application program part;
Application-dependent data is sent to ES by C3, ES consumer, and shows complete trails by application program part.
Remarks: all external data sources are read by Flume/JAVA REST API/ data extraction tool/Spark program KAFKA message queue is written, the place that message queue is written is provided with data access information monitoring, the file including accessing data Title or the table name of relational database, turn-on time, access data volume, performance data amount and miss data amount, wherein Miss data is recorded in external storage using HBase.
Using message carries out entity extraction, relationship mentions in real-time streaming processing frame Spark Streaming consumption KAFKA It takes, extracts result and KAFKA is written, the place that message queue is written is provided with data access information monitoring, including access data Table name, turn-on time, access data volume, performance data amount, miss data amount, wherein miss data is recorded in using HBase In external storage.
In conclusion the real time knowledge map construction system, passes through application program part, knowledge store module, data source The modules such as bus and data source modules are used cooperatively, and are carried out using the basic data in Spark Streaming consumption KAFKA Entity extraction, relationship are extracted, while miss data being recorded in external storage using HBase, are greatly reduced delay, are mentioned High working efficiency, while can solve and extract loss of data or repeat and data source accesses single problem, it strengthens The scalability of scheme, the use for the person of being convenient to use.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (4)

1. a kind of real time knowledge map construction system, it is characterised in that: including application program part, knowledge store module, knowledge Consumption module, knowledge bus, data source bus and data source module;
Wherein the specific works of each part are as follows:
A1, data source modules are responsible for data and acquire and access KAFKA message queue;
B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;
The basic data that C1, knowledge extraction module are responsible for being consumed with Spark Streaming in KAFKA carries out entity extraction, closes System extracts;
D1, knowledge bus are responsible for transmitting entity/relation data that Spark Streaming extracts completion;
It is that real-time figure is looked into that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA It askes and inquiry data is provided;
F1, application program part are responsible for the figure inquiry of various real-time query scenes;
KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, and the knowledge is total KAFKA cluster 11 is provided in line, the KAFKA cluster 1 is made of Server11-13, wherein being provided in each Server Two operating units;
The data source includes service server journal file, outside REST API request, external data storage;
Include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on as follows:
A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check entity for subsequent Relationship details use;
B2, GDB consumer program read entity/relationship in message queue and GDB are written, and use for subsequent real-time figure inquiry;
C2, ES consumer read the entity in message queue and Elasticsearch, the two of entity when for subsequent real-time query are written Grade index uses;
The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES, GDB, The relationship that the API in the corresponding library such as HBase completes figure is searched and details inquiry.
Wherein concrete operations process is as follows:
S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to data source Bus, wherein Flume obtains data by journal file;
S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will correspond to Data are sent in knowledge extraction module;
Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if real Body/relation data extracts failure and then records failure log to external storage by HBase;
Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase consumption Person, GDB consumer and ES consumer;
Entity/relation data is processed into application-dependent data and is sent to knowledge store module by S5, corresponding consumer to be stored, It is then sent to application program part, concrete condition is as follows:
Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the expansion of K layers of application program part;
Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API, is led to It crosses application program part and shows shortest path;
Application-dependent data is sent to ES by C3, ES consumer, and shows complete trails by application program part.
2. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: all external data sources are logical It crosses Flume/JAVA REST API/ data extraction tool/Spark program and reads write-in KAFKA message queue, message team is written The place of column is provided with data access information monitoring, the table name of file name or relational database including access data, Turn-on time, access data volume, performance data amount and miss data amount, wherein miss data is recorded in outside using HBase In storage.
3. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: handled using real-time streaming Frame Spark Streaming consumes message in KAFKA and carries out entity extraction, relationship extraction, extracts result and KAFKA is written, write The place for entering message queue is provided with data access information monitoring, table name, turn-on time, access data including accessing data Amount, performance data amount, miss data amount, wherein miss data is recorded in external storage using HBase.
4. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: the ES is The abbreviation of Elasticsearch.
CN201910642692.2A 2019-07-16 2019-07-16 Real-time knowledge graph construction system Active CN110377757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910642692.2A CN110377757B (en) 2019-07-16 2019-07-16 Real-time knowledge graph construction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910642692.2A CN110377757B (en) 2019-07-16 2019-07-16 Real-time knowledge graph construction system

Publications (2)

Publication Number Publication Date
CN110377757A true CN110377757A (en) 2019-10-25
CN110377757B CN110377757B (en) 2023-02-14

Family

ID=68253468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910642692.2A Active CN110377757B (en) 2019-07-16 2019-07-16 Real-time knowledge graph construction system

Country Status (1)

Country Link
CN (1) CN110377757B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104519A (en) * 2019-10-29 2020-05-05 北京海致星图科技有限公司 Method for constructing full-scale administrative region knowledge base
CN111639082A (en) * 2020-06-08 2020-09-08 成都信息工程大学 Object storage management method and system of billion-level node scale knowledge graph based on Ceph

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1508680A (en) * 2002-12-20 2004-06-30 中国科学院计算技术研究所 Method for rapid path analysis for distributed file system
US20170366480A1 (en) * 2016-06-21 2017-12-21 Oracle International Corporation Internet cloud-hosted natural language interactive messaging system sessionizer
CN107729413A (en) * 2017-09-25 2018-02-23 安徽畅通行交通信息服务有限公司 Regional traffic intelligent management system based on big data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1508680A (en) * 2002-12-20 2004-06-30 中国科学院计算技术研究所 Method for rapid path analysis for distributed file system
US20170366480A1 (en) * 2016-06-21 2017-12-21 Oracle International Corporation Internet cloud-hosted natural language interactive messaging system sessionizer
CN107729413A (en) * 2017-09-25 2018-02-23 安徽畅通行交通信息服务有限公司 Regional traffic intelligent management system based on big data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104519A (en) * 2019-10-29 2020-05-05 北京海致星图科技有限公司 Method for constructing full-scale administrative region knowledge base
CN111639082A (en) * 2020-06-08 2020-09-08 成都信息工程大学 Object storage management method and system of billion-level node scale knowledge graph based on Ceph
CN111639082B (en) * 2020-06-08 2022-12-23 成都信息工程大学 Object storage management method and system of billion-level node scale knowledge graph based on Ceph

Also Published As

Publication number Publication date
CN110377757B (en) 2023-02-14

Similar Documents

Publication Publication Date Title
CN105069703B (en) A kind of electrical network mass data management method
CN104933112B (en) Distributed interconnection Transaction Information storage processing method
CN109558450B (en) Automobile remote monitoring method and device based on distributed architecture
CN104778188B (en) A kind of distributed apparatus log collection method
CN107038162A (en) Real time data querying method and system based on database journal
CN104751359B (en) System and method for payment clearing
CN111586091B (en) Edge computing gateway system for realizing computing power assembly
CN106503276A (en) A kind of method and apparatus of the time series databases for real-time monitoring system
CN104036025A (en) Distribution-base mass log collection system
CN105824744A (en) Real-time log collection and analysis method on basis of B2B (Business to Business) platform
CN104216989A (en) Method for storing transmission line integrated data based on HBase
CN103995807B (en) Magnanimity data query and the method for after-treatment under a kind of framework based on Web
CN103793493B (en) A kind of method and system for handling car-mounted terminal mass data
CN102750326A (en) Log management optimization method of cluster system based on downsizing strategy
CN107800808A (en) A kind of data-storage system based on Hadoop framework
CN109739919A (en) A kind of front end processor and acquisition system for electric system
CN103455335A (en) Multilevel classification Web implementation method
CN106649687A (en) Method and device for on-line analysis and processing of large data
CN106570145B (en) Distributed database result caching method based on hierarchical mapping
CN110377757A (en) A kind of real time knowledge map construction system
CN112182004A (en) Method and device for viewing data in real time, computer equipment and storage medium
CN103345527B (en) Intelligent data statistical system
CN112465175A (en) Public service internet of things technology service platform
CN117076426A (en) Traffic intelligent engine system construction method and device based on flow batch integration
CN109145109A (en) User group's message propagation anomaly analysis method and device based on social networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant