CN110377757A - A kind of real time knowledge map construction system - Google Patents
A kind of real time knowledge map construction system Download PDFInfo
- Publication number
- CN110377757A CN110377757A CN201910642692.2A CN201910642692A CN110377757A CN 110377757 A CN110377757 A CN 110377757A CN 201910642692 A CN201910642692 A CN 201910642692A CN 110377757 A CN110377757 A CN 110377757A
- Authority
- CN
- China
- Prior art keywords
- data
- knowledge
- consumer
- entity
- kafka
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of real time knowledge map construction systems, are related to map analysis platform technology field.The real time knowledge map construction system, including application program part, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source module;Wherein the specific works of each part are as follows: A1, data source modules are responsible for data and acquire and access KAFKA message queue;B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with SparkStreaming carries out entity extraction, relationship is extracted.The real time knowledge map construction system, pass through being used cooperatively for the modules such as application program part, knowledge store module, data source bus and data source module, using the basic data in Spark Streaming consumption KAFKA carries out entity extraction, relationship is extracted, miss data is recorded in external storage using HBase, while be can solve and be extracted loss of data or repeat and data source accesses single problem.
Description
Technical field
The present invention relates to map analysis platform technology field, specially a kind of real time knowledge map construction system.
Background technique
Kafka is a kind of open source stream process platform, is a kind of distributed hair of high-throughput by Scala and written in Java
Cloth subscribes to message system, it can handle everything flow data of the consumer in website, and (web page browsing is searched for this movement
The action of rope and other users) be many social functions on modern network a key factor, these data are usually
Due to handling capacity requirement and solved by processing log and log aggregation, for as Hadoop daily record data and from
Line analysis system, but the limitation handled in real time is required, this is a feasible solution, and the purpose of Kafka is to pass through
The loaded in parallel mechanism of Hadoop unifies Message Processing on line and offline, disappears in real time also for being provided by cluster
Breath.
In the prior art, the application in actual current profile building field is seldom, and substantially T+1 (is constructed for one day
Map relationship), therefore cannot achieve and disposably extract accurate knowledge extraction as a result, it is difficult to well adapt to data
The higher business of accuracy requirement, while knowledge is extracted there are higher delay, be easy to appear knowledge extract loss of data or
The case where repeating, the data source supported and considered is less, and the scalability of scheme is not high, makes troubles to the use of user.
Summary of the invention
(1) the technical issues of solving
In view of the deficiencies of the prior art, it the present invention provides a kind of real time knowledge map construction system, solves existing
In technology, the application in actual current profile building field is seldom, substantially T+1 (the map relationship of building in one day),
Therefore it cannot achieve and disposably extract accurate knowledge extraction as a result, it is difficult to well adapt to higher to data accuracy requirement
Business, while knowledge is extracted there are higher delay, is easy to appear the case where knowledge is extracted loss of data or repeated, is supported
It is less with the data source of consideration, the not high problem of the scalability of scheme.
(2) technical solution
In order to achieve the above object, the present invention is achieved by the following technical programs: a kind of real time knowledge map construction system
System, including application program part, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source mould
Block;
Wherein the specific works of each part are as follows:
A1, data source modules are responsible for data and acquire and access KAFKA message queue;
B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;
The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with Spark Streaming carries out entity and mentions
It takes, relationship is extracted;
D1, knowledge bus are responsible for transmitting entity/relation data that Spark Streaming extracts completion;
It is real-time that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA
Figure inquiry provides inquiry data;
F1, application program part are responsible for the figure inquiry of various real-time query scenes;
KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, described to know
Know in bus and be provided with KAFKA cluster 11, the KAFKA cluster 1 is made of Server11-13, wherein setting in each Server
Operating unit there are two setting;
The data source includes service server journal file, outside REST API request, external data storage;
It include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on such as
Under:
A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check for subsequent
Entity relationship details use;
B2, GDB consumer program read entity/relationship in message queue and GDB are written, and make for subsequent real-time figure inquiry
With;
C2, ES consumer read the entity in message queue and Elasticsearch, entity when for subsequent real-time query are written
Secondary index use;
The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES,
The relationship that the API in the corresponding library such as GDB, HBase completes figure is searched and details inquiry.
Wherein concrete operations process is as follows:
S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to number
According to source bus, wherein Flume obtains data by journal file;
S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will
Corresponding data is sent in knowledge extraction module;
Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if
Entity/relation data extracts failure and then records failure log to external storage by HBase;
Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase
Consumer, GDB consumer and ES consumer;
Entity/relation data is processed into application-dependent data and is sent to the progress of knowledge store module by S5, corresponding consumer
Storage, is then sent to application program part, concrete condition is as follows:
Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the exhibition of K layers of application program part
It opens;
Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API
Diameter shows shortest path by application program part;
Application-dependent data is sent to ES by C3, ES consumer, and shows complete trails by application program part.
Preferably, all external data sources are read by Flume/JAVA REST API/ data extraction tool/Spark program
Write-in KAFKA message queue is taken, the place that message queue is written is provided with data access information monitoring, the text including accessing data
Part title or the table name of relational database, turn-on time, access data volume, performance data amount and miss data amount,
Middle miss data is recorded in external storage using HBase.
Preferably, entity is carried out using message in real-time streaming processing frame Spark Streaming consumption KAFKA to mention
It takes, relationship extraction, extracts result and KAFKA is written, the place that message queue is written is provided with data access information monitoring, including connects
Enter table name, turn-on time, the access data volume, performance data amount, miss data amount of data, wherein miss data uses
HBase is recorded in external storage.
Preferably, the ES is the abbreviation of Elasticsearch.
(3) beneficial effect
The present invention provides a kind of real time knowledge map construction systems.Have following the utility model has the advantages that the real time knowledge map
Building system is made by the cooperation of the modules such as application program part, knowledge store module, data source bus and data source module
With using the basic data in Spark Streaming consumption KAFKA carries out entity extraction, relationship is extracted, while will unsuccessfully be counted
It is recorded in external storage according to using HBase, greatly reduces delay, improve work efficiency, while can solve extraction
Loss of data or repetition and data source access single problem, strengthen the scalability of scheme, the person's of being convenient to use makes
With.
Detailed description of the invention
Fig. 1 is structure of the invention functional block diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides a kind of technical solution: a kind of real time knowledge map construction system, including apply journey
Preamble section, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source module;
Wherein the specific works of each part are as follows:
A1, data source modules are responsible for data and acquire and access KAFKA message queue;
B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;
The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with Spark Streaming carries out entity and mentions
It takes, relationship is extracted;
D1, knowledge bus are responsible for transmitting entity/relation data that Spark Streaming extracts completion;
It is real-time that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA
Figure inquiry provides inquiry data;
F1, application program part are responsible for the figure inquiry of various real-time query scenes;
KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, described to know
Know in bus and be provided with KAFKA cluster 11, the KAFKA cluster 1 is made of Server11-13, wherein setting in each Server
Operating unit there are two setting;
The data source includes service server journal file, outside REST API request, external data storage;
It include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on such as
Under:
A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check for subsequent
Entity relationship details use;
B2, GDB consumer program read entity/relationship in message queue and GDB are written, and make for subsequent real-time figure inquiry
With;
C2, ES consumer read the entity in message queue and Elasticsearch, entity when for subsequent real-time query are written
Secondary index use;
The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES,
The relationship that the API in the corresponding library such as GDB, HBase completes figure is searched and details inquiry (abbreviation that ES is Elasticsearch).
Wherein concrete operations process is as follows:
S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to number
According to source bus, wherein Flume obtains data by journal file;
S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will
Corresponding data is sent in knowledge extraction module;
Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if
Entity/relation data extracts failure and then records failure log to external storage by HBase;
Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase
Consumer, GDB consumer and ES consumer;
Entity/relation data is processed into application-dependent data and is sent to the progress of knowledge store module by S5, corresponding consumer
Storage, is then sent to application program part, concrete condition is as follows:
Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the exhibition of K layers of application program part
It opens;
Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API
Diameter shows shortest path by application program part;
Application-dependent data is sent to ES by C3, ES consumer, and shows complete trails by application program part.
Remarks: all external data sources are read by Flume/JAVA REST API/ data extraction tool/Spark program
KAFKA message queue is written, the place that message queue is written is provided with data access information monitoring, the file including accessing data
Title or the table name of relational database, turn-on time, access data volume, performance data amount and miss data amount, wherein
Miss data is recorded in external storage using HBase.
Using message carries out entity extraction, relationship mentions in real-time streaming processing frame Spark Streaming consumption KAFKA
It takes, extracts result and KAFKA is written, the place that message queue is written is provided with data access information monitoring, including access data
Table name, turn-on time, access data volume, performance data amount, miss data amount, wherein miss data is recorded in using HBase
In external storage.
In conclusion the real time knowledge map construction system, passes through application program part, knowledge store module, data source
The modules such as bus and data source modules are used cooperatively, and are carried out using the basic data in Spark Streaming consumption KAFKA
Entity extraction, relationship are extracted, while miss data being recorded in external storage using HBase, are greatly reduced delay, are mentioned
High working efficiency, while can solve and extract loss of data or repeat and data source accesses single problem, it strengthens
The scalability of scheme, the use for the person of being convenient to use.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding
And modification, the scope of the present invention is defined by the appended.
Claims (4)
1. a kind of real time knowledge map construction system, it is characterised in that: including application program part, knowledge store module, knowledge
Consumption module, knowledge bus, data source bus and data source module;
Wherein the specific works of each part are as follows:
A1, data source modules are responsible for data and acquire and access KAFKA message queue;
B1, source data bus are responsible for transmission and need to extract entity/relationship basic data;
The basic data that C1, knowledge extraction module are responsible for being consumed with Spark Streaming in KAFKA carries out entity extraction, closes
System extracts;
D1, knowledge bus are responsible for transmitting entity/relation data that Spark Streaming extracts completion;
It is that real-time figure is looked into that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA
It askes and inquiry data is provided;
F1, application program part are responsible for the figure inquiry of various real-time query scenes;
KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, and the knowledge is total
KAFKA cluster 11 is provided in line, the KAFKA cluster 1 is made of Server11-13, wherein being provided in each Server
Two operating units;
The data source includes service server journal file, outside REST API request, external data storage;
Include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on as follows:
A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check entity for subsequent
Relationship details use;
B2, GDB consumer program read entity/relationship in message queue and GDB are written, and use for subsequent real-time figure inquiry;
C2, ES consumer read the entity in message queue and Elasticsearch, the two of entity when for subsequent real-time query are written
Grade index uses;
The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES, GDB,
The relationship that the API in the corresponding library such as HBase completes figure is searched and details inquiry.
Wherein concrete operations process is as follows:
S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to data source
Bus, wherein Flume obtains data by journal file;
S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will correspond to
Data are sent in knowledge extraction module;
Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if real
Body/relation data extracts failure and then records failure log to external storage by HBase;
Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase consumption
Person, GDB consumer and ES consumer;
Entity/relation data is processed into application-dependent data and is sent to knowledge store module by S5, corresponding consumer to be stored,
It is then sent to application program part, concrete condition is as follows:
Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the expansion of K layers of application program part;
Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API, is led to
It crosses application program part and shows shortest path;
Application-dependent data is sent to ES by C3, ES consumer, and shows complete trails by application program part.
2. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: all external data sources are logical
It crosses Flume/JAVA REST API/ data extraction tool/Spark program and reads write-in KAFKA message queue, message team is written
The place of column is provided with data access information monitoring, the table name of file name or relational database including access data,
Turn-on time, access data volume, performance data amount and miss data amount, wherein miss data is recorded in outside using HBase
In storage.
3. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: handled using real-time streaming
Frame Spark Streaming consumes message in KAFKA and carries out entity extraction, relationship extraction, extracts result and KAFKA is written, write
The place for entering message queue is provided with data access information monitoring, table name, turn-on time, access data including accessing data
Amount, performance data amount, miss data amount, wherein miss data is recorded in external storage using HBase.
4. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: the ES is
The abbreviation of Elasticsearch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910642692.2A CN110377757B (en) | 2019-07-16 | 2019-07-16 | Real-time knowledge graph construction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910642692.2A CN110377757B (en) | 2019-07-16 | 2019-07-16 | Real-time knowledge graph construction system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110377757A true CN110377757A (en) | 2019-10-25 |
CN110377757B CN110377757B (en) | 2023-02-14 |
Family
ID=68253468
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910642692.2A Active CN110377757B (en) | 2019-07-16 | 2019-07-16 | Real-time knowledge graph construction system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110377757B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104519A (en) * | 2019-10-29 | 2020-05-05 | 北京海致星图科技有限公司 | Method for constructing full-scale administrative region knowledge base |
CN111639082A (en) * | 2020-06-08 | 2020-09-08 | 成都信息工程大学 | Object storage management method and system of billion-level node scale knowledge graph based on Ceph |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1508680A (en) * | 2002-12-20 | 2004-06-30 | 中国科学院计算技术研究所 | Method for rapid path analysis for distributed file system |
US20170366480A1 (en) * | 2016-06-21 | 2017-12-21 | Oracle International Corporation | Internet cloud-hosted natural language interactive messaging system sessionizer |
CN107729413A (en) * | 2017-09-25 | 2018-02-23 | 安徽畅通行交通信息服务有限公司 | Regional traffic intelligent management system based on big data |
-
2019
- 2019-07-16 CN CN201910642692.2A patent/CN110377757B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1508680A (en) * | 2002-12-20 | 2004-06-30 | 中国科学院计算技术研究所 | Method for rapid path analysis for distributed file system |
US20170366480A1 (en) * | 2016-06-21 | 2017-12-21 | Oracle International Corporation | Internet cloud-hosted natural language interactive messaging system sessionizer |
CN107729413A (en) * | 2017-09-25 | 2018-02-23 | 安徽畅通行交通信息服务有限公司 | Regional traffic intelligent management system based on big data |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104519A (en) * | 2019-10-29 | 2020-05-05 | 北京海致星图科技有限公司 | Method for constructing full-scale administrative region knowledge base |
CN111639082A (en) * | 2020-06-08 | 2020-09-08 | 成都信息工程大学 | Object storage management method and system of billion-level node scale knowledge graph based on Ceph |
CN111639082B (en) * | 2020-06-08 | 2022-12-23 | 成都信息工程大学 | Object storage management method and system of billion-level node scale knowledge graph based on Ceph |
Also Published As
Publication number | Publication date |
---|---|
CN110377757B (en) | 2023-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105069703B (en) | A kind of electrical network mass data management method | |
CN104933112B (en) | Distributed interconnection Transaction Information storage processing method | |
CN109558450B (en) | Automobile remote monitoring method and device based on distributed architecture | |
CN104778188B (en) | A kind of distributed apparatus log collection method | |
CN107038162A (en) | Real time data querying method and system based on database journal | |
CN104751359B (en) | System and method for payment clearing | |
CN111586091B (en) | Edge computing gateway system for realizing computing power assembly | |
CN106503276A (en) | A kind of method and apparatus of the time series databases for real-time monitoring system | |
CN104036025A (en) | Distribution-base mass log collection system | |
CN105824744A (en) | Real-time log collection and analysis method on basis of B2B (Business to Business) platform | |
CN104216989A (en) | Method for storing transmission line integrated data based on HBase | |
CN103995807B (en) | Magnanimity data query and the method for after-treatment under a kind of framework based on Web | |
CN103793493B (en) | A kind of method and system for handling car-mounted terminal mass data | |
CN102750326A (en) | Log management optimization method of cluster system based on downsizing strategy | |
CN107800808A (en) | A kind of data-storage system based on Hadoop framework | |
CN109739919A (en) | A kind of front end processor and acquisition system for electric system | |
CN103455335A (en) | Multilevel classification Web implementation method | |
CN106649687A (en) | Method and device for on-line analysis and processing of large data | |
CN106570145B (en) | Distributed database result caching method based on hierarchical mapping | |
CN110377757A (en) | A kind of real time knowledge map construction system | |
CN112182004A (en) | Method and device for viewing data in real time, computer equipment and storage medium | |
CN103345527B (en) | Intelligent data statistical system | |
CN112465175A (en) | Public service internet of things technology service platform | |
CN117076426A (en) | Traffic intelligent engine system construction method and device based on flow batch integration | |
CN109145109A (en) | User group's message propagation anomaly analysis method and device based on social networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |