CN110377757A

CN110377757A - A kind of real time knowledge map construction system

Info

Publication number: CN110377757A
Application number: CN201910642692.2A
Authority: CN
Inventors: 杨仪军
Original assignee: Beijing Sea - Induced Star Map Technology Co Ltd
Current assignee: Beijing Sea - Induced Star Map Technology Co Ltd
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2019-10-25
Anticipated expiration: 2039-07-16
Also published as: CN110377757B

Abstract

The invention discloses a kind of real time knowledge map construction systems, are related to map analysis platform technology field.The real time knowledge map construction system, including application program part, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source module；Wherein the specific works of each part are as follows: A1, data source modules are responsible for data and acquire and access KAFKA message queue；B1, source data bus are responsible for transmission and need to extract entity/relationship basic data；The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with SparkStreaming carries out entity extraction, relationship is extracted.The real time knowledge map construction system, pass through being used cooperatively for the modules such as application program part, knowledge store module, data source bus and data source module, using the basic data in Spark Streaming consumption KAFKA carries out entity extraction, relationship is extracted, miss data is recorded in external storage using HBase, while be can solve and be extracted loss of data or repeat and data source accesses single problem.

Description

A kind of real time knowledge map construction system

Technical field

The present invention relates to map analysis platform technology field, specially a kind of real time knowledge map construction system.

Background technique

Kafka is a kind of open source stream process platform, is a kind of distributed hair of high-throughput by Scala and written in Java Cloth subscribes to message system, it can handle everything flow data of the consumer in website, and (web page browsing is searched for this movement The action of rope and other users) be many social functions on modern network a key factor, these data are usually Due to handling capacity requirement and solved by processing log and log aggregation, for as Hadoop daily record data and from Line analysis system, but the limitation handled in real time is required, this is a feasible solution, and the purpose of Kafka is to pass through The loaded in parallel mechanism of Hadoop unifies Message Processing on line and offline, disappears in real time also for being provided by cluster Breath.

In the prior art, the application in actual current profile building field is seldom, and substantially T+1 (is constructed for one day Map relationship), therefore cannot achieve and disposably extract accurate knowledge extraction as a result, it is difficult to well adapt to data The higher business of accuracy requirement, while knowledge is extracted there are higher delay, be easy to appear knowledge extract loss of data or The case where repeating, the data source supported and considered is less, and the scalability of scheme is not high, makes troubles to the use of user.

Summary of the invention

(1) the technical issues of solving

In view of the deficiencies of the prior art, it the present invention provides a kind of real time knowledge map construction system, solves existing In technology, the application in actual current profile building field is seldom, substantially T+1 (the map relationship of building in one day), Therefore it cannot achieve and disposably extract accurate knowledge extraction as a result, it is difficult to well adapt to higher to data accuracy requirement Business, while knowledge is extracted there are higher delay, is easy to appear the case where knowledge is extracted loss of data or repeated, is supported It is less with the data source of consideration, the not high problem of the scalability of scheme.

(2) technical solution

In order to achieve the above object, the present invention is achieved by the following technical programs: a kind of real time knowledge map construction system System, including application program part, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source mould Block；

Wherein the specific works of each part are as follows:

A1, data source modules are responsible for data and acquire and access KAFKA message queue；

B1, source data bus are responsible for transmission and need to extract entity/relationship basic data；

The basic data that C1, knowledge extraction module are responsible for being consumed in KAFKA with Spark Streaming carries out entity and mentions It takes, relationship is extracted；

D1, knowledge bus are responsible for transmitting entity/relation data that Spark Streaming extracts completion；

It is real-time that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA Figure inquiry provides inquiry data；

F1, application program part are responsible for the figure inquiry of various real-time query scenes；

KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, described to know Know in bus and be provided with KAFKA cluster 11, the KAFKA cluster 1 is made of Server11-13, wherein setting in each Server Operating unit there are two setting；

The data source includes service server journal file, outside REST API request, external data storage；

It include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on such as Under:

A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check for subsequent Entity relationship details use；

B2, GDB consumer program read entity/relationship in message queue and GDB are written, and make for subsequent real-time figure inquiry With；

C2, ES consumer read the entity in message queue and Elasticsearch, entity when for subsequent real-time query are written Secondary index use；

The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES, The relationship that the API in the corresponding library such as GDB, HBase completes figure is searched and details inquiry.

Wherein concrete operations process is as follows:

S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to number According to source bus, wherein Flume obtains data by journal file；

S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will Corresponding data is sent in knowledge extraction module；

Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if Entity/relation data extracts failure and then records failure log to external storage by HBase；

Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase Consumer, GDB consumer and ES consumer；

Entity/relation data is processed into application-dependent data and is sent to the progress of knowledge store module by S5, corresponding consumer Storage, is then sent to application program part, concrete condition is as follows:

Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the exhibition of K layers of application program part It opens；

Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API Diameter shows shortest path by application program part；

Application-dependent data is sent to ES by C3, ES consumer, and shows complete trails by application program part.

Preferably, all external data sources are read by Flume/JAVA REST API/ data extraction tool/Spark program Write-in KAFKA message queue is taken, the place that message queue is written is provided with data access information monitoring, the text including accessing data Part title or the table name of relational database, turn-on time, access data volume, performance data amount and miss data amount, Middle miss data is recorded in external storage using HBase.

Preferably, entity is carried out using message in real-time streaming processing frame Spark Streaming consumption KAFKA to mention It takes, relationship extraction, extracts result and KAFKA is written, the place that message queue is written is provided with data access information monitoring, including connects Enter table name, turn-on time, the access data volume, performance data amount, miss data amount of data, wherein miss data uses HBase is recorded in external storage.

Preferably, the ES is the abbreviation of Elasticsearch.

(3) beneficial effect

The present invention provides a kind of real time knowledge map construction systems.Have following the utility model has the advantages that the real time knowledge map Building system is made by the cooperation of the modules such as application program part, knowledge store module, data source bus and data source module With using the basic data in Spark Streaming consumption KAFKA carries out entity extraction, relationship is extracted, while will unsuccessfully be counted It is recorded in external storage according to using HBase, greatly reduces delay, improve work efficiency, while can solve extraction Loss of data or repetition and data source access single problem, strengthen the scalability of scheme, the person's of being convenient to use makes With.

Detailed description of the invention

Fig. 1 is structure of the invention functional block diagram.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to Fig. 1, the present invention provides a kind of technical solution: a kind of real time knowledge map construction system, including apply journey Preamble section, knowledge store module, knowledge consumption module, knowledge bus, data source bus and data source module；

Wherein the specific works of each part are as follows:

The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES, The relationship that the API in the corresponding library such as GDB, HBase completes figure is searched and details inquiry (abbreviation that ES is Elasticsearch).

Wherein concrete operations process is as follows:

Remarks: all external data sources are read by Flume/JAVA REST API/ data extraction tool/Spark program KAFKA message queue is written, the place that message queue is written is provided with data access information monitoring, the file including accessing data Title or the table name of relational database, turn-on time, access data volume, performance data amount and miss data amount, wherein Miss data is recorded in external storage using HBase.

Using message carries out entity extraction, relationship mentions in real-time streaming processing frame Spark Streaming consumption KAFKA It takes, extracts result and KAFKA is written, the place that message queue is written is provided with data access information monitoring, including access data Table name, turn-on time, access data volume, performance data amount, miss data amount, wherein miss data is recorded in using HBase In external storage.

In conclusion the real time knowledge map construction system, passes through application program part, knowledge store module, data source The modules such as bus and data source modules are used cooperatively, and are carried out using the basic data in Spark Streaming consumption KAFKA Entity extraction, relationship are extracted, while miss data being recorded in external storage using HBase, are greatly reduced delay, are mentioned High working efficiency, while can solve and extract loss of data or repeat and data source accesses single problem, it strengthens The scalability of scheme, the use for the person of being convenient to use.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.

It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims

1. a kind of real time knowledge map construction system, it is characterised in that: including application program part, knowledge store module, knowledge Consumption module, knowledge bus, data source bus and data source module；

Wherein the specific works of each part are as follows:

The basic data that C1, knowledge extraction module are responsible for being consumed with Spark Streaming in KAFKA carries out entity extraction, closes System extracts；

It is that real-time figure is looked into that corresponding external storage, which is written, in entity/relation data that E1, knowledge consumption module are responsible for consuming in KAFKA It askes and inquiry data is provided；

KAFKA cluster 1 is provided in the data source bus, the KAFKA cluster 1 is made of Server1-3, and the knowledge is total KAFKA cluster 11 is provided in line, the KAFKA cluster 1 is made of Server11-13, wherein being provided in each Server Two operating units；

Include GDB consumer, HBase consumer and ES consumer in the knowledge consumption module, three specifically acts on as follows:

A2, HBase consumer program read entity/relationship in message queue and HBase database are written, and check entity for subsequent Relationship details use；

B2, GDB consumer program read entity/relationship in message queue and GDB are written, and use for subsequent real-time figure inquiry；

C2, ES consumer read the entity in message queue and Elasticsearch, the two of entity when for subsequent real-time query are written Grade index uses；

The inquiry scene such as the K layer expansion of the application program, shortest path, complete trails, community discovery by call ES, GDB, The relationship that the API in the corresponding library such as HBase completes figure is searched and details inquiry.

Wherein concrete operations process is as follows:

S1, data source modules obtain data source by Flume/REST API/ external storage and data source are sent to data source Bus, wherein Flume obtains data by journal file；

S2, data source bus receive the Server corresponded in KAFKA cluster 1 after data source and generate corresponding data, and will correspond to Data are sent in knowledge extraction module；

Entity/relation data in corresponding data is extracted and is sent to knowledge bus by S3, knowledge extraction module, wherein if real Body/relation data extracts failure and then records failure log to external storage by HBase；

Entity/relation data is sent to corresponding consumer by S4, knowledge consumption module, and wherein consumer includes HBase consumption Person, GDB consumer and ES consumer；

Entity/relation data is processed into application-dependent data and is sent to knowledge store module by S5, corresponding consumer to be stored, It is then sent to application program part, concrete condition is as follows:

Application-dependent data is sent to chart database storage by A3, GDB consumer, and passes through the expansion of K layers of application program part；

Application-dependent data is sent to HBase storage by B3, HBase consumer, and obtains shortest path by REST API, is led to It crosses application program part and shows shortest path；

2. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: all external data sources are logical It crosses Flume/JAVA REST API/ data extraction tool/Spark program and reads write-in KAFKA message queue, message team is written The place of column is provided with data access information monitoring, the table name of file name or relational database including access data, Turn-on time, access data volume, performance data amount and miss data amount, wherein miss data is recorded in outside using HBase In storage.

3. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: handled using real-time streaming Frame Spark Streaming consumes message in KAFKA and carries out entity extraction, relationship extraction, extracts result and KAFKA is written, write The place for entering message queue is provided with data access information monitoring, table name, turn-on time, access data including accessing data Amount, performance data amount, miss data amount, wherein miss data is recorded in external storage using HBase.

4. a kind of real time knowledge map construction system according to claim 1, it is characterised in that: the ES is The abbreviation of Elasticsearch.