CN114691700A - Kafaka cluster-based intelligent park retrieval method - Google Patents

Kafaka cluster-based intelligent park retrieval method Download PDF

Info

Publication number
CN114691700A
CN114691700A CN202011584254.4A CN202011584254A CN114691700A CN 114691700 A CN114691700 A CN 114691700A CN 202011584254 A CN202011584254 A CN 202011584254A CN 114691700 A CN114691700 A CN 114691700A
Authority
CN
China
Prior art keywords
message
data
unique value
kafaka
retrieval system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011584254.4A
Other languages
Chinese (zh)
Inventor
曾小虎
张大志
吴恺
欧阳少海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Flying Enterprise Internet Technology Co Ltd
Original Assignee
Guangdong Flying Enterprise Internet Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Flying Enterprise Internet Technology Co Ltd filed Critical Guangdong Flying Enterprise Internet Technology Co Ltd
Priority to CN202011584254.4A priority Critical patent/CN114691700A/en
Publication of CN114691700A publication Critical patent/CN114691700A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for retrieving an intelligent park based on a kafaka cluster, which is characterized by comprising the steps that a service end configures Logstash parameters for table data needing to be indexed, configures specified messages and records timestamps of the messages, a retrieval system collects the messages, and the Logstash pushes the messages to the kafaka cluster; the retrieval system acquires the message subscribed from the Kafaka cluster, verifies the message and gives the message a unique value; taking the message unique value as a document ID of the message, taking fields in the message unique value as a tenant ID, a source type ID, a data ID and message content of the message respectively, synchronizing the message unique value, the tenant ID, the source type ID, the data ID and the message content of the message into an elastic search, and updating the data of the message by inquiring the message unique value corresponding to the elastic search; the method collects the structured data and the unstructured data through the Logstash, can directly submit the structured data and classify the structured data and the unstructured data according to the type ID.

Description

Kafaka cluster-based intelligent park retrieval method
Technical Field
The invention relates to the technical field, in particular to a method for searching a smart park based on kafaka clusters.
Background
Full-text search is widely applied in multiple fields at present, and the content needs to be retrieved regardless of whether the content is structured data or unstructured data. However, in the smart campus industry, each campus is often an independent development and retrieval system, but in each business module, the generated structure and unstructured data retrieval systems cannot be collected and classified in a unified manner.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings in the prior art and provides a method for searching a smart park based on kafaka clusters.
One embodiment of the invention provides a method for searching a smart park based on kafaka cluster, which comprises the following steps:
the service end configures a Logstash parameter for table data needing to be indexed, configures a specified message and records a timestamp of the message, a retrieval system collects the message, and the Logstash pushes the message to a kafaka cluster;
the retrieval system acquires a message subscribed from the Kafaka cluster, verifies the message, and gives a message unique value to the message after the message is verified;
taking the message unique value as a document ID of the message, taking the values of Field types of four fields in the message unique value as a tenant ID, a source type ID, a data ID and a message content of the message respectively, and synchronizing the message unique value, the tenant ID, the source type ID, the data ID and the message content of the message into an elastic search;
the retrieval system updates the data of the message by querying the unique value of the elastic search message.
In one embodiment, the service end stores the message into MySQL, the retrieval system runs the Logstash service, whether the message is a new message is judged through a timestamp, and when the message is the new message, the retrieval system collects the message from a MySQL database table.
In one embodiment, after the message is verified, whether the message exists in an elastic search is searched according to the type ID and the data ID of the message source, and when the message does not exist, a message unique value is generated through a snowflake algorithm and is endowed to the message unique value; if the message exists, updating the timestamp of the message according to the timestamp of the message recorded by the service end, and updating the value of each corresponding field in the message unique value according to the message unique value of the message.
In one embodiment, when message data of a service end is deleted, the retrieval system calls the type ID and the data ID of the deletion message, and queries whether the elastic search has the type ID and the data ID of the deletion message according to the type ID and the data ID of the deletion message, and deletes the message data if the elastic search has the type ID and the data ID.
In one embodiment, the message includes structured data and unstructured data.
Compared with the prior art, the invention provides a method for searching an intelligent park based on a kafaka cluster, a service end configures a Logstash parameter for table data needing to be indexed, a searching system operates a Logstash service, the Logstash service collects information from a MySQL database table, the Logstash service pushes the data into kafaka, the kafaka pushes the information into the searching system, the searching system receives the kafaka data, checks and cleans the kafaka data, synchronizes the kafaka data into an elastic search after passing through the database and gives the information a unique message value, if the information does not exist, the unique message value is generated according to a snowflake algorithm, the unique message is used as a document ID of the information, and a tenant ID, a source type ID, a data ID and a message content are respectively used as a Field type value in the document and synchronized into an elastic search; if the message exists, updating the value of each corresponding field in the document according to the acquired document ID; when message data from a system source is deleted, retrieving a system call type ID and a data ID, inquiring whether message data exists in an elastic search, and deleting the message data if the message data exists; structured data and unstructured data are collected through the Logstash, can be directly submitted, and are classified according to type IDs.
In order that the invention may be more clearly understood, specific embodiments thereof will be described hereinafter with reference to the accompanying drawings.
Drawings
FIG. 1 is a flowchart of a method for searching a smart park based on kafaka clustering according to an embodiment of the present invention
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Please refer to fig. 1, which is a flowchart illustrating a method for retrieving a smart campus based on kafaka clustering according to an embodiment of the present invention, the method comprising:
the service end configures logstack (server end data processing pipeline) parameters for table data to be indexed, configures specified messages and records timestamps of the messages, a retrieval system collects the messages through the logstack, and the logstack pushes the messages to a kafaka (distributed log system) cluster;
the Kafaka cluster pushes the message to a retrieval system, the retrieval system verifies the message, and the message is given a message unique value after passing the message verification;
taking the message unique value as a document ID of the message, taking the values of Field types of four fields in the message unique value as a tenant ID, a source type ID, a data ID and a message content of the message respectively, and synchronizing the message unique value, the tenant ID, the source type ID, the data ID and the message content of the message into an elastic search (search server);
the retrieval system updates the data of the message by querying the unique value of the elasticsearch message.
The service end is used for matching specific service requirements with system functions. In one embodiment, the service end comprises a user end for generating messages, configuring a Logstash parameter, configuring specified messages and recording timestamps of the messages, and using a retrieval system.
The table data includes data in a data table in a database. In one embodiment, the table data includes messages of the business in a data table stored in a MySQL (database management system) database.
The time stamp is data generated using digital signature techniques. In one embodiment, the timestamp is data that is recorded by the service end for authentication of the time at which the message data was generated.
The retrieval system is a system for performing a lookup on data stored in a database through the data processing capability of a computer. In one embodiment, the retrieval system comprises a system of the elasticsearch service that looks up information or data stored in elasticsearch.
The message unique value is sixteen octets represented as 32 hexadecimal numbers, and all elements in the distributed system can have unique identification information. In one embodiment, the message unique value is a unique value generated by an algorithm and can be used as a document ID of a message, and the Field type values of the fields can be used as a tenant ID, a source type ID, a data ID and a message content of the message, respectively.
The tenant ID, the source type ID, the data ID and the message content are respectively an ID representing a source user of the message, an ID representing a message type, an ID representing message data and an ID representing data of specific content of the message.
The Kafaka is a high-throughput distributed system for log collection, subscription message publication, and message distribution. In one embodiment, the Kafaka cluster synchronizes data, provides distributed efficient expansion for real-time indexing of large amounts of data, and simultaneously interfaces multiple data sources of one or more service terminals and multiple different types of databases.
The Logstash is a platform for application program log, event transmission, processing, management and search. In one embodiment, the Logstash performs collection management on the messages configured by the service end and pushes the messages to the kafaka cluster.
The elastic search is a search server, and provides a full-text search engine with distributed multi-user capability for the system. In one embodiment, the content of the elastic search query message is fuzzy matching, and the query tenant ID and the user ID are complete matching.
The service end stores the message into MySQL, the retrieval system runs the Logistalsh service, whether the message is a new message is judged through the timestamp, and when the message is the new message, the retrieval system collects the message from the MySQL database table.
And the service end records a new message, wherein the new message comprises a timestamp stored in MySQL, and the Logstash calls an SQL statement of the MySQL to obtain the latest message through the timestamp.
After the message passes the verification, searching whether the message exists in an elastic search according to the type ID and the data ID of the message source, and generating a message unique value through a snowflake algorithm when the message does not exist, and giving the message unique value; if the message exists, updating the timestamp of the message according to the timestamp of the message recorded by a service end, and updating the value of each corresponding field in the message unique value according to the message unique value of the message.
The snowflake algorithm is composed of invalid bits, time bits, machine bits and sequence number bits. In one embodiment, the snowflake algorithm identifies nodes of a multi-node in a distributed environment.
When the message data of the service end is deleted, the retrieval system calls the type ID and the data ID of the deletion message, and queries whether the elastic search has the type ID and the data ID of the deletion message or not according to the type ID and the data ID of the deletion message, and if yes, the retrieval system deletes the message data.
The service end deletes the sent message and deletes the message in an elastic search engine, and the elastic search provides an HTTP deletion interface. In one embodiment, when a service system deletes a certain message, a retrieval system calls an interface for deleting the message through an HTTP interface, a service end transfers a parameter type ID and a data ID, and then queries whether the message data exists in an index in an elastic search through HTTP, and if so, executes an index deletion operation.
The message includes structured data and unstructured data.
The structured information is a table in a database, the structure of rows and columns is fixed, and the information can be identified, and the unstructured information is file information.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. A method for searching a smart park based on kafaka cluster is characterized by comprising the following steps:
the service end configures a Logstash parameter for table data needing to be indexed, configures a specified message and records a timestamp of the message, a retrieval system collects the message, and the Logstash pushes the message to a kafaka cluster;
the retrieval system acquires the message subscribed from the Kafaka cluster, verifies the message, and gives a message unique value to the message after the message is verified;
taking the message unique value as a document ID of the message, taking the values of Field types of four fields in the message unique value as a tenant ID, a source type ID, a data ID and a message content of the message respectively, and synchronizing the message unique value, the tenant ID, the source type ID, the data ID and the message content of the message into an elastic search;
the retrieval system updates the data of the message by querying the unique value of the elasticsearch message.
2. The method as claimed in claim 1, wherein the method comprises:
the business end stores the message into MySQL, the retrieval system runs the Logstash service, judges whether the message is a new message or not through the timestamp, and when the message is the new message, the retrieval system collects the message from the MySQL database table.
3. The method as claimed in claim 1, wherein the method comprises:
after the message passes the verification, searching whether the message exists in an elastic search according to the type ID and the data ID of the message source, and generating a message unique value through a snowflake algorithm when the message does not exist, and giving the message unique value; if the message exists, updating the timestamp of the message according to the timestamp of the message recorded by a service end, and updating the value of each corresponding field in the message unique value according to the message unique value of the message.
4. The method as claimed in claim 1, wherein the method comprises:
when the message data of the service end is deleted, the retrieval system calls the type ID and the data ID of the deletion message, and queries whether the elastic search has the type ID and the data ID of the deletion message or not according to the type ID and the data ID of the deletion message, and deletes the message data if the elastic search has the type ID and the data ID.
5. The method as claimed in claim 1, wherein the method comprises:
the message includes structured data and unstructured data.
CN202011584254.4A 2020-12-28 2020-12-28 Kafaka cluster-based intelligent park retrieval method Pending CN114691700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011584254.4A CN114691700A (en) 2020-12-28 2020-12-28 Kafaka cluster-based intelligent park retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011584254.4A CN114691700A (en) 2020-12-28 2020-12-28 Kafaka cluster-based intelligent park retrieval method

Publications (1)

Publication Number Publication Date
CN114691700A true CN114691700A (en) 2022-07-01

Family

ID=82130776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011584254.4A Pending CN114691700A (en) 2020-12-28 2020-12-28 Kafaka cluster-based intelligent park retrieval method

Country Status (1)

Country Link
CN (1) CN114691700A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860898A (en) * 2023-09-05 2023-10-10 建信金融科技有限责任公司 Data processing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860898A (en) * 2023-09-05 2023-10-10 建信金融科技有限责任公司 Data processing method and device
CN116860898B (en) * 2023-09-05 2024-04-23 建信金融科技有限责任公司 Data processing method and device

Similar Documents

Publication Publication Date Title
CN111552687B (en) Time sequence data storage method, query method, device, equipment and storage medium
US9124612B2 (en) Multi-site clustering
US9130971B2 (en) Site-based search affinity
CN113986873B (en) Method for processing, storing and sharing data modeling of mass Internet of things
EP2263180B1 (en) Indexing large-scale gps tracks
CN111459985B (en) Identification information processing method and device
CN109299183A (en) A kind of data processing method, device, terminal device and storage medium
KR102160318B1 (en) Aggregating data in a mediation system
US10108634B1 (en) Identification and removal of duplicate event records from a security information and event management database
EP3371717A1 (en) Virtual edge of a graph database
CN110245134B (en) Increment synchronization method applied to search service
CN109298978B (en) Recovery method and system for database cluster of specified position
CN111506556A (en) Multi-source heterogeneous structured data synchronization method
CN111061758B (en) Data storage method, device and storage medium
CN112685433A (en) Metadata updating method and device, electronic equipment and computer-readable storage medium
CN110955704A (en) Data management method, device, equipment and storage medium
CN113704790A (en) Abnormal log information summarizing method and computer equipment
CN110674231A (en) Data lake-oriented user ID integration method and system
CN113612306A (en) Distributed power distribution cabinet and control system thereof
CN109902127A (en) History state data processing method, device, computer equipment and storage medium
US10838931B1 (en) Use of stream-oriented log data structure for full-text search oriented inverted index metadata
CN111858722A (en) Big data application system and method based on Internet of things
CN114691700A (en) Kafaka cluster-based intelligent park retrieval method
CN113608952A (en) System fault processing method and system based on log construction support environment
CN109542913B (en) Network asset safety management method in complex environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination