CN109299225B - Log retrieval method, system, terminal and computer readable storage medium - Google Patents

Log retrieval method, system, terminal and computer readable storage medium Download PDF

Info

Publication number
CN109299225B
CN109299225B CN201811246089.4A CN201811246089A CN109299225B CN 109299225 B CN109299225 B CN 109299225B CN 201811246089 A CN201811246089 A CN 201811246089A CN 109299225 B CN109299225 B CN 109299225B
Authority
CN
China
Prior art keywords
cluster
temporary
log
distributed file
file system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811246089.4A
Other languages
Chinese (zh)
Other versions
CN109299225A (en
Inventor
石晓龙
黄望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811246089.4A priority Critical patent/CN109299225B/en
Publication of CN109299225A publication Critical patent/CN109299225A/en
Application granted granted Critical
Publication of CN109299225B publication Critical patent/CN109299225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a log retrieval method, a system, a terminal and a computer readable storage medium. The log retrieval method comprises the following steps: storing specified log data in the ELK log system from the ES cluster backup to a distributed file system; creating a Kubernetes cluster, the Kubernetes cluster being independent of the ES cluster and the distributed file system; receiving a query request for specified log data stored to the distributed file system, and calling the Kubernetes cluster to create a temporary ES cluster; and importing the specified log data stored by the distributed file system into the temporary ES cluster to execute the query request through the temporary ES cluster. According to the invention, the appointed log data is stored to the HDFS based on distributed deployment, the log data retrieval operation is carried out by creating and utilizing the temporary ES cluster, and after the completion of the user retrieval operation is monitored, the created temporary ES cluster is released, so that the system resource is not occupied.

Description

Log retrieval method, system, terminal and computer readable storage medium
Technical Field
The present invention relates to the field of log data processing, and in particular, to a log retrieval method, system, terminal, and computer readable storage medium.
Background
This section is intended to provide a background or context for embodiments of the invention that are recited in the claims and detailed description. The description herein is not admitted to be prior art by inclusion in this section.
Most of the current enterprise log systems are developed based on ELKs (ELASTICSEARCH, LOGSTASH and Kibana for short), and for some important log data, a user will choose to backup the important data from the ELASTICSEARCH (ES) cluster of the ELKs to the distributed file system (Hadoop Distributed FILE SYSTEM, HDFS), and when a query is needed, the data in the HDFS is led back to the ES cluster, but the load of the ES cluster is usually higher, and the storage space is limited, once the data is led, the influence is caused to other users on the platform to a great extent, and when serious, the whole service is unavailable.
Disclosure of Invention
In view of the above, the present invention provides a log retrieval method, system, terminal and computer readable storage medium, which can realize no influence on the original ES cluster in the log data importing process, and ensure the stability of the ELK system.
An embodiment of the present application provides a log retrieval method, including:
Storing specified log data in the ELK log system from the ES cluster backup to a distributed file system;
Creating a Kubernetes cluster, wherein the Kubernetes cluster is independent of the ES cluster of the ELK log system and the distributed file system;
receiving a query request for specified log data stored to the distributed file system, and calling the Kubernetes cluster to create a temporary ES cluster; and
And importing the specified log data stored by the distributed file system into the temporary ES cluster to execute the query request through the temporary ES cluster.
Preferably, the step of storing specified log data in the ELK log system from the ES cluster backup to a distributed file system includes:
Storing specified log data in the ELK log system from the ES cluster backup to the distributed file system by using a snapshot migration algorithm; or (b)
And establishing an ES-Hadoop framework, and using the ES-Hadoop framework to store specified log data in the ELK log system from the ES cluster backup to the distributed file system.
Preferably, the temporary ES cluster is independent of ES clusters in the ELK log system.
Preferably, the step of calling the Kubernetes cluster to create a temporary ES cluster includes:
Acquiring the data size of the specified log data stored to the distributed file system; and
Creating a temporary ES cluster containing the corresponding node number according to the data size of the specified log data;
The temporary ES cluster comprises at least one ES-Client node, at least one ES-Data node and at least one ES-Master node.
Preferably, the step of importing the specified log data stored by the distributed file system into the temporary ES cluster includes:
and establishing an ES-Hadoop framework, and importing the specified log data stored by the distributed file system into the temporary ES cluster by using the ES-Hadoop framework.
Preferably, the step of importing the specified log data stored by the distributed file system into the temporary ES cluster to execute the query request through the temporary ES cluster includes;
importing the specified log data stored by the distributed file system into the temporary ES cluster, and returning to the service address of the temporary ES cluster after the data importing operation is completed; and
And linking to the temporary ES cluster according to the service address so as to execute the query request through the temporary ES cluster.
Preferably, the step of importing the specified log data stored in the distributed file system into the temporary ES cluster to execute the query request through the temporary ES cluster further includes:
monitoring whether the operation of the temporary ES cluster for executing the query request is finished; and
And when the operation of the query request is finished, releasing the temporary ES cluster.
An embodiment of the present application provides a log retrieval system, including:
the backup module is used for backing up and storing the appointed log data in the ELK log system from the ES cluster to a distributed file system;
A first creation module for creating a Kubernetes cluster, wherein the Kubernetes cluster is independent of the ES cluster of the ELK log system and the distributed file system;
The second creating module is used for receiving a query request for the specified log data stored in the distributed file system and calling the Kubernetes cluster to create a temporary ES cluster; and
And the execution module is used for importing the specified log data stored by the distributed file system into the temporary ES cluster so as to execute the query request through the temporary ES cluster.
An embodiment of the present application provides a terminal, where the terminal includes a processor and a memory, where the memory stores a plurality of computer programs, and the processor is configured to implement the steps of the log searching method described above when executing the computer programs stored in the memory.
An embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the log retrieval method as described above.
According to the log retrieval method, the system, the terminal and the computer readable storage medium, the Kubernetes cluster independent of the HDFS and the ES cluster is created, the Kubernetes cluster is called according to the retrieval requirement of the user to create a temporary ES cluster which is completely isolated from the ES cluster, backup log data on the HDFS is automatically imported into the temporary ES cluster after the creation is completed, so that the user can perform log data retrieval operation on the temporary ES cluster, the created temporary ES cluster is released after the completion of the retrieval operation of the user is monitored, the original ES cluster is not affected in the whole process, the created temporary ES cluster is only effective to the user, system resources are not occupied, the uniqueness of the user operation is guaranteed, the whole ELK system resources have good allocability and the operation experience of other users is not affected.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a method for retrieving a log according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating steps of a log search method according to another embodiment of the present invention.
FIG. 3 is a functional block diagram of a log search system according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. The embodiments of the present application and the features in the embodiments may be combined with each other without collision.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Preferably, the log retrieval method of the present invention is applied in one or more computer devices. The computer apparatus is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.
The computer device may be a computing device such as a desktop computer, a notebook computer, a tablet computer, a server, etc. The computer device can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
Embodiment one:
FIG. 1 is a flowchart showing the steps of a log search method according to a preferred embodiment of the present invention. The order of the steps in the flow diagrams may be changed, and some steps may be omitted, according to different needs.
Referring to fig. 1, the log searching method specifically includes the following steps.
Step S11, storing the specified log data in the ELK log system from the ES cluster backup to a distributed file system (HDFS).
In one embodiment, the ELK log system includes an ES cluster, a logstack framework, and a Kibana framework. The Logstar framework can be distributed on each node of the ES cluster, is used for collecting relevant log data, is analyzed and filtered and then is sent to the ES cluster for storage, and then the Kibana framework presents the log data to a user, such as providing various APIs for the user to inquire and operate. The specified log data can be log data manually specified by a user in the ELK log system, or can be log data screened out according to preset screening conditions. The preset screening condition may be to specify a screening by a time period, a keyword, or the like. For example, using instructions: the sed-n '/2014-12-17:16:17:20/,/2014-12-17:16:17:36/p' test.log, and the log data of 2014-12-17:16:20-2014-12-17:16:17:36 are found; the line number of the log data containing the key "insurance" is obtained using the instruction cat-n test.log|grep "insurance".
The HDFS is used to store and manage files. The file in HDFS is physically a block store (block), the size of which may be specified by a configuration parameter (dfs.blocksize), the default size being 128M. The HDFS file system may provide a unified abstract directory tree that clients may access files through specified paths, such as: hdfs:// naminode: port/dir-a/dir-b/dir-c/file. The management of directory structure and file block location information (metadata) is borne by a name-node, which is the HDFS cluster master node, responsible for maintaining the directory tree of the entire HDFS file system, and the data block information (block identifier and data-node server where each path (file) corresponds to. The storage management of each block of the file is borne by a data-node, wherein the data-node is an HDFS cluster slave node, and each block can store a plurality of copies on a plurality of data-nodes (the number of the copies can also be set to dfs.duplicate through a parameter, and the default is 3). Data-node periodically reports its own stored file block information to Name-node, which is responsible for maintaining the number of copies of the file, by applying to Name-node when access to HDFS is requested.
In an embodiment, the specified log data may be backed up and stored from the ES cluster onto the HDFS by means of snapshot migration, specifically: adding repository-hdfs plugins into the ES cluster and modifying the configuration file of each node; and establishing a warehouse for creating and storing snapshots, wherein the ES cluster can create a plurality of warehouses, a designated warehouse can be selected for creating the snapshots, each snapshot can contain a plurality of indexes, the default is to backup the index of the whole ES cluster, or the index of only designated log data can be designated to be backed up, and then the snapshot is restored in the HDFS, so that the designated log data can be backed up and stored from the ES cluster to the HDFS.
In an embodiment, the storing of the specified log data in the ELK log system from the ES cluster backup to the HDFS may also be achieved by creating an ES-Hadoop framework, specifically: creating a document in the ES cluster (the data in the ES cluster is stored by taking the document as a basic unit), wherein the document comprises index, type, id and other information, index is used for indicating to which index the document belongs, type is used for indicating to which type the document belongs, id is an identifier of the document, then creating a data migration mapping table from the ES cluster to Hadoop, the value of a key of a mapping table parameter is the id value of the document in the ES cluster, and the input parameter is the content of the document in the ES cluster; creating a Job class of data migration from the ES cluster to the Hadoop, wherein the Job class can read data from the ES cluster and convert the data into input parameters of the mapping table, and finally starting a MapReduce task to realize backup storage of specified log data from the ES cluster to the HDFS.
Step S12, creating a Kubernetes cluster, wherein the Kubernetes cluster is independent of the ES cluster and the HDFS of the ELK log system.
In one embodiment, the Kubernetes cluster may be created in an ELK log system. Specifically, creating Kubernetes clusters may be accomplished by: setting the number and specification of cloud virtual machines used for creating the Kubernetes cluster, and completing the creation of the required cloud virtual machines; acquiring IP information and ssh (Secure Shell, secure Shell protocol) information of a cloud virtual machine; copying binary files required by deploying the Kubernetes cluster to the created cloud virtual machine by using a ssh tool, and setting the parameters of the Kubernetes cluster; finally all components of the Kubernetes cluster are deployed using kubectl tools.
In an embodiment, the Kubernetes cluster is decoupled from the ES clusters in the HDFS and the ELK log system, i.e., the Kubernetes cluster, the ES cluster, and the HDFS are independent of each other.
Step S13, receiving a query request for the specified log data stored to the HDFS, and calling the Kubernetes cluster to create a temporary ES cluster.
In one embodiment, when a user wants to query log data backed up to the HDFS, the user may fill in basic information of the backed up data on a page of an ELK log system and issue a query request, where the ELK log system may call the Kubernetes cluster to create a temporary ES cluster for the user, where the temporary ES cluster is an ES cluster independent of the ELK log system.
In an embodiment, in the process of creating the temporary ES cluster, the data size of the specified log data stored in the HDFS may be first obtained, and the temporary ES cluster including the corresponding number of nodes may be created according to the data size of the specified log data. In other words, if the Data amount of the specified log Data is larger, a larger number of ES-Client nodes, ES-Data nodes, and ES-Master nodes need to be created for the temporary ES cluster, and if the Data amount of the specified log Data is smaller, a smaller number of ES-Client nodes, ES-Data nodes, and ES-Master nodes may be created for the temporary ES cluster. It is understood that the temporary ES cluster includes at least one ES-Client node, one ES-Data node, and one ES-Master node.
In one embodiment, the ELK log system has a rights management function, and the rights of the ELK log viewable by each user are different, and the data amount of log data that a user can backup to the HDFS is also different. In the process of calling the interface of the Kubernetes cluster to create a temporary ES cluster for a user, the ELK log system creates corresponding quantity of ES-Client nodes, ES-Data nodes and ES-Master nodes according to the log Data size of the HDFS which the user has backed up to before. The different user log sizes correspond to different numbers of ES-Client nodes, ES-Data nodes, and ES-Master nodes.
The ES-Data node is mainly used for storing index Data, and can perform operations such as adding, deleting, checking, aggregating and the like on the document. The ES-Master node is used to perform cluster operations related content, such as creating or deleting indexes, tracking which nodes are part of the cluster, and deciding which shards are allocated to the related nodes. The ES-Client node can coordinate the ES-Master node and the ES-Data node, and after the ES-Client node joins the cluster, the state of the cluster can be obtained and the request can be directly routed according to the state of the cluster. When both the ES-Master node and the ES-Data node are configured to false, the ES-Client node may handle routing requests, handle searches, distribute indexes, etc.
Step S14, the appointed log data stored by the distributed file system is imported into the temporary ES cluster, so that the query request is executed through the temporary ES cluster.
In one embodiment, the specified log data stored by the distributed file system may also be imported into the temporary ES cluster by creating an ES-Hadoop framework and utilizing the ES-Hadoop framework. After the specified log data stored by the distributed file system is imported into the temporary ES cluster, returning to a service address of the temporary ES cluster, and enabling a user to link to the temporary ES cluster according to the service address so as to execute the query request through the temporary ES cluster.
In one embodiment, establishing an ES-Hadoop framework and importing the specified log data stored by the distributed file system into the temporary ES cluster using the ES-Hadoop framework may be implemented by: firstly, creating a mapping table for data migration from Hadoop to the temporary ES cluster, wherein the input of the mapping table is an HDFS file (the HDFS file comprises log data to be imported), and the output result of the mapping table is a Text in json format; then creating a Job class of data migration from Hadoop to the temporary ES cluster, wherein the Job class can convert an output result (Text in json format) of MapReduce into an id of the temporary ES cluster and contents of a document; finally, a MapReduce task is started to import data from the HDFS into the temporary ES cluster.
In an embodiment, the importing of the specified log data stored by the distributed file system into the temporary ES cluster may also be implemented by: firstly, loading specified log data from HDFS into Spark SQL and storing the specified log data in the form of RDD (distributed Java object collection); then adding preset data structure information to update the RDD, and creating DATAFRAME (a set of distributed Row objects) to connect to the temporary ES cluster according to the updated RDD; finally, an index is created by DATAFRAME and written into the temporary ES cluster.
Referring to fig. 2, compared with the log searching method shown in fig. 1, the log searching method shown in fig. 2 further includes step S15 and step S16.
Step S15, monitoring whether the operation of the temporary ES cluster for executing the query request is finished;
step S16, when the operation of the query request is finished, releasing the temporary ES cluster.
In an embodiment, after the ELK log system monitors that the user completes the query operation of log data on the temporary ES cluster, the temporary ES cluster is released, so that the temporary ES cluster is created only for the current user, system resources are not occupied, uniqueness of user operation is ensured, and the whole ELK log system resources have good dispensability and do not affect operation experience of other users. When a query request for the log data stored by the HDFS is received again, a temporary ES cluster needs to be re-created to implement the data query operation.
Embodiment two:
FIG. 3 is a functional block diagram of a log search system according to a preferred embodiment of the present invention.
Referring to fig. 3, the log retrieval system 10 may include a backup module 101, a first creation module 102, a second creation module 103, an execution module 104, a monitoring module 105, and a release module 106.
The backup module 101 is configured to backup and store specified log data in the ELK log system from the ES cluster to a distributed file system (HDFS).
In one embodiment, the ELK log system includes an ES cluster, a logstack framework, and a Kibana framework. The Logstar framework can be distributed on each node of the ES cluster, is used for collecting relevant log data, is analyzed and filtered and then is sent to the ES cluster for storage, and then the Kibana framework presents the log data to a user, such as providing various APIs for the user to inquire and operate. The specified log data can be log data manually specified by a user in the ELK log system, or can be log data screened out according to preset screening conditions. The preset screening condition may be to specify a screening by a time period, a keyword, or the like. For example, using instructions: the sed-n '/2014-12-17:16:17:20/,/2014-12-17:16:17:36/p' test.log, and the log data of 2014-12-17:16:20-2014-12-17:16:17:36 are found; the line number of the log data containing the key "insurance" is obtained using the instruction cat-n test.log|grep "insurance".
The HDFS is used to store and manage files. The file in HDFS is physically a block store (block), the size of which may be specified by a configuration parameter (dfs.blocksize), the default size being 128M. The HDFS file system may provide a unified abstract directory tree that clients may access files through specified paths, such as: hdfs:// naminode: port/dir-a/dir-b/dir-c/file. The management of directory structure and file block location information (metadata) is borne by a name-node, which is the HDFS cluster master node, responsible for maintaining the directory tree of the entire HDFS file system, and the data block information (block identifier and data-node server where each path (file) corresponds to. The storage management of each block of the file is borne by a data-node, wherein the data-node is an HDFS cluster slave node, and each block can store a plurality of copies on a plurality of data-nodes (the number of the copies can also be set to dfs.duplicate through a parameter, and the default is 3). Data-node periodically reports its own stored file block information to Name-node, which is responsible for maintaining the number of copies of the file, by applying to Name-node when access to HDFS is requested.
In an embodiment, the backup module 101 may backup and store the specified log data from the ES cluster onto the HDFS by means of snapshot migration, specifically: adding repository-hdfs plugins into the ES cluster and modifying the configuration file of each node; and establishing a warehouse for creating and storing snapshots, wherein the ES cluster can create a plurality of warehouses, a designated warehouse can be selected for creating the snapshots, each snapshot can contain a plurality of indexes, the default is to backup the index of the whole ES cluster, or the index of only designated log data can be designated to be backed up, and then the snapshot is restored in the HDFS, so that the designated log data can be backed up and stored from the ES cluster to the HDFS.
In an embodiment, the backup module 101 may further implement backup storage of the specified log data in the ELK log system from the ES cluster to the HDFS by creating an ES-Hadoop framework, specifically: creating a document in the ES cluster (the data in the ES cluster is stored by taking the document as a basic unit), wherein the document comprises index, type, id and other information, index is used for indicating to which index the document belongs, type is used for indicating to which type the document belongs, id is an identifier of the document, then creating a data migration mapping table from the ES cluster to Hadoop, the value of a key of a mapping table parameter is the id value of the document in the ES cluster, and the input parameter is the content of the document in the ES cluster; creating a Job class of data migration from the ES cluster to the Hadoop, wherein the Job class can read data from the ES cluster and convert the data into input parameters of the mapping table, and finally starting a MapReduce task to realize backup storage of specified log data from the ES cluster to the HDFS.
The first creation module 102 is configured to create a Kubernetes cluster, wherein the Kubernetes cluster is independent of the ES cluster and the HDFS of the ELK log system.
In one embodiment, the Kubernetes cluster may be created in an ELK log system. Specifically, the first creation module 102 creates Kubernetes clusters by: setting the number and specification of cloud virtual machines used for creating the Kubernetes cluster, and completing the creation of the required cloud virtual machines; acquiring IP information and ssh (Secure Shell, secure Shell protocol) information of a cloud virtual machine; copying binary files required by deploying the Kubernetes cluster to the created cloud virtual machine by using a ssh tool, and setting the parameters of the Kubernetes cluster; finally all components of the Kubernetes cluster are deployed using kubectl tools.
In an embodiment, the Kubernetes cluster is decoupled from the ES clusters in the HDFS and the ELK log system, i.e., the Kubernetes cluster, the ES cluster, and the HDFS are independent of each other.
The second creating module 103 is configured to receive a query request for specified log data stored to the HDFS, and call the Kubernetes cluster to create a temporary ES cluster.
In one embodiment, when a user wants to query log data backed up to the HDFS, the user may fill in basic information of the backed up data on a page of an ELK log system and issue a query request, where the ELK log system may call the Kubernetes cluster to create a temporary ES cluster for the user, where the temporary ES cluster is an ES cluster independent of the ELK log system.
In an embodiment, in the process of creating the temporary ES cluster, the second creating module 103 may first obtain the data size of the specified log data stored to the HDFS, and create the temporary ES cluster including the corresponding number of nodes according to the data size of the specified log data. In other words, if the Data amount of the specified log Data is larger, a larger number of ES-Client nodes, ES-Data nodes, and ES-Master nodes need to be created for the temporary ES cluster, and if the Data amount of the specified log Data is smaller, a smaller number of ES-Client nodes, ES-Data nodes, and ES-Master nodes may be created for the temporary ES cluster. It is understood that the temporary ES cluster includes at least one ES-Client node, one ES-Data node, and one ES-Master node.
In one embodiment, the ELK log system has a rights management function, and the rights of the ELK log viewable by each user are different, and the data amount of log data that a user can backup to the HDFS is also different. In the process of calling the interface of the Kubernetes cluster to create a temporary ES cluster for a user, the ELK log system creates corresponding quantity of ES-Client nodes, ES-Data nodes and ES-Master nodes according to the log Data size of the HDFS which the user has backed up to before. The different user log sizes correspond to different numbers of ES-Client nodes, ES-Data nodes, and ES-Master nodes.
The ES-Data node is mainly used for storing index Data, and can perform operations such as adding, deleting, checking, aggregating and the like on the document. The ES-Master node is used to perform cluster operations related content, such as creating or deleting indexes, tracking which nodes are part of the cluster, and deciding which shards are allocated to the related nodes. The ES-Client node can coordinate the ES-Master node and the ES-Data node, and after the ES-Client node joins the cluster, the state of the cluster can be obtained and the request can be directly routed according to the state of the cluster. When both the ES-Master node and the ES-Data node are configured to false, the ES-Client node may handle routing requests, handle searches, distribute indexes, etc.
The execution module 104 is configured to import specified log data stored by the distributed file system into the temporary ES cluster, so as to execute the query request through the temporary ES cluster.
In one embodiment, the execution module 104 may also import the specified log data stored by the distributed file system into the temporary ES cluster by creating an ES-Hadoop framework and utilizing the ES-Hadoop framework. After the specified log data stored by the distributed file system is imported into the temporary ES cluster, returning to a service address of the temporary ES cluster, and enabling a user to link to the temporary ES cluster according to the service address so as to execute the query request through the temporary ES cluster.
In one embodiment, the execution module 104 may be configured to establish an ES-Hadoop framework and import the specified log data stored by the distributed file system into the temporary ES cluster using the ES-Hadoop framework by: firstly, creating a mapping table for data migration from Hadoop to the temporary ES cluster, wherein the input of the mapping table is an HDFS file (the HDFS file comprises log data to be imported), and the output result of the mapping table is a Text in json format; then creating a Job class of data migration from Hadoop to the temporary ES cluster, wherein the Job class can convert an output result (Text in json format) of MapReduce into an id of the temporary ES cluster and contents of a document; finally, a MapReduce task is started to import data from the HDFS into the temporary ES cluster.
In an embodiment, the execution module 104 may further implement importing the specified log data stored by the distributed file system into the temporary ES cluster by: firstly, loading specified log data from HDFS into Spark SQL and storing the specified log data in the form of RDD (distributed Java object collection); then adding preset data structure information to update the RDD, and creating DATAFRAME (a set of distributed Row objects) to connect to the temporary ES cluster according to the updated RDD; finally, an index is created by DATAFRAME and written into the temporary ES cluster.
The monitoring module 105 is configured to monitor whether the temporary ES cluster performs the operation of the query request to end.
In one embodiment, the log retrieval system 10 may determine whether the temporary ES cluster needs to be released by monitoring whether the operation of the temporary ES cluster to execute the query request is completed.
The releasing module 106 is configured to release the temporary ES cluster after the operation of the query request has ended.
In an embodiment, after the ELK log system monitors that the user completes the query operation of the log data on the temporary ES cluster, the releasing module 106 releases the temporary ES cluster, so that the temporary ES cluster is created only effectively for the current user, and does not occupy system resources, so that the uniqueness of the user operation is ensured, the whole ELK log system resources have good assignability, and the operation experience of other users is not affected. When a query request for the log data stored by the HDFS is received again, a temporary ES cluster needs to be re-created to implement the data query operation.
FIG. 4 is a schematic diagram of a computer device according to a preferred embodiment of the invention.
The computer device 1 comprises a memory 20, a processor 30 and a computer program 40, such as a log search program, stored in the memory 20 and executable on the processor 30. The processor 30 implements the steps in the above-described log search method embodiment, for example, steps S11 to S14 shown in fig. 1 and steps S11 to S16 shown in fig. 2, when executing the computer program 40. Or the processor 30, when executing the computer program 40, performs the functions of the modules of the above-described log retrieval system embodiment, such as the modules 101-106 of fig. 3.
Illustratively, the computer program 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program 40 in the computer device 1. For example, the computer program 40 may be partitioned into a backup module 101, a first creation module 102, a second creation module 103, an execution module 104, a monitoring module 105, and a release module 106 in fig. 3. For specific functions of each module, see embodiment two.
The computer device 1 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the computer apparatus 1 and does not constitute a limitation of the computer apparatus 1, and may comprise more or less components than shown, or may combine certain components, or different components, e.g. the computer apparatus 1 may further comprise input and output devices, network access devices, buses, etc.
The Processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, the processor 30 being the control center of the computer device 1, the various interfaces and lines being used to connect the various parts of the overall computer device 1.
The memory 20 may be used to store the computer program 40 and/or modules/units, and the processor 30 may perform various functions of the computer device 1 by executing or executing the computer program and/or modules/units stored in the memory 20, and invoking data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the computer apparatus 1 (such as audio data, phonebook, etc.), and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The modules/units integrated in the computer device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
In the several embodiments provided by the present invention, it should be understood that the disclosed computer apparatus and method may be implemented in other ways. For example, the above-described embodiments of the computer apparatus are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be other manners of division when actually implemented.
In addition, each functional unit in the embodiments of the present invention may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Multiple units or computer means recited in the computer means claim may also be implemented by means of software or hardware by means of the same unit or computer means. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (9)

1. A log retrieval method, the method comprising:
Storing specified log data in the ELK log system from the ES cluster backup to a distributed file system;
Creating a Kubernetes cluster, wherein the Kubernetes cluster is independent of the ES cluster of the ELK log system and the distributed file system;
receiving a query request for specified log data stored to the distributed file system, and calling the Kubernetes cluster to create a temporary ES cluster; and
Importing the specified log data stored by the distributed file system into the temporary ES cluster to execute the query request through the temporary ES cluster;
monitoring whether the operation of the temporary ES cluster for executing the query request is finished; and
And when the operation of the query request is finished, releasing the temporary ES cluster.
2. The method of claim 1, wherein storing specified log data in the ELK log system from the ES cluster backup to a distributed file system comprises:
Storing specified log data in the ELK log system from the ES cluster backup to the distributed file system by using a snapshot migration algorithm; or (b)
And establishing an ES-Hadoop framework, and using the ES-Hadoop framework to store specified log data in the ELK log system from the ES cluster backup to the distributed file system.
3. The log retrieval method of claim 1, wherein the temporary ES cluster is independent of an ES cluster in the ELK log system.
4. A method of log retrieval as defined in any one of claims 1-3, wherein said step of calling said Kubernetes cluster to create a temporary ES cluster comprises:
Acquiring the data size of the specified log data stored to the distributed file system; and
Creating a temporary ES cluster containing the corresponding node number according to the data size of the specified log data;
The temporary ES cluster comprises at least one ES-Client node, at least one ES-Data node and at least one ES-Master node.
5. The log retrieval method as recited in claim 1, wherein the step of importing the specified log data stored by the distributed file system into the temporary ES cluster comprises:
and establishing an ES-Hadoop framework, and importing the specified log data stored by the distributed file system into the temporary ES cluster by using the ES-Hadoop framework.
6. A log retrieval method according to any one of claims 1 to 3, wherein the step of importing specified log data stored by the distributed file system into the temporary ES cluster to execute the query request through the temporary ES cluster comprises;
importing the specified log data stored by the distributed file system into the temporary ES cluster, and returning to the service address of the temporary ES cluster after the data importing operation is completed; and
And linking to the temporary ES cluster according to the service address so as to execute the query request through the temporary ES cluster.
7. A log retrieval system, the system comprising:
the backup module is used for backing up and storing the appointed log data in the ELK log system from the ES cluster to a distributed file system;
A first creation module for creating a Kubernetes cluster, wherein the Kubernetes cluster is independent of the ES cluster of the ELK log system and the distributed file system;
The second creating module is used for receiving a query request for the specified log data stored in the distributed file system and calling the Kubernetes cluster to create a temporary ES cluster;
An execution module, configured to import specified log data stored by the distributed file system into the temporary ES cluster, so as to execute the query request through the temporary ES cluster;
The monitoring module is used for monitoring whether the operation of the temporary ES cluster for executing the query request is finished or not; and
And the releasing module is used for releasing the temporary ES cluster when the operation of the query request is finished.
8. A terminal comprising a processor and a memory, the memory having stored thereon a number of computer programs, characterized in that the processor is adapted to implement the steps of the log retrieval method according to any of claims 1-6 when executing the computer programs stored in the memory.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the log retrieval method according to any one of claims 1-6.
CN201811246089.4A 2018-10-24 2018-10-24 Log retrieval method, system, terminal and computer readable storage medium Active CN109299225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811246089.4A CN109299225B (en) 2018-10-24 2018-10-24 Log retrieval method, system, terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811246089.4A CN109299225B (en) 2018-10-24 2018-10-24 Log retrieval method, system, terminal and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109299225A CN109299225A (en) 2019-02-01
CN109299225B true CN109299225B (en) 2024-05-28

Family

ID=65158639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811246089.4A Active CN109299225B (en) 2018-10-24 2018-10-24 Log retrieval method, system, terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109299225B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633265A (en) * 2019-08-22 2019-12-31 达疆网络科技(上海)有限公司 Method for realizing ES (ES) cross-multi-version data migration
CN112948188A (en) * 2021-01-28 2021-06-11 苏州浪潮智能科技有限公司 Log file screening method, system and medium
CN113761043A (en) * 2021-08-17 2021-12-07 紫金诚征信有限公司 Data extraction method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102937980A (en) * 2012-10-18 2013-02-20 亿赞普(北京)科技有限公司 Method for inquiring data of cluster database
US8719415B1 (en) * 2010-06-28 2014-05-06 Amazon Technologies, Inc. Use of temporarily available computing nodes for dynamic scaling of a cluster
CN104239532A (en) * 2014-09-19 2014-12-24 浪潮(北京)电子信息产业有限公司 Method and device for self-making user extraction information tool in Hive
CN107547653A (en) * 2017-09-11 2018-01-05 华北水利水电大学 A kind of distributed file storage system
CN107766386A (en) * 2016-08-22 2018-03-06 中兴通讯股份有限公司 A kind of solr data migration methods and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8156082B2 (en) * 2006-10-06 2012-04-10 Sybase, Inc. System and methods for temporary data management in shared disk cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8719415B1 (en) * 2010-06-28 2014-05-06 Amazon Technologies, Inc. Use of temporarily available computing nodes for dynamic scaling of a cluster
CN102937980A (en) * 2012-10-18 2013-02-20 亿赞普(北京)科技有限公司 Method for inquiring data of cluster database
CN104239532A (en) * 2014-09-19 2014-12-24 浪潮(北京)电子信息产业有限公司 Method and device for self-making user extraction information tool in Hive
CN107766386A (en) * 2016-08-22 2018-03-06 中兴通讯股份有限公司 A kind of solr data migration methods and device
CN107547653A (en) * 2017-09-11 2018-01-05 华北水利水电大学 A kind of distributed file storage system

Also Published As

Publication number Publication date
CN109299225A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
US11809726B2 (en) Distributed storage method and device
US11226847B2 (en) Implementing an application manifest in a node-specific manner using an intent-based orchestrator
US9971823B2 (en) Dynamic replica failure detection and healing
JP5902716B2 (en) Large-scale storage system
US10936423B2 (en) Enhanced application write performance
US9372880B2 (en) Reclamation of empty pages in database tables
CN109299225B (en) Log retrieval method, system, terminal and computer readable storage medium
US10372555B1 (en) Reversion operations for data store components
CN109902114B (en) ES cluster data multiplexing method, system, computer device and storage medium
US9348855B2 (en) Supporting big data in enterprise content management systems
CN105468720A (en) Method for integrating distributed data processing systems, corresponding systems and data processing method
CN102938784A (en) Method and system used for data storage and used in distributed storage system
CN111324606B (en) Data slicing method and device
US11991094B2 (en) Metadata driven static determination of controller availability
US10929246B2 (en) Backup capability for object store used as primary storage
CN111684437B (en) Staggered update key-value storage system ordered by time sequence
CN112685499A (en) Method, device and equipment for synchronizing process data of work service flow
CN109684270A (en) Database filing method, apparatus, system, equipment and readable storage medium storing program for executing
CN111225003B (en) NFS node configuration method and device
CN115510016A (en) Client response method, device and medium based on directory fragmentation
CN114996053A (en) Remote volume replication transmission method, system, device and storage medium
US11042665B2 (en) Data connectors in large scale processing clusters
CN115525618A (en) Storage cluster, data storage method, system and storage medium
US20200233870A1 (en) Systems and methods for linking metric data to resources
WO2024113898A1 (en) Metadata reporting method and apparatus, and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant