WO2023273544A1 - Procédé et appareil de stockage de fichier journal, dispositif, et support de stockage - Google Patents

Procédé et appareil de stockage de fichier journal, dispositif, et support de stockage Download PDF

Info

Publication number
WO2023273544A1
WO2023273544A1 PCT/CN2022/088359 CN2022088359W WO2023273544A1 WO 2023273544 A1 WO2023273544 A1 WO 2023273544A1 CN 2022088359 W CN2022088359 W CN 2022088359W WO 2023273544 A1 WO2023273544 A1 WO 2023273544A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
index
data
current
fragment
Prior art date
Application number
PCT/CN2022/088359
Other languages
English (en)
Chinese (zh)
Inventor
周凯洋
张成思
刘叶
周子站
王雪飞
Original Assignee
中国民航信息网络股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国民航信息网络股份有限公司 filed Critical 中国民航信息网络股份有限公司
Publication of WO2023273544A1 publication Critical patent/WO2023273544A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the technical field of computers, in particular to a storage method, device, equipment and storage medium of a log file.
  • Elasticsearch (hereinafter referred to as ES) is an existing open source distributed search and data analysis engine. When storing files, it will create multiple indexes (index), and then save the files in these indexes.
  • An index of ES is equivalent to a database. An index will be divided into multiple index fragments. Each index fragment is used to save a certain amount of files. Multiple index fragments are stored in multiple computer nodes in the distributed system. middle.
  • ES's index template (Elasticsearch Index Template) provides a multiplexing mechanism that can automatically create an index, but in actual use, since the ES index template is used to automatically create an index, each log file can only be stored by one index. And the number of index fragments included in the index needs to be specified by the user in advance. However, in actual business scenarios, it is difficult to accurately estimate the data volume of log files. A log file may be too large, resulting in an excessively large amount of data stored in each index slice of the corresponding index (more than 20GB officially recommended by ES) to 40GB of data volume), that is, there will be super large fragmentation, and the appearance of super large fragmentation will have a negative impact on the stability and query performance of the ES system.
  • the present application provides a log file storage method, device, device, and storage medium to solve the problem of excessive data volume of index fragments in the Elasticsearch system.
  • the first aspect of the present application provides a method for storing log files, including:
  • the current index refers to the index currently used to store the target log file; the data volume of the index fragments of the current index is not within the preset data volume range , the current index is in an unavailable state;
  • the second aspect of the present application provides a log file storage device, including:
  • a collection unit configured to collect log data from the target log file
  • a processing unit configured to process the log data through a log analysis component
  • a judging unit configured to judge whether the current index is available or unavailable; wherein, the current index refers to an index currently used to store the target log file; the data volume of the index fragments of the current index is not within the preset When the amount of data is within the range, the current index is in an unavailable state;
  • a creating unit configured to create a new index by using the collection item and collection time of the log data if the current index is in an unavailable state
  • a storage unit configured to store the log data in the new index.
  • the third aspect of the present application provides a computer storage medium for storing a computer program.
  • the computer program When executed, it is specifically used to implement the log file storage method provided by any one of the first aspect of the present application.
  • the fourth aspect of the present application provides an electronic device, including a memory and a processor
  • the memory is used to store computer programs
  • the processor is configured to execute the computer program, specifically to implement the log file storage method provided in any one of the first aspects of the present application.
  • the present application provides a log file storage method, device, device, and storage medium.
  • the method includes collecting log data from the target log file; processing the log data through a log analysis component; Large and unavailable, the current index refers to the index currently used to store the target log file; if the current index is unavailable, use the log data name and collection time to create a new index, and store the log data in the new index.
  • This solution automatically identifies the size of the index fragment of the current index during the storage process, and automatically creates a new index to save the collected log data when the index fragment is large, thereby avoiding the occurrence of super large index fragments.
  • FIG. 1 is a schematic diagram of a module structure of a log file storage system provided by an embodiment of the present application
  • Fig. 2 is the flow chart of a kind of log file storage method provided by the embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a log file storage device provided in an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • the method for storing log files is mainly used to store log files generated by each application server during operation in the Elasticsearch system, and the method can be executed by the log file storage system shown in FIG. 1 .
  • the log file storage system may include a log collection component, an index monitoring component, an index generation component, and an index analysis component. The connections between the components are shown in FIG. 1 .
  • the log collection component is installed on each application server respectively, and is used for collecting log files of each application server, and sending these log files to the log analysis component.
  • the log collection component in this application can be implemented using filebeat, which is an existing log data collector (a data collection program) for local files, which can monitor log directories or specific log files (tail files), And forward them to Elasticsearch through the Kafka message queue, so that the ES system will save the logs in the corresponding index.
  • the log analysis component on each application server can run in an independent process to monitor the log files of the application server in real time. Whenever new log data is added to the log file of the application server, the Read these newly added log data and store them in the message queue Kafka, so that these log data can be passed to the log analysis component through the message queue.
  • the index monitoring component is used to monitor the current capacity of each index in the ES system in real time. Specifically, it can monitor in real time the amount of data currently stored in each index fragment, the total data amount of each index (the total amount of data is all the indexes contained in the index shards) and the number of index shards, when the monitored data volume of an index shard is too large, or the monitored total data volume of an index does not match the number of its index shards , you can set the index to an unavailable state, indicating that the index cannot continue to write data.
  • the index generation component is used for monitoring results of the index monitoring component, and if there is still an index that can be written normally, no index generation operation will be performed. If there is no index that can be written normally, a new index will be generated according to the naming rules, and the newly generated index information will be passed to the log analysis component, so that the log analysis component can write the log data newly collected by the log collection component into this The newly generated index.
  • the log analysis component in the system provided by this application, can be implemented with the logstash program (an existing log data processing program).
  • Logstash is equivalent to a pipeline with real-time data transmission capability, responsible for transmitting data information from the input end of the pipeline to the output end of the pipeline, and processing the transmitted data information.
  • the log analysis component is based on the logstash program deployed on the dedicated log analysis server.
  • the log analysis component can obtain the log data written by the log collection component from the aforementioned Kafka message queue, and process these log data according to the configured rules. processing, and then write the processed log data into the index corresponding to the log file.
  • the embodiment of the present application provides a method for storing log files, please refer to Figure 2, the method may include the following steps:
  • the target log file can refer to the log file of each application server.
  • the log file storage method provided by this application can be used to collect the data contained in the log file of each application server in real time, and store these data Stored in multiple indexes of the ES system.
  • each application server will pre-define a log file (equivalent to a file path), and the application server will write log data generated during operation into this log file in real time.
  • the above log data may include data requests sent by the client to the application server, and the application server returns reply data to the client after processing the data requests.
  • the data request sent by the client may be a flight information query request
  • the reply data fed back by the application server may be the flight information of one or more flights being queried.
  • step S202 may be implemented in various specific manners, which are not limited here.
  • the current index refers to the index currently used to store the target log file; when the data volume of the index fragments of the current index is not within the preset data volume range, the current index is in an unavailable state.
  • the ES system can create an index for the log file of the application server, and specify this index to save the log data of the application server, and the index is the current index in step S203.
  • the data volume of the log data stored in the log file of the application server also gradually increases, and correspondingly, the data volume of each index fragment of the current index that stores the log data also increases. Gradually increase.
  • the log monitoring component monitors that the data volume of the index fragment of the current index exceeds the data volume range (for example, 20G to 40G)
  • the current index can be set to an unavailable state. Relatively, if the log monitoring component If it is monitored that the data volume of the index fragments of the current index has not exceeded the data volume range, then the current index will be kept in an available state.
  • step S204 If the current index is available, execute step S204; otherwise, if the current index is unavailable, execute step S205.
  • the collection item of the log data is used to refer to one or more pre-specified fields in the log data.
  • Step S205 can be executed by calling the original log template of the ES system, and the specific implementation process will not be described in detail.
  • step S201 the method of collecting log data may be:
  • the log collection component can use the two parts of the finder prospector and the collector harvester to read the data of the target log file (tail file) and send the data to the specified output (that is, the Kafka message queue), where the finder The number of can be multiple.
  • the log collection component of an application server When the log collection component of an application server is started, it will start one or more finders to view the path of the target log file specified in advance in the application server, and for each log file searched by the prospector, the prospector starts the harvester. Each harvester will read the new log data of the log file from the log file, and finally filebeat will combine the log data read by the harvester from multiple log files into aggregated data, and then write the aggregated data to the pre-specified output The path, that is, the Kafka message queue.
  • the log collection component filebeat can collect the log data of the target log file based on the following configuration information:
  • the paths item specifies the path of the target log file. Multiple paths can be specified here, so that the log collection component can collect log data from multiple target log files.
  • the path of the target log file can support regular expression matching. For example, httpd-*.log includes httpd-1.log, httpd-2.log, httpd-a.log, etc. All paths start with httpd- and end with Log files ending in .log.
  • the tail_files item is to specify the way to collect log files. If this item is true, it means incremental collection, and the newly generated log data will be collected. If this item is false, it means full collection, and all log data will be collected. In this application, the tail_files item is generally set to true, that is, the incremental collection mode is usually used to collect log data.
  • the harvester collects log data at regular intervals, and only collects the newly added log data in the target log file between the time of the last collection and the current time each time.
  • the harvester will record the log offset of each collection (representing the position of the currently collected log data in the target log file), and the next time the harvester records the log from the previous The offset starts and continues to collect new log data downwards.
  • the corresponding filebeat monitoring process can also be set on the application server, which will monitor the filebeat process status, and will automatically start when the filebeat process is lost.
  • the application server which will monitor the filebeat process status, and will automatically start when the filebeat process is lost.
  • the information under output.Kafka is the configuration of the Kafka message queue in this application. Due to the compatibility between filebeat and Kafka, filebeat does not support Kafka with authentication methods other than SASL/PLAIN, so it is necessary to add the Kafka message queue SASL/PLAIN authentication method, the Kafka message queue with this authentication method needs other applications to provide a specified username and password to access, so username and password need to be added under output.Kafka.
  • the local storage space can be divided into 3 to 30 data partitions according to the data volume of the log data, thereby improving storage efficiency.
  • the log data of a target log file will be stored in multiple different index shards, the log data needs to be divided into multiple log shards, and the data needs to be determined in multiple log shards.
  • the context correspondence relationship between, specifically, is to determine which log data corresponds to the same data request, therefore, in step S202, the specific execution process of processing log data through the log analysis component may be:
  • a corresponding relationship is established between the reply data contained in the current log fragment and the reply data contained in the previous log fragment and corresponding to the same request.
  • the current log fragment refers to every log fragment obtained by dividing the log data except the first log fragment;
  • the previous log fragment refers to the previous log fragment of the current log fragment.
  • these log fragments may be sorted according to the order in which they appear in the log data.
  • the second log fragment copy the first log fragment as redundant data, compare the second log fragment with the redundant data, and detect that the redundant data in the second log fragment corresponds to The data of the same data request, and then record the correspondence between the second log fragment and the above redundant data as the context correspondence of the second log fragment.
  • the currently replicated redundant data can be discarded, and then the context correspondence of the third log fragment is detected.
  • an application server feeds back two reply information for a data request, the first reply information is recorded in the first log fragment, and the second reply information is recorded in the second log fragment.
  • the third log fragment copy the second log fragment as redundant data, and compare the third log fragment with the redundant data to determine whether the third log fragment neutralizes the redundant data.
  • the data corresponds to the data of the same data request, and then record the correspondence between the third log fragment and the above redundant data as the context correspondence of the third log fragment.
  • the context correspondence of each of the above log fragments can be recorded in the message manager inside the Kafka message queue, and the message manager can be implemented using existing program frameworks such as Zookeeper and Redis.
  • the Kafka message queue is a distributed, partition-supporting, replica-based distributed messaging system based on the Zookeeper framework.
  • the Kafka message queue was originally developed by Linkedin.
  • the most prominent feature of the Kafka message queue is that it can process large amounts of data in real time to meet various demand scenarios.
  • the characteristics of high throughput, low delay, and high concurrency are considered, so the Kafka message queue is used to store the log data collected by the log collection component and the context correspondence between each log fragment.
  • the Kafka message queue can be deployed on the Spark cluster, and correspondingly, the above data can be stored in the internal cache of the Spark cluster.
  • the processing of the log data in step S202 may also include parsing the log data using the logstash program.
  • the logstash program s processing flow of log data can include five stages: input, decode, filter, encode, and output, where input is the input stage, which can be understood as reading the log data to be processed from the kafka message queue, and decode is the decoding stage. logstash At this stage, the program converts the log data from the data format customized by the Kafka message queue to the data format recognizable by the logstash program.
  • the filter is the filtering stage
  • the logstash program converts the log data converted from the previous stage at this stage according to the preset analysis Rules are processed
  • the encode stage is the encoding stage.
  • the logstash program encodes the processed log data into a data format that can be recognized by subsequent programs at this stage. In this application, it encodes it into a data format that can be recognized by the Elasticsearch system.
  • the output stage is the output stage.
  • the logstash program sends the encoded data to the subsequent program at this stage, which is sent to the Elasticsearch system in this application.
  • the logstash program can process log data based on the following configuration information:
  • the input item is used to specify the input source of the data to be processed by the logstash program.
  • the logstash program reads the log data to be processed from the Kafka message queue, so the input configures the relevant information of the Kafka message queue .
  • bootstrap_servers is the IP and port information of the Kafka message queue
  • topics_pattern is the queue name for obtaining the Kafka message queue
  • topics_pattern supports regular matching.
  • AOS_APPLOG_* configured above matches all queues starting with AOS_APP_LOG_.
  • the Filter item is used to specify the parsing rules when the logstash program processes log data.
  • a variety of grammars and plug-ins can be used to define the parsing rules.
  • grok is a plug-in that combines multiple predefined regular expressions to match segmented text and map to keywords. It is usually used for relatively simple preprocessing of log data.
  • the output item is used to specify the output address of the log data processed by the logstash program.
  • the log data processed by the logstash program will need to be output to the Elasticsearch system so as to be stored in the index corresponding to the target log file in the Elasticsearch system. Therefore, The output item specifies output to the specified index on the Elasticsearch system.
  • the index specified by the output item is the current index. If the current index is not available, a new index needs to be generated by the aforementioned index generation component. At this time, the index specified in the above output item is The corresponding change is a newly generated index, so that the processed log data is saved in the new index.
  • the index monitoring component can:
  • the index monitoring component can call the Rest interface (API) based on the Http protocol provided by Elasticsearch, and by sending a Rest request to the Rest interface, it can monitor the data volume of the index fragments of each index in the Elasticsearch system.
  • API Rest interface
  • the index monitoring component can use the restful interface of Elasticsearch to obtain the index information of Elasticsearch.
  • the index information of the index includes the number of index shards of the index and the amount of data stored in each index shard. and other information.
  • the index information obtained through restful is in the json data format. After parsing the index information in the json data format, you can get the total data volume of each index expressed in GB, and then divide the total data volume of the index by the index The number of index shards, the data volume of each index shard of the index can be obtained.
  • the monitoring component can judge whether the size of each fragment is larger than the preset data volume range (generally, the data volume range can be set to 20-40G, or can be set according to the actual situation), if it is larger than the specified value, then Set the index to unavailable state.
  • the preset data volume range generally, the data volume range can be set to 20-40G, or can be set according to the actual situation
  • the index name of an index, the size of the index data, whether it is available or unavailable, and the latest update time of the index will be recorded in the Elasticsearch index record file, so the index monitoring component determines the current After the data volume of the index shards of the index exceeds the data volume range, you can access the above index record file, and set the current index as unavailable in the index record file. In this way, it will be determined in step S203 that the current index is in an unavailable state.
  • the index record files of the Elasticsearch system can be synchronized to the server where the logstash program is located at regular intervals. On this basis, the logstash program can access the local index record file of the server. If the current index status in the index record file is available, the logstash program will continue to process the log data according to the original configuration information, and will process the log data Saved in the current index of the Elasticsearch system.
  • the logstash program can use the collection items and collection time of the currently read log data to create a new index in the Elasticsearch system, and then replace it with the index name of the newly created index
  • the index name of the current index in the aforementioned configuration information enables subsequent log data to be stored in a new index, and at the same time, the index name of the newly created index is added to the index record file of the Elasticsearch system.
  • the present invention solves the limitation that the current Elasticsearch index template cannot create a personalized index.
  • the current Elasticsearch index template cannot create a personalized index.
  • multiple log files for saving the target log file can also be created The index of the log data.
  • the indexing strategy of the present invention is simple and efficient. It does not need to accurately estimate the data volume of the log data of the log files to be stored. It only needs to set the maximum capacity of a single index and the number of fragments according to the actual resources of the system. When the amount of log data stored in a single index is too large, you can create a new index and store the log data in the new index, so as to avoid the excessive amount of data in each index fragment.
  • the present invention realizes the real-time monitoring of the index status of each index in the Elasticsearch system, can dynamically evaluate the size of the actual data volume of each index, and when the data volume of the index is too large, it can create itself according to the index strategy new index.
  • the present invention avoids the occurrence of fragmented small index fragments and super large index fragments through the dynamic creation of indexes in the Elasticsearch system, and improves the stability and query performance of the ES cluster.
  • the present invention realizes the efficient utilization of system resources and reduces the cost of manual maintenance in the later stage through the dynamic creation and refined control of indexes in the Elasticsearch system.
  • the index monitoring component monitors that the data volume of the index fragment of index-1 is greater than 40 GB, so the state of index-1 is set to unavailable in the index record file.
  • the data volume of the index fragments of each index can be prevented from being too large by creating a new index to prevent the occurrence of super large fragments.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the embodiment of the present application also provides a log file storage device, please refer to Figure 3, the storage device may include the following units:
  • the collection unit 301 is configured to collect log data from a target log file.
  • the processing unit 302 is configured to process log data through a log analysis component.
  • the judging unit 303 is configured to judge whether the current index is available or unavailable.
  • the current index refers to the index currently used to store the target log file; when the data volume of the index fragments of the current index is not within the preset data volume range, the current index is in an unavailable state.
  • the creation unit 304 is configured to create a new index by using the collection item and collection time of the log data if the current index is not available.
  • the storage unit 305 is configured to store log data in a new index.
  • the collection unit 301 collects log data, it is specifically used for:
  • the processing unit 301 when processing the log data through the log analysis component, is specifically used for:
  • the storage device further includes a monitoring unit 306, configured to:
  • the collection unit 301 is equivalent to the aforementioned log collection component
  • the processing unit 302 the judging unit 303 and the storage unit 305 are equivalent to the aforementioned log analysis component.
  • the creating unit 304 is equivalent to the aforementioned index generating component
  • the monitoring unit 306 is equivalent to the aforementioned index monitoring component.
  • the present application provides a log file device, wherein the collection unit 301 collects log data from the target log file; the processing unit 302 processes the log data through a log analysis component; Large and in an unavailable state, the current index refers to the index currently used to store the target log file; if the creation unit 304 is in an unavailable state, the name and collection time of the log data are used to create a new index, and the storage unit 305 will log Data is stored in the new index.
  • This solution automatically identifies the size of the index fragment of the current index during the storage process, and automatically creates a new index to save the collected log data when the index fragment is large, thereby avoiding the occurrence of super large index fragments.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit for obtaining at least two Internet Protocol addresses".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • the embodiment of the present application also provides an electronic device suitable for implementing the embodiments of the present disclosure, and a schematic structural diagram of the electronic device is shown in FIG. 4 .
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 4 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 400 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 401, which may be randomly accessed according to a program stored in a read-only memory (ROM) 402 or loaded from a storage device 408.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM read-only memory
  • various appropriate actions and processes are executed by programs in the memory (RAM) 403 .
  • RAM 403 In the RAM 403, various programs and data necessary for the operation of the electronic device 400 are also stored.
  • the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An input/output (I/O) interface 405 is also connected to bus 404 .
  • the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 407 such as a computer; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409.
  • the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 4 shows electronic device 400 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • the embodiment of the present application also provides a computer storage medium (that is, a computer-readable medium), the computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device Execute the log file storage method provided by any embodiment of the present application.
  • a computer storage medium that is, a computer-readable medium
  • the computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device Execute the log file storage method provided by any embodiment of the present application.
  • a computer-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the embodiment shown in FIG. 2 of the present application provides a method for storing log files, including:
  • the current index refers to the index currently used to store the target log file; the data volume of the index fragments of the current index is not within the preset data volume range , the current index is in an unavailable state;
  • the collecting log data includes:
  • the processing of the log data through the log analysis component includes:
  • a corresponding relationship is established between the reply data contained in the current log fragment and the reply data contained in the previous log fragment and corresponding to the same request; wherein, the current log fragment refers to the division of the log data into , each log fragment except the first log fragment; the previous log fragment refers to the previous log fragment of the current log fragment.
  • the current index before the judging that the current index is in the available state or the unavailable state, it also includes:
  • the embodiment shown in FIG. 3 of the present application is a log file storage device, including:
  • a collection unit configured to collect log data from the target log file
  • a processing unit configured to process the log data through a log analysis component
  • a judging unit configured to judge whether the current index is available or unavailable; wherein, the current index refers to the index currently used to store the target log file; the data volume of the index fragments of the current index is not within the preset When the amount of data is within the range, the current index is in an unavailable state;
  • a creating unit configured to create a new index by using the collection item and collection time of the log data if the current index is in an unavailable state
  • a storage unit configured to store the log data in the new index.
  • the collection unit collects log data, it is specifically used for:
  • processing unit processes the log data through the log analysis component, it is specifically used for:
  • the current log fragment refers to the log data division Obtained, each log fragment except the first log fragment
  • the previous log fragment refers to the previous log fragment of the current log fragment.
  • the storage device further includes a monitoring unit, configured to:
  • the embodiment of the present application also provides a computer storage medium for storing a computer program.
  • the computer program When executed, it is specifically used to implement the log file storage method provided in the embodiment of the present application.
  • the embodiment of the present application also provides an electronic device, including a memory and a processor;
  • the memory is used to store computer programs
  • the processor is configured to execute the computer program, and is specifically configured to implement the log file storage method provided in the embodiment of the present application.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 409, or from storage means 408, or from ROM 402.
  • the processing device 401 When the computer program is executed by the processing device 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente demande concerne un procédé et un appareil de stockage de fichier journal, un dispositif, et un support de stockage. Le procédé consiste : à collecter des données de journal à partir d'un fichier journal cible ; à traiter les données de journal au moyen d'un composant d'analyse de journal ; à déterminer si un index actuel n'est pas disponible en raison d'une quantité excessive de données dans des fragments d'index, l'index actuel faisant référence à l'index actuellement utilisé pour stocker le fichier journal cible ; et si l'index actuel n'est pas disponible, à créer un nouvel index à l'aide du nom et du temps de collecte des données de journal et à stocker les données de journal dans le nouvel index. Dans cette solution, la taille des fragments d'index de l'index actuel est automatiquement reconnue pendant un processus de stockage et un nouvel index est automatiquement créé pour sauvegarder les données de journal collectées lorsque les fragments d'index sont importants, empêchant ainsi l'apparition de fragments d'index surdimensionnés.
PCT/CN2022/088359 2021-06-30 2022-04-22 Procédé et appareil de stockage de fichier journal, dispositif, et support de stockage WO2023273544A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110741313.2 2021-06-30
CN202110741313.2A CN113485962B (zh) 2021-06-30 2021-06-30 日志文件的存储方法、装置、设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023273544A1 true WO2023273544A1 (fr) 2023-01-05

Family

ID=77936876

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088359 WO2023273544A1 (fr) 2021-06-30 2022-04-22 Procédé et appareil de stockage de fichier journal, dispositif, et support de stockage

Country Status (2)

Country Link
CN (1) CN113485962B (fr)
WO (1) WO2023273544A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113485962B (zh) * 2021-06-30 2023-08-01 中国民航信息网络股份有限公司 日志文件的存储方法、装置、设备和存储介质
CN113688142B (zh) * 2021-10-25 2022-05-06 北京金山云网络技术有限公司 索引管理方法、装置、存储介质和电子设备
CN114138795A (zh) * 2021-12-08 2022-03-04 兴业银行股份有限公司 线程安全的索引动态更新方法及系统
CN113986944B (zh) * 2021-12-29 2022-03-25 天地伟业技术有限公司 分片数据的写入方法、系统及电子设备
CN117421337B (zh) * 2023-09-26 2024-05-28 东土科技(宜昌)有限公司 数据采集方法、装置、设备及计算机可读介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090192982A1 (en) * 2008-01-25 2009-07-30 Nuance Communications, Inc. Fast index with supplemental store
US20160203061A1 (en) * 2015-01-09 2016-07-14 Ariba, Inc. Delta replication of index fragments to enhance disaster recovery
US20160321352A1 (en) * 2015-04-30 2016-11-03 Splunk Inc. Systems and methods for providing dynamic indexer discovery
CN110442645A (zh) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 数据索引方法及装置
CN110851436A (zh) * 2018-08-03 2020-02-28 Emc Ip控股有限公司 具有虚拟编索引的分布式搜索框架
US20200250163A1 (en) * 2019-01-31 2020-08-06 Thoughtspot, Inc. Index Sharding
CN113485962A (zh) * 2021-06-30 2021-10-08 中国民航信息网络股份有限公司 日志文件的存储方法、装置、设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170193039A1 (en) * 2015-12-30 2017-07-06 Dropbox, Inc. Servicing queries of an event log
US11016955B2 (en) * 2016-04-15 2021-05-25 Hitachi Vantara Llc Deduplication index enabling scalability
CN110515898B (zh) * 2019-07-31 2022-04-22 济南浪潮数据技术有限公司 一种日志处理方法及装置
CN110990366B (zh) * 2019-12-04 2024-02-23 中国农业银行股份有限公司 一种提升基于es的日志系统性能的索引分配方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090192982A1 (en) * 2008-01-25 2009-07-30 Nuance Communications, Inc. Fast index with supplemental store
US20160203061A1 (en) * 2015-01-09 2016-07-14 Ariba, Inc. Delta replication of index fragments to enhance disaster recovery
US20160321352A1 (en) * 2015-04-30 2016-11-03 Splunk Inc. Systems and methods for providing dynamic indexer discovery
CN110851436A (zh) * 2018-08-03 2020-02-28 Emc Ip控股有限公司 具有虚拟编索引的分布式搜索框架
US20200250163A1 (en) * 2019-01-31 2020-08-06 Thoughtspot, Inc. Index Sharding
CN110442645A (zh) * 2019-07-11 2019-11-12 新华三大数据技术有限公司 数据索引方法及装置
CN113485962A (zh) * 2021-06-30 2021-10-08 中国民航信息网络股份有限公司 日志文件的存储方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN113485962B (zh) 2023-08-01
CN113485962A (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
WO2023273544A1 (fr) Procédé et appareil de stockage de fichier journal, dispositif, et support de stockage
CN110147398B (zh) 一种数据处理方法、装置、介质和电子设备
CN111258978B (zh) 一种数据存储的方法
WO2023029854A1 (fr) Procédé et appareil d'interrogation de données, support de stockage et dispositif électronique
US20140033262A1 (en) Parsing Single Source Content for Multi-Channel Publishing
CN111949850B (zh) 多源数据的采集方法、装置、设备及存储介质
CN107729570B (zh) 用于服务器的数据迁移方法和装置
WO2018161881A1 (fr) Procédé de traitement de données structurées, support de stockage de données et appareil informatique
WO2014173151A1 (fr) Procédé, dispositif et terminal de traitement de données
US10866960B2 (en) Dynamic execution of ETL jobs without metadata repository
CN111221793A (zh) 数据挖掘方法、平台、计算机设备及存储介质
US20190327342A1 (en) Methods and electronic devices for data transmission and reception
WO2019232932A1 (fr) Procédé et appareil de traitement de nœuds, support d'informations lisible par ordinateur et dispositif électronique
WO2024124789A1 (fr) Procédé et appareil de traitement de fichier, serveur et support
CN111651424B (zh) 一种数据处理方法、装置、数据节点及存储介质
CN110781159B (zh) Ceph目录文件信息读取方法、装置、服务器及存储介质
CN113568938A (zh) 数据流处理方法、装置、电子设备及存储介质
CN113220710B (zh) 数据查询方法、装置、电子设备以及存储介质
CN108319604B (zh) 一种hive中大小表关联的优化方法
CN112948410A (zh) 数据处理方法、装置、设备及介质
WO2023169251A1 (fr) Procédé et appareil de détermination d'indice, et serveur et support
US20240106889A1 (en) Data resource storage method and apparatus, data resource query method and apparatus, and electronic device
US11074244B1 (en) Transactional range delete in distributed databases
CN113051244A (zh) 数据访问方法和装置、数据获取方法和装置
CN111782588A (zh) 一种文件读取方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22831373

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22831373

Country of ref document: EP

Kind code of ref document: A1