CN113590556A - Database-based log processing method, device and equipment - Google Patents

Database-based log processing method, device and equipment Download PDF

Info

Publication number
CN113590556A
CN113590556A CN202110872299.XA CN202110872299A CN113590556A CN 113590556 A CN113590556 A CN 113590556A CN 202110872299 A CN202110872299 A CN 202110872299A CN 113590556 A CN113590556 A CN 113590556A
Authority
CN
China
Prior art keywords
log file
log
information
file
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110872299.XA
Other languages
Chinese (zh)
Inventor
吕若昕
严芝芳
余圣嘉
单章邕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110872299.XA priority Critical patent/CN113590556A/en
Publication of CN113590556A publication Critical patent/CN113590556A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Abstract

The document belongs to the field of finance, particularly relates to the field of log processing, and provides a database-based log processing method, device and equipment, wherein the method comprises the following steps: receiving a log file; determining the type information of the log file according to a preset classification model, and storing the log file into a specified folder corresponding to the type information; generating index path information of the log file according to the log file and the specified folder; according to the index path information, a search engine of the database is updated, the log file in the database can be optimized, statistics and extraction of logs are facilitated, and the management efficiency of the log file is improved.

Description

Database-based log processing method, device and equipment
Technical Field
The present disclosure relates to database technologies, and in particular, to a method, an apparatus, and a device for processing a log based on a database.
Background
Log files are one of the main ways currently used to record system behavior and system operational events. For example, user operation, system running state, abnormal information and the like can effectively provide technical running guarantee and improvement basis for developers based on records in log files.
However, as internet technology and network environment are continuously developed, more and more terminals access the internet, so that most network service systems need to operate continuously for 24 hours, and a large amount of data is generated during the operation of the system, including a large amount of log files generated during the operation of the system. Due to the fact that the number of the log files is too large, different log files can contain and influence each other, and besides, the log files contain a lot of irrelevant contents. The traditional log management method is to send system logs at regular time or in real time, so that the centralized transmission of the logs is realized, but the logs are not analyzed and sorted, and the extraction and management efficiency of the logs is low.
Disclosure of Invention
In view of the foregoing problems in the prior art, an object of the present invention is to provide a method, an apparatus and a device for processing a log based on a database, which can improve the efficiency of log management.
In order to solve the technical problems, the specific technical scheme is as follows:
in one aspect, provided herein is a database-based log processing method, the method comprising:
receiving a log file;
determining the type information of the log file according to a preset classification model, and storing the log file into a specified folder corresponding to the type information;
generating index path information of the log file according to the log file and the specified folder;
and updating a search engine of the database according to the index path information.
Further, after receiving the log file, the method further includes:
storing the log file to a first folder, wherein the first folder comprises at least one designated folder;
and preprocessing the log file, wherein the preprocessing at least comprises virus searching and killing processing and log compression processing.
Further, the storing the log file to a first folder includes:
judging whether the size of the log file is larger than a first preset threshold value or not;
if so, segmenting the log file according to a preset segmentation rule so that the size of each segmented sub-log file is smaller than the preset threshold value;
and sending each divided sub-log file to the first folder, and merging the sub-log files according to a preset merging rule to form an original log file.
Further, the preset classification model is obtained by the following steps:
acquiring a history log file;
calculating and obtaining the similarity of any two historical log files according to the historical log files;
clustering the historical log files according to the similarity and a preset clustering rule to obtain a plurality of historical log file sets;
labeling each historical log file set according to a preset labeling rule, so as to determine the labeling information of each historical log file;
and bringing the historical log file and the corresponding labeling information into an initial classification model for training to obtain a trained preset classification model.
Optionally, the initial classification model is a KNN classifier or an SVM classifier.
Further, after storing the log file in a designated folder corresponding to the type information, the method further includes:
performing word segmentation processing on the log file to obtain data information of the log file;
judging whether the data information is invalid data or data with irregular format;
when invalid data exists, clearing the invalid data;
and when the irregular data exist, carrying out format conversion on the irregular data according to a preset conversion rule.
Further, after storing the log file in a designated folder corresponding to the type information, the method further includes:
and according to a specified period, compressing the log files in the specified folder.
Further, after storing the log file in a designated folder corresponding to the type information, the method further includes:
obtaining the size sum of the log files in each appointed folder;
judging whether the sum of the sizes of the log files exceeds a preset second preset threshold value or not;
if yes, determining the appointed folders of which the sum of the sizes of the log files exceeds the second preset threshold, and compressing the appointed folders according to preset compression rules until the sum of the sizes of the log files in all the appointed folders is lower than the second preset threshold.
Further, the generating index path information of the log file according to the log file and the designated folder includes:
acquiring configuration file information of the specified folder and position information of the log file in the specified folder;
according to the configuration file information and the position information, determining the filing file position information of the log file;
and generating index path information of the log file according to the position information of the archived file, wherein the index path information is a log capturing path of the log file.
Further, the method further comprises:
acquiring log query information in a preset time period;
determining a log file with query times meeting preset conditions according to the log query information;
adjusting the position information of the log file in the appointed folder to obtain the index path information of the adjusted log file;
and updating a search engine of the database according to the adjusted index path information of the log file.
In another aspect, this document also provides a database-based log processing apparatus, the apparatus comprising:
the log file receiving module is used for receiving log files;
the log file processing module is used for determining the type information of the log file according to a preset classification model and storing the log file into a specified folder corresponding to the type information;
the index path information generating module is used for generating the index path information of the log file according to the log file and the specified folder;
and the search engine updating module is used for updating the search engine of the database according to the index path information.
In another aspect, a computer device is also provided herein, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the computer program.
Finally, a computer-readable storage medium is also provided herein, which stores a computer program that, when executed by a processor, implements the method as described above.
By adopting the technical scheme, the log processing method, the log processing device and the log processing equipment based on the database classify the received log files according to the preset classification model, so that the classified storage of the log files is realized, and further, the corresponding index path information is generated according to the stored log files, the updating of a database search engine is realized.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating steps of a database-based log processing method provided by an embodiment herein;
FIG. 2 is a schematic diagram showing log file processing steps in an embodiment herein;
FIG. 3 is a diagram illustrating the steps of determining a predetermined classification model in the embodiment of the present disclosure;
FIG. 4 is a diagram showing a process of specifying a folder in the embodiment herein;
FIG. 5 is a schematic diagram illustrating a log file optimization procedure in an embodiment herein;
FIG. 6 is a block diagram of a log processing system according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram illustrating a database-based log processing apparatus provided in an embodiment herein;
fig. 8 shows a schematic structural diagram of a computer device provided in an embodiment herein.
Description of the symbols of the drawings:
100. a log file receiving module;
200. a log file processing module;
300. an index path information generation module;
400. a search engine update module;
802. a computer device;
804. a processor;
806. a memory;
808. a drive mechanism;
810. an input/output module;
812. an input device;
814. an output device;
816. a presentation device;
818. a graphical user interface;
820. a network interface;
822. a communication link;
824. a communication bus.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments herein without making any creative effort, shall fall within the scope of protection.
It should be noted that the terms "first," "second," and the like in the description and claims herein and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments herein described are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device.
With the continuous development of internet technology and network environment, more and more terminals are connected to the internet, so that a large number of log files generated in the running process of the system can be mutually contained and influenced due to the overlarge number of the log files, and the log files contain a lot of irrelevant contents. The traditional log management method is to send system logs at regular time or in real time, so that the centralized transmission of the logs is realized, but the logs are not analyzed and sorted, and the extraction and management efficiency of the logs is low.
In order to solve the above problem, embodiments herein provide a database-based log processing method, which can improve the efficiency of log management. Fig. 1 is a schematic diagram of steps of a database-based log processing method provided in an embodiment herein, and the present specification provides the method operation steps as described in the embodiment or the flowchart, but more or less operation steps may be included based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual system or apparatus product executes, it can execute sequentially or in parallel according to the method shown in the embodiment or the figures. Specifically, as shown in fig. 1, the method may include:
s101: receiving a log file;
s102: determining the type information of the log file according to a preset classification model, and storing the log file into a specified folder corresponding to the type information;
s103: generating index path information of the log file according to the log file and the specified folder;
s104: and updating a search engine of the database according to the index path information.
The method can be understood that the received log files are classified in real time through the preset classification model and stored in the corresponding designated folder, so that the rapid classification of different log files is realized, the index path information of the log files is generated according to the classified log files and the designated folder where the log files are located, and finally, the search engine is updated, so that the management efficiency of the log files is improved, and the log can be rapidly processed according to the storage positions of the logs during extraction and processing.
The log file may be information recorded in the system operation process, and in a banking operation platform, user operation information, system operation state, abnormal information, and the like are recorded in the form of a log file, and stored in a database (such as a MySQL database), so that subsequent extraction and analysis of the log file are facilitated, and accordingly, corresponding abnormal information is solved or system operation is optimized and the like.
The designated folder can be different log file storage spaces obtained according to different classification rules in the database, and log files belonging to the same type are stored in the same designated folder, so that the primary processing of the log files is realized, the log files are convenient to count and further optimize, the operating pressure of a server can be reduced, and the efficiency of data counting optimization is improved.
In an embodiment of this specification, the receiving a log file further includes:
storing the log file to a first folder, wherein the first folder comprises at least one designated folder;
and preprocessing the log file, wherein the preprocessing at least comprises virus searching and killing processing and log compression processing.
The first folder may be a database space for storing the log file, and the log file is preprocessed in the database space, so that the initial log file is processed, such as virus searching and killing processing and log compression processing.
Furthermore, the virus searching and killing script can also be arranged at a file receiving port of the database, and virus searching and killing processing can be carried out on the log files which are about to enter the database in advance outside the database, so that the log files which enter the database are safe and reliable, and the protection of the database is further improved.
In a further embodiment, the method further comprises:
receiving a log recovery instruction of a user;
according to the log recovery instruction, determining the position information and the recovery position information of the log to be recovered;
and extracting the log to be recovered according to the position information of the log to be recovered, and adding the log to be recovered to the recovered position information.
It can be understood that, in this specification, a log recovery script may be further set to recover a deleted log, for example, the storage time for a log file is fixed, and when the validity period is exceeded, the log file does not need to be stored, so that an expired log file may be deleted (for example, deleted by polling), or a user actively deletes the log file, and the deleted log file may be stored in a preset folder (i.e., in a trash bin).
In this embodiment of the present specification, as shown in fig. 2, the storing the log file to the first folder includes:
s201: judging whether the size of the log file is larger than a first preset threshold value or not;
s202: if so, segmenting the log file according to a preset segmentation rule so that the size of each segmented sub-log file is smaller than the preset threshold value;
s203: and sending each divided sub-log file to the first folder, and merging the sub-log files according to a preset merging rule to form an original log file.
In the process of transmitting the log file to the database, in order to improve the safety and reliability of data transmission, large files can be prevented from being transmitted at a single time, the packet loss or damage of the data in the transmission process can be reduced, and the integrity of data transmission is ensured. Alternatively, the splitting process may be performed by using a splitter when the transferred log file is large, so as to ensure that the size of each transferred single file does not exceed the first preset threshold, and the splitting process may be performed by using a splitter when the transferred log file is large, where the first preset threshold is set according to actual conditions, such as the speed of data transfer, and is not limited in this specification.
The preset splitting rule corresponds to the preset merging rule, for example, the splitting may be performed in an equal splitting manner, so that the split sub-log files have equal sizes and are all smaller than a first preset threshold, which can ensure that the original log file can be obtained after the split sub-log files are merged, and the log merging is realized in the first folder, thereby ensuring that the log file can be completely input into the designated folder, and ensuring the integrity of data, wherein the preset splitting rule and the preset merging rule are not limited in this specification.
In the embodiment of the present specification, as shown in fig. 3, the preset classification model is obtained through the following steps:
s301: acquiring a history log file;
s302: calculating and obtaining the similarity of any two historical log files according to the historical log files;
s303: clustering the historical log files according to the similarity and a preset clustering rule to obtain a plurality of historical log file sets;
s304: labeling each historical log file set according to a preset labeling rule, so as to determine the labeling information of each historical log file;
s305: and bringing the historical log file and the corresponding labeling information into an initial classification model for training to obtain a trained preset classification model.
It can be understood that, in the embodiments of the present specification, the historical log files are clustered and divided, so that storage spaces divided for different log file types in the database are realized, further, model training is performed on the divided historical log files through machine learning thinking, a preset classification model for automatic division of the log file types is obtained, and thus, the efficiency of classifying subsequent log files into the database is improved.
Optionally, the similarity between two history log files is calculated first, where the similarity may be calculated in a manner of cosine similarity, euclidean distance, and the like, and since the log files are generally stored in a text form, a text vector of the history log files may be obtained first, and the similarity between any two history log files is calculated according to the text vector, where the obtaining of the text vector is a common process in Natural Language Processing (NLP), and a specific calculation process is not limited in this embodiment of the specification. On the basis of obtaining the similarity between different historical log files, clustering division is carried out by combining with preset clustering rules to obtain a historical log file set of a plurality of clustering centers, wherein the preset clustering rules can include but are not limited to the similarity, classification granularity, keywords, log types (according to size, service attributes, functional data and the like), compression degree and the like.
After the clustering is finished, a plurality of second folders (namely designated folders) can be arranged in the first folder, and each second folder stores a history log file set, so that the classification processing and the storage of the log files are realized.
The preset labeling rule may be a rule for labeling according to a log file type, specifically, determining a log file type (according to size, service attribute, functional data, and the like) in each historical log file set; and determining the probability distribution of each log file type in each historical log file set according to the log file types, and determining the type with the highest probability as the marking attribute of the current historical log file according to the probability distribution.
In some other embodiments, other labeling modes are also possible, and the embodiments in this specification are not limited.
It should be noted that after each historical log file set is labeled, all historical log files in the historical log file set are subjected to corresponding standards, so that it can be ensured that output types of each historical log file in the same historical log file set are consistent during model training, and reliability and accuracy of cluster classification are ensured.
Optionally, the initial classification model is a KNN classifier or an SVM classifier, and in some other embodiments, there may be other classification models, which are not limited in this embodiment of the present specification.
In a further embodiment, in order to ensure the accuracy and the real-time performance of the preset classification model, the preset classification model may be optimized according to a specified time period, for example, a newly added log file at the end of each day or at the end of each week is obtained, and model parameters are adjusted according to the newly added log file to optimize the preset classification model, so as to improve the timeliness of the preset classification model.
In an embodiment of the present specification, after storing the log file in a designated folder corresponding to the type information, the method further includes:
performing word segmentation processing on the log file to obtain data information of the log file;
judging whether the data information is invalid data or data with irregular format;
when invalid data exists, clearing the invalid data;
and when the irregular data exist, carrying out format conversion on the irregular data according to a preset conversion rule.
It can be understood that, in this step, the log file entered into the designated folder is further optimized, the invalid data may be meaningless data, and since different text positions in the log file represent different meanings, the text type and format of the log file can be determined in advance, and the data meanings of different positions can be obtained through word segmentation processing, and when data also appears in other positions, the data at the position is represented as invalid data and can be removed; accordingly. And determining whether the format of the log file is correct or whether the format of part of the data is correct (namely, whether the data is regular) through the log format determined in advance, and if the format of the log file is incorrect (namely, the data is irregular), performing format conversion on the irregular data according to the log format determined in advance to obtain the data with the regular format.
In a further embodiment, after storing the log file in a designated folder corresponding to the type information, the method further includes:
and according to a specified period, compressing the log files in the specified folder.
In this step, the log files in each designated folder may be compressed in a polling manner, where the designated period may be one day, one week, or one month, and the log files in the designated folders may be compressed, so that the usage space of the data in the designated folders may be further improved, the load pressure of the server is reduced, and the efficiency of data statistics and optimization is improved. The manner and degree of compression processing are not limited in the embodiments of the present specification.
In this embodiment of the present specification, as shown in fig. 4, after storing the log file in a specified folder corresponding to the type information, the method further includes:
s401: obtaining the size sum of the log files in each appointed folder;
s402: judging whether the sum of the sizes of the log files exceeds a preset second preset threshold value or not;
s403: if yes, determining the appointed folders of which the sum of the sizes of the log files exceeds the second preset threshold, and compressing the appointed folders according to preset compression rules until the sum of the sizes of the log files in all the appointed folders is lower than the second preset threshold.
It can be understood that, by determining the size of the used space of each designated folder to determine whether to compress it, high use efficiency of each designated folder is ensured, wherein the theoretical storage space of each designated folder is the same, so in order to ensure that each designated folder can work effectively, the usable space in each designated folder should be kept at a certain level, which can ensure that subsequent log files can be stored smoothly, and therefore, by setting the second preset threshold as the warning value for compressing the designated folder, the second preset threshold is set according to actual conditions, which is not limited in the embodiments of the present specification.
The preset compression rule may be set according to a difference between a sum of sizes of the log files and the second preset threshold, for example, different difference gradients are set, and the different difference gradients correspond to different compression degrees, for example, a first difference gradient, a second difference gradient and a third difference gradient are set, the first difference gradient is 0 to 50M, the second difference gradient is 50M to 100M, and the third difference gradient is greater than or equal to 100M, a first compression process is adopted for the first difference gradient, a second compression process is adopted for the second difference gradient, a third compression process is adopted for the third gradient, and the greater the difference gradient is, the higher the degree of the compression process is, the greater the degree of the compression process is (the greater the degree of the compression process is, the greater the system consumption is indicated, the greater the consumed computing resource is), so that the efficiency of the compression process can be improved, the compression speed can be improved and the system performance consumption can be reduced for the designated folder with smaller difference degree.
In a further embodiment, theoretical storage spaces of the designated folders in the first folder may be the same or different, and therefore setting a threshold is not beneficial to unified management and optimization, and therefore, a utilization threshold may be set, and by calculating a utilization of each designated folder and comparing the utilization with the utilization threshold, when the utilization exceeds the utilization threshold, the designated folder may be compressed to ensure that the utilization of the designated folder is lower than the utilization threshold. The compression processing method is the same as the above steps, and is not limited in the embodiments of the present specification.
In an embodiment of this specification, the generating index path information of the log file according to the log file and the specified folder includes:
acquiring configuration file information of the specified folder and position information of the log file in the specified folder;
according to the configuration file information and the position information, determining the filing file position information of the log file;
and generating index path information of the log file according to the position information of the archived file, wherein the index path information is a log capturing path of the log file.
Therefore, the search engine of the database can be updated in time by determining the index path information of the newly added log file, so that the log file is convenient to query. The configuration file information may be configuration information of the designated folder in the database, and the location information may be obtained according to a storage manner of the designated folder, for example, sequentially stored according to a storage time, so that the archive file location information of the log file can be obtained through the configuration file information and the location information. The position information of the archived file is the specific information of the log file in the database, and the index path information of the log file can be obtained through the position information of the archived file.
After the search engine of the database is updated, the corresponding log file can be queried as required, for example, the log is read and checked in a keyword mode, further, in order to improve the efficiency of log file information recording, information such as access time, log file name, user name and the like can be recorded when a user accesses the log file, and optionally, an information recording script can be set for recording, so as to realize automatic statistics and recording.
In a further embodiment, a log index table may be established according to the updated search engine, for example, different log index keywords are set by a log clustering type, each keyword corresponds to corresponding path information, that is, Uniform Resource Locator (URL), the viewing policy is to set a log viewer on the web system, the viewer performs classified viewing according to the log file type, corresponds to the log index table in the database, and obtains real-time log file information according to the log URL reference information in the index table; the user can check the log information stored in the formulated folder by selecting the corresponding time and log type.
In order to further improve the query and statistical efficiency of the log file, as shown in fig. 5, the method further includes:
s501: acquiring log query information in a preset time period;
s502: determining a log file with query times meeting preset conditions according to the log query information;
s503: adjusting the position information of the log file in the appointed folder to obtain the index path information of the adjusted log file;
s504: and updating a search engine of the database according to the adjusted index path information of the log file.
It can be understood that, in the embodiments of the present specification, the log file with a large number of query times is determined by counting the log query data, and then the location information of the corresponding log file can be adjusted, so that the query speed of the adjusted log file is faster, and therefore, in this step, adjusting the location information of the log file in the designated folder can optimize the query path of the log file, and improve the query speed of the log file.
The preset time period can be set according to actual conditions, for example, path information of the log file is optimized once per week, in actual operation, information recording scripts can be set to count log query information in real time, and the log query data of the log statistics scripts are combined to perform optimization through the log optimization scripts according to the preset time period.
As described above, the storage time of the log file is fixed, and when the validity period is exceeded, the log file does not need to be stored, so that the expired log file can be deleted (for example, the log file is deleted in a polling manner), or a user actively deletes the log file, so that each designated folder can also be provided with a corresponding log deletion script to delete the corresponding log file.
The embodiment of the specification provides a database-based log processing method, which includes the steps of performing multilayer optimization processing (virus killing and compression processing) on log files entering a database, dividing the log files into designated folders by combining a preset classification model, achieving classification storage of the log files, reducing load pressure of a server, optimizing a log file query path by combining statistical analysis in a log query process, and improving data statistics and optimization efficiency.
On the basis of the method provided above, an embodiment of the present specification further provides a database-based log processing system, where the system runs on the database, as shown in fig. 6, a main folder is created in the system, and a virus killing script, a log compression script, a log classification script, a log recovery script, and an information recording script are set in the main folder; the working sequence among all scripts is as follows: the method comprises the steps that a virus checking and killing script is converted into a log compression script and then converted into a log classification script, original log files are subjected to primary processing and classification, so that the log files can be better counted, optimized and analyzed, when a user accesses the log files, the access time, the name of the log files and whether file information is extracted or not are stored in an information recording script, so that an administrator can conveniently count access data, and the files with the largest access times are rearranged, so that the user can access the log files more quickly; a plurality of subfolders (such as the subfolder 1, the subfolder 2, the subfolder 3, the subfolder. cndot. sub-folder N) are created in the main folder, and a log counting script, a log optimizing script and a log deleting script are set in the subfolders, so that data statistics and optimization can be performed on files of the same type, the pressure of a server can be relieved, and the efficiency of the data statistics and optimization is improved.
Based on the same inventive concept, an embodiment of the present specification further provides a database-based log processing apparatus, as shown in fig. 7, the apparatus includes:
a log file receiving module 100, configured to receive a log file;
the log file processing module 200 is configured to determine type information of the log file according to a preset classification model, and store the log file into a designated folder corresponding to the type information;
an index path information generating module 300, configured to generate index path information of the log file according to the log file and the designated folder;
and a search engine updating module 400, configured to update the search engine of the database according to the index path information.
The advantages obtained by the device are consistent with those obtained by the method, and the embodiments of the present description are not repeated.
The database-based log processing method and apparatus provided in the embodiments of the present specification may be used in the financial field in the aspect of log processing, and may also be used in any field other than the financial field.
As shown in fig. 8, for a computer device provided for embodiments herein, which may perform the method provided above for the database-based log processing apparatus provided above, the computer device 802 may include one or more processors 804, such as one or more Central Processing Units (CPUs), each of which may implement one or more hardware threads. The computer device 802 may also include any memory 806 for storing any kind of information, such as code, settings, data, etc. For example, and without limitation, memory 806 may include any one or more of the following in combination: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may use any technology to store information. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 802. In one case, when the processor 804 executes the associated instructions, which are stored in any memory or combination of memories, the computer device 802 can perform any of the operations of the associated instructions. The computer device 802 also includes one or more drive mechanisms 808, such as a hard disk drive mechanism, an optical disk drive mechanism, etc., for interacting with any memory.
Computer device 802 may also include an input/output module 810(I/O) for receiving various inputs (via input device 812) and for providing various outputs (via output device 814)). One particular output mechanism may include a presentation device 816 and an associated Graphical User Interface (GUI) 818. In other embodiments, input/output module 810(I/O), input device 812, and output device 814 may also be excluded, as just one computer device in a network. Computer device 802 may also include one or more network interfaces 820 for exchanging data with other devices via one or more communication links 822. One or more communication buses 824 couple the above-described components together.
Communication link 822 may be implemented in any manner, such as over a local area network, a wide area network (e.g., the Internet), a point-to-point connection, etc., or any combination thereof. The communication link 822 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.
Corresponding to the methods in fig. 1-5, the embodiments herein also provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the above-described method.
Embodiments herein also provide computer readable instructions, wherein when executed by a processor, a program thereof causes the processor to perform the method as shown in fig. 1-5.
It should be understood that, in various embodiments herein, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments herein.
It should also be understood that, in the embodiments herein, the term "and/or" is only one kind of association relation describing an associated object, meaning that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided herein, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purposes of the embodiments herein.
In addition, functional units in the embodiments herein may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present invention may be implemented in a form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The principles and embodiments of this document are explained herein using specific examples, which are presented only to aid in understanding the methods and their core concepts; meanwhile, for the general technical personnel in the field, according to the idea of this document, there may be changes in the concrete implementation and the application scope, in summary, this description should not be understood as the limitation of this document.

Claims (13)

1. A database-based log processing method, the method comprising:
receiving a log file;
determining the type information of the log file according to a preset classification model, and storing the log file into a specified folder corresponding to the type information;
generating index path information of the log file according to the log file and the specified folder;
and updating a search engine of the database according to the index path information.
2. The method of claim 1, wherein receiving the log file further comprises:
storing the log file to a first folder, wherein the first folder comprises at least one designated folder;
and preprocessing the log file, wherein the preprocessing at least comprises virus searching and killing processing and log compression processing.
3. The method of claim 2, wherein storing the log file to a first folder comprises:
judging whether the size of the log file is larger than a first preset threshold value or not;
if so, segmenting the log file according to a preset segmentation rule so that the size of each segmented sub-log file is smaller than the preset threshold value;
and sending each divided sub-log file to the first folder, and merging the sub-log files according to a preset merging rule to form an original log file.
4. The method according to claim 1, wherein the preset classification model is obtained by:
acquiring a history log file;
calculating and obtaining the similarity of any two historical log files according to the historical log files;
clustering the historical log files according to the similarity and a preset clustering rule to obtain a plurality of historical log file sets;
labeling each historical log file set according to a preset labeling rule, so as to determine the labeling information of each historical log file;
and bringing the historical log file and the corresponding labeling information into an initial classification model for training to obtain a trained preset classification model.
5. The method of claim 4, wherein the initial classification model is a KNN classifier or a SVM classifier.
6. The method of claim 1, wherein after storing the log file in a designated folder corresponding to the type information, further comprising:
performing word segmentation processing on the log file to obtain data information of the log file;
judging whether the data information is invalid data or data with irregular format;
when invalid data exists, clearing the invalid data;
and when the irregular data exist, carrying out format conversion on the irregular data according to a preset conversion rule.
7. The method of claim 1, wherein after storing the log file in a designated folder corresponding to the type information, further comprising:
and according to a specified period, compressing the log files in the specified folder.
8. The method of claim 1, wherein after storing the log file in a designated folder corresponding to the type information, further comprising:
obtaining the size sum of the log files in each appointed folder;
judging whether the sum of the sizes of the log files exceeds a preset second preset threshold value or not;
if yes, determining the appointed folders of which the sum of the sizes of the log files exceeds the second preset threshold, and compressing the appointed folders according to preset compression rules until the sum of the sizes of the log files in all the appointed folders is lower than the second preset threshold.
9. The method of claim 1, wherein generating the index path information of the log file according to the log file and the designated folder comprises:
acquiring configuration file information of the specified folder and position information of the log file in the specified folder;
according to the configuration file information and the position information, determining the filing file position information of the log file;
and generating index path information of the log file according to the position information of the archived file, wherein the index path information is a log capturing path of the log file.
10. The method of claim 1, further comprising:
acquiring log query information in a preset time period;
determining a log file with query times meeting preset conditions according to the log query information;
adjusting the position information of the log file in the appointed folder to obtain the index path information of the adjusted log file;
and updating a search engine of the database according to the adjusted index path information of the log file.
11. A database-based log processing apparatus, the apparatus comprising:
the log file receiving module is used for receiving log files;
the log file processing module is used for determining the type information of the log file according to a preset classification model and storing the log file into a specified folder corresponding to the type information;
the index path information generating module is used for generating the index path information of the log file according to the log file and the specified folder;
and the search engine updating module is used for updating the search engine of the database according to the index path information.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 10 when executing the computer program.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 10.
CN202110872299.XA 2021-07-30 2021-07-30 Database-based log processing method, device and equipment Pending CN113590556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110872299.XA CN113590556A (en) 2021-07-30 2021-07-30 Database-based log processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110872299.XA CN113590556A (en) 2021-07-30 2021-07-30 Database-based log processing method, device and equipment

Publications (1)

Publication Number Publication Date
CN113590556A true CN113590556A (en) 2021-11-02

Family

ID=78252740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110872299.XA Pending CN113590556A (en) 2021-07-30 2021-07-30 Database-based log processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN113590556A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328389A (en) * 2021-12-31 2022-04-12 浙江汇鼎华链科技有限公司 Big data file analysis processing system and method under cloud computing environment
CN114584346A (en) * 2022-01-28 2022-06-03 深圳融安网络科技有限公司 Log stream processing method, system, terminal device and storage medium
CN115190189A (en) * 2022-07-08 2022-10-14 中国银行股份有限公司 Message information processing method and device
CN116781984A (en) * 2023-08-21 2023-09-19 深圳市华星数字有限公司 Set top box data optimized storage method
CN117112791A (en) * 2023-10-18 2023-11-24 中孚安全技术有限公司 Unknown log classification decision system, method and device and readable storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114328389A (en) * 2021-12-31 2022-04-12 浙江汇鼎华链科技有限公司 Big data file analysis processing system and method under cloud computing environment
CN114328389B (en) * 2021-12-31 2022-06-17 浙江汇鼎华链科技有限公司 Big data file analysis processing system and method under cloud computing environment
CN114584346A (en) * 2022-01-28 2022-06-03 深圳融安网络科技有限公司 Log stream processing method, system, terminal device and storage medium
CN114584346B (en) * 2022-01-28 2024-01-12 深圳融安网络科技有限公司 Log stream processing method, system, terminal equipment and storage medium
CN115190189A (en) * 2022-07-08 2022-10-14 中国银行股份有限公司 Message information processing method and device
CN116781984A (en) * 2023-08-21 2023-09-19 深圳市华星数字有限公司 Set top box data optimized storage method
CN116781984B (en) * 2023-08-21 2023-11-07 深圳市华星数字有限公司 Set top box data optimized storage method
CN117112791A (en) * 2023-10-18 2023-11-24 中孚安全技术有限公司 Unknown log classification decision system, method and device and readable storage medium
CN117112791B (en) * 2023-10-18 2024-02-20 中孚安全技术有限公司 Unknown log classification decision system, method and device and readable storage medium

Similar Documents

Publication Publication Date Title
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
CN113590556A (en) Database-based log processing method, device and equipment
US11562286B2 (en) Method and system for implementing machine learning analysis of documents for classifying documents by associating label values to the documents
US9836541B2 (en) System and method of managing capacity of search index partitions
EP3161635B1 (en) Machine learning service
US9690842B2 (en) Analyzing frequently occurring data items
US20130013597A1 (en) Processing Repetitive Data
CN110569214B (en) Index construction method and device for log file and electronic equipment
CN110727643B (en) File classification management method and system based on machine learning
EP4099170B1 (en) Method and apparatus of auditing log, electronic device, and medium
CN113254255B (en) Cloud platform log analysis method, system, device and medium
US20240126817A1 (en) Graph data query
CN105511812A (en) Method and device for optimizing big data of memory system
WO2022095637A1 (en) Fault log classification method and system, and device and medium
WO2020140624A1 (en) Method for extracting data from log, and related device
CN110377576A (en) Create method and apparatus, the log analysis method of log template
CN113535677B (en) Data analysis query management method, device, computer equipment and storage medium
CN111221698A (en) Task data acquisition method and device
CN106919566A (en) A kind of query statistic method and system based on mass data
CN110309206B (en) Order information acquisition method and system
CN109522349B (en) Cross-type data calculation and sharing method, system and equipment
US11822578B2 (en) Matching machine generated data entries to pattern clusters
CN114356712A (en) Data processing method, device, equipment, readable storage medium and program product
CN112433888A (en) Data processing method and device, storage medium and electronic equipment
CN113064597B (en) Redundant code identification method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination