CN114840421A - Log data processing method and device - Google Patents

Log data processing method and device Download PDF

Info

Publication number
CN114840421A
CN114840421A CN202210457673.4A CN202210457673A CN114840421A CN 114840421 A CN114840421 A CN 114840421A CN 202210457673 A CN202210457673 A CN 202210457673A CN 114840421 A CN114840421 A CN 114840421A
Authority
CN
China
Prior art keywords
log data
data
log
cleaning
tested
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210457673.4A
Other languages
Chinese (zh)
Inventor
高睿
张予
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210457673.4A priority Critical patent/CN114840421A/en
Publication of CN114840421A publication Critical patent/CN114840421A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a log data processing method and a device, which can be used in the field of finance, and the method comprises the following steps: acquiring log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning, wherein the log data of the tested system during production and operation are stored in the data lake; classifying the log data according to the character string similarity of the log data, determining corresponding data transmission modes according to different classes of the log data, and sending the log data to a system to be tested for system testing; the method and the device can accurately and conveniently classify the log data and send the log data to the system to be tested for system testing, and testing efficiency and accuracy are improved.

Description

Log data processing method and device
Technical Field
The application relates to the field of data processing, can also be used in the field of finance, and particularly relates to a log data processing method and device.
Background
In the software system development and test process, the system performance and function need to be tested. Aiming at application servers with different scenes and different protocol types, when functions are changed, a large number of test scripts or pressure test scripts need to be manually written, the labor consumption is high, the scenes involved in the actual operation process are difficult to be completely covered, and the defects in the aspects of operation efficiency, reliability, expandability and the like exist.
Specifically, the manually written script runs the risk of insufficient judgment on the existing functional branches of the system, and lacks coverage and coverage accuracy to a certain extent. The scripts written by each developer or tester cannot uniformly manage the assets at the enterprise level, and the asset reuse is not enough. When the simulation system is put into practical production and operation, the simulated messages can not be ensured to simulate the practical production and operation conditions to the maximum extent due to the difference with the development and test environment.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a log data processing method and device, which can accurately and conveniently classify log data and send the log data to a system to be tested for system testing, so that the testing efficiency and the accuracy are improved.
In order to solve at least one of the above problems, the present application provides the following technical solutions:
in a first aspect, the present application provides a log data processing method, including:
acquiring log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning, wherein the log data of the tested system during production and operation are stored in the data lake;
and classifying the log data according to the character string similarity of the log data, determining corresponding data transmission modes according to different classes of the log data, and sending the log data to a system to be tested for system testing.
Further, the acquiring log data from the setting data lake includes:
and obtaining log data from a set data lake in batches according to a set time period and storing the log data into a local memory in a persistent mode, wherein the data lake stores the log data of the tested system during production and operation through a distributed file system.
Further, the performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning includes:
analyzing the log data according to a set data analysis format to obtain log data in a standard format;
obtaining log data in a standard format in batches to a data cleaning queue, cleaning and filtering the log data in the data cleaning queue according to a set data cleaning rule, and obtaining the log data after data analysis and data cleaning.
Further, the classifying the log data according to the character string similarity of the log data includes:
determining the similarity of character strings between two log data according to the character length, the same character number, the overlapped character number, the overlapping rate and the matching rate of the character strings with set lengths in the two log data;
and classifying the log data with the character string similarity larger than a threshold value into the same category.
Further, before determining the similarity of the character strings between the two log data according to the character length, the same character number, the number of overlapped characters, the overlapping rate and the matching rate of the character strings with the set length in the two log data, the method includes:
determining the overlapping rate according to the character length and the number of overlapping characters of the character string with the set length in the two log data;
and determining the matching rate according to the character length and the same number of characters of the character string with the set length in the two log data.
Further, the determining the corresponding data transmission mode according to the different types of the log data and sending the log data to a system to be tested for system testing includes:
determining corresponding transmission protocols, transmission servers and thread sending modes according to different types of the log data;
and distributing the log data to a system to be tested according to the transmission protocol, the transmission server and the thread sending mode to carry out system test.
In a second aspect, the present application provides a log data processing apparatus, including:
the log preprocessing module is used for acquiring log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data subjected to the data analysis and the data cleaning, wherein the log data of the tested system during production and operation are stored in the data lake;
and the log classification and distribution module is used for classifying the log data according to the character string similarity of the log data, determining a corresponding data transmission mode according to different classes of the log data and sending the log data to a system to be tested for system testing.
Further, the log preprocessing module comprises:
the persistent storage unit is used for acquiring log data from a set data lake in batches according to a set time period and persistently storing the log data into a local memory, wherein the data lake stores the log data generated during production and running of the system to be tested through a distributed file system.
Further, the log preprocessing module further comprises:
the log analysis unit is used for analyzing the log data according to a set data analysis format to obtain the log data in a standard format;
and the log cleaning unit is used for acquiring log data in a standard format in batches to the data cleaning queue, cleaning and filtering the log data in the data cleaning queue according to a set data cleaning rule, and obtaining the log data after data analysis and data cleaning.
Further, the log classification distribution module comprises:
the similarity calculation unit is used for determining the similarity of the character strings between the two log data according to the character length, the same character number, the number of overlapped characters, the overlapping rate and the matching rate of the character strings with the set length in the two log data;
and the log classification unit is used for classifying the log data with the character string similarity larger than a threshold value into the same category.
Further, the log classification distribution module further includes:
an overlap rate determining unit for determining an overlap rate according to the character length and the number of overlapped characters of the character string of the set length in the two log data;
and the matching rate determining unit is used for determining the matching rate according to the character length and the same number of characters of the character string with the set length in the two log data.
Further, the log classification distribution module further includes:
the transmission determining unit is used for determining corresponding transmission protocols, transmission servers and thread sending modes according to different types of the log data;
and the log distribution unit is used for distributing the log data to a system to be tested according to the transmission protocol, the transmission server and the thread sending mode to carry out system test.
In a third aspect, the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the log data processing method when executing the program.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the log data processing method.
In a fifth aspect, the present application provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the log data processing method.
According to the technical scheme, the log data are acquired from the set data lake, the log data are subjected to data analysis and data cleaning, then the log data are classified according to the character string similarity of the log data, the corresponding data transmission modes are determined according to different types of the log data and are sent to the system to be tested for system testing, therefore, the log data can be accurately and conveniently classified and sent to the system to be tested for system testing, and the testing efficiency and the testing accuracy are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a log data processing method in an embodiment of the present application;
FIG. 2 is a second flowchart illustrating a log data processing method according to an embodiment of the present application;
FIG. 3 is a third flowchart illustrating a log data processing method according to an embodiment of the present application;
FIG. 4 is a fourth flowchart illustrating a log data processing method according to an embodiment of the present application;
FIG. 5 is a fifth flowchart illustrating a log data processing method according to an embodiment of the present application;
fig. 6 is one of the structural diagrams of a log data processing apparatus in the embodiment of the present application;
FIG. 7 is a second block diagram of a log data processing apparatus according to an embodiment of the present invention;
fig. 8 is a third block diagram of a log data processing apparatus according to an embodiment of the present application;
FIG. 9 is a fourth block diagram of a log data processing apparatus according to an embodiment of the present application;
FIG. 10 is a fifth block diagram of a log data processing apparatus according to an embodiment of the present application;
FIG. 11 is a sixth configuration diagram of a log data processing apparatus according to an embodiment of the present application;
FIG. 12 is a diagram illustrating a log data processing flow according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
In consideration of the risk that the manually written scripts in the prior art have insufficient judgment on the existing functional branches of the system, the coverage range and the accuracy of the coverage are lost to a certain extent. The method comprises the steps that log data are obtained from a set data lake, data analysis and data cleaning are carried out on the log data, then the log data are classified according to the character string similarity of the log data, corresponding data transmission modes are determined according to different types of the log data and are sent to a system to be tested for system testing, and therefore the log data can be accurately and conveniently classified and sent to the system to be tested for system testing, and testing efficiency and accuracy are improved.
In order to accurately and conveniently classify log data and send the log data to a system to be tested for system testing, and improve testing efficiency and accuracy, the application provides an embodiment of a log data processing method, which specifically includes the following contents, with reference to fig. 1:
step S101: obtaining log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning, wherein the log data during the production and the operation of the system to be tested is stored in the data lake.
Optionally, the log data can be acquired from a set data lake in batches according to a set time period and stored in the local memory in a persistent manner, wherein the data lake stores the log data of the tested system during production and operation through the distributed file system, and the log data is stored in the local memory in a persistent manner, so that the time efficiency of log access can be accelerated.
Optionally, the log data can be analyzed and cleaned according to a certain format after being read from the big data lake.
Optionally, the method and the device can be connected with the big data lake through the client, log files are pulled from the data lake in batches, pulled logs are stored in the analysis device persistently and read and written in a memory form, follow-up analysis and quick access are facilitated, and then a json analysis mode is adopted to analyze the message header and the message body in the logs according to a standard format.
Optionally, the method and the device can also perform cleaning and filtering on the analyzed structured log data according to a certain rule, for example, structured log data are read from the data persistence unit in batches and added into a cleaning queue, the log is analyzed according to a preset log format, the log which does not conform to a regular expression is filtered according to an analysis rule preset by a user, and effective log information is retained.
Step S102: and classifying the log data according to the character string similarity of the log data, determining corresponding data transmission modes according to different classes of the log data, and sending the log data to a system to be tested for system testing.
Optionally, the present application may classify the similarity of the character strings of the specific log, for example:
setting a category array, storing by category-log content-log occurrence time dimensions, and initializing a character string similarity threshold, wherein the character strings with similarity exceeding the threshold are of one category. In order to avoid the influence of frequent abnormal logs on the training of the user, the classified time interval parameters are introduced, namely similar logs in a time interval are recorded only once.
For a certain category, the application of the specific log in each row is compared with the similarity of the specific log in each row of the final category array of the category: if the similarity of the character strings with a certain row of specific logs in the final category exceeds a threshold value, the two character strings are classified into one category, only the time point of the specific log to be analyzed is stored in the category, and the analysis of the row of logs is stopped. If the similarity with the string of any row of the specific log in the final category is below the threshold. A new category is found. A row of records is added to the final category. And taking the point corresponding to the time interval of the log as the first time record of the time array of the category.
Optionally, when the similarity of the character strings is determined, a comparison method of matrix similarity may be adopted in the present application, that is, two character strings with equal length are given and compared in the moving process, for example, the following table 1:
TABLE 1 character comparison table
a b c d d a c b c b
a a d a c c b d d c
n: the length of the string, which is 10 in this case;
m: the same character, in this case 3, includes d, a, c;
r: the overlap of the two strings, in this case 8;
the specific formula for defining the repetition rate and the matching rate is as follows:
the overlapping rate: and L is r/n.
Matching rate: and M is M/n.
Similarity: q ^ M ^2 xL ^ M ^2/n ^2 ^ x (r/n).
Then, after the massive logs are grouped by similarity, the massive logs are included in different groups.
As can be seen from the above description, the log data processing method provided in the embodiment of the present application can perform data analysis and data cleaning on the log data by obtaining the log data from the set data lake, then classify the log data according to the character string similarity of the log data, determine the corresponding data transmission mode according to the different types of the log data, and send the log data to the system to be tested for system testing, so that the log data can be accurately and conveniently classified and sent to the system to be tested for system testing, and the testing efficiency and accuracy are improved.
In order to reliably store log data and efficiently obtain the log data, in an embodiment of the log data processing method of the present application, the step S101 may further specifically include the following:
and obtaining log data from a set data lake in batches according to a set time period and storing the log data into a local memory in a persistent mode, wherein the data lake stores the log data of the tested system during production and operation through a distributed file system.
Optionally, the log data can be acquired from a set data lake in batches according to a set time period and stored in the local memory in a persistent manner, wherein the data lake stores the log data of the tested system during production and operation through the distributed file system, and the log data is stored in the local memory in a persistent manner, so that the time efficiency of log access can be accelerated.
In order to pre-process the acquired log data, in an embodiment of the log data processing method of the present application, referring to fig. 2, the step S101 may further specifically include the following steps:
step S201: and analyzing the log data according to a set data analysis format to obtain the log data in a standard format.
Step S202: obtaining log data in a standard format in batches to a data cleaning queue, cleaning and filtering the log data in the data cleaning queue according to a set data cleaning rule, and obtaining the log data after data analysis and data cleaning.
Optionally, the method and the device can be connected with the big data lake through the client, log files are pulled from the data lake in batches, pulled logs are stored in the analysis device persistently and read and written in a memory form, follow-up analysis and quick access are facilitated, and then a json analysis mode is adopted to analyze the message header and the message body in the logs according to a standard format.
Optionally, the method and the device can also perform cleaning and filtering on the analyzed structured log data according to a certain rule, for example, structured log data are read from the data persistence unit in batches and added into a cleaning queue, the log is analyzed according to a preset log format, the log which does not conform to a regular expression is filtered according to an analysis rule preset by a user, and effective log information is retained.
In order to accurately classify the log data, in an embodiment of the log data processing method of the present application, referring to fig. 3, the step S102 may further include the following steps:
step S301: and determining the similarity of the character strings between the two log data according to the character length, the same character number, the overlapped character number, the overlapping rate and the matching rate of the character strings with the set length in the two log data.
Step S302: and classifying the log data with the character string similarity larger than a threshold value into the same category.
In order to accurately calculate the overlapping rate and the matching rate, in an embodiment of the log data processing method of the present application, referring to fig. 4, the following may be specifically included:
step S401: and determining the overlapping rate according to the character length and the number of overlapped characters of the character string with the set length in the two log data.
Step S402: and determining the matching rate according to the character length and the same number of characters of the character string with the set length in the two log data.
Optionally, when the similarity of the character strings is determined, a comparison method of matrix similarity may be adopted in the present application, that is, two character strings with equal length are given and compared in the moving process, for example, the following table 1:
TABLE 1 character comparison table
a b c d d a c b c b
a a d a c c b d d c
n: the length of the string, which is 10 in this case;
m: the same character, in this case 3, includes d, a, c;
r: the overlap of the two strings, in this case 8;
the specific formula for defining the repetition rate and the matching rate is as follows:
the overlapping rate: and L is r/n.
Matching rate: and M is M/n.
Similarity: q ^ M ^2 xL ^ M ^2/n ^2 ^ x (r/n).
Then, after the massive logs are grouped by similarity, the massive logs are included in different groups.
In order to accurately distribute log data, in an embodiment of the log data processing method of the present application, referring to fig. 5, step S102 may further specifically include the following:
step S501: and determining a corresponding transmission protocol, a transmission server and a thread sending mode according to different types of the log data.
Step S502: and distributing the log data to a system to be tested according to the transmission protocol, the transmission server and the thread sending mode to carry out system test.
Optionally, the classified logs may be distributed through multiple servers and multiple concurrent servers, for example, the classified logs are received and brought into a distribution queue, the logs and a corresponding protocol are associated and matched through a log template maintained by a user, the matched logs are distributed to a designated log distribution node, and then a real access request is simulated in a thread concurrent manner and sent to a target system to be tested.
In order to accurately and conveniently classify log data and send the log data to a system to be tested for system testing, and improve testing efficiency and accuracy, the application provides an embodiment of a log data processing device for implementing all or part of the contents of the log data processing method, and referring to fig. 6, the log data processing device specifically includes the following contents:
the log preprocessing module 10 is configured to acquire log data from a set data lake, perform data analysis and data cleaning on the log data, and obtain the log data after the data analysis and the data cleaning, where the log data during production and operation of the system to be tested is stored in the data lake.
And the log classification and distribution module 20 is configured to classify the log data according to the character string similarity of the log data, determine a corresponding data transmission mode according to different categories of the log data, and send the data transmission mode to a system to be tested to perform a system test.
According to the log data processing device, the log data can be analyzed and cleaned through obtaining the log data from the set data lake, then the log data are classified according to the character string similarity of the log data, the corresponding data transmission mode is determined according to different types of the log data and the log data are sent to the system to be tested for system testing, so that the log data can be accurately and conveniently classified and sent to the system to be tested for system testing, and the testing efficiency and the accuracy are improved.
In order to reliably store log data and efficiently obtain the log data, in an embodiment of the log data processing apparatus of the present application, referring to fig. 7, the log preprocessing module 10 includes:
the persistent storage unit 11 is configured to obtain log data from a set data lake in batches according to a set time period and persistently store the log data into a local memory, where the data lake stores the log data during production and operation of the system to be tested through a distributed file system.
In order to pre-process the acquired log data, in an embodiment of the log data processing apparatus of the present application, referring to fig. 8, the log pre-processing module 10 further includes:
and a log analysis unit 12, configured to analyze the log data according to a set data analysis format to obtain log data in a standard format.
And the log cleaning unit 13 is configured to acquire log data in a standard format in batches to a data cleaning queue, and perform cleaning and filtering on the log data in the data cleaning queue according to a set data cleaning rule to obtain the log data after data analysis and data cleaning.
In order to accurately classify log data, in an embodiment of the log data processing apparatus of the present application, referring to fig. 9, the log classification distribution module 20 includes:
the similarity calculation unit 21 is configured to determine a similarity of character strings between two log data according to a character length, the number of identical characters, the number of overlapping characters, an overlapping rate, and a matching rate of character strings with set lengths in the two log data.
And the log classifying unit 22 is used for classifying the log data with the character string similarity larger than the threshold into the same category.
In order to accurately calculate the overlapping rate and the matching rate, in an embodiment of the log data processing apparatus of the present application, referring to fig. 10, the log classification distribution module 20 further includes:
an overlap ratio determining unit 23 for determining an overlap ratio based on the character length and the number of overlapped characters of the character string of the set length in the log data.
A matching rate determining unit 24, configured to determine a matching rate according to the character length and the same number of characters of the character string with the set length in the two log data.
In order to accurately distribute log data, in an embodiment of the log data processing apparatus of the present application, referring to fig. 11, the log classification distribution module 20 further includes:
and a transmission determining unit 25, configured to determine, according to different categories of the log data, corresponding transmission protocols, transmission servers, and thread sending manners.
And the log distribution unit 26 is configured to distribute the log data to the system to be tested according to the transmission protocol, the transmission server, and the thread sending mode to perform system testing.
To further explain the present solution, the present application further provides a specific application example of implementing the log data processing method by using the log data processing apparatus, and refer to fig. 12, which specifically includes the following contents:
the big data storage device 1 is used for storing structured log information generated in actual operation in production, the data storage unit stores massive logs in an HDFS file mode, statistics and caching are carried out on the log storage condition through the metadata management unit so as to accelerate the log access timeliness, and the data caching unit caches commonly used retrieval information and provides high-efficiency access efficiency for log access of the device.
The data analysis device 2 is used for reading data from the big data lake and analyzing and cleaning the data according to a certain format. The data pulling unit is deployed in the analysis device and connected with the big data lake through a client, log files are pulled from the data lake in batches, the pulled logs are stored in the analysis device in a persistent mode through the data persistence unit and are read and written in a memory mode, subsequent analysis and quick access are facilitated, and the format analysis unit analyzes a message header and a message body in the logs according to a standard format in a json analysis mode.
The log cleaning device 3 is used for cleaning and filtering the analyzed structured logs according to a certain rule, wherein the log reading unit is responsible for reading the structured log data in batches from the data persistence unit and adding the structured log data into a cleaning queue, the analysis rule management unit is responsible for managing the log analysis rule, a user can analyze the logs through a preset log format, and the cleaning execution unit is responsible for filtering the logs which do not conform to the regular expression of the rule according to the analysis rule preset by the user and keeping effective log information.
The log classifying unit 4 is responsible for classifying the similarity of the character strings of the specific log for each row of logs read from the big data storage device, the text extracting unit is responsible for reading the cleaned logs and transmitting the cleaned logs to the similarity matching unit, and the algorithm is as follows:
setting a category array, storing by category-log content-log occurrence time dimensions, and initializing a character string similarity threshold, wherein the character strings with similarity exceeding the threshold are of one category. To avoid the impact of frequent abnormal logs on our training, we introduce a categorizing interval parameter, i.e. similar logs within a time interval are only recorded once.
For a certain category, for each row of specific logs we compare with the similarity of each row of specific logs of the final category array of the category: if the similarity of the character strings with a certain row of specific logs in the final category exceeds a threshold value, the two character strings are classified into one category, only the time point of the specific log to be analyzed is stored in the category, and the analysis of the row of logs is stopped. If the similarity with the string of any row of the specific log in the final category is below the threshold. We find a new category. A row of records is added to the final category. And taking the point corresponding to the time interval of the log as the first time record of the time array of the category.
When judging the similarity, a matrix similarity comparison method is adopted, namely two character strings with equal length are given, and comparison is carried out in the moving process:
n: the length of the string, which is 10 in this case;
m: the same character, in this case 3, includes d, a, c;
r: the overlap of the two strings, in this case 8;
we define the repetition rate, match rate, etc. as follows:
the overlapping rate: and L is r/n.
Matching rate: and M is M/n.
Similarity: q ^ M ^2 xL ^ M ^2/n ^2 ^ x (r/n).
The log classifying unit 4 is responsible for bringing the massive logs into different classifications after being classified according to the similarity.
The automatic distribution device 5 and the automatic distribution device 6 are responsible for distributing the classified logs through a plurality of servers and a plurality of concurrent logs, the log receiving unit receives the classified logs and brings the classified logs into a distribution queue, the protocol splicing unit performs association matching between the logs and a corresponding protocol through a log template maintained by a user, the matched logs are distributed to designated log distribution nodes, and the simulation request unit simulates real access requests and sends the real access requests to a target system in a thread concurrent mode.
According to the method and the device, data do not need to be classified and labeled manually, the system is helped to perform playback test on massive log information, the coverage is complete, and the simulation degree is high. The system can be used in the system development and test process, and the robot automatically completes log sorting and playback, so that the software safety is guaranteed to a great extent, and the potential loss caused by abnormal transactions is reduced.
In order to accurately and conveniently classify log data and send the log data to a system to be tested for system testing on a hardware level, and improve testing efficiency and accuracy, the application provides an embodiment of an electronic device for implementing all or part of contents in the log data processing method, and the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the log data processing device and relevant equipment such as a core service system, a user terminal, a relevant database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the logic controller may be implemented with reference to the embodiment of the log data processing method and the embodiment of the log data processing apparatus in the embodiment, and the contents thereof are incorporated herein, and repeated descriptions are omitted.
It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), an in-vehicle device, a smart wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In practical applications, part of the log data processing method may be executed on the electronic device side as described in the above, or all operations may be completed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device may further include a processor if all operations are performed in the client device.
The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
Fig. 13 is a schematic block diagram of a system configuration of an electronic device 9600 according to the embodiment of the present application. As shown in fig. 13, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 13 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the log data processing method function may be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:
step S101: obtaining log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning, wherein the log data during the production and the operation of the system to be tested is stored in the data lake.
Step S102: and classifying the log data according to the character string similarity of the log data, determining corresponding data transmission modes according to different classes of the log data, and sending the log data to a system to be tested for system testing.
According to the description, the electronic equipment provided by the embodiment of the application acquires the log data from the set data lake, analyzes and cleans the log data, classifies the log data according to the character string similarity of the log data, determines the corresponding data transmission mode according to different classes of the log data, and sends the log data to the system to be tested for system test, so that the log data can be accurately and conveniently classified and sent to the system to be tested for system test, and the test efficiency and the accuracy are improved.
In another embodiment, the log data processing apparatus may be configured separately from the central processor 9100, for example, the log data processing apparatus may be configured as a chip connected to the central processor 9100, and the function of the log data processing method may be implemented by the control of the central processor.
As shown in fig. 13, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 13; in addition, the electronic device 9600 may further include components not shown in fig. 13, which can be referred to in the prior art.
As shown in fig. 13, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the log data processing method in which the execution subject is the server or the client in the foregoing embodiments, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements all the steps in the log data processing method in which the execution subject is the server or the client in the foregoing embodiments, for example, when the processor executes the computer program, implements the following steps:
step S101: obtaining log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning, wherein the log data during the production and the operation of the system to be tested is stored in the data lake.
Step S102: and classifying the log data according to the character string similarity of the log data, determining corresponding data transmission modes according to different classes of the log data, and sending the log data to a system to be tested for system testing.
As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application obtains log data from a set data lake, performs data analysis and data cleaning on the log data, classifies the log data according to the similarity of character strings of the log data, determines a corresponding data transmission mode according to different classes of the log data, and sends the log data to a system to be tested for system testing, so that the log data can be accurately and conveniently classified and sent to the system to be tested for system testing, and the testing efficiency and accuracy are improved.
Embodiments of the present application further provide a computer program product capable of implementing all steps in the log data processing method in which the execution subject in the foregoing embodiments is a server or a client, and when executed by a processor, the computer program/instruction implements the steps of the log data processing method, for example, the computer program/instruction implements the following steps:
step S101: obtaining log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning, wherein the log data during the production and the operation of the system to be tested is stored in the data lake.
Step S102: and classifying the log data according to the character string similarity of the log data, determining corresponding data transmission modes according to different classes of the log data, and sending the log data to a system to be tested for system testing.
As can be seen from the above description, the computer program product provided in the embodiment of the present application obtains log data from a set data lake, analyzes and cleans the log data, classifies the log data according to the string similarity of the log data, determines a corresponding data transmission mode according to different classes of the log data, and sends the log data to a system to be tested for system testing, so that the log data can be accurately and conveniently classified and sent to the system to be tested for system testing, and the testing efficiency and accuracy are improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method of processing log data, the method comprising:
acquiring log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and the data cleaning, wherein the log data of the tested system during production and operation are stored in the data lake;
and classifying the log data according to the character string similarity of the log data, determining corresponding data transmission modes according to different classes of the log data, and sending the log data to a system to be tested for system testing.
2. The log data processing method of claim 1, wherein the obtaining log data from a setting data lake comprises:
and obtaining log data from a set data lake in batches according to a set time period and storing the log data into a local memory in a persistent mode, wherein the data lake stores the log data of the tested system during production and operation through a distributed file system.
3. The log data processing method of claim 1, wherein the performing data analysis and data cleaning on the log data to obtain the log data after the data analysis and data cleaning comprises:
analyzing the log data according to a set data analysis format to obtain log data in a standard format;
obtaining log data in a standard format in batches to a data cleaning queue, cleaning and filtering the log data in the data cleaning queue according to a set data cleaning rule, and obtaining the log data after data analysis and data cleaning.
4. The method according to claim 1, wherein the classifying the log data according to the string similarity of the log data comprises:
determining the similarity of character strings between two log data according to the character length, the same character number, the overlapped character number, the overlapping rate and the matching rate of the character strings with set lengths in the two log data;
and classifying the log data with the character string similarity larger than a threshold value into the same category.
5. The log data processing method according to claim 4, wherein before determining the similarity of the character strings between the log data according to the character length, the number of identical characters, the number of overlapping characters, the overlapping ratio, and the matching ratio of the character strings having the set length in the log data, the method comprises:
determining the overlapping rate according to the character length and the number of overlapping characters of the character string with the set length in the two log data;
and determining the matching rate according to the character length and the same number of characters of the character string with the set length in the two log data.
6. The method for processing the log data according to claim 1, wherein the determining the corresponding data transmission mode according to the different types of the log data and sending the determined data transmission mode to a system to be tested for system testing comprises:
determining a corresponding transmission protocol, a transmission server and a thread sending mode according to different types of the log data;
and distributing the log data to a system to be tested according to the transmission protocol, the transmission server and the thread sending mode to carry out system test.
7. A log data processing apparatus characterized by comprising:
the log preprocessing module is used for acquiring log data from a set data lake, and performing data analysis and data cleaning on the log data to obtain the log data subjected to the data analysis and the data cleaning, wherein the log data of the tested system during production and operation are stored in the data lake;
and the log classification and distribution module is used for classifying the log data according to the character string similarity of the log data, determining a corresponding data transmission mode according to different classes of the log data and sending the log data to a system to be tested for system testing.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the log data processing method of any one of claims 1 to 6 are implemented when the processor executes the program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the log data processing method of any one of claims 1 to 6.
10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the log data processing method of any of claims 1 to 6.
CN202210457673.4A 2022-04-28 2022-04-28 Log data processing method and device Pending CN114840421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210457673.4A CN114840421A (en) 2022-04-28 2022-04-28 Log data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210457673.4A CN114840421A (en) 2022-04-28 2022-04-28 Log data processing method and device

Publications (1)

Publication Number Publication Date
CN114840421A true CN114840421A (en) 2022-08-02

Family

ID=82567650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210457673.4A Pending CN114840421A (en) 2022-04-28 2022-04-28 Log data processing method and device

Country Status (1)

Country Link
CN (1) CN114840421A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303322A (en) * 2023-05-19 2023-06-23 北京长亭科技有限公司 Declaration type log generalization method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303322A (en) * 2023-05-19 2023-06-23 北京长亭科技有限公司 Declaration type log generalization method and device
CN116303322B (en) * 2023-05-19 2023-08-11 北京长亭科技有限公司 Declaration type log generalization method and device

Similar Documents

Publication Publication Date Title
CN106776253A (en) A kind of interface interim card monitoring method and device
CN107423085B (en) Method and apparatus for deploying applications
CN112783793B (en) Automatic interface test system and method
CN110956956A (en) Voice recognition method and device based on policy rules
CN111782470A (en) Distributed container log data processing method and device
CN111488995A (en) Method and apparatus for evaluating a joint training model
CN111694644A (en) Message processing method and device based on robot operating system and computer equipment
CN110059064B (en) Log file processing method and device and computer readable storage medium
CN107402878B (en) Test method and device
CN114840421A (en) Log data processing method and device
CN113128986A (en) Error reporting processing method and device for long-link transaction
CN113095782A (en) Automatic approval decision-making method and device
CN111767558A (en) Data access monitoring method, device and system
CN109413663B (en) Information processing method and equipment
CN113515447B (en) Automatic testing method and device for system
CN112631850A (en) Fault scene simulation method and device
CN114285657A (en) Firewall security policy change verification method and device
CN113760680A (en) Method and device for testing system pressure performance
CN112905491B (en) Software test effectiveness analysis method and device
CN110879868A (en) Consultant scheme generation method, device, system, electronic equipment and medium
CN113328911B (en) Traffic link monitoring method and device during service operation
CN111581944B (en) Method, apparatus, device and medium for generating information
CN113392011A (en) Link segmentation performance testing method and device
CN113419953A (en) Test data processing method, device and system
CN113705830A (en) Data processing method and device based on DPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination